Skip to main content

Hybrid Search Implementation Guide

NeuraFlow implements a hybrid search system that combines dense vector semantic search with BM25 lexical search using Reciprocal Rank Fusion (RRF). This approach provides consistent retrieval quality across varying query lengths and supports multilingual content.


Overview

Architecture Components


Problem Statement

When using text-embedding-3-large for semantic search, we observed inconsistent similarity scores:

Query TypeExampleSemantic Score Range
Single word"pricing"0.10 - 0.20
Short phrase"return policy"0.15 - 0.25
Full question"What is the return policy for electronics?"0.25 - 0.40

Problems:

  • Inconsistent thresholds: A threshold of 0.3 would filter out valid short queries but be too permissive for long queries
  • Poor precision on short queries: Single-word queries produced many false positives
  • Semantic drift: Embeddings sometimes captured related but irrelevant concepts

Why text-embedding-ada-002 Was Not the Solution

While text-embedding-ada-002 produced more consistent scores (similar to post-RRF scores), it was unsuitable for production due to:

LimitationImpact
English-centricPoor performance on non-English content
Legacy modelNo future updates or improvements
Limited multilingualInconsistent cross-language retrieval

Business requirement: Support multilingual content and future localization capabilities required text-embedding-3-large.


Experimental Findings & Background

This section documents the empirical observations that led to the adoption of hybrid search.

Semantic Search Score Analysis with text-embedding-3-large

During testing with the text-embedding-3-large embedding model, we observed that semantic similarity scores were significantly lower than expected, particularly for shorter queries:

Query LengthExample QueryObserved Score RangeRetrieval Quality
1 word"pricing"0.1 - 0.2Poor - many relevant results filtered out
2-3 words"return policy"0.15 - 0.25Inconsistent
Short phrase"how to return items"0.20 - 0.30Moderate
Full sentence"What is the return policy for electronics?"0.25 - 0.40Better but still low
Long query"Can you explain the process for returning defective electronics purchased online?"0.30 - 0.45Best, but threshold still problematic

The Threshold Dilemma

The inconsistent score ranges created a fundamental problem when selecting a retrieval threshold:

ThresholdEffect on Short QueriesEffect on Long Queries
0.15Retrieves relevant contentToo permissive, includes noise
0.25Filters out valid resultsStill somewhat permissive
0.35Most results filtered outGood precision
0.40Almost nothing retrievedMisses some relevant content

Conclusion: No single threshold could provide consistent retrieval quality across varying query lengths.

Comparison with text-embedding-ada-002

Interestingly, the legacy text-embedding-ada-002 model produced scores that were more aligned with expected retrieval behavior:

Query Typeada-002 Score3-large ScoreDifference
1 word0.50 - 0.650.10 - 0.20-0.40
Short phrase0.55 - 0.700.20 - 0.30-0.35
Full sentence0.60 - 0.800.30 - 0.45-0.30

The text-embedding-ada-002 scores were similar to what we later achieved with hybrid search + RRF fusion. However, switching back to ada-002 was not viable because:

  1. Multilingual retrieval: text-embedding-3-large provides superior performance for non-English content
  2. Future localization: Product roadmap includes support for multiple languages
  3. Model longevity: ada-002 is a legacy model with no future improvements expected
  4. Cross-lingual search: text-embedding-3-large enables searching across language boundaries

The Hybrid Search Solution

By introducing hybrid search with BM25 lexical matching and RRF fusion, the revised scores became much more aligned with accurate content retrieval:

Query TypePure Semantic (3-large)Hybrid + RRFImprovement
1 word0.10 - 0.200.50 - 0.70+300%
Short phrase0.20 - 0.300.55 - 0.72+175%
Full sentence0.30 - 0.450.60 - 0.80+80%

Key Finding: Universal Threshold

After implementing hybrid search, a threshold of 0.5 became universally capable of retrieving accurate and relevant content across all query types:

Summary of Findings

AspectFinding
Root causetext-embedding-3-large produces lower absolute similarity scores than ada-002
ImpactThreshold selection was query-length dependent
SolutionHybrid search (semantic + BM25) with RRF fusion
ResultConsistent scores enabling a universal 0.5 threshold
Trade-off acceptedSlight increase in query latency (~30ms) for significantly improved consistency
Why not ada-002Required text-embedding-3-large for multilingual support and future localization

Solution: Hybrid Search with RRF

How Hybrid Search Solves the Problem

Results after implementing hybrid search:

Query TypeSemantic ScoreRRF Fused ScoreImprovement
Single word0.10 - 0.200.50 - 0.70+250%
Short phrase0.15 - 0.250.55 - 0.75+200%
Full question0.25 - 0.400.60 - 0.85+100%

Key benefit: A universal threshold of 0.5 now works consistently across all query types.


Search Types

NeuraFlow supports three search modes:

Search Type Comparison

FeatureSemanticLexicalHybrid
Best forConceptual queriesExact keyword matchesGeneral use
Query lengthLonger queriesAny lengthAny length
Synonym handlingExcellentNoneGood
Typo toleranceGoodPoorModerate
SpeedFastVery fastModerate
RecommendedConceptual searchKnown keywordsDefault

Technical Implementation

Reciprocal Rank Fusion (RRF)

RRF combines rankings from multiple search methods without requiring score normalization:

RRF Formula:

RRF_score(d) = Σ 1 / (k + rank_i(d))

Where:

  • d = document
  • k = ranking constant (typically 60)
  • rank_i(d) = rank of document in search method i

Qdrant Named Vectors

The collection uses named vectors for hybrid search:

# Collection Configuration
vectors_config = {
"dense": VectorParams(
size=3072, # text-embedding-3-large
distance=Distance.COSINE,
)
}

sparse_vectors_config = {
"bm25": SparseVectorParams(
modifier=Modifier.IDF, # Required for BM25 scoring
)
}

Document Indexing

Each document chunk is stored with both dense and sparse vectors:

PointStruct(
id=point_id,
vector={
"dense": [0.123, 0.456, ...], # 3072-dim embedding
"bm25": Document(
text="chunk text content",
model="Qdrant/bm25", # Server-side tokenization
),
},
payload={
"document_id": "uuid",
"chunk_index": 0,
"text": "chunk content",
"metadata": {...}
}
)

Search Flow


API Reference

Search Endpoint

POST /api/v1/kb/search
Content-Type: application/json

{
"query": "What is the vacation policy?",
"search_type": "hybrid",
"limit": 10,
"score_threshold": 0.5,
"document_ids": ["uuid-1", "uuid-2"]
}

Request Parameters

FieldTypeDefaultDescription
querystringrequiredSearch query (1-500 characters)
search_typeenum"semantic""semantic", "lexical", or "hybrid"
limitint10Max results (1-50)
score_thresholdfloat0.0Min score filter (0.0-1.0)
document_idsUUID[]nullFilter by specific documents

Response Format

{
"query": "What is the vacation policy?",
"results": [
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"chunk_index": 5,
"text": "Employees are entitled to 20 days of paid vacation per year...",
"score": 0.72,
"metadata": {
"filename": "employee_handbook.pdf",
"page": 15
}
}
],
"total_results": 1
}

Search Type Enum

class SearchType(str, Enum):
SEMANTIC = "semantic" # Dense vector similarity
LEXICAL = "lexical" # BM25 keyword search
HYBRID = "hybrid" # Combined with RRF fusion

Agent Flow Integration

KB Tools Configuration

The knowledge base tool in agent flows uses hybrid search by default:

# core/app/services/graph/tools/kb_tools.py

def create_kb_tools(
kb_service,
user_id: Optional[UUID] = None,
document_ids: Optional[List[str]] = None,
score_threshold: float = 0.3,
limit: int = 5,
search_type: str = "hybrid", # Default to hybrid
):

Search Type in Agent Nodes

Agent nodes can specify search type in their configuration:

{
"type": "agent",
"data": {
"name": "Support Agent",
"knowledgeBases": [
{
"id": "doc-uuid",
"title": "Employee Handbook"
}
],
"searchType": "hybrid",
"scoreThreshold": 0.5
}
}

Configuration

Environment Variables

# Embedding Model Configuration
OPENAI_EMBEDDING_MODEL=text-embedding-3-large

# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=<optional-api-key>
QDRANT_COLLECTION_NAME=knowledge_base

# Default Search Settings (optional)
KB_DEFAULT_SEARCH_TYPE=hybrid
KB_SCORE_THRESHOLD=0.5
KB_SEARCH_LIMIT=10

Embedding Model Options

ModelDimensionsMultilingualRecommended
text-embedding-3-large3072ExcellentYes
text-embedding-3-small1536GoodBudget
text-embedding-ada-0021536LimitedLegacy

Score Threshold Guidelines

Use CaseThresholdRationale
High recall (chatbots)0.3 - 0.4Include borderline relevant results
Balanced0.5Recommended default
High precision0.6 - 0.7Only highly relevant results
Strict matching0.8+Near-exact matches only

Developer Setup

Prerequisites

RequirementDetails
Qdrantv1.7+ with hybrid search support
OpenAI APIAccess to embedding models
Python3.11+

Step 1: Configure Qdrant Collection

The collection is created automatically with hybrid support:

# Automatic collection creation with hybrid vectors
qdrant_client._create_hybrid_collection(
collection_name="knowledge_base",
vector_size=3072, # text-embedding-3-large
distance=Distance.COSINE,
enable_hybrid=True,
)

Step 2: Index Documents

Documents are indexed with both dense and sparse vectors:

# Document processing pipeline
1. Parse document → chunks
2. Generate embeddings (text-embedding-3-large)
3. Upsert hybrid vectors (dense + BM25 text)
from app.services.knowledge_base.retrieval_service import RetrievalService

retrieval = RetrievalService()

# Hybrid search (recommended)
results = await retrieval.search(
query="vacation policy",
limit=10,
score_threshold=0.5,
search_type="hybrid",
)

Troubleshooting

"Low scores on short queries"

CauseSolution
Using semantic search onlySwitch to search_type: "hybrid"
Threshold too highLower to 0.3-0.5 for hybrid search
Old embedding modelEnsure using text-embedding-3-large

"BM25 search returns no results"

CauseSolution
Sparse vectors not indexedRe-index documents with hybrid vectors
Query too shortBM25 needs at least 1 meaningful term
Collection missing BM25 configRecreate collection with sparse vectors

"Hybrid search slower than semantic"

CauseSolution
Large prefetch limitReduce from 2x to 1.5x if needed
Many documentsAdd document_id filters
Network latencyConsider Qdrant colocation

"Inconsistent scores across deployments"

CauseSolution
Different embedding modelsStandardize on text-embedding-3-large
Collection recreationRe-embed all documents together
Version mismatchEnsure Qdrant version consistency

Performance Considerations

Latency Comparison

Search TypeAvg LatencyNotes
Semantic~50msSingle vector search
Lexical~30msBM25 is very fast
Hybrid~80msTwo searches + fusion

Optimization Tips

  1. Use document filters: Filter by document_ids when possible
  2. Limit prefetch: Default 2x is usually sufficient
  3. Cache embeddings: Query embeddings can be cached for repeated searches
  4. Batch searches: Use search_by_multiple_documents for efficiency

Quick Reference

Search Type Selection

SettingValueRationale
Search TypehybridBest overall performance
Score Threshold0.5Balanced precision/recall
Limit5-10Sufficient context for LLMs
Embedding Modeltext-embedding-3-largeMultilingual support

API Quick Start

# Hybrid search (recommended)
curl -X POST "http://localhost:8009/api/v1/kb/search" \
-H "Content-Type: application/json" \
-d '{
"query": "vacation policy",
"search_type": "hybrid",
"limit": 10,
"score_threshold": 0.5
}'


Appendix: Code References

Backend Files

FilePurpose
core/app/services/knowledge_base/retrieval_service.pySearch orchestration, filtering, reranking
core/app/vector_store/vector_operations.pyQdrant operations (semantic/lexical/hybrid)
core/app/vector_store/qdrant_client.pyQdrant connection, hybrid collection setup
core/app/services/knowledge_base/embedding_service.pyOpenAI embedding generation
core/app/schemas/knowledge_base.pySearchType enum, request/response schemas
core/app/api/v1/knowledge_base.pyREST API endpoint
core/app/services/graph/tools/kb_tools.pyAgent flow KB tool integration

Key Functions

FunctionLocationPurpose
hybrid_search()vector_operations.py:765RRF fusion search
semantic_search()vector_operations.py:628Dense vector search
lexical_search()vector_operations.py:697BM25 sparse search
search()retrieval_service.py:73High-level search router
upsert_hybrid_vectors_batch()qdrant_client.py:505Index documents with hybrid vectors

Support

For issues or questions: