Hybrid Search Implementation Guide
NeuraFlow implements a hybrid search system that combines dense vector semantic search with BM25 lexical search using Reciprocal Rank Fusion (RRF). This approach provides consistent retrieval quality across varying query lengths and supports multilingual content.
Overview
Architecture Components
Problem Statement
The Challenge with Pure Semantic Search
When using text-embedding-3-large for semantic search, we observed inconsistent similarity scores:
| Query Type | Example | Semantic Score Range |
|---|---|---|
| Single word | "pricing" | 0.10 - 0.20 |
| Short phrase | "return policy" | 0.15 - 0.25 |
| Full question | "What is the return policy for electronics?" | 0.25 - 0.40 |
Problems:
- Inconsistent thresholds: A threshold of 0.3 would filter out valid short queries but be too permissive for long queries
- Poor precision on short queries: Single-word queries produced many false positives
- Semantic drift: Embeddings sometimes captured related but irrelevant concepts
Why text-embedding-ada-002 Was Not the Solution
While text-embedding-ada-002 produced more consistent scores (similar to post-RRF scores), it was unsuitable for production due to:
| Limitation | Impact |
|---|---|
| English-centric | Poor performance on non-English content |
| Legacy model | No future updates or improvements |
| Limited multilingual | Inconsistent cross-language retrieval |
Business requirement: Support multilingual content and future localization capabilities required text-embedding-3-large.
Experimental Findings & Background
This section documents the empirical observations that led to the adoption of hybrid search.
Semantic Search Score Analysis with text-embedding-3-large
During testing with the text-embedding-3-large embedding model, we observed that semantic similarity scores were significantly lower than expected, particularly for shorter queries:
| Query Length | Example Query | Observed Score Range | Retrieval Quality |
|---|---|---|---|
| 1 word | "pricing" | 0.1 - 0.2 | Poor - many relevant results filtered out |
| 2-3 words | "return policy" | 0.15 - 0.25 | Inconsistent |
| Short phrase | "how to return items" | 0.20 - 0.30 | Moderate |
| Full sentence | "What is the return policy for electronics?" | 0.25 - 0.40 | Better but still low |
| Long query | "Can you explain the process for returning defective electronics purchased online?" | 0.30 - 0.45 | Best, but threshold still problematic |
The Threshold Dilemma
The inconsistent score ranges created a fundamental problem when selecting a retrieval threshold:
| Threshold | Effect on Short Queries | Effect on Long Queries |
|---|---|---|
| 0.15 | Retrieves relevant content | Too permissive, includes noise |
| 0.25 | Filters out valid results | Still somewhat permissive |
| 0.35 | Most results filtered out | Good precision |
| 0.40 | Almost nothing retrieved | Misses some relevant content |
Conclusion: No single threshold could provide consistent retrieval quality across varying query lengths.
Comparison with text-embedding-ada-002
Interestingly, the legacy text-embedding-ada-002 model produced scores that were more aligned with expected retrieval behavior:
| Query Type | ada-002 Score | 3-large Score | Difference |
|---|---|---|---|
| 1 word | 0.50 - 0.65 | 0.10 - 0.20 | -0.40 |
| Short phrase | 0.55 - 0.70 | 0.20 - 0.30 | -0.35 |
| Full sentence | 0.60 - 0.80 | 0.30 - 0.45 | -0.30 |
The text-embedding-ada-002 scores were similar to what we later achieved with hybrid search + RRF fusion. However, switching back to ada-002 was not viable because:
- Multilingual retrieval:
text-embedding-3-largeprovides superior performance for non-English content - Future localization: Product roadmap includes support for multiple languages
- Model longevity:
ada-002is a legacy model with no future improvements expected - Cross-lingual search:
text-embedding-3-largeenables searching across language boundaries
The Hybrid Search Solution
By introducing hybrid search with BM25 lexical matching and RRF fusion, the revised scores became much more aligned with accurate content retrieval:
| Query Type | Pure Semantic (3-large) | Hybrid + RRF | Improvement |
|---|---|---|---|
| 1 word | 0.10 - 0.20 | 0.50 - 0.70 | +300% |
| Short phrase | 0.20 - 0.30 | 0.55 - 0.72 | +175% |
| Full sentence | 0.30 - 0.45 | 0.60 - 0.80 | +80% |
Key Finding: Universal Threshold
After implementing hybrid search, a threshold of 0.5 became universally capable of retrieving accurate and relevant content across all query types:
Summary of Findings
| Aspect | Finding |
|---|---|
| Root cause | text-embedding-3-large produces lower absolute similarity scores than ada-002 |
| Impact | Threshold selection was query-length dependent |
| Solution | Hybrid search (semantic + BM25) with RRF fusion |
| Result | Consistent scores enabling a universal 0.5 threshold |
| Trade-off accepted | Slight increase in query latency (~30ms) for significantly improved consistency |
| Why not ada-002 | Required text-embedding-3-large for multilingual support and future localization |
Solution: Hybrid Search with RRF
How Hybrid Search Solves the Problem
Results after implementing hybrid search:
| Query Type | Semantic Score | RRF Fused Score | Improvement |
|---|---|---|---|
| Single word | 0.10 - 0.20 | 0.50 - 0.70 | +250% |
| Short phrase | 0.15 - 0.25 | 0.55 - 0.75 | +200% |
| Full question | 0.25 - 0.40 | 0.60 - 0.85 | +100% |
Key benefit: A universal threshold of 0.5 now works consistently across all query types.
Search Types
NeuraFlow supports three search modes:
Search Type Comparison
| Feature | Semantic | Lexical | Hybrid |
|---|---|---|---|
| Best for | Conceptual queries | Exact keyword matches | General use |
| Query length | Longer queries | Any length | Any length |
| Synonym handling | Excellent | None | Good |
| Typo tolerance | Good | Poor | Moderate |
| Speed | Fast | Very fast | Moderate |
| Recommended | Conceptual search | Known keywords | Default |
Technical Implementation
Reciprocal Rank Fusion (RRF)
RRF combines rankings from multiple search methods without requiring score normalization:
RRF Formula:
RRF_score(d) = Σ 1 / (k + rank_i(d))
Where:
d= documentk= ranking constant (typically 60)rank_i(d)= rank of document in search methodi
Qdrant Named Vectors
The collection uses named vectors for hybrid search:
# Collection Configuration
vectors_config = {
"dense": VectorParams(
size=3072, # text-embedding-3-large
distance=Distance.COSINE,
)
}
sparse_vectors_config = {
"bm25": SparseVectorParams(
modifier=Modifier.IDF, # Required for BM25 scoring
)
}
Document Indexing
Each document chunk is stored with both dense and sparse vectors:
PointStruct(
id=point_id,
vector={
"dense": [0.123, 0.456, ...], # 3072-dim embedding
"bm25": Document(
text="chunk text content",
model="Qdrant/bm25", # Server-side tokenization
),
},
payload={
"document_id": "uuid",
"chunk_index": 0,
"text": "chunk content",
"metadata": {...}
}
)
Search Flow
API Reference
Search Endpoint
POST /api/v1/kb/search
Content-Type: application/json
{
"query": "What is the vacation policy?",
"search_type": "hybrid",
"limit": 10,
"score_threshold": 0.5,
"document_ids": ["uuid-1", "uuid-2"]
}
Request Parameters
| Field | Type | Default | Description |
|---|---|---|---|
query | string | required | Search query (1-500 characters) |
search_type | enum | "semantic" | "semantic", "lexical", or "hybrid" |
limit | int | 10 | Max results (1-50) |
score_threshold | float | 0.0 | Min score filter (0.0-1.0) |
document_ids | UUID[] | null | Filter by specific documents |
Response Format
{
"query": "What is the vacation policy?",
"results": [
{
"document_id": "550e8400-e29b-41d4-a716-446655440000",
"chunk_index": 5,
"text": "Employees are entitled to 20 days of paid vacation per year...",
"score": 0.72,
"metadata": {
"filename": "employee_handbook.pdf",
"page": 15
}
}
],
"total_results": 1
}
Search Type Enum
class SearchType(str, Enum):
SEMANTIC = "semantic" # Dense vector similarity
LEXICAL = "lexical" # BM25 keyword search
HYBRID = "hybrid" # Combined with RRF fusion
Agent Flow Integration
KB Tools Configuration
The knowledge base tool in agent flows uses hybrid search by default:
# core/app/services/graph/tools/kb_tools.py
def create_kb_tools(
kb_service,
user_id: Optional[UUID] = None,
document_ids: Optional[List[str]] = None,
score_threshold: float = 0.3,
limit: int = 5,
search_type: str = "hybrid", # Default to hybrid
):
Search Type in Agent Nodes
Agent nodes can specify search type in their configuration:
{
"type": "agent",
"data": {
"name": "Support Agent",
"knowledgeBases": [
{
"id": "doc-uuid",
"title": "Employee Handbook"
}
],
"searchType": "hybrid",
"scoreThreshold": 0.5
}
}
Configuration
Environment Variables
# Embedding Model Configuration
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=<optional-api-key>
QDRANT_COLLECTION_NAME=knowledge_base
# Default Search Settings (optional)
KB_DEFAULT_SEARCH_TYPE=hybrid
KB_SCORE_THRESHOLD=0.5
KB_SEARCH_LIMIT=10
Embedding Model Options
| Model | Dimensions | Multilingual | Recommended |
|---|---|---|---|
text-embedding-3-large | 3072 | Excellent | Yes |
text-embedding-3-small | 1536 | Good | Budget |
text-embedding-ada-002 | 1536 | Limited | Legacy |
Score Threshold Guidelines
| Use Case | Threshold | Rationale |
|---|---|---|
| High recall (chatbots) | 0.3 - 0.4 | Include borderline relevant results |
| Balanced | 0.5 | Recommended default |
| High precision | 0.6 - 0.7 | Only highly relevant results |
| Strict matching | 0.8+ | Near-exact matches only |
Developer Setup
Prerequisites
| Requirement | Details |
|---|---|
| Qdrant | v1.7+ with hybrid search support |
| OpenAI API | Access to embedding models |
| Python | 3.11+ |
Step 1: Configure Qdrant Collection
The collection is created automatically with hybrid support:
# Automatic collection creation with hybrid vectors
qdrant_client._create_hybrid_collection(
collection_name="knowledge_base",
vector_size=3072, # text-embedding-3-large
distance=Distance.COSINE,
enable_hybrid=True,
)
Step 2: Index Documents
Documents are indexed with both dense and sparse vectors:
# Document processing pipeline
1. Parse document → chunks
2. Generate embeddings (text-embedding-3-large)
3. Upsert hybrid vectors (dense + BM25 text)
Step 3: Query with Hybrid Search
from app.services.knowledge_base.retrieval_service import RetrievalService
retrieval = RetrievalService()
# Hybrid search (recommended)
results = await retrieval.search(
query="vacation policy",
limit=10,
score_threshold=0.5,
search_type="hybrid",
)
Troubleshooting
"Low scores on short queries"
| Cause | Solution |
|---|---|
| Using semantic search only | Switch to search_type: "hybrid" |
| Threshold too high | Lower to 0.3-0.5 for hybrid search |
| Old embedding model | Ensure using text-embedding-3-large |
"BM25 search returns no results"
| Cause | Solution |
|---|---|
| Sparse vectors not indexed | Re-index documents with hybrid vectors |
| Query too short | BM25 needs at least 1 meaningful term |
| Collection missing BM25 config | Recreate collection with sparse vectors |
"Hybrid search slower than semantic"
| Cause | Solution |
|---|---|
| Large prefetch limit | Reduce from 2x to 1.5x if needed |
| Many documents | Add document_id filters |
| Network latency | Consider Qdrant colocation |
"Inconsistent scores across deployments"
| Cause | Solution |
|---|---|
| Different embedding models | Standardize on text-embedding-3-large |
| Collection recreation | Re-embed all documents together |
| Version mismatch | Ensure Qdrant version consistency |
Performance Considerations
Latency Comparison
| Search Type | Avg Latency | Notes |
|---|---|---|
| Semantic | ~50ms | Single vector search |
| Lexical | ~30ms | BM25 is very fast |
| Hybrid | ~80ms | Two searches + fusion |
Optimization Tips
- Use document filters: Filter by
document_idswhen possible - Limit prefetch: Default 2x is usually sufficient
- Cache embeddings: Query embeddings can be cached for repeated searches
- Batch searches: Use
search_by_multiple_documentsfor efficiency
Quick Reference
Search Type Selection
Recommended Defaults
| Setting | Value | Rationale |
|---|---|---|
| Search Type | hybrid | Best overall performance |
| Score Threshold | 0.5 | Balanced precision/recall |
| Limit | 5-10 | Sufficient context for LLMs |
| Embedding Model | text-embedding-3-large | Multilingual support |
API Quick Start
# Hybrid search (recommended)
curl -X POST "http://localhost:8009/api/v1/kb/search" \
-H "Content-Type: application/json" \
-d '{
"query": "vacation policy",
"search_type": "hybrid",
"limit": 10,
"score_threshold": 0.5
}'
Related Documentation
Appendix: Code References
Backend Files
| File | Purpose |
|---|---|
core/app/services/knowledge_base/retrieval_service.py | Search orchestration, filtering, reranking |
core/app/vector_store/vector_operations.py | Qdrant operations (semantic/lexical/hybrid) |
core/app/vector_store/qdrant_client.py | Qdrant connection, hybrid collection setup |
core/app/services/knowledge_base/embedding_service.py | OpenAI embedding generation |
core/app/schemas/knowledge_base.py | SearchType enum, request/response schemas |
core/app/api/v1/knowledge_base.py | REST API endpoint |
core/app/services/graph/tools/kb_tools.py | Agent flow KB tool integration |
Key Functions
| Function | Location | Purpose |
|---|---|---|
hybrid_search() | vector_operations.py:765 | RRF fusion search |
semantic_search() | vector_operations.py:628 | Dense vector search |
lexical_search() | vector_operations.py:697 | BM25 sparse search |
search() | retrieval_service.py:73 | High-level search router |
upsert_hybrid_vectors_batch() | qdrant_client.py:505 | Index documents with hybrid vectors |
Support
For issues or questions:
- Internal: Brain Station 23 AI Platform team
- Qdrant: Qdrant Documentation
- OpenAI: OpenAI API Documentation