Hybrid Search Implementation Guide

NeuraFlow implements a hybrid search system that combines dense vector semantic search with BM25 lexical search using Reciprocal Rank Fusion (RRF). This approach provides consistent retrieval quality across varying query lengths and supports multilingual content.

Overview

Architecture Components

Problem Statement

The Challenge with Pure Semantic Search

When using text-embedding-3-large for semantic search, we observed inconsistent similarity scores:

Query Type	Example	Semantic Score Range
Single word	"pricing"	0.10 - 0.20
Short phrase	"return policy"	0.15 - 0.25
Full question	"What is the return policy for electronics?"	0.25 - 0.40

Problems:

Inconsistent thresholds: A threshold of 0.3 would filter out valid short queries but be too permissive for long queries
Poor precision on short queries: Single-word queries produced many false positives
Semantic drift: Embeddings sometimes captured related but irrelevant concepts

Why text-embedding-ada-002 Was Not the Solution

While text-embedding-ada-002 produced more consistent scores (similar to post-RRF scores), it was unsuitable for production due to:

Limitation	Impact
English-centric	Poor performance on non-English content
Legacy model	No future updates or improvements
Limited multilingual	Inconsistent cross-language retrieval

Business requirement: Support multilingual content and future localization capabilities required text-embedding-3-large.

Experimental Findings & Background

This section documents the empirical observations that led to the adoption of hybrid search.

Semantic Search Score Analysis with text-embedding-3-large

During testing with the text-embedding-3-large embedding model, we observed that semantic similarity scores were significantly lower than expected, particularly for shorter queries:

Query Length	Example Query	Observed Score Range	Retrieval Quality
1 word	"pricing"	0.1 - 0.2	Poor - many relevant results filtered out
2-3 words	"return policy"	0.15 - 0.25	Inconsistent
Short phrase	"how to return items"	0.20 - 0.30	Moderate
Full sentence	"What is the return policy for electronics?"	0.25 - 0.40	Better but still low
Long query	"Can you explain the process for returning defective electronics purchased online?"	0.30 - 0.45	Best, but threshold still problematic

The Threshold Dilemma

The inconsistent score ranges created a fundamental problem when selecting a retrieval threshold:

Threshold	Effect on Short Queries	Effect on Long Queries
0.15	Retrieves relevant content	Too permissive, includes noise
0.25	Filters out valid results	Still somewhat permissive
0.35	Most results filtered out	Good precision
0.40	Almost nothing retrieved	Misses some relevant content

Conclusion: No single threshold could provide consistent retrieval quality across varying query lengths.

Comparison with text-embedding-ada-002

Interestingly, the legacy text-embedding-ada-002 model produced scores that were more aligned with expected retrieval behavior:

Query Type	ada-002 Score	3-large Score	Difference
1 word	0.50 - 0.65	0.10 - 0.20	-0.40
Short phrase	0.55 - 0.70	0.20 - 0.30	-0.35
Full sentence	0.60 - 0.80	0.30 - 0.45	-0.30

The text-embedding-ada-002 scores were similar to what we later achieved with hybrid search + RRF fusion. However, switching back to ada-002 was not viable because:

Multilingual retrieval: text-embedding-3-large provides superior performance for non-English content
Future localization: Product roadmap includes support for multiple languages
Model longevity: ada-002 is a legacy model with no future improvements expected
Cross-lingual search: text-embedding-3-large enables searching across language boundaries

The Hybrid Search Solution

By introducing hybrid search with BM25 lexical matching and RRF fusion, the revised scores became much more aligned with accurate content retrieval:

Query Type	Pure Semantic (3-large)	Hybrid + RRF	Improvement
1 word	0.10 - 0.20	0.50 - 0.70	+300%
Short phrase	0.20 - 0.30	0.55 - 0.72	+175%
Full sentence	0.30 - 0.45	0.60 - 0.80	+80%

Key Finding: Universal Threshold

After implementing hybrid search, a threshold of 0.5 became universally capable of retrieving accurate and relevant content across all query types:

Summary of Findings

Aspect	Finding
Root cause	`text-embedding-3-large` produces lower absolute similarity scores than `ada-002`
Impact	Threshold selection was query-length dependent
Solution	Hybrid search (semantic + BM25) with RRF fusion
Result	Consistent scores enabling a universal 0.5 threshold
Trade-off accepted	Slight increase in query latency (~30ms) for significantly improved consistency
Why not ada-002	Required `text-embedding-3-large` for multilingual support and future localization

Solution: Hybrid Search with RRF

How Hybrid Search Solves the Problem

Results after implementing hybrid search:

Query Type	Semantic Score	RRF Fused Score	Improvement
Single word	0.10 - 0.20	0.50 - 0.70	+250%
Short phrase	0.15 - 0.25	0.55 - 0.75	+200%
Full question	0.25 - 0.40	0.60 - 0.85	+100%

Key benefit: A universal threshold of 0.5 now works consistently across all query types.

Search Types

NeuraFlow supports three search modes:

Search Type Comparison

Feature	Semantic	Lexical	Hybrid
Best for	Conceptual queries	Exact keyword matches	General use
Query length	Longer queries	Any length	Any length
Synonym handling	Excellent	None	Good
Typo tolerance	Good	Poor	Moderate
Speed	Fast	Very fast	Moderate
Recommended	Conceptual search	Known keywords	Default

Technical Implementation

Reciprocal Rank Fusion (RRF)

RRF combines rankings from multiple search methods without requiring score normalization:

RRF Formula:

RRF_score(d) = Σ 1 / (k + rank_i(d))

Where:

d = document
k = ranking constant (typically 60)
rank_i(d) = rank of document in search method i

Qdrant Named Vectors

The collection uses named vectors for hybrid search:

# Collection Configuration
vectors_config = {
    "dense": VectorParams(
        size=3072,  # text-embedding-3-large
        distance=Distance.COSINE,
    )
}

sparse_vectors_config = {
    "bm25": SparseVectorParams(
        modifier=Modifier.IDF,  # Required for BM25 scoring
    )
}

Document Indexing

Each document chunk is stored with both dense and sparse vectors:

PointStruct(
    id=point_id,
    vector={
        "dense": [0.123, 0.456, ...],  # 3072-dim embedding
        "bm25": Document(
            text="chunk text content",
            model="Qdrant/bm25",  # Server-side tokenization
        ),
    },
    payload={
        "document_id": "uuid",
        "chunk_index": 0,
        "text": "chunk content",
        "metadata": {...}
    }
)

Search Flow

API Reference

Search Endpoint

POST /api/v1/kb/search
Content-Type: application/json

{
  "query": "What is the vacation policy?",
  "search_type": "hybrid",
  "limit": 10,
  "score_threshold": 0.5,
  "document_ids": ["uuid-1", "uuid-2"]
}

Request Parameters

Field	Type	Default	Description
`query`	string	required	Search query (1-500 characters)
`search_type`	enum	`"semantic"`	`"semantic"`, `"lexical"`, or `"hybrid"`
`limit`	int	10	Max results (1-50)
`score_threshold`	float	0.0	Min score filter (0.0-1.0)
`document_ids`	UUID[]	null	Filter by specific documents

Response Format

{
  "query": "What is the vacation policy?",
  "results": [
    {
      "document_id": "550e8400-e29b-41d4-a716-446655440000",
      "chunk_index": 5,
      "text": "Employees are entitled to 20 days of paid vacation per year...",
      "score": 0.72,
      "metadata": {
        "filename": "employee_handbook.pdf",
        "page": 15
      }
    }
  ],
  "total_results": 1
}

Search Type Enum

class SearchType(str, Enum):
    SEMANTIC = "semantic"  # Dense vector similarity
    LEXICAL = "lexical"    # BM25 keyword search
    HYBRID = "hybrid"      # Combined with RRF fusion

Agent Flow Integration

KB Tools Configuration

The knowledge base tool in agent flows uses hybrid search by default:

# core/app/services/graph/tools/kb_tools.py

def create_kb_tools(
    kb_service,
    user_id: Optional[UUID] = None,
    document_ids: Optional[List[str]] = None,
    score_threshold: float = 0.3,
    limit: int = 5,
    search_type: str = "hybrid",  # Default to hybrid
):

Search Type in Agent Nodes

Agent nodes can specify search type in their configuration:

{
  "type": "agent",
  "data": {
    "name": "Support Agent",
    "knowledgeBases": [
      {
        "id": "doc-uuid",
        "title": "Employee Handbook"
      }
    ],
    "searchType": "hybrid",
    "scoreThreshold": 0.5
  }
}

Configuration

Environment Variables

# Embedding Model Configuration
OPENAI_EMBEDDING_MODEL=text-embedding-3-large

# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=<optional-api-key>
QDRANT_COLLECTION_NAME=knowledge_base

# Default Search Settings (optional)
KB_DEFAULT_SEARCH_TYPE=hybrid
KB_SCORE_THRESHOLD=0.5
KB_SEARCH_LIMIT=10

Embedding Model Options

Model	Dimensions	Multilingual	Recommended
`text-embedding-3-large`	3072	Excellent	Yes
`text-embedding-3-small`	1536	Good	Budget
`text-embedding-ada-002`	1536	Limited	Legacy

Score Threshold Guidelines

Use Case	Threshold	Rationale
High recall (chatbots)	0.3 - 0.4	Include borderline relevant results
Balanced	0.5	Recommended default
High precision	0.6 - 0.7	Only highly relevant results
Strict matching	0.8+	Near-exact matches only

Developer Setup

Prerequisites

Requirement	Details
Qdrant	v1.7+ with hybrid search support
OpenAI API	Access to embedding models
Python	3.11+

Step 1: Configure Qdrant Collection

The collection is created automatically with hybrid support:

# Automatic collection creation with hybrid vectors
qdrant_client._create_hybrid_collection(
    collection_name="knowledge_base",
    vector_size=3072,  # text-embedding-3-large
    distance=Distance.COSINE,
    enable_hybrid=True,
)

Step 2: Index Documents

Documents are indexed with both dense and sparse vectors:

# Document processing pipeline
Parse document → chunks
Generate embeddings (text-embedding-3-large)
Upsert hybrid vectors (dense + BM25 text)

Step 3: Query with Hybrid Search

from app.services.knowledge_base.retrieval_service import RetrievalService

retrieval = RetrievalService()

# Hybrid search (recommended)
results = await retrieval.search(
    query="vacation policy",
    limit=10,
    score_threshold=0.5,
    search_type="hybrid",
)

Troubleshooting

"Low scores on short queries"

Cause	Solution
Using semantic search only	Switch to `search_type: "hybrid"`
Threshold too high	Lower to 0.3-0.5 for hybrid search
Old embedding model	Ensure using `text-embedding-3-large`

"BM25 search returns no results"

Cause	Solution
Sparse vectors not indexed	Re-index documents with hybrid vectors
Query too short	BM25 needs at least 1 meaningful term
Collection missing BM25 config	Recreate collection with sparse vectors

"Hybrid search slower than semantic"

Cause	Solution
Large prefetch limit	Reduce from 2x to 1.5x if needed
Many documents	Add document_id filters
Network latency	Consider Qdrant colocation

"Inconsistent scores across deployments"

Cause	Solution
Different embedding models	Standardize on `text-embedding-3-large`
Collection recreation	Re-embed all documents together
Version mismatch	Ensure Qdrant version consistency

Performance Considerations

Latency Comparison

Search Type	Avg Latency	Notes
Semantic	~50ms	Single vector search
Lexical	~30ms	BM25 is very fast
Hybrid	~80ms	Two searches + fusion

Optimization Tips

Use document filters: Filter by document_ids when possible
Limit prefetch: Default 2x is usually sufficient
Cache embeddings: Query embeddings can be cached for repeated searches
Batch searches: Use search_by_multiple_documents for efficiency

Quick Reference

Search Type Selection

Recommended Defaults

Setting	Value	Rationale
Search Type	`hybrid`	Best overall performance
Score Threshold	`0.5`	Balanced precision/recall
Limit	`5-10`	Sufficient context for LLMs
Embedding Model	`text-embedding-3-large`	Multilingual support

API Quick Start

# Hybrid search (recommended)
curl -X POST "http://localhost:8009/api/v1/kb/search" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "vacation policy",
    "search_type": "hybrid",
    "limit": 10,
    "score_threshold": 0.5
  }'

Appendix: Code References

Backend Files

File	Purpose
`core/app/services/knowledge_base/retrieval_service.py`	Search orchestration, filtering, reranking
`core/app/vector_store/vector_operations.py`	Qdrant operations (semantic/lexical/hybrid)
`core/app/vector_store/qdrant_client.py`	Qdrant connection, hybrid collection setup
`core/app/services/knowledge_base/embedding_service.py`	OpenAI embedding generation
`core/app/schemas/knowledge_base.py`	SearchType enum, request/response schemas
`core/app/api/v1/knowledge_base.py`	REST API endpoint
`core/app/services/graph/tools/kb_tools.py`	Agent flow KB tool integration

Key Functions

Function	Location	Purpose
`hybrid_search()`	`vector_operations.py:765`	RRF fusion search
`semantic_search()`	`vector_operations.py:628`	Dense vector search
`lexical_search()`	`vector_operations.py:697`	BM25 sparse search
`search()`	`retrieval_service.py:73`	High-level search router
`upsert_hybrid_vectors_batch()`	`qdrant_client.py:505`	Index documents with hybrid vectors

Support

For issues or questions:

Internal: Brain Station 23 AI Platform team
Qdrant: Qdrant Documentation
OpenAI: OpenAI API Documentation

Overview​

Architecture Components​

Problem Statement​

The Challenge with Pure Semantic Search​

Why text-embedding-ada-002 Was Not the Solution​

Experimental Findings & Background​

Semantic Search Score Analysis with text-embedding-3-large​

The Threshold Dilemma​

Comparison with text-embedding-ada-002​

The Hybrid Search Solution​

Key Finding: Universal Threshold​

Summary of Findings​

Solution: Hybrid Search with RRF​

How Hybrid Search Solves the Problem​

Search Types​

Search Type Comparison​

Technical Implementation​

Reciprocal Rank Fusion (RRF)​

Qdrant Named Vectors​

Document Indexing​

Search Flow​

API Reference​

Search Endpoint​

Request Parameters​

Response Format​

Search Type Enum​

Agent Flow Integration​

KB Tools Configuration​

Search Type in Agent Nodes​

Configuration​

Environment Variables​

Embedding Model Options​

Score Threshold Guidelines​

Developer Setup​

Prerequisites​

Step 1: Configure Qdrant Collection​

Step 2: Index Documents​

Step 3: Query with Hybrid Search​

Troubleshooting​

"Low scores on short queries"​

"BM25 search returns no results"​

"Hybrid search slower than semantic"​

"Inconsistent scores across deployments"​

Performance Considerations​

Latency Comparison​

Optimization Tips​

Quick Reference​

Search Type Selection​

Recommended Defaults​

API Quick Start​

Related Documentation​

Appendix: Code References​

Backend Files​

Key Functions​

Support​