
๐ Search Gateway
Federated search across multiple knowledge sources with AI-powered reranking
Home Lab Project ยท 5 Providers ยท AI Search Mode
The Problem
My knowledge was scattered across multiple systems:
- ChromaDB - Vector embeddings from documents
- Content Processor - Extracted sections and metadata
- Knowledge Graph - Entities and relationships
- Work Suite - Notes and tasks
- Skills (LSK) - Structured knowledge units
Each had its own search API. Finding something meant checking multiple places and mentally merging results.
The Solution
Search Gateway federates search across all providers, merges results intelligently, and reranks using a cross-encoder model.
One query โ all sources โ unified, ranked results.
curl -X POST http://localhost:8700/api/v1/search \
-H "Content-Type: application/json" \
-d '{"query": "how does the agent memory system work?"}'
Returns results from all providers, deduplicated and ranked by relevance.
Key Features
- ๐ 5 Search Providers - ChromaDB, Content, KGraph, WorkSuite, LSK
- โก Parallel Fan-out - Query all providers simultaneously
- ๐ฏ Cross-Encoder Reranking - mxbai-rerank for precise ordering
- ๐ง AI Search Mode - LLM-powered tool calling for complex queries
- ๐ Query Enhancement - Acronym expansion, intent detection
- ๐พ Saved Searches - Save and rerun common queries
- ๐ Analytics - Track search patterns and performance
Architecture
flowchart TB
subgraph Client["Client"]
Q[Query]
end
subgraph Gateway["Search Gateway"]
direction TB
API[API Gateway]
E[Query Enhancer<br/>Acronyms, Intent]
O[Orchestrator]
M[Result Merger<br/>Dedup, Weight]
R[Reranker<br/>Cross-Encoder]
end
subgraph Providers["Search Providers"]
P1[ChromaDB<br/>Vectors]
P2[Content<br/>Documents]
P3[KGraph<br/>Entities]
P4[WorkSuite<br/>Notes]
P5[LSK<br/>Skills]
end
subgraph Backend["Backend"]
VR[Valet Runtime<br/>Reranking]
end
Q --> API --> E --> O
O -->|Parallel| P1 & P2 & P3 & P4 & P5
P1 & P2 & P3 & P4 & P5 --> M
M --> R --> VR
VR --> API
Search Flow
sequenceDiagram
participant C as Client
participant G as Gateway
participant E as Enhancer
participant P as Providers
participant R as Reranker
C->>G: Search query
G->>E: Enhance query
E-->>G: Expanded query + intent
par Fan-out
G->>P: ChromaDB search
G->>P: Content search
G->>P: KGraph search
end
P-->>G: Results (parallel)
G->>G: Merge & deduplicate
G->>R: Rerank with cross-encoder
R-->>G: Reranked results
G->>G: Apply boosts
G-->>C: Final ranked results
Providers
| Provider | Source | What It Searches |
|---|---|---|
| ChromaDB | Vector store | Semantic similarity on embeddings |
| Content | Content Processor | Documents, sections, metadata |
| KGraph | Knowledge Graph | Entities, relationships, facts |
| WorkSuite | Work Suite | Notes, tasks, bookmarks |
| LSK | Skills Registry | Structured knowledge units |
Each provider returns results in a standardized format with scores. The gateway handles the translation.
Query Enhancement
Before searching, queries are enhanced:
flowchart LR
Q[Raw Query] --> A[Acronym<br/>Expansion]
A --> I[Intent<br/>Detection]
I --> T[Term<br/>Extraction]
T --> EQ[Enhanced Query]
- Acronym Expansion: "API" โ "Application Programming Interface"
- Intent Detection: Categorize as lookup, comparison, how-to, etc.
- Term Extraction: Identify key entities and concepts
Reranking
Initial results are scored by each provider, but these scores aren't comparable. The gateway uses a cross-encoder to rerank:
flowchart LR
subgraph Initial["Initial Results"]
R1[Doc A: 0.85]
R2[Doc B: 0.72]
R3[Doc C: 0.91]
end
subgraph Reranker["Cross-Encoder"]
CE[mxbai-rerank<br/>Query + Doc pairs]
end
subgraph Final["Final Order"]
F1[Doc C: 0.94]
F2[Doc A: 0.87]
F3[Doc B: 0.61]
end
Initial --> CE --> Final
The cross-encoder considers both query and document text together, producing more accurate relevance scores.
AI Search Mode
For complex questions, AI Search uses LLM tool calling:
{
"query": "Compare the memory systems in Agent Platform vs Content Processor",
"max_tool_calls": 5,
"include_tool_trace": true
}
The LLM decides which searches to run, synthesizes results, and generates an answer with citations:
flowchart TB
Q[Complex Question] --> LLM[LLM Planner]
LLM --> T1[Tool: search_chromadb]
LLM --> T2[Tool: search_kgraph]
LLM --> T3[Tool: search_content]
T1 & T2 & T3 --> S[Synthesize]
S --> A[Answer + Sources]
Includes:
- Tool trace: See exactly which searches were run
- Source citations: Links back to original documents
- Confidence scoring: How certain the answer is
Caching
Results are cached to reduce load:
| Cache Type | TTL | Purpose |
|---|---|---|
| Query results | 5 min | Avoid repeated searches |
| Provider status | 30 sec | Health check caching |
| Rerank results | 10 min | Expensive cross-encoder calls |
Tech Stack
| Component | Technology | Why |
|---|---|---|
| API | FastAPI | Async, streaming, OpenAPI |
| Rate Limiting | SlowAPI | Protect AI endpoints |
| Reranking | mxbai-rerank via Valet | Cross-encoder accuracy |
| Metrics | Prometheus | Observability |
| WebSocket | FastAPI WS | Real-time updates |
| Cache | In-memory | Fast, stateless |
What I Learned
- Federation is powerful - Unified search across silos is a game-changer
- Reranking matters - Cross-encoders dramatically improve result quality
- Parallel is fast - Fan-out queries hide individual provider latency
- Intent helps - Knowing what kind of answer users want improves relevance
What's Next
- Hybrid search (keyword + semantic)
- Query suggestions / autocomplete
- Search history and personalization
- More provider integrations