Hybrid retrieval, not pure vector
BM25 catches exact-phrase queries that embeddings miss. Embeddings catch paraphrases. Use both; rerank.
Build a legal intake agent that collects matter details, checks conflicts, summarizes documents, and routes prospects safely.
┌──────────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ inbound │──▶ │ router / intent │──▶ │ retrieval (hybrid) │
│ (ticket, │ │ classify │ │ bm25 + embeddings │
│ webhook) │ └──────────────────┘ └─────────┬───────────┘
└──────────────┘ │ │
▼ ▼
┌─────────────────┐ ┌─────────────────────┐
│ guard / PII │ │ refusal classifier │
│ detect & │ ◀── │ grounded? cited? │
│ redact │ │ in scope? │
└─────────────────┘ └─────────┬───────────┘
│
▼
┌────────────────────┐
│ reply + citations │
│ emit trace + eval │
└────────┬───────────┘
│
▼
┌────────────────────┐
│ human handoff / │
│ ticket close │
└────────────────────┘Six steps. Each one has an eval. Each one logs a trace. The kill switch sits in front of the router, so we can stop the world at any point without a deploy.
BM25 catches exact-phrase queries that embeddings miss. Embeddings catch paraphrases. Use both; rerank.
A separate classifier decides “can this be answered grounded in our docs?” before the agent ever speaks.
No citation, no send. Every customer-visible answer references the doc section that authorized it. Audit is one click.
A flag that disables the route without a deploy. A rollout dial at 1%, 10%, 50%, 100%. The first time you’ll need it, you’ll be glad.