RAG without hype: a production checklist we actually use

Most retrieval-augmented generation demos collapse the moment real users show up. These are the boring things we check before a RAG system is allowed anywhere near production.

Evaluation first

No eval set, no merge. We build a golden set of 200–500 Q&A pairs reviewed by a subject-matter expert before writing a single embedding. Every change is graded against it.

Retrieval quality

Hybrid search — dense + BM25 — with a fallback to keyword-only for exact product codes.
Chunking informed by the document type (code, policy, transcript), not a fixed 512 tokens.
Re-ranking with a small cross-encoder when latency budget allows.

Guardrails that matter

Input filtering, output filtering, and cost ceilings. Every call is observable: prompt, retrieved chunks, model response, cost, latency. We replay problem queries in CI.