CODLIY · AI & ML

RAG without hype: a production checklist we actually use

April 24, 2026 · 1 min read · 16 claps
RAG without hype: a production checklist we actually use

Most retrieval-augmented generation demos collapse the moment real users show up. These are the boring things we check before a RAG system is allowed anywhere near production.

Evaluation first

No eval set, no merge. We build a golden set of 200–500 Q&A pairs reviewed by a subject-matter expert before writing a single embedding. Every change is graded against it.

Retrieval quality

  • Hybrid search — dense + BM25 — with a fallback to keyword-only for exact product codes.
  • Chunking informed by the document type (code, policy, transcript), not a fixed 512 tokens.
  • Re-ranking with a small cross-encoder when latency budget allows.

Guardrails that matter

Input filtering, output filtering, and cost ceilings. Every call is observable: prompt, retrieved chunks, model response, cost, latency. We replay problem queries in CI.

The quiet part

80% of RAG quality comes from the data pipeline, not the model.

Get the ingestion, normalization and eval loop right first. The rest is tuning.