Skip to main content
← Back to the Journal
Product · RAG Ops·June 2026·9 min read

The weekly RAG scorecard before blaming the frontier model.

A product squad blamed GPT until telemetry showed retrieval missed the paragraph half the time — hallucination followed retrieval failure.

Nuqta complements five RAG metrics before blaming an LLM with a single Monday grid so operators cannot hide behind anecdotes [6].

What belongs on the worksheet.

Columns: Recall@k (gold chunk present), citation hit-rate on human-reviewed answers, latency p95 to first streamed token, drift rate when assistants answer sans citing corpora anchors [4]. Pair with retrieval architecture refresher via hybrid lexical + vector retrieval.

Rules that keep dashboards honest.

  • Freeze fifty user pain queries thirty days minimum before swapping embedding families [6].
  • Snapshot before/after release on identical queries — reroute incident energy if regressions originate upstream [6].
  • Lock corpus + query embedding revisions while measuring — otherwise false motion appears as progress.
Four numbers suffice when the query batch is sacred; thirty vanity metrics waste the room.
FIG. 1 — WEEKLY SCORECARD: RECALL@K · CITE · LATENCY · DRIFT

Caveat.

LLM versioning shifts outputs even when corpus static — annotate provider release IDs every Monday row [6].

The invitation.

Before next Monday noon, circulate the worksheet with fifty locked queries populated — delaying that ritual signals political fear of measurement.

Frequently asked questions.

  • Does this replace prior article? Adds operational cadence atop conceptual metrics [5].
  • Time cost? Half-day setup first week then fifteen-minute rotations weekly afterward [6].
  • Arabic nuance? Measure on your corpora not generic leaderboard corpora.
  • Blame attribution? Recall failures route to ingestion — drift routes to prompting or grounding policy [2][6].
  • Team size? PM + infra + taxonomy steward suffices under two hundred WAU pilots.

Sources.

[1] Lewis et al. — Dense Passage Retrieval.

[2] NIST — AI RMF governance hooks.

[3] Elasticsearch / Pinecone docs — fused retrieval references (vendor-neutral concept).

[4] Nuqta — frozen Arabic embedding QA protocol, April 2026.

[5] Nuqta — client scorecard appendix referenced here for governance boards.

[6] Nuqta — operational RAG scorecard bundle, June 2026.

Related posts

Share this article

← Back to the JournalNuqta · Journal