RAG Done Properly
RAG shows up in nearly half of agent take-home assignments — but building a pipeline is table stakes. The senior differentiator is *measuring* it: ingest → chunk → embed → index → retrieve → rerank → generate with citations, with an evaluation harness running from day one.
- ▸Explain what embeddings are, why cosine similarity finds meaning, and where dense retrieval structurally fails
- ▸Choose and defend a chunking strategy (fixed-size vs. structural, size, overlap) with your own numbers
- ▸Stand up Qdrant locally and implement hybrid search: BM25 + dense vectors fused with RRF
- ▸Add a cross-encoder reranking stage and explain the bi-encoder/cross-encoder trade-off
- ▸Apply query rewriting, decomposition, and HyDE when the user's question is a bad search query
- ▸Build a labeled eval set and report precision@k, recall@k, MRR, faithfulness, and answer relevance
Lessons
Best external resources
Curated reading, docs, and tools that pair with this module.
The eval framework for Lab 03 — faithfulness, relevance, context metrics.
DocsLocal vector DB for the lab; the hybrid-search tutorial is directly relevant.
DocsBest practitioner essays on RAG patterns and evaluation design.
EssayReranking implementation you'll use in the lab.
DocsPrepend LLM-generated context to chunks: −49% retrieval failures, −67% with reranking. Measured, replicable.
EssayFree 1–2 hr targeted fills: advanced retrieval, reranking, RAG evaluation.
Course