Module 3 · Phase 2: Knowledge & state · Weeks 6–8

RAG Done Properly

RAG shows up in nearly half of agent take-home assignments — but building a pipeline is table stakes. The senior differentiator is *measuring* it: ingest → chunk → embed → index → retrieve → rerank → generate with citations, with an evaluation harness running from day one.

After this module you can
  • Explain what embeddings are, why cosine similarity finds meaning, and where dense retrieval structurally fails
  • Choose and defend a chunking strategy (fixed-size vs. structural, size, overlap) with your own numbers
  • Stand up Qdrant locally and implement hybrid search: BM25 + dense vectors fused with RRF
  • Add a cross-encoder reranking stage and explain the bi-encoder/cross-encoder trade-off
  • Apply query rewriting, decomposition, and HyDE when the user's question is a bad search query
  • Build a labeled eval set and report precision@k, recall@k, MRR, faithfulness, and answer relevance

Lessons

Best external resources

Curated reading, docs, and tools that pair with this module.