Module 5 · Phase 3: Scale & interoperability · Weeks 12–14

Multi-Agent Systems & Frameworks

Frameworks enter — you've earned them by building everything by hand. LangGraph for stateful, checkpointed, resumable agent graphs; orchestrator-worker and handoff patterns; and the senior-level judgment call interviews probe hardest: when multi-agent is worth the coordination cost (usually it isn't).

After this module you can

▸Build a LangGraph StateGraph from memory: typed state, nodes, fixed and conditional edges, compile, invoke
▸Enable checkpointing so a graph can resume after a crash, time-travel to past states, and pause for humans
▸Implement human-in-the-loop interrupts that pause a graph mid-run and resume with human feedback
▸Implement orchestrator-worker and handoff patterns with structured briefs, not raw transcripts
▸Quantify error compounding and coordination cost, and argue when multi-agent is and isn't justified
▸Run a single-agent baseline comparison and report quality, cost, and latency honestly

Lessons

What Frameworks Actually Buy You

You built the loop, memory, retries, and tracing by hand in Modules 1–4. That was the point: now you can evaluate a framework's version of each instead of trusting it blindly. LangGraph's pitch in one sentence: your agent loop, reified as a graph with persistent state.

State, Nodes & Conditional Edges

The state schema is the most important design decision in a LangGraph system: it's the contract between every node, the thing the checkpointer persists, and — in multi-agent graphs — the communication channel between agents. Get it right and routing, fan-out, and debugging all get easier.

Checkpoints, Resume & Human-in-the-Loop

The checkpointer persists graph state after every step, keyed by thread ID. That one mechanism gives you crash recovery, time-travel debugging, and — combined with interrupts — humans who can approve or reject an agent's work days after the process exited.

Orchestrator-Workers, Handoffs & What Crosses the Boundary

The two structural patterns behind almost every multi-agent system — a central planner delegating to specialists, versus peers transferring control — and the design decision that determines whether either works: what actually gets passed between agents.

When Multi-Agent Is Worth It (Usually It Isn't)

The senior-engineer take interviews reward: multi-agent adds latency, cost, and compounding error rates, and most systems that ship as five agents should have shipped as one good agent. Learn the three legitimate justifications, the math of compounding failure, and how to run the baseline comparison that keeps you honest.

12 questions · pass ≥ 80%

Lab: Multi-Agent Research System with Single-Agent Baselineportfolio

Build a planner → parallel searchers → writer → critic research system in LangGraph that answers questions with a cited brief — checkpointed, resumable, with a human approval gate and logged structured handoffs — then benchmark it honestly against a single agent with the same tools. Starter code lives in labs/lab05-multi-agent/.

Best external resources

Curated reading, docs, and tools that pair with this module.

LangChain Academy — Intro to LangGraph

Free official course; do modules 1–4 before Lab 05.

Anthropic — How we built our multi-agent research system

Real production numbers on orchestrator-worker, incl. token cost honesty.

Cognition — Don't Build Multi-Agents

The counterargument. Read both sides; interviews reward the synthesis.

Hugging Face AI Agents Course

Free, certified; broad framework coverage (smolagents, LlamaIndex, LangGraph).

LangGraph reference docs

The API reference for Lab 05 — state, reducers, checkpointers, interrupts.