Module 8 · Phase 5: Capstone & interview readiness · Weeks 21–26

Capstone: The Autonomous Coding Agent

The portfolio anchor. An autonomous software-development agent that takes a GitHub issue, explores the codebase, implements a fix in a sandbox, runs the tests, and opens a PR gated on human approval — shipped with eval results, a cost analysis, and an honest limitations doc. Everything from Modules 1–7 converges here, then you turn it into interview narratives.

After this module you can

▸Architect an issue-to-PR coding agent with checkpointed plans and a sandboxed workspace
▸Implement codebase exploration that locates relevant files without stuffing the whole repo into context
▸Choose and apply an edit strategy (search/replace vs. full-file) and defend the trade-off
▸Run a test-driven repair loop with bounded retries: reproduce (red) → fix → verify (green)
▸Gate PR creation behind human-in-the-loop approval showing diff, test results, and cost
▸Assemble a small SWE-bench-style local eval set and report success rate, a partial-success taxonomy, and cost/time per issue
▸Write a frank limitations doc that scopes the agent honestly
▸Turn the capstone into system-design answers and STAR behavioral stories at a senior bar

Lessons

Architecture & Codebase Exploration

The capstone is the sum of every prior module: the agent loop from Lab 02, RAG-style retrieval, memory, evals, tracing, and HITL. First the architecture and scope; then the hardest sub-problem — finding the few relevant files in a repo far too large for the context window.

Edit Strategies & the Test-Driven Repair Loop

Now the agent changes code and proves the change works. Search/replace versus full-file rewrites and why the choice matters; then the heart of the capstone: a red-to-green repair loop with bounded retries that writes a failing test, makes it pass, and never spins forever.

PR Etiquette, HITL Gating & Evaluating a Coding Agent

A green test suite isn't a merge. The agent produces a well-formed PR gated on human approval, then you measure whether the whole thing actually works — a small SWE-bench-style eval set you assemble yourself, with a partial-success taxonomy and cost/time per issue.

The Limitations Doc & Interview Readiness

Two things ship the capstone into a career. A frank limitations doc that scopes the agent honestly — the artifact seniors respect most — and turning the whole project into interview capital: system-design answers, STAR stories, and a take-home strategy for Gate G4.

12 questions · pass ≥ 80%

Lab: Capstone — Autonomous Coding Agent (Issue → Tested, HITL-Gated PR)portfolio

Build the portfolio anchor: an agent that takes a GitHub issue, explores the repo, implements a fix in a sandbox, writes a reproducing test and iterates to green with bounded retries, and opens a draft PR gated on human approval. Ship it with full tracing, a per-issue cost report, an evaluation across ≥10 issues with a partial-success taxonomy, and a frank limitations doc. Then turn it into Gate G4 interview material.

Best external resources

Curated reading, docs, and tools that pair with this module.

AI Engineering Field Guide

Real interview processes and take-home assignments from 50+ companies.

DataCamp — Top 30 Agentic AI Interview Questions

Cross-check your quiz mastery against an external bank.

The benchmark your capstone is a miniature of — mine it for eval-design ideas.

65% on SWE-bench verified in ~100 lines of Python. Read every line before building your capstone — it's the existence proof that simple works.

SWE-agent (Princeton/Stanford)

The research-grade version: agent-computer interface design, trajectories to study.