Architecture & Codebase Exploration
The capstone is the sum of every prior module: the agent loop from Lab 02, RAG-style retrieval, memory, evals, tracing, and HITL. First the architecture and scope; then the hardest sub-problem — finding the few relevant files in a repo far too large for the context window.
Hiring research is blunt about this: an issue-to-PR coding agent is the single strongest portfolio piece you can build. It exercises everything — planning, retrieval, tool use, sandboxing, test-driven iteration, evaluation, cost accounting, and human oversight — in one artifact that a reviewer can actually run. This module builds it, then converts it into interview capital.
The six stages
- Input: a GitHub issue URL (or a local issue file) — title, body, and any repro steps.
- Explore: map the repo, locate the relevant code, and state your understanding of the bug plus a plan. Checkpoint the plan.
- Implement: write the fix in a sandboxed workspace — a git worktree or container, never the real tree.
- Verify: run the repo's test suite, write at least one new test reproducing the issue (red → green), iterate on failures up to a bounded retry cap.
- Deliver: open a draft PR (or produce a patch + PR description) — gated on HITL approval showing the diff, test results, and cost.
- Observe & evaluate: full tracing and a per-issue cost report; run across an eval set and report results.
Architecture decisions worth defending
| Decision | Options | Sensible default |
|---|---|---|
| Orchestration | Hand-rolled agent loop vs. a graph framework (e.g. LangGraph) | Checkpointed plan either way; a framework earns its keep once you need durable resume across stages |
| Plan durability | In-memory vs. persisted checkpoints | Persist — a crash mid-fix shouldn't discard exploration work |
| Sandbox | Git worktree vs. container | Worktree for speed and simplicity on trusted repos; container when running untrusted code |
| Edit strategy | Search/replace vs. full-file rewrite | Search/replace by default (cheaper, safer diffs); covered next lesson |
Exploration strategy: agentic search over dumping
There are two ways to locate relevant code, and the agentic one usually wins for this task. Embedding-based retrieval (chunk the repo, embed, semantic search on the issue text) is fine for concept-level 'where is auth handled?' queries. But bugs are often about specific symbols, error strings, and call sites — where agentic grep-and-read shines: give the agent tools to search for symbols, list a directory, and read a file, and let it navigate the way a human engineer does. In practice you combine them: semantic search to seed candidates, then grep/read to confirm and expand.
import subprocess, pathlib
REPO = pathlib.Path("/sandbox/repo")
def search_symbol(pattern: str, max_results: int = 40) -> str:
"""Grep the repo for a symbol or error string. Ripgrep if available."""
try:
out = subprocess.run(
["rg", "-n", "--max-count", "3", pattern, str(REPO)],
capture_output=True, text=True, timeout=20,
).stdout
except FileNotFoundError:
out = subprocess.run(
["grep", "-rn", pattern, str(REPO)],
capture_output=True, text=True, timeout=20,
).stdout
lines = out.splitlines()[:max_results]
return "\n".join(lines) or "no matches"
def list_dir(rel: str = ".") -> str:
target = (REPO / rel).resolve()
# Constrain to the repo — never let the agent wander the filesystem.
if REPO not in target.parents and target != REPO:
return "error: path escapes the repo"
entries = sorted(p.name + ("/" if p.is_dir() else "") for p in target.iterdir())
return "\n".join(entries)
def read_file(rel: str, start: int = 1, end: int = 400) -> str:
target = (REPO / rel).resolve()
if REPO not in target.parents and target != REPO:
return "error: path escapes the repo"
if not target.is_file():
return f"error: {rel} is not a file"
text = target.read_text(errors="replace").splitlines()
window = text[start - 1:end]
# Line numbers help the model reference and later edit precisely.
return "\n".join(f"{i + start:>5} {ln}" for i, ln in enumerate(window))import json, pathlib
PLAN_PATH = pathlib.Path("/sandbox/plan.json")
def explore_and_plan(issue_text: str) -> dict:
system = (
"You are a senior engineer triaging a bug. Use search_symbol, "
"list_dir, and read_file to locate the relevant code. Do NOT guess "
"at file contents — read them. When confident, call record_plan with "
"the files you will edit, your understanding of the bug, and the fix "
"approach. Read only what you need; the repo is large."
)
messages = [{"role": "user", "content": f"Issue:\n{issue_text}"}]
while True:
resp = call_model(system, EXPLORE_TOOLS, messages) # search/list/read/record_plan
if resp.stop_reason != "tool_use":
continue
messages.append({"role": "assistant", "content": resp.content})
results = []
plan = None
for block in resp.content:
if block.type != "tool_use":
continue
if block.name == "record_plan":
plan = block.input # {"files":[...], "bug":"...", "fix":"..."}
else:
results.append(run_explore_tool(block))
if plan is not None:
# Checkpoint: survives a crash so W22's implement stage can resume.
PLAN_PATH.write_text(json.dumps(plan, indent=2))
return plan
messages.append({"role": "user", "content": results})- ▸An issue-to-PR agent is the strongest single portfolio piece; it exercises every prior module.
- ▸Scope narrowly and honestly ('simple bug-fix issues, Python repos <10k LOC') — precise scoping signals seniority.
- ▸Six stages: input → explore → implement (sandboxed) → verify → deliver (HITL) → observe/evaluate.
- ▸The context window can't hold the repo; exploration is a retrieval problem.
- ▸Prefer agentic grep-and-read (seeded by optional semantic search) — bugs are about specific symbols and call sites.
- ▸Constrain all file paths to the repo, window reads, and checkpoint the plan so implementation can resume.