Module 1: LLM API Mastery · Lesson 4 of 5 · 20 min

Structured Outputs & JSON Schema

When you need data, not prose: forcing model output to conform to a schema, and what to do when it doesn't.

Half of real-world LLM use isn't chat — it's extraction and classification: pull fields from an email, route a ticket, score a document. Downstream code needs types, not vibes. There are three levels of rigor for getting JSON out of a model.

ApproachHowGuarantee
Prompt & pray"Respond only with JSON…"None. Fine for prototypes only.
JSON moderesponse_format: {type: "json_object"} (OpenAI)Syntactically valid JSON — but any shape.
Schema-enforcedOpenAI structured outputs (json_schema, strict: true); or a forced tool call whose input_schema is your output schemaConforms to your schema via constrained decoding.
the tool-call trick for guaranteed structure (Anthropic)
# Define your desired OUTPUT as a tool schema, then force the model to "call" it.
resp = client.messages.create(
    model="claude-sonnet-4-5", max_tokens=1024,
    tools=[{
        "name": "record_ticket",
        "description": "Record the classified support ticket.",
        "input_schema": {
            "type": "object",
            "properties": {
                "category": {"type": "string",
                             "enum": ["billing", "bug", "feature_request", "other"]},
                "severity": {"type": "integer", "minimum": 1, "maximum": 5},
                "summary":  {"type": "string"},
            },
            "required": ["category", "severity", "summary"],
        },
    }],
    tool_choice={"type": "tool", "name": "record_ticket"},   # MUST call it
    messages=[{"role": "user", "content": f"Classify this ticket: {ticket_text}"}],
)
data = next(b for b in resp.content if b.type == "tool_use").input
# data is a dict matching the schema — no parsing prose
tool_choice forces the call, so the 'tool' is really just an output mold. This is the standard Anthropic pattern for structured extraction. On OpenAI, prefer native structured outputs with strict: true, which constrains decoding to the schema.

Validate anyway — and repair

validate with Pydantic, repair with a feedback retry
from pydantic import BaseModel, ValidationError, conint

class Ticket(BaseModel):
    category: str
    severity: conint(ge=1, le=5)
    summary: str

def extract(text: str, max_retries: int = 2) -> Ticket:
    prompt = f"Classify this ticket: {text}"
    for attempt in range(max_retries + 1):
        raw = call_model(prompt)            # your API call
        try:
            return Ticket.model_validate(raw)
        except ValidationError as e:
            # feed the error BACK to the model — it usually self-corrects
            prompt = (f"Classify this ticket: {text}\n"
                      f"Your previous output failed validation:\n{e}\n"
                      f"Return corrected JSON only.")
    raise RuntimeError("extraction failed after retries")
Order of mitigations for malformed output: (1) validate and retry with the error message included — cheapest fix, works most of the time; (2) tighten the schema/prompt (enums, strict mode, lower temperature); (3) fall back to a stronger model or a deterministic parser. Never silently json.loads and hope.
Tip
Tool calling vs. structured output — when to use which? Tool = the model needs information or effects mid-task (search, DB query), possibly several times. Structured output = you need the final answer in a shape (classification, extraction). If there's no action to perform, don't dress extraction up as an agent loop — force one schema'd output and be done.
Key takeaways
  • JSON mode guarantees syntax; schema enforcement (strict structured outputs / forced tool call) guarantees shape.
  • Constrain aggressively: enums, min/max, required — every constraint removes a failure mode.
  • Validate with Pydantic/Zod even when 'guaranteed'; retry with the validation error fed back.
  • Extraction ≠ agent. No action needed → one forced structured call.