Module 1: LLM API Mastery · Lesson 3 of 5 · 30 min
Tool Calling End-to-End
The mechanism that turns a text generator into something that can act. Crucial mental model: the model never executes anything — it emits structured JSON, and your code does the work.
◆ Key insight
Tool calling is just structured output plus a convention. The model generates JSON that matches a schema you provided; you run the corresponding function; you append the result to the messages; the model continues. The model has no network access, no filesystem, no side effects — you are its hands.
The four-step dance
- You send messages plus
tools: each tool has aname,description, and a JSON-schemainput_schemafor its parameters. - Model decides it needs a tool: the response contains a
tool_useblock (Anthropic) /tool_callsarray (OpenAI) with the tool name, generated arguments, and a uniqueid.stop_reasonis"tool_use". - You execute the actual function with those arguments, then append (a) the assistant message verbatim, and (b) a
tool_resultreferencing the sameid, with the output as a string. - Model continues — it may answer, or request another tool. Loop until
stop_reasonis"end_turn".
import json
import anthropic
client = anthropic.Anthropic()
TOOLS = [{
"name": "get_weather",
"description": (
"Get current weather for a city. Use whenever the user asks about "
"weather, temperature, or outdoor conditions. Returns Celsius."
),
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. 'Tokyo'"},
},
"required": ["city"],
},
}]
def get_weather(city: str) -> str:
return json.dumps({"city": city, "temp_c": 21, "sky": "clear"}) # stub
messages = [{"role": "user", "content": "Should I bike to work in Tokyo today?"}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-5", max_tokens=1024,
tools=TOOLS, messages=messages,
)
if resp.stop_reason != "tool_use":
print(resp.content[0].text)
break
# 1) append the assistant turn EXACTLY as returned
messages.append({"role": "assistant", "content": resp.content})
# 2) run every requested tool, append results
results = []
for block in resp.content:
if block.type == "tool_use":
output = get_weather(**block.input) # your code acts
results.append({
"type": "tool_result",
"tool_use_id": block.id, # must match!
"content": output,
})
messages.append({"role": "user", "content": results})Two invariants trip everyone up: the assistant message containing
tool_use must be resent verbatim, and every tool_result must reference a real tool_use_id from the immediately preceding assistant turn. Return a result for a tool that was never called (or drop one that was) and the API rejects the request with a 400 — the strict pairing is how the model keeps causality straight.OpenAI's shape, for comparison
resp = openai_client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=[{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": { # 'parameters', not 'input_schema'
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}],
)
msg = resp.choices[0].message
if msg.tool_calls:
messages.append(msg) # assistant turn, verbatim
for tc in msg.tool_calls:
args = json.loads(tc.function.arguments) # arrives as a STRING
messages.append({
"role": "tool", # dedicated role
"tool_call_id": tc.id,
"content": get_weather(**args),
})Key differences: OpenAI nests schemas under
function.parameters, arguments arrive as a JSON string you must parse (and which can be malformed — validate!), and results use a dedicated tool role rather than a block inside a user message. The concepts are identical; only the plumbing differs.⚠ Tool descriptions are prompts
The model chooses tools by reading their names and descriptions — nothing else. A bad description (
"weather tool") yields wrong tool choices and garbage arguments. A good one says what the tool does, when to use it, what it returns, and its units/limits. Anthropic's own guidance: extremely detailed descriptions are the single highest-leverage factor in tool-use quality.Key takeaways
- ▸The model requests; your code executes. All side effects are yours.
- ▸Loop on
stop_reason == "tool_use"; resend assistant turns verbatim; matchtool_use_idexactly. - ▸Multiple tool calls can arrive in one turn — answer all of them.
- ▸Tool errors go back as
tool_resultcontent (withis_error: trueon Anthropic) so the model can recover. - ▸Invest in tool descriptions like you invest in prompts — they are prompts.