Production-Quality MCP Server
Wrap a real API you actually use (Pacvue-adjacent, GitHub, Jira, or similar) in a Python MCP server with task-level tools, a resource, and a prompt — tested at three layers, hardened against bad inputs, with a sandboxed run_python tool. This is the artifact Gate G3's practical test attacks live. Starter code lives in labs/lab06-mcp-server/.
What you're building
A stdio MCP server that a stranger could clone, configure with their own credentials, and connect to Claude Desktop or Claude Code by following your README alone. At least four task-level tools (not endpoint mirrors), one resource, one prompt. Every tool description states purpose, arguments, output shape, and when not to use it. Large results paginate with explicit more-available signals; every error path returns actionable text; one destructive tool hides behind two-phase confirm; and a run_python tool executes agent code in a locked-down Docker container. Then you prove all of it with tests. Pick an API you genuinely use — a real one forces the real design decisions (which five tasks matter, what a shaped summary looks like, where pagination bites); GitHub or Jira work if work APIs are off-limits, but avoid toys — "production-quality weather wrapper" is an oxymoron on a portfolio.
Suggested structure
# server.py
import os
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my-api-server")
API_KEY = os.environ.get("MYAPI_KEY")
if not API_KEY:
raise SystemExit("MYAPI_KEY not set -- see README configuration.")
@mcp.tool()
def search_things(query: str, status: str = "open",
page: int = 1, page_size: int = 20) -> str:
"""One-paragraph purpose. Output shape. When NOT to use this.
Pagination contract: page_size max 50; response states if more exist."""
# TODO: call the real API; shape results into lines;
# append MORE AVAILABLE trailer when truncated;
# branch on 401/404/429 -> instructive error strings
...
@mcp.tool()
def dangerous_thing(item_id: str, confirm: bool = False) -> str:
"""DESTRUCTIVE. confirm=False returns a preview only."""
...
@mcp.tool()
def run_python(code: str, timeout_s: int = 30) -> str:
"""Sandboxed execution: docker run --rm --network none
--memory 512m, timeout, non-root, read-only. See Lesson 5."""
...
@mcp.resource("myapi://reference/statuses")
def statuses() -> str: ...
@mcp.prompt()
def investigate(item_id: str) -> str: ...
if __name__ == "__main__":
mcp.run() # stdio
# tests/test_tools.py -- unit: tool logic with the API mocked
# tests/test_protocol.py -- integration: stdio_client + ClientSession,
# initialize, list_tools, call each tool,
# assert on error strings for bad IDs
# tests/test_sandbox.py -- prove "import socket; ...connect..." FAILS,
# and that a 60s sleep is killed by timeout- ☐Python MCP server with ≥4 tools, ≥1 resource, ≥1 prompt; runs over stdio; connects to a real client (Claude Desktop or Claude Code)
- ☐Tools are task-level, not endpoint mirrors; every description states purpose, arguments, output shape, and when not to use it
- ☐Large results are paginated with explicit "more available" signals; all errors return as actionable messages (tested: wrong ID, expired auth, rate limit)
- ☐Credentials come via env vars; one destructive tool is gated behind a confirm: true parameter, documented
- ☐Test suite: unit tests for tool logic plus an integration test speaking the MCP protocol
- ☐A run_python sandboxed-execution tool — Docker container, no network, 512MB/30s limits — with a test proving socket-connect code fails
- ☐README with client setup instructions someone else could follow cold
- ◇Serve the same server over streamable HTTP behind bearer-token auth, and document the trust-model differences from stdio in the README
- ◇Add response-budget telemetry: log tokens-returned per tool call, and use a week of your own usage to tune page sizes and truncation caps
- ◇Swap the Docker sandbox for a hosted sandbox service (E2B or similar) behind the same tool interface, proving the isolation tests still pass unchanged
Be honest — the gates only mean something if the criteria really pass.