step-by-step
- accuracy
- 95%
- cost / run
- $0.33
- steps
- 2.5
The only configuration that asked the agent to check its work before submitting. Won on accuracy and cost less than baseline, because the agent didn't have to retry.
step-by-step · baseline · with-anthropic-pdf-skill · with-pdfplumber-mcp · terse-prompt · few-shotclaude-haiku-4-5-20251001, identical across all six configurations.parsebench-table·20260513-014902. SDK: claude-agent-sdk.step-by-stepThe only configuration that asked the agent to check its work before submitting. Won on accuracy and cost less than baseline, because the agent didn't have to retry.
baselineStructured prompt, no self-check. The control group.
with-anthropic-pdf-skillBaseline plus a bundled PDF skill in context. Same accuracy as baseline — the skill didn't move the needle here.
with-pdfplumber-mcpTerse prompt plus a specialized MCP server for table extraction. Cheapest run, but accuracy dropped — the model reading the PDF natively beat the specialized tool.
terse-promptOne sentence of guidance. The agent burned nearly twice the baseline cost figuring out a workflow on its own.
few-shotGiven a worked example to pattern-match against. The example anchored the agent on a structure that didn't fit every page. Worst accuracy, highest cost.