View full AVP JSON.
,
claude-haiku-4-5tools shell, write, edit, computercontroller__web_scrape, computercontroller__pdf_toolWhen we saw how much Opus 4.8 cost, we decided to take a look at what the bottom shelf of the model aisle looked like. What resulted is a sort of recession-proof benchmark: how much hard work can a cheaper model accomplish, provided it's wrapped by a solid agent harness (Goose)?
So we reached for Claude Haiku 4.5 and gave it an extremely annoying PDF page to extract (a page from ParseBench, lifted straight from an arXiv paper). We tested two agent configurations: one that could see, via the pdf-vision MCP server, and one that could only read text, via Goose's built-in pdf_tool.
A few gentle spoilers on what we found, before you read on:
Every step below is recorded with the Agent Voyager Project (AVP), a free, open, platform-agnostic standard for capturing what an agent does. Numbers and quotes are verbatim from the trajectories, on claude-haiku-4-5.
This is page 7 of a 2012 econometrics paper, pulled from ParseBench. Four separate tables are crammed onto it. The one that matters is Table 7: two six-by-six correlation matrices stacked on top of each other, triangular, half the cells blank, and values like 0.47 [0.49] where two numbers share one cell.

The task we gave Goose was easy to state: download the page, rebuild it as an HTML table, do not get it wrong.
First, the obvious move. Goose's built-in PDF reader (pdf_tool, a pdfplumber wrapper) pulls the text off the page. Here is what it handed back.
… Mar c h FB 4 - 7.309 O 9 - 1.513 69.312 1531.360 7.270 Ta ble 7. C ross c or r e latio n c oe f fic ien ts fo r six C P I ti me se rie s a nd their fir s t diff e r e nc e s. Or i g inal se rie s include 1 24 r e a din g s, and th e ir f irst di ff e re n c e s 123 r e a din g s. F FB SE F V O R PR R SH O F 1 FB 0.99998 1 SE F V 0.99714 0.99671 1 O R PR 0.98356 0.98295 0.98702 1 R SH 0. 97533 0.97478 0.97736 0.99698 1 O 0.97752 0.97661 0.98664 0.95629 0.93924 1 d F d FB d SE F V d O R PR d R SH d O d F 1 d FB 0.994 1 d SE F V 0.47 [ 0. 49] 0.4 8 [ 0. 49] 1 …
Every table on the page, poured into one run-on stream. Aug ust. ORP R. ti me se rie s. No rows, no columns, no way to tell where one table ends and the next begins. Goose even tried to pull images for the structure and got back “No images found in PDF,” so it worked with the text. It rebuilt the matrices, re-read to check, and declared victory.
“All values match perfectly.”
Goose, onclaude-haiku-4-5, right before scoring 53%
It scored 53%, a clear fail, and nothing in the run flagged it: five turns and five cents, which made it the fastest run we recorded on the hardest page on the board. The score on its own just says “fail,” but the trajectory shows something worse, which is an agent that was confidently wrong and had no idea.
,
claude-haiku-4-5tools shell, write, edit, computercontroller__web_scrape, computercontroller__pdf_toolThe whole voyage, recorded: a quick, smooth, doomed five turns. Press play, then open “View full AVP JSON” to read the agent's own words at each step.
Download the PDF page at {input}, read it, rebuild it as a single HTML <table>, then re-read the original and verify your table matches before returning. Output only the HTML.Same model, same page. This time we swapped the built-in reader for pdf-vision, an MCP server whose headline trick is rendering the page to an image so the model can look at it. We even told Goose, in the prompt, to trust the picture over the text.
The eyes did not work. The image kept coming back empty, so Goose tried to see the page another way, and then kept trying as each new approach failed.
“Now let me render the page as an image to see the table structure.”
“Let me get the page image with file output so I can view it.”
“Now let me view the image using the computer controller.”
“Let me use computer controller to view the image.”
“Let me examine the PDF more carefully to identify which page contains the table.”
“Each Y position represents a row. Let me build the complete tables.”
“Let me take a completely different approach.”
“Perfect! The markdown format shows the tables clearly.”
That was eight attempts to look at a picture that was never going to load, one of which piped a PNG through base64 in the terminal in the hope of reading it by hand. When the eyes failed, Goose fell back to the same tool's other trick, layout-aware text, exporting the page as markdown with the structure the flat reader had destroyed.
Table 7. Cross correlation coefficients for six CPI time series and their first differences. ||F|FB|SEFV|ORPR|RSH|O| |---|---|---|---|---|---|---| |F|1|||||| |FB|0.99998|1||||| |SEFV|0.99714|0.99671|1|||| |ORPR|0.98356|0.98295|0.98702|1||| |RSH|0.97533|0.97478|0.97736|0.99698|1|| |O|0.97752|0.97661|0.98664|0.95629|0.93924|1| ||dF|dFB|dSEFV|dORPR|dRSH|dO| |---|---|---|---|---|---|---| |dF|1|||||| |dFB|0.994|1||||| |dSEFV|0.47 [0.49]|0.48 [0.49]|1|||| |dORPR|0.12 [0.26]|0.12 [0.26]|0.31 [0.35]|1||| |dRSH|0.13 [0.30]|0.12 [0.28]|0.10 [0.29]|0.31 [0.37]|1|| |dO|-0.18 [0.30]|-0.18 [0.28]|0.06 [0.29]|0.002 [-0.21]|0.04 [-0.29]|1|
That export had real rows and columns, the two matrices kept separate, and the bracketed cells intact. From there it was straightforward, and Goose finished the table.
Goose scored 100% and passed, on the one page on the board that plain text couldn't touch.
goose-vision,claude-haiku-4-5
,
claude-haiku-4-5tools shell, write, editmcp pdf-visionTwenty-four turns of the same run, recorded end to end. Watch how much longer this voyage is than the last one.
Reproduce the table from a PDF page as a single HTML <table>.
1. Download it: curl -sL '{input}' -o page.pdf
2. Render the page to an image with the pdf-vision get_page_image tool and LOOK at it. The image is the ground truth for the 2D layout: how many columns there are, which cells are merged or span rows, section-header rows, and which values visually share one cell (e.g. multiple holders inside a single cell). Use get_page_text only to copy exact text; trust the image for structure.
3. Rebuild as ONE HTML <table> that matches what you see: one <tr> per visual row, <th> for header cells, <td> for data, colspan/rowspan for merged cells. Keep column order and exact cell text. Do NOT split a value into extra columns or merge rows that are visually separate.
4. Verify against the image: the header column count and each row's cell count must match the page. Fix and redo if not. Do not submit a table you can see is wrong.
5. Output only the final HTML <table>.The model never once saw the page. The win came entirely from text, and specifically from the tool that kept the structure intact. pdf-vision is a misleading name for what actually rescued this run.
Haiku did not get smarter between the two runs. What changed is that the second time it had a tool worth being stubborn with, and a harness that refused to stop. Twenty-four turns, eight dead ends, one quiet pivot, a perfect answer. That is the harness wrangling a weaker model all the way to the finish.
The dollar figure says vision cost about 7x more. The score says vision won. Neither tells you the agent never used its eyes, or that the real hero was a markdown export it reached for on turn 22. AVP captures every step, every tool call, every line of the agent's own reasoning, in one open format. The gap between “vision won” and “persistence won, here is the exact turn” is the whole reason we record trajectories.
Cheaper models can do harder things than their price tag suggests, if the harness is good and you can see what it is doing. We are going to keep poking at that. More from the lab soon.