A prompt engineer shared a deceptively simple finding this week: swap screenshots for JSON when feeding context to a coding agent, and reliability goes up noticeably. The post, surfaced on Hacker News, pairs neatly with a new tool called Shotlist that asks agents to prove their work with real screenshots on the output side. Read together, they sketch a clean input-output discipline for coding agents that teams can adopt right now.
The pattern
The core idea is straightforward. When a coding agent needs to understand a UI state, a component tree, or an error condition, the instinct is to pass a screenshot. Screenshots are convenient but lossy: the model must OCR text, infer layout, and guess at interactive element boundaries. JSON encodes all of that explicitly. A button's label, its disabled state, its position in a hierarchy, its associated event handler, these become unambiguous key-value pairs rather than pixel clusters the model has to interpret.
Shotlist inverts the same logic for outputs. Instead of trusting an agent's text summary of what it did, Shotlist requires the agent to capture and attach real screenshots as evidence. The claim and the proof live in the same artifact.
Why now
Coding agents have matured past toy demos but trust is still the bottleneck. Teams shipping LLM features in 2026 are not asking "can the agent do this?" They are asking "how do I know it actually did this, correctly?" Both techniques address that question from opposite ends. Structured input reduces the surface area for misinterpretation. Verifiable output reduces the surface area for undetected failure. The agent tooling ecosystem is converging on auditability as the next frontier after raw capability.
How it works in practice
- Serialize your UI state before the prompt. Use your framework's accessibility tree, a JSON snapshot of component props, or a structured error object. Pass that as the context block instead of, or alongside, a screenshot.
- Define a schema for the agent's expected input. If the agent always receives the same JSON shape, you can validate inputs before they hit the model and catch upstream bugs early.
- Instrument Shotlist (or equivalent) on the output side. Require the agent to attach a screenshot or structured diff for any action that mutates state. Treat unverified outputs as unconfirmed.
- Log both artifacts together. The JSON input and the screenshot output form a paired record. Debugging a failed run becomes a diff exercise rather than a log archaeology expedition.
- Iterate on the schema, not just the prompt. When the agent makes a mistake, ask whether the input JSON was missing a field rather than immediately rewriting the prompt text. Schema gaps are often the root cause.
The trade-off
JSON serialization adds a step. Someone has to write the code that extracts component state and formats it correctly, and that extraction layer can itself introduce bugs or go stale as the UI changes. For highly visual tasks, like pixel-level layout review, screenshots may still carry information that no reasonable JSON schema captures. The technique is strongest for logic-heavy coding tasks, form validation, API integration, state management, where structure dominates over aesthetics. It is weaker for pure design review.
Shotlist adds latency and storage overhead. Screenshot evidence is not free. Teams need to decide which agent actions warrant proof and which are low-stakes enough to trust without it.
Where it goes next
The logical endpoint is a standardized agent context protocol where UI frameworks emit structured accessibility snapshots automatically, no manual serialization required. Browser vendors and frontend frameworks are already moving toward richer accessibility trees. When those mature, the "JSON instead of screenshots" technique becomes the default, not a manual optimization. On the verification side, expect screenshot diffing and visual regression to merge with agent audit logs as a first-class feature in agent orchestration platforms.
The teams winning with agents right now are the ones treating input quality and output verifiability as engineering problems, not prompt-writing problems.
READY TO ASCEND
Get AI news that respects your time
The signal, distilled. Curated AI news and prompt-engineering insight. No noise.