Get started

Install AVP

AVP is an open standard for running agents and recording what they do. The avp CLI installs, runs, and scores agents for you. You need one thing on your machine — Docker — and the CLI manages the rest. macOS and Linux.

1

Install and run Docker

Every agent run executes in a sandbox backed by a Docker daemon. Any of Docker Desktop, OrbStack, or colima works. Skip this if Docker is already running.

brew install --cask docker  # or: brew install colima docker && colima start
2

Install the avp CLI

curl -LsSf https://astral.sh/uv/install.sh | sh   # install uv, if you don't have it
uv tool install avp-cli                            # installs the `avp` command

That's it — avp is on your PATH. Run avp with no arguments any time to see the full command map.

3

Install an agent

Agents are prebuilt; the CLI fetches and installs them.

avp agent install goose
avp agent install claude-code   # optional; needs the claude CLI on PATH
avp agent list
4

Run your first eval

An eval is a JSON file: a dataset, a scorer, and the agent configs (“commissions”) to compare. avp init scaffolds one; avp eval run runs it and prints a ranked board.

export ANTHROPIC_API_KEY=sk-ant-...
avp init capitals --agent goose
avp eval run capitals.eval.json

Every avp eval / avp run executes the agent inside an isolated sandbox with a default-deny network allowlist — its writes stay in its workspace. The first run sets up the sandbox stack; later runs start in a couple of seconds.

Full reference, the four specs, and the conformance suite live in the repository. Want to see a run first? Set sail →