> ## Documentation Index
> Fetch the complete documentation index at: https://docs.browserbase.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluate browser agents with Prime Intellect

> Run browser agent evaluations using BrowserEnv and the Prime CLI.

Use `BrowserEnv` with the Prime CLI to evaluate browser agents on structured tasks. Each evaluation run spins up Browserbase sessions, feeds observations to your model, and collects reward signals — giving you reproducible benchmarks for browser-capable models.

<Frame>
  <img src="https://mintcdn.com/browserbase/1xxeq1pLS7jUKujV/images/integrations/prime/eval-run.png?fit=max&auto=format&n=1xxeq1pLS7jUKujV&q=85&s=7577980b88c1f2cdd7f74fef34df5665" alt="Prime Intellect evaluation run showing browser agent rollouts with reward signals" width="2574" height="1498" data-path="images/integrations/prime/eval-run.png" />
</Frame>

## Prerequisites

<CardGroup cols={3}>
  <Card title="Browserbase account" icon="key" href="https://www.browserbase.com/overview">
    API key from your Browserbase dashboard
  </Card>

  <Card title="Prime CLI" icon="terminal" href="https://docs.primeintellect.ai/tutorials-environments/evaluating">
    Install via `uv add prime`
  </Card>

  <Card title="verifiers" icon="box" href="https://github.com/PrimeIntellect-ai/verifiers">
    Install with browser extras: `uv add verifiers[browser]`
  </Card>
</CardGroup>

## Install and configure

### Set Browserbase credentials

Export your Browserbase credentials so `BrowserEnv` can create sessions:

```bash theme={null}
export BROWSERBASE_API_KEY=your_browserbase_api_key
```

### Install the Prime CLI

```bash theme={null}
uv add prime
prime login
```

### Install verifiers with browser support

```bash theme={null}
uv add verifiers[browser]
```

## Choose a BrowserEnv mode

`BrowserEnv` supports two observation/action modes. The mode is selected when you run an evaluation — either through the environment's default or via `-a` args.

### DOM mode (recommended)

The agent receives structured DOM content and issues natural language instructions via [Stagehand](https://docs.stagehand.dev/) tools (`navigate`, `observe`, `act`, `extract`). This is the default and works well for most browser tasks.

```bash theme={null}
prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY
```

### CUA mode

The agent receives screenshots and uses coordinate-based tool calls (`click`, `type_text`, `scroll`, `screenshot`). Use this for vision models trained on screenshot-grounded interaction.

```bash theme={null}
prime eval run browser-cua-example -m anthropic/claude-opus-4.5 -k PRIME_API_KEY
```

<Info>
  CUA mode deploys a sandbox server by default to handle connection to Browserbase's custom CDP driver, [Understudy](https://github.com/browserbase/stagehand/tree/main/packages/core/lib/v3/understudy), which overcomes performance limitations of Playwright. You can also run against a local server with `-a '{"use_sandbox": false}'`. See [Operational Notes](#operational-notes) below.
</Info>

## Run an evaluation

### Install a hub environment

Install a published Browserbase environment from the Prime hub:

```bash theme={null}
prime env install browser-dom-example
```

### Run with default settings

```bash theme={null}
prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY
```

<Frame>
  <img src="https://mintcdn.com/browserbase/q8GesOQFytbF4QXg/images/integrations/prime/prime-eval-screenshot.png?fit=max&auto=format&n=q8GesOQFytbF4QXg&q=85&s=3e29fdc5744c98ffc1c5746ca89a7d61" alt="CLI output from a browser-dom-example evaluation run showing tool calls, reward, and metrics" width="2300" height="1678" data-path="images/integrations/prime/prime-eval-screenshot.png" />
</Frame>

### Override evaluation parameters

Control the number of examples, rollouts, and environment-specific args:

```bash theme={null}
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -n 10 \
  -r 2
```

| Flag                     | Short | Description                                                            |
| ------------------------ | ----- | ---------------------------------------------------------------------- |
| `--model`                | `-m`  | Model to evaluate (e.g. `openai/gpt-4.1`, `anthropic/claude-opus-4.5`) |
| `--api-key-var`          | `-k`  | Environment variable name for the model API key                        |
| `--num-examples`         | `-n`  | Number of task examples to evaluate                                    |
| `--rollouts-per-example` | `-r`  | Rollouts per example                                                   |
| `--env-args`             | `-a`  | JSON args passed to the environment's `load_environment()`             |
| `--max-concurrent`       | `-c`  | Max concurrent requests                                                |
| `--save-results`         | `-s`  | Save results to disk                                                   |

### Pass environment args

Use `-a` to pass JSON arguments to the environment. These are forwarded to the `load_environment()` function:

```bash theme={null}
# DOM mode with custom Stagehand model and max turns
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -a '{"max_turns": 20, "stagehand_model": "openai/gpt-4.1"}'

# CUA mode with proxies and Verified
prime eval run browser-cua-example \
  -m anthropic/claude-opus-4.5 \
  -k PRIME_API_KEY \
  -a '{"proxies": true, "verified": true}'
```

### Run a published benchmark

Browserbase publishes browser benchmarks on the Prime hub:

```bash theme={null}
# Mind2Web benchmark
prime eval run browserbase/mind2web \
  -m anthropic/claude-opus-4.5 \
  -r 1 -n 10 \
  -a '{"max_turns": 50, "proxies": true, "verified": true}'

# WebVoyager benchmark
prime eval run browserbase/webvoyager \
  -m anthropic/claude-opus-4.5 \
  -r 1 -n 4 \
  -a '{"max_turns": 5, "proxies": true, "verified": true}'
```

### Run from a local environment

If your environment lives in a local directory:

```bash theme={null}
prime eval run ./my_browser_env -m openai/gpt-4.1 -k PRIME_API_KEY
```

## Operational notes

<AccordionGroup>
  <Accordion title="CUA Mode: Sandbox vs Local Server" icon="display">
    By default, CUA mode deploys a sandbox server using a pre-built Docker image ([`deepdream19/cua-server:latest`](https://hub.docker.com/r/deepdream19/cua-server)) that exposes Browserbase's CDP framework, [Understudy](https://github.com/browserbase/stagehand/tree/main/packages/core/lib/v3/understudy). This is the recommended setup.

    For local development, you can run the CUA server yourself and disable the sandbox:

    ```bash theme={null}
    prime eval run browser-cua-example \
      -m openai/gpt-4.1 \
      -k PRIME_API_KEY \
      -a '{"use_sandbox": false, "server_url": "http://localhost:3000"}'
    ```
  </Accordion>

  <Accordion title="Browserbase Proxies and Verified" icon="shield">
    Enable [Proxies](/platform/identity/proxies) and [Verified](/platform/identity/overview) via environment args:

    ```bash theme={null}
    prime eval run browser-dom-example \
      -m openai/gpt-4.1 \
      -k PRIME_API_KEY \
      -a '{"proxies": true, "verified": true}'
    ```

    These are passed through to Browserbase session creation.
  </Accordion>

  <Accordion title="Environment Variables" icon="key">
    **DOM mode** requires:

    * `BROWSERBASE_API_KEY` — Browserbase API key
    * `MODEL_API_KEY` — API key for Stagehand's underlying model

    **CUA mode** requires:

    * `BROWSERBASE_API_KEY` — Browserbase API key
    * `PRIME_API_KEY` — Required when using sandbox mode (default). Set via `prime login` or as an env var.

    **CUA mode** optional:

    * `OPENAI_API_KEY` — Forwarded into the sandbox container if set
  </Accordion>
</AccordionGroup>

## Related resources

<CardGroup cols={2}>
  <Card title="Prime Intellect evaluation docs" icon="book" href="https://docs.primeintellect.ai/tutorials-environments/evaluating">
    Full documentation on Prime's evaluation workflow
  </Card>

  <Card title="Prime verifiers Environments" icon="book" href="https://github.com/PrimeIntellect-ai/verifiers">
    Source code and docs for verifiers environments
  </Card>

  <Card title="Browserbase getting started" icon="rocket" href="/welcome/getting-started">
    Core Browserbase documentation
  </Card>

  <Card title="RL Training Guide" icon="brain" href="/integrations/prime-intellect/rl-training">
    Wire BrowserEnv into Prime RL training workflows
  </Card>
</CardGroup>
