> ## Documentation Index
> Fetch the complete documentation index at: https://docs.browserbase.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluate browser agents with Prime Intellect

> Run browser agent evaluations using BrowserEnv and the Prime CLI.

Use `BrowserEnv` with the Prime CLI to evaluate browser agents on structured tasks. Each evaluation run spins up Browserbase sessions, feeds observations to your model, and collects reward signals, giving you reproducible benchmarks for browser-capable models.

<Frame>
  <img src="https://mintcdn.com/browserbase/1xxeq1pLS7jUKujV/images/integrations/prime/eval-run.png?fit=max&auto=format&n=1xxeq1pLS7jUKujV&q=85&s=7577980b88c1f2cdd7f74fef34df5665" alt="Prime Intellect evaluation run showing browser agent rollouts with reward signals" width="2574" height="1498" data-path="images/integrations/prime/eval-run.png" />
</Frame>

## Prerequisites

<CardGroup cols={3}>
  <Card title="Browserbase account" icon="key" href="https://www.browserbase.com/overview">
    API key from your Browserbase dashboard
  </Card>

  <Card title="Prime CLI" icon="terminal" href="https://docs.primeintellect.ai/tutorials-environments/evaluating">
    Install via `uv add prime`
  </Card>

  <Card title="verifiers" icon="box" href="https://github.com/PrimeIntellect-ai/verifiers">
    Install with browser extras: `uv add verifiers[browser]`
  </Card>
</CardGroup>

## Install and configure

### Set Browserbase credentials

Export your Browserbase credentials so `BrowserEnv` can create sessions:

```bash theme={null}
export BROWSERBASE_API_KEY=your_browserbase_api_key
```

### Install the Prime CLI

```bash theme={null}
uv add prime
prime login
```

### Install verifiers with browser support

```bash theme={null}
uv add verifiers[browser]
```

## Choose a BrowserEnv mode

`BrowserEnv` supports two observation/action modes. The mode is selected when you run an evaluation, either through the environment's default or via `-a` args.

### DOM mode (recommended)

The agent receives structured DOM content and issues natural language instructions via [Stagehand](https://docs.stagehand.dev/) tools (`navigate`, `observe`, `act`, `extract`). This is the default and works well for most browser tasks.

```bash theme={null}
prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY
```

### CUA mode

The agent receives screenshots and uses coordinate-based tool calls (`click`, `type_text`, `scroll`, `screenshot`). Use this for vision models trained on screenshot-grounded interaction.

```bash theme={null}
prime eval run browser-cua-example -m anthropic/claude-opus-4.5 -k PRIME_API_KEY
```

<Info>
  CUA mode deploys a sandbox server by default to handle connection to Browserbase's custom CDP driver, [Understudy](https://github.com/browserbase/stagehand/tree/main/packages/core/lib/v3/understudy), which overcomes performance limitations of Playwright. You can also run against a local server with `-a '{"use_sandbox": false}'`. See [Operational Notes](#operational-notes) below.
</Info>

## Run an evaluation

### Install a hub environment

Install a published Browserbase environment from the Prime hub:

```bash theme={null}
prime env install browser-dom-example
```

### Run with default settings

```bash theme={null}
prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY
```

<Frame>
  <img src="https://mintcdn.com/browserbase/q8GesOQFytbF4QXg/images/integrations/prime/prime-eval-screenshot.png?fit=max&auto=format&n=q8GesOQFytbF4QXg&q=85&s=3e29fdc5744c98ffc1c5746ca89a7d61" alt="CLI output from a browser-dom-example evaluation run showing tool calls, reward, and metrics" width="2300" height="1678" data-path="images/integrations/prime/prime-eval-screenshot.png" />
</Frame>

### Override evaluation parameters

Control the number of examples, rollouts, and environment-specific args:

```bash theme={null}
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -n 10 \
  -r 2
```

| Flag                     | Short | Description                                                            |
| ------------------------ | ----- | ---------------------------------------------------------------------- |
| `--model`                | `-m`  | Model to evaluate (e.g. `openai/gpt-4.1`, `anthropic/claude-opus-4.5`) |
| `--api-key-var`          | `-k`  | Environment variable name for the model API key                        |
| `--num-examples`         | `-n`  | Number of task examples to evaluate                                    |
| `--rollouts-per-example` | `-r`  | Rollouts per example                                                   |
| `--env-args`             | `-a`  | JSON args passed to the environment's `load_environment()`             |
| `--max-concurrent`       | `-c`  | Max concurrent requests                                                |
| `--save-results`         | `-s`  | Save results to disk                                                   |

### Pass environment args

Use `-a` to pass JSON arguments to the environment. These are forwarded to the `load_environment()` function:

```bash theme={null}
# DOM mode with custom Stagehand model and max turns
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -a '{"max_turns": 20, "stagehand_model": "openai/gpt-4.1"}'

# CUA mode with proxies and Verified
prime eval run browser-cua-example \
  -m anthropic/claude-opus-4.5 \
  -k PRIME_API_KEY \
  -a '{"proxies": true, "verified": true}'
```

### Run a published benchmark

Browserbase publishes browser benchmarks on the Prime hub:

```bash theme={null}
# Mind2Web benchmark
prime eval run browserbase/mind2web \
  -m anthropic/claude-opus-4.5 \
  -r 1 -n 10 \
  -a '{"max_turns": 50, "proxies": true, "verified": true}'

# WebVoyager benchmark
prime eval run browserbase/webvoyager \
  -m anthropic/claude-opus-4.5 \
  -r 1 -n 4 \
  -a '{"max_turns": 5, "proxies": true, "verified": true}'
```

### Run from a local environment

If your environment lives in a local directory:

```bash theme={null}
prime eval run ./my_browser_env -m openai/gpt-4.1 -k PRIME_API_KEY
```

## Operational notes

<AccordionGroup>
  <Accordion title="CUA Mode: Sandbox vs Local Server" icon="display">
    By default, CUA mode deploys a sandbox server using a pre-built Docker image ([`deepdream19/cua-server:latest`](https://hub.docker.com/r/deepdream19/cua-server)) that exposes Browserbase's CDP framework, [Understudy](https://github.com/browserbase/stagehand/tree/main/packages/core/lib/v3/understudy). This is the recommended setup.

    For local development, you can run the CUA server yourself and disable the sandbox:

    ```bash theme={null}
    prime eval run browser-cua-example \
      -m openai/gpt-4.1 \
      -k PRIME_API_KEY \
      -a '{"use_sandbox": false, "server_url": "http://localhost:3000"}'
    ```
  </Accordion>

  <Accordion title="Browserbase Proxies and Verified" icon="shield">
    Enable [Proxies](/platform/identity/proxies) and [Verified](/platform/identity/overview) via environment args:

    ```bash theme={null}
    prime eval run browser-dom-example \
      -m openai/gpt-4.1 \
      -k PRIME_API_KEY \
      -a '{"proxies": true, "verified": true}'
    ```

    These are passed through to Browserbase session creation.
  </Accordion>

  <Accordion title="Environment Variables" icon="key">
    **DOM mode** requires:

    * `BROWSERBASE_API_KEY`: Browserbase API key
    * `MODEL_API_KEY`: API key for Stagehand's underlying model

    **CUA mode** requires:

    * `BROWSERBASE_API_KEY`: Browserbase API key
    * `PRIME_API_KEY`: Required when using sandbox mode (default). Set via `prime login` or as an env var.

    **CUA mode** optional:

    * `OPENAI_API_KEY`: Forwarded into the sandbox container if set
  </Accordion>
</AccordionGroup>

## Related resources

<CardGroup cols={2}>
  <Card title="Prime Intellect evaluation docs" icon="book" href="https://docs.primeintellect.ai/tutorials-environments/evaluating">
    Full documentation on Prime's evaluation workflow
  </Card>

  <Card title="Prime verifiers Environments" icon="book" href="https://github.com/PrimeIntellect-ai/verifiers">
    Source code and docs for verifiers environments
  </Card>

  <Card title="Browserbase getting started" icon="rocket" href="/welcome/getting-started">
    Core Browserbase documentation
  </Card>

  <Card title="RL Training Guide" icon="brain" href="/integrations/prime-intellect/rl-training">
    Wire BrowserEnv into Prime RL training workflows
  </Card>
</CardGroup>
