> ## Documentation Index > Fetch the complete documentation index at: https://docs.browserbase.com/llms.txt > Use this file to discover all available pages before exploring further. # Evaluate browser agents with Prime Intellect > Run browser agent evaluations using BrowserEnv and the Prime CLI. Use `BrowserEnv` with the Prime CLI to evaluate browser agents on structured tasks. Each evaluation run spins up Browserbase sessions, feeds observations to your model, and collects reward signals, giving you reproducible benchmarks for browser-capable models. Prime Intellect evaluation run showing browser agent rollouts with reward signals

Prime Intellect evaluation run showing browser agent rollouts with reward signals

## Prerequisites API key from your Browserbase dashboard Install via `uv add prime` Install with browser extras: `uv add verifiers[browser]` ## Install and configure ### Set Browserbase credentials Export your Browserbase credentials so `BrowserEnv` can create sessions: ```bash theme={null} export BROWSERBASE_API_KEY=your_browserbase_api_key ``` ### Install the Prime CLI ```bash theme={null} uv add prime prime login ``` ### Install verifiers with browser support ```bash theme={null} uv add verifiers[browser] ``` ## Choose a BrowserEnv mode `BrowserEnv` supports two observation/action modes. The mode is selected when you run an evaluation, either through the environment's default or via `-a` args. ### DOM mode (recommended) The agent receives structured DOM content and issues natural language instructions via [Stagehand](https://docs.stagehand.dev/) tools (`navigate`, `observe`, `act`, `extract`). This is the default and works well for most browser tasks. ```bash theme={null} prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY ``` ### CUA mode The agent receives screenshots and uses coordinate-based tool calls (`click`, `type_text`, `scroll`, `screenshot`). Use this for vision models trained on screenshot-grounded interaction. ```bash theme={null} prime eval run browser-cua-example -m anthropic/claude-opus-4.5 -k PRIME_API_KEY ``` CUA mode deploys a sandbox server by default to handle connection to Browserbase's custom CDP driver, [Understudy](https://github.com/browserbase/stagehand/tree/main/packages/core/lib/v3/understudy), which overcomes performance limitations of Playwright. You can also run against a local server with `-a '{"use_sandbox": false}'`. See [Operational Notes](#operational-notes) below. ## Run an evaluation ### Install a hub environment Install a published Browserbase environment from the Prime hub: ```bash theme={null} prime env install browser-dom-example ``` ### Run with default settings ```bash theme={null} prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY ``` CLI output from a browser-dom-example evaluation run showing tool calls, reward, and metrics

CLI output from a browser-dom-example evaluation run showing tool calls, reward, and metrics

### Override evaluation parameters Control the number of examples, rollouts, and environment-specific args: ```bash theme={null} prime eval run browser-dom-example \ -m openai/gpt-4.1 \ -k PRIME_API_KEY \ -n 10 \ -r 2 ``` | Flag | Short | Description | | ------------------------ | ----- | ---------------------------------------------------------------------- | | `--model` | `-m` | Model to evaluate (e.g. `openai/gpt-4.1`, `anthropic/claude-opus-4.5`) | | `--api-key-var` | `-k` | Environment variable name for the model API key | | `--num-examples` | `-n` | Number of task examples to evaluate | | `--rollouts-per-example` | `-r` | Rollouts per example | | `--env-args` | `-a` | JSON args passed to the environment's `load_environment()` | | `--max-concurrent` | `-c` | Max concurrent requests | | `--save-results` | `-s` | Save results to disk | ### Pass environment args Use `-a` to pass JSON arguments to the environment. These are forwarded to the `load_environment()` function: ```bash theme={null} # DOM mode with custom Stagehand model and max turns prime eval run browser-dom-example \ -m openai/gpt-4.1 \ -k PRIME_API_KEY \ -a '{"max_turns": 20, "stagehand_model": "openai/gpt-4.1"}' # CUA mode with proxies and Verified prime eval run browser-cua-example \ -m anthropic/claude-opus-4.5 \ -k PRIME_API_KEY \ -a '{"proxies": true, "verified": true}' ``` ### Run a published benchmark Browserbase publishes browser benchmarks on the Prime hub: ```bash theme={null} # Mind2Web benchmark prime eval run browserbase/mind2web \ -m anthropic/claude-opus-4.5 \ -r 1 -n 10 \ -a '{"max_turns": 50, "proxies": true, "verified": true}' # WebVoyager benchmark prime eval run browserbase/webvoyager \ -m anthropic/claude-opus-4.5 \ -r 1 -n 4 \ -a '{"max_turns": 5, "proxies": true, "verified": true}' ``` ### Run from a local environment If your environment lives in a local directory: ```bash theme={null} prime eval run ./my_browser_env -m openai/gpt-4.1 -k PRIME_API_KEY ``` ## Operational notes By default, CUA mode deploys a sandbox server using a pre-built Docker image ([`deepdream19/cua-server:latest`](https://hub.docker.com/r/deepdream19/cua-server)) that exposes Browserbase's CDP framework, [Understudy](https://github.com/browserbase/stagehand/tree/main/packages/core/lib/v3/understudy). This is the recommended setup. For local development, you can run the CUA server yourself and disable the sandbox: ```bash theme={null} prime eval run browser-cua-example \ -m openai/gpt-4.1 \ -k PRIME_API_KEY \ -a '{"use_sandbox": false, "server_url": "http://localhost:3000"}' ``` Enable [Proxies](/platform/identity/proxies) and [Verified](/platform/identity/overview) via environment args: ```bash theme={null} prime eval run browser-dom-example \ -m openai/gpt-4.1 \ -k PRIME_API_KEY \ -a '{"proxies": true, "verified": true}' ``` These are passed through to Browserbase session creation. **DOM mode** requires: * `BROWSERBASE_API_KEY`: Browserbase API key * `MODEL_API_KEY`: API key for Stagehand's underlying model **CUA mode** requires: * `BROWSERBASE_API_KEY`: Browserbase API key * `PRIME_API_KEY`: Required when using sandbox mode (default). Set via `prime login` or as an env var. **CUA mode** optional: * `OPENAI_API_KEY`: Forwarded into the sandbox container if set ## Related resources Full documentation on Prime's evaluation workflow Source code and docs for verifiers environments Core Browserbase documentation Wire BrowserEnv into Prime RL training workflows