Skip to main content

Overview

Use BrowserEnv with the Prime CLI to evaluate browser agents on structured tasks. Each evaluation run spins up Browserbase sessions, feeds observations to your model, and collects reward signals — giving you reproducible benchmarks for browser-capable models.
Prime Intellect evaluation run showing browser agent rollouts with reward signals

Prerequisites

Browserbase Account

API key and project ID from your Browserbase dashboard

Prime CLI

Install via uv add prime

verifiers

Install with browser extras: uv add verifiers[browser]

Install and Configure

Set Browserbase Credentials

Export your Browserbase credentials so BrowserEnv can create sessions:
export BROWSERBASE_API_KEY=your_browserbase_api_key
export BROWSERBASE_PROJECT_ID=your_browserbase_project_id

Install the Prime CLI

uv add prime
prime login

Install verifiers with Browser Support

uv add verifiers[browser]

Choose a BrowserEnv Mode

BrowserEnv supports two observation/action modes. The mode is selected when you run an evaluation — either through the environment’s default or via -a args. The agent receives structured DOM content and issues natural language instructions via Stagehand tools (navigate, observe, act, extract). This is the default and works well for most browser tasks.
prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY

CUA Mode

The agent receives screenshots and uses coordinate-based tool calls (click, type_text, scroll, screenshot). Use this for vision models trained on screenshot-grounded interaction.
prime eval run browser-cua-example -m anthropic/claude-opus-4.5 -k PRIME_API_KEY
CUA mode deploys a sandbox server by default to handle connection to Browserbase’s custom CDP driver, Understudy, which overcomes performance limitations of Playwright. You can also run against a local server with -a '{"use_sandbox": false}'. See Operational Notes below.

Run an Evaluation

Install a Hub Environment

Install a published Browserbase environment from the Prime hub:
prime env install browser-dom-example

Run with Default Settings

prime eval run browser-dom-example -m openai/gpt-4.1 -k PRIME_API_KEY
CLI output from a browser-dom-example evaluation run showing tool calls, reward, and metrics

Override Evaluation Parameters

Control the number of examples, rollouts, and environment-specific args:
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -n 10 \
  -r 2
FlagShortDescription
--model-mModel to evaluate (e.g. openai/gpt-4.1, anthropic/claude-opus-4.5)
--api-key-var-kEnvironment variable name for the model API key
--num-examples-nNumber of task examples to evaluate
--rollouts-per-example-rRollouts per example
--env-args-aJSON args passed to the environment’s load_environment()
--max-concurrent-cMax concurrent requests
--save-results-sSave results to disk

Pass Environment Args

Use -a to pass JSON arguments to the environment. These are forwarded to the load_environment() function:
# DOM mode with custom Stagehand model and max turns
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -a '{"max_turns": 20, "stagehand_model": "openai/gpt-4.1"}'

# CUA mode with proxies and stealth
prime eval run browser-cua-example \
  -m anthropic/claude-opus-4.5 \
  -k PRIME_API_KEY \
  -a '{"proxies": true, "advanced_stealth": true}'

Run a Published Benchmark

Browserbase publishes browser benchmarks on the Prime hub:
# Mind2Web benchmark
prime eval run browserbase/mind2web \
  -m anthropic/claude-opus-4.5 \
  -r 1 -n 10 \
  -a '{"max_turns": 50, "proxies": true, "advanced_stealth": true}'

# WebVoyager benchmark
prime eval run browserbase/webvoyager \
  -m anthropic/claude-opus-4.5 \
  -r 1 -n 4 \
  -a '{"max_turns": 5, "proxies": true, "advanced_stealth": true}'

Run from a Local Environment

If your environment lives in a local directory:
prime eval run ./my_browser_env -m openai/gpt-4.1 -k PRIME_API_KEY

Operational Notes

By default, CUA mode deploys a sandbox server using a pre-built Docker image (deepdream19/cua-server:latest) that exposes Browserbase’s CDP framework, Understudy. This is the recommended setup.For local development, you can run the CUA server yourself and disable the sandbox:
prime eval run browser-cua-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -a '{"use_sandbox": false, "server_url": "http://localhost:3000"}'
Enable Proxies and Stealth Mode via environment args:
prime eval run browser-dom-example \
  -m openai/gpt-4.1 \
  -k PRIME_API_KEY \
  -a '{"proxies": true, "advanced_stealth": true}'
These are passed through to Browserbase session creation.
DOM mode requires:
  • BROWSERBASE_API_KEY — Browserbase API key
  • BROWSERBASE_PROJECT_ID — Browserbase project ID
  • MODEL_API_KEY — API key for Stagehand’s underlying model
CUA mode requires:
  • BROWSERBASE_API_KEY — Browserbase API key
  • BROWSERBASE_PROJECT_ID — Browserbase project ID
  • PRIME_API_KEY — Required when using sandbox mode (default). Set via prime login or as an env var.
CUA mode optional:
  • OPENAI_API_KEY — Forwarded into the sandbox container if set

Prime Intellect Evaluating Docs

Full documentation on Prime’s evaluation workflow

Prime verifiers Environments

Source code and docs for verifiers environments

Browserbase Getting Started

Core Browserbase documentation

RL Training Guide

Wire BrowserEnv into Prime RL training workflows