Skip to main content

Overview

Once you’ve validated your browser environment with evaluations, you can use BrowserEnv in Prime Intellect’s reinforcement learning pipelines. This guide covers how to wire Browserbase-backed environments into Hosted Training and self-managed prime-rl setups.
This guide focuses on connecting BrowserEnv to Prime’s training workflows. For RL concepts, reward shaping, and training configuration details, see Prime’s training documentation.

Before You Train

  1. Validate with evals first. Run prime eval run against your environment and verify that reward signals are meaningful. Invest time in understanding and perfecting your judge rubric — your model will only learn if it has a good judge producing accurate reward signals. Training on a broken reward function wastes compute.
  2. Check reward distribution. Ensure your environment produces a useful range of rewards — not all zeros or all ones.
  3. Test at small scale. Start with a short training run to confirm the environment, credentials, and networking all work before committing to a full run.

Setup

Install and Configure the Prime CLI

uv add prime
prime login
You’ll also need to install the correct dependencies:
uv pip install -e ".[browser]"

Install Your BrowserEnv Environment

prime lab setup
prime env install browser-dom-example

Set Browserbase Credentials

The training workers need access to Browserbase. Export your credentials:
export BROWSERBASE_API_KEY=your_browserbase_api_key
export BROWSERBASE_PROJECT_ID=your_browserbase_project_id

Hosted Training

With Prime’s Hosted Training, you define your environment in a TOML config and Prime handles orchestration.

Minimal Config

model = "Qwen/Qwen3-4B-Instruct-2507"
max_steps = 50
batch_size = 128
rollouts_per_example = 8

[sampling]
max_tokens = 512

[[env]]
id = "browserbase/webvoyager"
The [[env]] section references a published Browserbase environment from the Prime hub. Environment-specific args can be passed via [env.args] — the same args you’d pass via -a in prime eval run.

Launch Training

prime rl run training_config.toml
Prime Intellect training run dashboard showing reward curves and metrics for a Browserbase WebVoyager environment
Once training completes, post-trained models can be deployed on Prime infrastructure with the selected checkpoint.
Browserbase credentials (BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID) are read from environment variables by the training workers. Set them in your training environment configuration or pass them as secrets through Prime’s secret management. They do not go in the TOML config file.

Self-Managed prime-rl

For full control over the training loop, use prime-rl directly.

Setup

prime lab setup --prime-rl
uv add verifiers[browser]

Run Training

prime rl run rl_config.toml
Refer to the prime-rl documentation for full configuration options.

Choosing DOM vs CUA for Training

DOM ModeCUA Mode
ObservationStructured DOM / accessibility treeScreenshots
Action spaceText-based instructions (navigate, act, observe, extract)Coordinate-based (click, type, scroll, screenshot)
Best forInstruction-following tasks, form filling, navigationVisual/pixel-grounded tasks, complex UIs
Compute costLower — text observations are compactHigher — screenshot rendering and transfer
Recommended whenYour model processes text and you want faster rolloutsYour model is vision-based or you need pixel-level grounding

Performance and Cost Considerations

Each training step involves creating or reusing a browser session, loading a page, and exchanging observations/actions. This adds latency compared to text-only environments. Budget for slower rollouts in your training timeline.
Every rollout step consumes Browserbase session time. Monitor your usage on the Browserbase dashboard and factor session costs into your training budget. See Measuring Usage for tracking details.
CUA mode is heavier than DOM mode — screenshots must be rendered, tokenized, and consumed by the model. If your task doesn’t require visual grounding, DOM mode will give you faster and cheaper training runs.

Prime Hosted Training

Getting started with Prime’s Hosted Training

Prime verifiers Training Docs

Environment configuration for training workflows

BrowserEnv Evals Guide

Validate your environment before training

Browserbase Cost Optimization

Reduce session costs in production workloads