Overview
Once you’ve validated your browser environment with evaluations, you can useBrowserEnv in Prime Intellect’s reinforcement learning pipelines. This guide covers how to wire Browserbase-backed environments into Hosted Training and self-managed prime-rl setups.
This guide focuses on connecting
BrowserEnv to Prime’s training workflows. For RL concepts, reward shaping, and training configuration details, see Prime’s training documentation.Before You Train
- Validate with evals first. Run
prime eval runagainst your environment and verify that reward signals are meaningful. Invest time in understanding and perfecting your judge rubric — your model will only learn if it has a good judge producing accurate reward signals. Training on a broken reward function wastes compute. - Check reward distribution. Ensure your environment produces a useful range of rewards — not all zeros or all ones.
- Test at small scale. Start with a short training run to confirm the environment, credentials, and networking all work before committing to a full run.
Setup
Install and Configure the Prime CLI
Install Your BrowserEnv Environment
Set Browserbase Credentials
The training workers need access to Browserbase. Export your credentials:Hosted Training
With Prime’s Hosted Training, you define your environment in a TOML config and Prime handles orchestration.Minimal Config
[[env]] section references a published Browserbase environment from the Prime hub. Environment-specific args can be passed via [env.args] — the same args you’d pass via -a in prime eval run.
Launch Training

Where do Browserbase credentials go?
Where do Browserbase credentials go?
Browserbase credentials (
BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID) are read from environment variables by the training workers. Set them in your training environment configuration or pass them as secrets through Prime’s secret management. They do not go in the TOML config file.Self-Managed prime-rl
For full control over the training loop, useprime-rl directly.
Setup
Run Training
Choosing DOM vs CUA for Training
| DOM Mode | CUA Mode | |
|---|---|---|
| Observation | Structured DOM / accessibility tree | Screenshots |
| Action space | Text-based instructions (navigate, act, observe, extract) | Coordinate-based (click, type, scroll, screenshot) |
| Best for | Instruction-following tasks, form filling, navigation | Visual/pixel-grounded tasks, complex UIs |
| Compute cost | Lower — text observations are compact | Higher — screenshot rendering and transfer |
| Recommended when | Your model processes text and you want faster rollouts | Your model is vision-based or you need pixel-level grounding |
Performance and Cost Considerations
Rollout Latency
Rollout Latency
Each training step involves creating or reusing a browser session, loading a page, and exchanging observations/actions. This adds latency compared to text-only environments. Budget for slower rollouts in your training timeline.
Session Costs
Session Costs
Every rollout step consumes Browserbase session time. Monitor your usage on the Browserbase dashboard and factor session costs into your training budget. See Measuring Usage for tracking details.
CUA vs DOM Overhead
CUA vs DOM Overhead
CUA mode is heavier than DOM mode — screenshots must be rendered, tokenized, and consumed by the model. If your task doesn’t require visual grounding, DOM mode will give you faster and cheaper training runs.
Related Resources
Prime Hosted Training
Getting started with Prime’s Hosted Training
Prime verifiers Training Docs
Environment configuration for training workflows
BrowserEnv Evals Guide
Validate your environment before training
Browserbase Cost Optimization
Reduce session costs in production workloads