Back to articles
Back to articles
Engineering

Introducing BrowserEnv: Train browser agents on real websites

Harsehaj Dhami
Harsehaj DhamiGrowth Engineer
Kyle Jeong
Kyle JeongGrowth Engineer
March 25, 2026
4 min read
Share

TL;DR: Browserbase and Prime Intellect have partnered to launch BrowserEnv, a reinforcement learning environment for training and evaluating browser agents on real web tasks.


Everyone wants AI models that can actually use the browser to get work done, but most models weren’t trained to interact with real websites. They were trained on static datasets instead of environments where they can practice navigating pages, clicking elements, and completing multi-step workflows. This is why many browser agents look impressive in demos but struggle in real-world use.

The missing piece is a reliable and scalable training environment.

Training browser agents requires significant infrastructure when running browsers at scale, interacting with live websites without getting blocked, resetting sessions between tasks, and verifying results. This is the infrastructure frontier labs are already building.

For example, Microsoft trained and evaluated their computer-use model Fara-7B using Browserbase, which required reliable access to real websites and scalable browser environments for evaluation and reinforcement learning workflows.

We have partnered with Prime Intellect to make this infrastructure accessible to everyone with BrowserEnv.

BrowserEnv is a reinforcement learning environment designed specifically for training browser agents. It runs on Browserbase, which provides scalable browser infrastructure and access to real websites. Prime Intellect provides the training platform.

Together, they make it possible to train and evaluate computer-use models on real browser tasks without building the infrastructure yourself. All you need is a dataset of tasks.

Researchers and developers can train open models like Qwen or other computer-use models using reinforcement learning, while BrowserEnv handles browser orchestration, task execution, and verification.

Copy link
Training Qwen 3 VL on WebVoyager with BrowserEnv

To validate our stack end to end, we fine-tuned Qwen/Qwen3-VL-8B-Instruct on real WebVoyager tasks using BrowserEnv and Prime Intellect. We plugged the ‎prime/webvoyager-no-anti-bot environment into Prime’s RL pipeline, so the model could practice real navigation flows across sites like Amazon, Allrecipes, GitHub, Booking, and more without getting stuck on anti bot walls.

BrowserEnv handled browser orchestration on Browserbase, Prime handled rollouts and optimization, and WebVoyager provided a standardized benchmark of 600 filtered tasks.

We started from the public WebVoyager environment in the Prime hub, switched it to CUA mode, and pointed it at Qwen3-VL-8B-Instruct. The training run used a relatively small but realistic configuration: 200 steps, batch size 32, 8 rollouts per example, learning rate 1e-4, and an oversampling factor of 2, with modest parallelism.

model = "Qwen/Qwen3-VL-8B-Instruct" max_steps = 200 batch_size = 32 rollouts_per_example = 8 learning_rate = 0.0001 oversampling_factor = 2 max_async_level = 2 [sampling] max_tokens = 512 [[env]] id = "prime/webvoyager-no-anti-bot" args = { mode = "cua", viewport_width = 800, viewport_height = 600, keep_recent_screenshots = 2 }

In this setup, each training step created or reused a Browserbase session, loaded a WebVoyager task, and let Qwen3-VL act through coordinate based CUA primitives while a verifier judged task completion and produced reward signals. Over the course of the run, the model improved on multi step tasks such as searching, filtering, and extracting information from live pages, rather than just static HTML.

The output of this training run is a LoRA adapter that can be easily deployed to run on the Prime Intellect platform.

This training workflow is reproducible by anyone with access to a Browserbase and Prime Intellect account. You can even start from the same ingredients we used: BrowserEnv on Browserbase, the WebVoyager no anti bot environment in Prime, and an open vision language model like Qwen3-VL.

Frontier labs are already training browser agents this way, and now anyone with access to the internet can do the same.

BrowserEnv is generally available today, learn more at browserenv.com and start training your own browser agents.

Train your own custom modelLearn more