BrowserGym WebArena

raw JSON →
0.14.3 verified Mon Apr 27 auth: no python

WebArena benchmark environment for BrowserGym, version 0.14.3. Provides a Gymnasium-compatible environment for evaluating web agents on realistic web interaction tasks.

pip install browsergym-webarena
error ModuleNotFoundError: No module named 'browsergym.webarena'
cause Package not installed or imported with wrong name.
fix
Run 'pip install browsergym-webarena' and import as 'from browsergym.webarena import ...'
error gym.error.UnregisteredEnv: Cannot find environment with id 'browsergym/webarena'
cause Deprecated environment ID used after version 0.14.0.
fix
Use a specific task ID, e.g., 'browsergym/webarena.0'.
error Error: Docker containers not running. Make sure you have started the WebArena infrastructure.
cause Required Docker services for the benchmark websites are not running.
fix
Follow the WebArena setup instructions to start the Docker containers.
error TypeError: expected string or bytes-like object
cause Agent returned a non-string action (e.g., a dict or array) to the environment.
fix
Convert action to a string before calling 'env.step(action)'.
breaking WebArena requires a specific Docker image for the websites. The environment will fail to initialize if the Docker containers are not running.
fix Pull and run the required Docker images as per the WebArena setup instructions before using the environment.
deprecated The 'browsergym/webarena' environment ID is deprecated in favor of task-specific IDs like 'browsergym/webarena.0'.
fix Use 'browsergym/webarena.<task_id>' where <task_id> is an integer from 0 to 811.
gotcha The environment returns observations as dictionaries with 'screenshot' (PIL Image), 'text' (str), and other fields. Do not assume it returns a single array.
fix Always access 'obs['screenshot']' or 'obs['text']' appropriately.
gotcha The environment uses Playwright under the hood. Do not run multiple environments in the same process without proper cleanup, or you may face port conflicts.
fix Use 'env.close()' after each episode or use context managers.
gotcha WebArena tasks are defined with a specific evaluation function (teardown). The agent must return an action string; otherwise, the evaluation may not work correctly.
fix Ensure your agent returns a string action from the 'action_space' (Text space) each step.

Create a WebArena environment and reset it. Use 'headless=True' for servers.

import gymnasium as gym
import browsergym.webarena

env = gym.make('browsergym/webarena.0', headless=True)
obs, info = env.reset()
# run your agent
env.close()