Browser Use
Browser Use is a Python library designed to empower AI agents with the ability to navigate and interact with web browsers programmatically. It facilitates tasks such as clicking buttons, filling forms, and scraping data, effectively allowing AI to perform web-based actions autonomously. The library is actively maintained with frequent updates.
Warnings
- breaking Starting with version `0.12.5`, `litellm` was removed as a core dependency due to a supply chain attack. If you rely on `ChatLiteLLM` or other `litellm` features, you must install `litellm` separately (`pip install litellm`).
- breaking Version `0.12.3` introduced Browser Use CLI 2.0, which switched from Playwright to direct Chrome DevTools Protocol (CDP). This change provides faster performance but might alter underlying browser interaction behavior or require adjustments if previous Playwright-specific assumptions were made. SDK 3.0 also brought breaking changes to the client.run() API.
- gotcha Browser Use primarily leverages Chrome DevTools Protocol (CDP), meaning it only supports Chrome/Chromium-based browsers. Safari and Firefox are not supported.
- gotcha The library requires Python 3.11 or higher. Using older Python versions will lead to installation or runtime errors.
- gotcha API keys (e.g., `BROWSER_USE_API_KEY`, `OPENAI_API_KEY`, `BROWSERLESS_TOKEN`) are essential for most functionalities. These should be set as environment variables, often loaded from a `.env` file using `python-dotenv` and `load_dotenv()`. Failure to set them correctly will result in authentication errors.
Install
-
pip install browser-use -
uv pip install browser-use -
curl -fsSL https://browser-use.com/cli/install.sh | bash
Imports
- Agent
from browser_use import Agent
- Browser
from browser_use import Browser
- ChatBrowserUse
from browser_use import ChatBrowserUse
Quickstart
import asyncio
import os
from dotenv import load_dotenv
from browser_use import Agent, Browser, ChatBrowserUse
load_dotenv()
async def main():
# Set API keys as environment variables (e.g., in a .env file)
# BROWSER_USE_API_KEY=your_browser_use_key
# OPENAI_API_KEY=your_openai_key (or other LLM provider)
# Optionally configure the browser (headless by default)
browser = Browser(
# use_cloud=False, # Set to True to use Browser Use Cloud
# headless=False, # Set to False to see the browser window
# window_size={'width': 1920, 'height': 1080}
)
agent = Agent(
task="Go to example.com and extract the main heading text",
llm=ChatBrowserUse(api_key=os.environ.get('BROWSER_USE_API_KEY', '')), # Or ChatOpenAI, ChatGoogle, etc.
browser=browser,
)
result = await agent.run()
print("Task completed.")
print(f"Final result: {result.final_result()}")
print(f"Visited URLs: {result.urls()}")
if __name__ == "__main__":
asyncio.run(main())