Cua Agent
Cua (Computer Use) Agent is a Python library designed for AI-driven computer interaction, allowing large language models to perceive and act upon a graphical user interface. It focuses on enabling agents to understand screen content and execute actions like clicking, typing, and dragging programmatically. The current version is 0.7.38, and it is under active, rapid development with frequent releases.
Warnings
- gotcha Cua Agent requires platform-specific dependencies for full functionality on macOS and Windows. Installing only `pip install cua-agent` provides core features, but GUI interaction requires additional `[mac]` or `[win]` extras.
- gotcha An OpenAI API key (or a compatible LLM provider endpoint) and a multimodal model (e.g., `gpt-4o`) are mandatory for the agent to function. Misconfiguration or missing keys will prevent the agent from running tasks.
- gotcha GUI automation is inherently fragile. Small changes in UI elements, screen resolution, or application states can break agent tasks. The agent's performance depends heavily on the clarity of the screen and the robustness of the underlying LLM's visual understanding.
- gotcha The agent can be resource-intensive, particularly due to frequent screen capturing, image processing, and LLM API calls. This can lead to high CPU, memory, and network usage, incurring costs for API calls.
Install
-
pip install cua-agent -
pip install "cua-agent[mac]" # For macOS users pip install "cua-agent[win]" # For Windows users
Imports
- CuaAgent
from cua_agent.agent import CuaAgent
- CuaConfig
from cua_agent.config import CuaConfig
- get_full_screen_screenshot
from cua_agent.utils import get_full_screen_screenshot
Quickstart
import os
from cua_agent.agent import CuaAgent
from cua_agent.config import CuaConfig
# Ensure OPENAI_API_KEY is set in your environment
# or pass it directly to CuaConfig
openai_api_key = os.environ.get('OPENAI_API_KEY', 'sk-your-openai-key')
if not openai_api_key or openai_api_key == 'sk-your-openai-key':
print("Warning: OPENAI_API_KEY environment variable not set. Please set it or provide it in CuaConfig.")
config = CuaConfig(
openai_api_key=openai_api_key,
model='gpt-4o' # Or another suitable multimodal model
)
agent = CuaAgent(config=config)
# Example: Ask the agent to open a browser and navigate
# This assumes a browser (like Chrome/Edge) is installed
# and can be opened by 'command-space'/'windows-key' search.
# Actual execution depends on your OS and installed applications.
try:
print("Agent performing task...")
# Note: This is an example, actual commands might vary based on your system and language.
# The agent uses LLM reasoning to interpret this and interact with your GUI.
agent.run_task("Open a web browser and navigate to example.com")
print("Agent task completed.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure you have platform-specific dependencies installed (e.g., pip install \"cua-agent[mac]\")")
print("and your OpenAI API key is correctly configured.")