Cua Agent

0.7.38 · active · verified Sun Apr 12

Cua (Computer Use) Agent is a Python library designed for AI-driven computer interaction, allowing large language models to perceive and act upon a graphical user interface. It focuses on enabling agents to understand screen content and execute actions like clicking, typing, and dragging programmatically. The current version is 0.7.38, and it is under active, rapid development with frequent releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Cua Agent with a configuration, including your OpenAI API key, and run a simple task. Remember to install platform-specific dependencies (e.g., `[mac]` or `[win]`) and ensure your OpenAI API key is available via environment variable or direct configuration.

import os
from cua_agent.agent import CuaAgent
from cua_agent.config import CuaConfig

# Ensure OPENAI_API_KEY is set in your environment
# or pass it directly to CuaConfig
openai_api_key = os.environ.get('OPENAI_API_KEY', 'sk-your-openai-key')
if not openai_api_key or openai_api_key == 'sk-your-openai-key':
    print("Warning: OPENAI_API_KEY environment variable not set. Please set it or provide it in CuaConfig.")

config = CuaConfig(
    openai_api_key=openai_api_key,
    model='gpt-4o' # Or another suitable multimodal model
)

agent = CuaAgent(config=config)

# Example: Ask the agent to open a browser and navigate
# This assumes a browser (like Chrome/Edge) is installed
# and can be opened by 'command-space'/'windows-key' search.
# Actual execution depends on your OS and installed applications.
try:
    print("Agent performing task...")
    # Note: This is an example, actual commands might vary based on your system and language.
    # The agent uses LLM reasoning to interpret this and interact with your GUI.
    agent.run_task("Open a web browser and navigate to example.com")
    print("Agent task completed.")
except Exception as e:
    print(f"An error occurred: {e}")
    print("Please ensure you have platform-specific dependencies installed (e.g., pip install \"cua-agent[mac]\")")
    print("and your OpenAI API key is correctly configured.")

view raw JSON →