Argilla
The Argilla Python client library (SDK) facilitates logging, managing, and exploring data for AI feedback, monitoring, and fine-tuning. It provides tools for data annotation, model monitoring, and fine-tuning LLMs with human and AI feedback. It's currently at version 2.8.0 and follows a regular release cadence, often releasing minor versions monthly or bi-monthly.
Common errors
-
ModuleNotFoundError: No module named 'rubrix'
cause Attempting to import the old `rubrix` package after upgrading to Argilla v2.x or installing `argilla` directly.fixUninstall the old `rubrix` package (`pip uninstall rubrix`) and ensure `argilla` is installed (`pip install argilla`). Update all `import rubrix as rb` statements to `import argilla as rg`. -
argilla.errors.ArgillaClientError: Cannot connect to the Argilla server at [...]
cause The Argilla server is either not running, its `api_url` is incorrect, or the provided `api_key` is invalid or missing.fixVerify that your Argilla server is running and accessible. Double-check the `ARGILLA_API_URL` and `ARGILLA_API_KEY` environment variables or the parameters passed to `rg.init()`. -
pydantic.v1.ValidationError: [...] value is not a valid enumeration member; permitted: [...]
cause This usually indicates that a record's fields or types do not match the expected schema for the dataset's `TaskTemplate` or `Workflow`, or you are using an incorrect record type.fixConsult the Argilla documentation for the correct `rg.Record` type and its expected fields for your specific task (e.g., `rg.TextRecord`, `rg.FeedbackRecord`). Ensure that if you're working with Feedback tasks, the dataset settings are correctly defined via `rg.FeedbackDataset.from_argilla` before logging. -
TypeError: init() got an unexpected keyword argument 'workspace'
cause You are likely trying to use the `workspace` parameter with a local Argilla server setup which doesn't require or support it in the same way Argilla Cloud does, or it's a version mismatch.fixRemove the `workspace` argument from `rg.init()` if you are connecting to a local Argilla instance, or ensure your Argilla server version and client version are compatible. If connecting to Argilla Cloud, ensure `workspace` is correctly used and your client version supports it.
Warnings
- breaking The entire library was renamed from `rubrix` to `argilla` in v2.0. This is a significant breaking change requiring package uninstallation and reinstallation, and all import paths to be updated.
- breaking The `rg.init()` function signature changed significantly in v2.0. Arguments like `api_url` and `api_key` are now explicit keyword arguments, and the `api_key` format changed (no longer 'owner.apikey' prefix).
- breaking Data model classes for records underwent a major refactor in v2.0. For instance, `rubrix.TextClassificationRecord` was simplified to `argilla.TextRecord` with changes in field names and structure.
- gotcha Connecting to Argilla Cloud instances often requires an additional `workspace` parameter in `rg.init()` which is not needed for local deployments.
- gotcha The `rg.log()` function automatically creates a dataset with default settings if a dataset with the specified name does not already exist. This can lead to unexpected dataset configurations if not explicitly managed.
Install
-
pip install argilla
Imports
- argilla
import rubrix as rb
import argilla as rg
- rg.init
rg.init('http://localhost:6900', api_key='owner.apikey')rg.init(api_url=..., api_key=...)
- rg.TextRecord
rg.TextClassificationRecord(text="...")
rg.TextRecord(text="...")
Quickstart
import argilla as rg
import os
# Initialize Argilla client. It looks for ARGILLA_API_URL and ARGILLA_API_KEY in environment variables.
# For local Argilla server, defaults are typically http://localhost:6900 and 'argilla.apikey'.
# For Argilla Cloud, you'd typically set these env vars and potentially ARGILLA_WORKSPACE.
# If env vars are not set, you can pass them directly:
rg.init(
api_url=os.environ.get("ARGILLA_API_URL", "http://localhost:6900"),
api_key=os.environ.get("ARGILLA_API_KEY", "argilla.apikey"),
# workspace=os.environ.get("ARGILLA_WORKSPACE", None) # Uncomment for Argilla Cloud
)
dataset_name = "my_first_argilla_text_dataset"
# Create a list of simple text records
records = [
rg.TextRecord(
text="This is my first text record for Argilla.",
metadata={"source": "quickstart"},
# You can add predictions for classification, regression, etc.
# predictions=[("label_A", 0.9), ("label_B", 0.1)]
),
rg.TextRecord(
text="Argilla helps with data annotation and LLM fine-tuning.",
metadata={"source": "docs_example"},
)
]
try:
# Check if dataset exists; if not, log the records
existing_dataset = rg.load(name=dataset_name)
print(f"Dataset '{dataset_name}' already exists with {len(existing_dataset)} records.")
# You might want to append new records or clear it first depending on the use case
# rg.log(records=records, name=dataset_name) # To append
except Exception: # Catches argilla.errors.NotFoundError (or a more general Exception if not specifically handled)
print(f"Dataset '{dataset_name}' not found. Creating and logging new records.")
# Log the records. If the dataset doesn't exist, it will be created.
# For TextRecord, the default task type is 'TextClassification' if not specified.
rg.log(records=records, name=dataset_name)
print(f"Logged {len(records)} records to dataset '{dataset_name}'.")
# Example of loading the dataset
# dataset = rg.load(name=dataset_name)
# print(f"Successfully loaded dataset '{dataset_name}' with {len(dataset)} records.")