Argilla

2.8.0 · active · verified Thu Apr 16

The Argilla Python client library (SDK) facilitates logging, managing, and exploring data for AI feedback, monitoring, and fine-tuning. It provides tools for data annotation, model monitoring, and fine-tuning LLMs with human and AI feedback. It's currently at version 2.8.0 and follows a regular release cadence, often releasing minor versions monthly or bi-monthly.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the Argilla client and log `TextRecord` objects to a new or existing dataset. It automatically handles environment variables for connection parameters.

import argilla as rg
import os

# Initialize Argilla client. It looks for ARGILLA_API_URL and ARGILLA_API_KEY in environment variables.
# For local Argilla server, defaults are typically http://localhost:6900 and 'argilla.apikey'.
# For Argilla Cloud, you'd typically set these env vars and potentially ARGILLA_WORKSPACE.
# If env vars are not set, you can pass them directly:
rg.init(
    api_url=os.environ.get("ARGILLA_API_URL", "http://localhost:6900"),
    api_key=os.environ.get("ARGILLA_API_KEY", "argilla.apikey"),
    # workspace=os.environ.get("ARGILLA_WORKSPACE", None) # Uncomment for Argilla Cloud
)

dataset_name = "my_first_argilla_text_dataset"

# Create a list of simple text records
records = [
    rg.TextRecord(
        text="This is my first text record for Argilla.",
        metadata={"source": "quickstart"},
        # You can add predictions for classification, regression, etc.
        # predictions=[("label_A", 0.9), ("label_B", 0.1)]
    ),
    rg.TextRecord(
        text="Argilla helps with data annotation and LLM fine-tuning.",
        metadata={"source": "docs_example"},
    )
]

try:
    # Check if dataset exists; if not, log the records
    existing_dataset = rg.load(name=dataset_name)
    print(f"Dataset '{dataset_name}' already exists with {len(existing_dataset)} records.")
    # You might want to append new records or clear it first depending on the use case
    # rg.log(records=records, name=dataset_name) # To append
except Exception: # Catches argilla.errors.NotFoundError (or a more general Exception if not specifically handled)
    print(f"Dataset '{dataset_name}' not found. Creating and logging new records.")
    # Log the records. If the dataset doesn't exist, it will be created.
    # For TextRecord, the default task type is 'TextClassification' if not specified.
    rg.log(records=records, name=dataset_name)
    print(f"Logged {len(records)} records to dataset '{dataset_name}'.")

# Example of loading the dataset
# dataset = rg.load(name=dataset_name)
# print(f"Successfully loaded dataset '{dataset_name}' with {len(dataset)} records.")

view raw JSON →