SetFit

1.1.3 · active · verified Thu Apr 16

SetFit is a Python library for efficient few-shot learning using Sentence Transformers. It enables training accurate text classifiers with minimal labeled data by finetuning pre-trained Sentence Transformer models. The library is prompt-free, fast to train, and offers multilingual support. The current version is 1.1.3, and the project maintains an active release cadence with frequent patch updates addressing compatibility and minor fixes, alongside larger feature releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the typical workflow for training a SetFit model for text classification. It covers initializing a `SetFitModel` from the Hugging Face Hub, preparing a dataset (including simulating a few-shot scenario), configuring training parameters via `TrainingArguments`, creating and training a `SetFitTrainer`, evaluating the model, and making predictions. The example uses a small BGE model and the SST-2 dataset, sampling 8 examples per class for training.

from datasets import load_dataset
from setfit import SetFitModel, SetFitTrainer, TrainingArguments, sample_dataset
from sentence_transformers.losses import CosineSimilarityLoss

# 1. Initialize a SetFit model
model = SetFitModel.from_pretrained("BAAI/bge-small-en-v1.5")

# 2. Load and prepare a dataset (e.g., sst2 for sentiment classification)
dataset = load_dataset("SetFit/sst2")

# Simulate few-shot regime: 8 examples per class
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)
eval_dataset = dataset["validation"]

# Optional: Map dataset columns if they are not 'text' and 'label'
column_mapping = {"sentence": "text", "label": "label"}

# 3. Define TrainingArguments
training_args = TrainingArguments(
    batch_size=16,
    num_iterations=20, # Number of text pairs to generate for contrastive learning
    num_epochs=1,      # Number of epochs to use for contrastive learning
    learning_rate=2e-5,
    seed=42,
    evaluation_strategy="epoch",
    save_strategy="epoch"
)

# 4. Create SetFitTrainer
trainer = SetFitTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    loss_class=CosineSimilarityLoss, # Loss function for contrastive learning
    metric="accuracy",
    column_mapping=column_mapping
)

# 5. Train the model
trainer.train()

# 6. Evaluate the model
metrics = trainer.evaluate()
print(f"Evaluation Metrics: {metrics}")

# 7. Make predictions
sentences = ["The movie was great!", "I didn't like the food."]
predictions = model.predict(sentences)
print(f"Predictions: {predictions}")

# 8. Push model to Hugging Face Hub (requires `huggingface_hub` login)
# trainer.push_to_hub("my-awesome-setfit-model")

view raw JSON →