TRL

0.29.1 · active · verified Thu Mar 26

Hugging Face library for post-training LLMs: SFT, DPO, GRPO, PPO, reward modeling. Current version is 0.29.1 (Mar 2026). Requires Python >=3.10. Extremely high API churn — major parameter renames across versions. tokenizer= renamed to processing_class= in 0.12. Still pre-1.0 (Development Status: Pre-Alpha).

Warnings

Install

Imports

Quickstart

SFT then DPO pipeline. Use SFTConfig/DPOConfig for all training args.

from datasets import load_dataset
from trl import SFTConfig, SFTTrainer

# SFT — minimal setup
trainer = SFTTrainer(
    model='Qwen/Qwen2.5-0.5B',
    args=SFTConfig(output_dir='sft_output', num_train_epochs=1),
    train_dataset=load_dataset('trl-lib/Capybara', split='train'),
)
trainer.train()

# DPO — after SFT
from trl import DPOConfig, DPOTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained('sft_output')
tokenizer = AutoTokenizer.from_pretrained('sft_output')

trainer = DPOTrainer(
    model=model,
    args=DPOConfig(output_dir='dpo_output', beta=0.1),
    train_dataset=load_dataset('trl-lib/ultrafeedback_binarized', split='train'),
    processing_class=tokenizer,
)
trainer.train()

view raw JSON →