{"id":216,"library":"trl","title":"TRL","description":"Hugging Face library for post-training LLMs: SFT, DPO, GRPO, PPO, reward modeling. Current version is 0.29.1 (Mar 2026). Requires Python >=3.10. Extremely high API churn — major parameter renames across versions. tokenizer= renamed to processing_class= in 0.12. Still pre-1.0 (Development Status: Pre-Alpha).","status":"active","version":"0.29.1","language":"python","source_language":"en","source_url":"https://github.com/huggingface/trl/releases","tags":["fine-tuning","rlhf","dpo","sft","grpo","llm","huggingface","post-training"],"install":[{"cmd":"pip install trl","lang":"bash","label":"Standard"},{"cmd":"pip install trl[peft]","lang":"bash","label":"With PEFT/LoRA support"},{"cmd":"pip install trl[quantization]","lang":"bash","label":"With bitsandbytes quantization"},{"cmd":"pip install trl[vllm]","lang":"bash","label":"With vLLM for GRPO online generation"}],"dependencies":[{"reason":"Required. Must be compatible version — TRL pins minimum transformers versions per release.","package":"transformers","optional":false},{"reason":"Required. Installed automatically.","package":"accelerate","optional":false},{"reason":"Required for LoRA/QLoRA training. Install separately or use trl[peft].","package":"peft","optional":true},{"reason":"Required in practice for dataset loading. Not installed automatically.","package":"datasets","optional":true}],"imports":[{"note":"tokenizer= parameter renamed to processing_class= in TRL 0.12. Training args like max_seq_length moved to SFTConfig, not passed directly to SFTTrainer.","wrong":"trainer = SFTTrainer(\n    model=model,\n    train_dataset=dataset,\n    tokenizer=tokenizer,  # deprecated since 0.12, removed in future release\n    max_seq_length=512,   # moved to SFTConfig, not SFTTrainer directly\n)","symbol":"SFTTrainer","correct":"from trl import SFTConfig, SFTTrainer\nfrom datasets import load_dataset\n\ntrainer = SFTTrainer(\n    model='Qwen/Qwen2.5-0.5B',\n    args=SFTConfig(output_dir='output', max_length=512),\n    train_dataset=load_dataset('trl-lib/Capybara', split='train'),\n    processing_class=tokenizer,  # not tokenizer=\n)"},{"note":"When using PEFT with DPOTrainer, do not pass ref_model. TRL automatically recovers reference behavior by disabling the adapter. Passing ref_model with PEFT wastes memory and may conflict.","wrong":"trainer = DPOTrainer(\n    model=model,\n    ref_model=ref_model,  # unnecessary when using PEFT — causes extra memory usage\n    args=training_args,\n    train_dataset=dataset,\n    tokenizer=tokenizer,  # deprecated\n)","symbol":"DPOTrainer","correct":"from trl import DPOConfig, DPOTrainer\n\ntrainer = DPOTrainer(\n    model=model,\n    args=DPOConfig(output_dir='output', beta=0.1),\n    train_dataset=dataset,\n    processing_class=tokenizer,\n    # With PEFT: no ref_model needed — adapter is disabled to recover reference behavior\n)"}],"quickstart":{"code":"from datasets import load_dataset\nfrom trl import SFTConfig, SFTTrainer\n\n# SFT — minimal setup\ntrainer = SFTTrainer(\n    model='Qwen/Qwen2.5-0.5B',\n    args=SFTConfig(output_dir='sft_output', num_train_epochs=1),\n    train_dataset=load_dataset('trl-lib/Capybara', split='train'),\n)\ntrainer.train()\n\n# DPO — after SFT\nfrom trl import DPOConfig, DPOTrainer\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\nmodel = AutoModelForCausalLM.from_pretrained('sft_output')\ntokenizer = AutoTokenizer.from_pretrained('sft_output')\n\ntrainer = DPOTrainer(\n    model=model,\n    args=DPOConfig(output_dir='dpo_output', beta=0.1),\n    train_dataset=load_dataset('trl-lib/ultrafeedback_binarized', split='train'),\n    processing_class=tokenizer,\n)\ntrainer.train()","lang":"python","description":"SFT then DPO pipeline. Use SFTConfig/DPOConfig for all training args."},"warnings":[{"fix":"Replace tokenizer=tokenizer with processing_class=tokenizer in all trainer constructors.","message":"tokenizer= parameter deprecated since TRL 0.12 and being removed. All trainers (SFTTrainer, DPOTrainer, etc.) now use processing_class= instead. Passing tokenizer= raises DeprecationWarning now, TypeError in future release.","severity":"breaking","affected_versions":">= 0.12"},{"fix":"Use SFTConfig(max_length=512, ...) and pass as args=SFTConfig(...) to SFTTrainer.","message":"Training args like max_seq_length, packing, dataset_text_field have moved from SFTTrainer constructor to SFTConfig. Passing them directly to SFTTrainer raises TypeError in recent versions.","severity":"breaking","affected_versions":">= 0.10"},{"fix":"Pin exact TRL version in requirements. Read the GitHub releases page before upgrading: https://github.com/huggingface/trl/releases","message":"TRL has extremely high API churn — major parameter renames, removals, and behavioral changes in almost every minor release. Code written for 0.8 likely fails on 0.15+. Pin versions in production.","severity":"breaking","affected_versions":"all"},{"fix":"With PEFT models, omit ref_model entirely. DPOTrainer handles reference behavior automatically via adapter disable/enable.","message":"DPOTrainer with PEFT does not keep a separate reference model in memory — it disables the adapter to recover reference behavior. Passing ref_model= with a PEFT model wastes memory and raises a warning about sync_ref_model incompatibility.","severity":"gotcha","affected_versions":"all"},{"fix":"Wrap single reward function in a list: reward_funcs=[my_reward_fn].","message":"GRPOTrainer reward_funcs must be a list of callables, not a single callable. Passing a single function raises TypeError.","severity":"gotcha","affected_versions":"all"},{"fix":"For chat format datasets use a 'messages' column with OpenAI-style message dicts. For plain text use a 'text' column. Check dataset_text_field in SFTConfig if using a custom column name.","message":"SFTTrainer dataset format: conversational datasets (with 'messages' column) are handled differently from text datasets (with 'text' column). Mixing formats or using wrong column name causes silent empty-loss training.","severity":"gotcha","affected_versions":"all"},{"fix":"Ensure your Python environment (version and OS distribution) has readily available pre-built `torch` wheels. Consider using a Python version officially supported by PyTorch (e.g., Python 3.10, 3.11) on a widely supported base OS (e.g., Debian/Ubuntu). If using Alpine, you may need to build `torch` from source or switch to a glibc-based image.","message":"TRL's core dependency, `torch`, often lacks pre-built wheels for less common Python versions (e.g., Python 3.13) or non-glibc environments (e.g., Alpine Linux). This results in `pip` installation failures because `torch` cannot be resolved.","severity":"breaking","affected_versions":"all"}],"env_vars":null,"last_verified":"2026-05-12T11:38:58.448Z","next_check":"2026-06-26T00:00:00.000Z","problems":[],"ecosystem":"pypi","meta_description":null,"install_score":0,"install_tag":"stale","quickstart_score":0,"quickstart_tag":"stale","pypi_latest":null,"install_checks":{"last_tested":"2026-05-12","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":17.16,"mem_mb":181.4,"disk_size":"5.0G"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":22.34,"mem_mb":204.3,"disk_size":"5.1G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"peft","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":22.7,"mem_mb":208.7,"disk_size":"5.1G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"quantization","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":21.59,"mem_mb":204.3,"disk_size":"5.3G"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":24.18,"mem_mb":197.6,"disk_size":"5.1G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"peft","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":25.37,"mem_mb":202,"disk_size":"5.1G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"quantization","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":24.42,"mem_mb":197.6,"disk_size":"5.2G"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"default","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":21.36,"mem_mb":201.8,"disk_size":"5.1G"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"peft","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":21.91,"mem_mb":206.2,"disk_size":"5.1G"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"quantization","exit_code":0,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":21.02,"mem_mb":201.8,"disk_size":"5.2G"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"default","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"peft","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"quantization","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"vllm","exit_code":1,"wheel_type":null,"failure_reason":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null}]},"quickstart_checks":{"last_tested":"2026-04-23","tag":"stale","tag_description":"widespread failures or data too old to trust","results":[{"runtime":"python:3.10-alpine","exit_code":1},{"runtime":"python:3.10-slim","exit_code":-1},{"runtime":"python:3.11-alpine","exit_code":1},{"runtime":"python:3.11-slim","exit_code":-1},{"runtime":"python:3.12-alpine","exit_code":1},{"runtime":"python:3.12-slim","exit_code":-1},{"runtime":"python:3.13-alpine","exit_code":1},{"runtime":"python:3.13-slim","exit_code":-1},{"runtime":"python:3.9-alpine","exit_code":1},{"runtime":"python:3.9-slim","exit_code":-1}]}}