PyTorch Native Library for LLM Fine-tuning
torchtune is a PyTorch-native library designed for authoring, fine-tuning, and experimenting with Large Language Models (LLMs). It provides hackable training recipes for techniques like SFT, LoRA, QLoRA, FSDP, DPO, PPO, and QAT, supporting popular architectures such as Llama, Gemma, Mistral, Phi, and Qwen. While it offers a componentized design, memory efficiency, and strong integrations, active feature development for torchtune officially ceased in July 2025. The library will receive critical bug fixes and security patches through 2025, but no new features will be added, as the PyTorch team is developing a new product.
Warnings
- breaking Active feature development for torchtune officially ceased in July 2025. The library will receive critical bug fixes and security patches during 2025, but no new features will be added. Users are advised to be aware of this and monitor announcements from the PyTorch team regarding a successor product.
- gotcha Memory issues, particularly 'CUDA out of memory', are common when fine-tuning large LLMs. torchtune offers various memory optimizations, but careful configuration is required.
- gotcha Downloading gated models from Hugging Face Hub (e.g., Llama models) requires a valid Hugging Face authentication token (`HF_TOKEN`). Without it, download commands will fail.
- gotcha The documentation around custom datasets and generation, especially with YAML configurations, has been a source of frustration for users due to a lack of clarity, requiring trial and error or reading the library's source code.
Install
-
pip install torch torchvision torchao -
pip install torchtune
Imports
- lora_llama2_7b
from torchtune.models.llama2 import lora_llama2_7b
- Message
from torchtune.data import Message
- InputOutputToMessages
from torchtune.data import InputOutputToMessages
- FullModelHFCheckpointer
from torchtune.checkpointers import FullModelHFCheckpointer
from torchtune.training import FullModelHFCheckpointer
Quickstart
# Ensure you have a Hugging Face token (HF_TOKEN) set as an environment variable or pass it directly.
# export HF_TOKEN="hf_YOUR_TOKEN"
import os
# 1. Download Llama2 7B model weights and tokenizer
# You need access to Meta Llama models on Hugging Face.
# This command will download files to /tmp/Llama-2-7b-hf
print("Downloading model...")
os.system(
f"tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf "
f"--ignore-patterns \"*.bin\" --hf-token {os.environ.get('HF_TOKEN', 'YOUR_HF_TOKEN_HERE')}"
)
# 2. Copy a LoRA fine-tuning config for single device
print("Copying config...")
os.system("tune cp llama2/7B_lora_single_device ./my_llama2_lora_config.yaml")
# (Optional) Modify my_llama2_lora_config.yaml if needed, e.g., adjust batch_size,
# dataset path, or output_dir. Ensure tokenizer path points to the downloaded model.
# Example: Update output_dir to a persistent location and ensure tokenizer path is correct.
# 3. Run the LoRA fine-tuning recipe
print("Running fine-tuning...")
os.system(
"tune run lora_finetune_single_device --config ./my_llama2_lora_config.yaml "
"checkpointer.checkpoint_dir=/tmp/Llama-2-7b-hf "
"tokenizer.path=/tmp/Llama-2-7b-hf/tokenizer.model "
"output_dir=/tmp/torchtune_output"
)
print("Fine-tuning command executed. Check /tmp/torchtune_output for results.")