PEFT
Hugging Face Parameter-Efficient Fine-Tuning library. LoRA, QLoRA, LoHa, IA3, prompt tuning and more. Current version is 0.18.1 (Jan 2026). Requires Python >=3.10. PEFT <0.18.0 is incompatible with Transformers v5.
Warnings
- breaking PEFT <0.18.0 is incompatible with Transformers v5. Using peft<0.18.0 with transformers>=5.0 will raise ImportError or cause silent incorrect behavior.
- breaking Python 3.9 support dropped in PEFT 0.18.0.
- breaking merge_and_unload() produces incorrect results (different outputs than unmerged peft_model) when the base model is quantized (bitsandbytes 4-bit/8-bit). This is a fundamental limitation — quantized weights cannot be cleanly merged.
- breaking prepare_model_for_kbit_training() must be called before get_peft_model() when using bitsandbytes quantization. Skipping it causes dtype mismatch errors during the backward pass.
- gotcha save_pretrained() on a PeftModel saves only the adapter weights (small, ~MBs), not the full model. This is correct behavior but surprises users expecting a complete loadable checkpoint.
- gotcha target_modules must match the actual layer names of your model architecture. q_proj/v_proj is correct for LLaMA but wrong for GPT-2 (which uses c_attn). Use model.named_modules() to inspect, or set target_modules='all-linear'.
Install
-
pip install peft -
pip install peft bitsandbytes
Imports
- get_peft_model
from peft import LoraConfig, get_peft_model, TaskType config = LoraConfig( r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'], lora_dropout=0.05, bias='none', task_type=TaskType.CAUSAL_LM ) model = get_peft_model(model, config) model.print_trainable_parameters() - merge_and_unload
# Only works on non-quantized (full precision) base model: model = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.float16) peft_model = PeftModel.from_pretrained(model, adapter_id) merged = peft_model.merge_and_unload() merged.save_pretrained('merged_model')
Quickstart
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
'meta-llama/Llama-3.2-1B',
torch_dtype=torch.bfloat16,
device_map='auto'
)
# Configure LoRA
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules='all-linear', # applies to all linear layers (QLoRA style)
lora_dropout=0.05,
bias='none',
task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
# trainable params: 6,815,744 || all params: 1,242,343,424 || trainable%: 0.55
# After training — save adapter only:
model.save_pretrained('lora_adapter/')
# Reload for inference:
base = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B', torch_dtype=torch.bfloat16)
from peft import PeftModel
peft_model = PeftModel.from_pretrained(base, 'lora_adapter/')