PEFT
raw JSON → 0.18.1 verified Tue May 12 auth: no python install: stale quickstart: stale
Hugging Face Parameter-Efficient Fine-Tuning library. LoRA, QLoRA, LoHa, IA3, prompt tuning and more. Current version is 0.18.1 (Jan 2026). Requires Python >=3.10. PEFT <0.18.0 is incompatible with Transformers v5.
pip install peft Common errors
error ModuleNotFoundError: No module named 'peft' ↓
cause The 'peft' library is not installed in the Python environment you are currently using or is not accessible in the execution path.
fix
Install the 'peft' library using pip:
pip install peft error TypeError: LoraConfig.__init__() got an unexpected keyword argument '<argument-name>' ↓
cause You are attempting to load a PEFT adapter configuration (e.g., LoraConfig) that was saved with a newer version of the 'peft' library than is currently installed, leading to unrecognized keyword arguments.
fix
Upgrade your 'peft' library to the latest version:
pip install -U peft error ValueError: Target module <MODULE_NAME> is not supported. ↓
cause The PEFT method (e.g., LoRA) is trying to inject adapters into a module type within your base model that it does not explicitly support or recognize. This can happen with custom model architectures, specific quantization layers (like `QuantLinear`), or newer model types (like `Gemma4ClippableLinear`).
fix
Manually specify the
target_modules in your PeftConfig (e.g., LoraConfig) to include only the supported linear layers or other modules you intend to fine-tune. You may need to inspect model.named_modules() to find the correct layer names. For example, LoraConfig(target_modules=["q_proj", "v_proj"]). error ValueError: Attempting to unscale FP16 gradients. ↓
cause This error typically occurs when using automatic mixed precision (AMP) with `fp16=True` in the Trainer, but trainable PEFT weights are inadvertently left in `torch.float16` or `torch.bfloat16`, which conflicts with the gradient scaling process.
fix
Explicitly cast the trainable PEFT parameters to
torch.float32 after initializing the PEFT model. While PEFT versions 0.12.0+ generally handle this, specific setups might still require it:
from peft import get_peft_model
# ... (model and peft_config setup)
peft_model = get_peft_model(model, peft_config)
# Ensure trainable parameters are in float32
for param in peft_model.parameters():
if param.requires_grad:
param.data = param.data.float()
# Then proceed with Trainer(model=peft_model, fp16=True, ...) Warnings
breaking PEFT <0.18.0 is incompatible with Transformers v5. Using peft<0.18.0 with transformers>=5.0 will raise ImportError or cause silent incorrect behavior. ↓
fix Upgrade to peft>=0.18.0 before upgrading to Transformers v5.
breaking Python 3.9 support dropped in PEFT 0.18.0. ↓
fix Pin peft<0.18.0 for Python 3.9 environments, or upgrade Python to 3.10+.
breaking merge_and_unload() produces incorrect results (different outputs than unmerged peft_model) when the base model is quantized (bitsandbytes 4-bit/8-bit). This is a fundamental limitation — quantized weights cannot be cleanly merged. ↓
fix To merge and save a full-precision model: reload the base model without quantization (torch_dtype=torch.float16), then load the adapter and merge. Only quantize after merging if needed.
breaking prepare_model_for_kbit_training() must be called before get_peft_model() when using bitsandbytes quantization. Skipping it causes dtype mismatch errors during the backward pass. ↓
fix Pattern: model = prepare_model_for_kbit_training(model) then model = get_peft_model(model, config). Enable gradient checkpointing first: model.gradient_checkpointing_enable().
gotcha save_pretrained() on a PeftModel saves only the adapter weights (small, ~MBs), not the full model. This is correct behavior but surprises users expecting a complete loadable checkpoint. ↓
fix To load: use PeftModel.from_pretrained(base_model, adapter_path). The base model must be loaded separately. To get a standalone model: use merge_and_unload() on a non-quantized base, then save_pretrained().
gotcha target_modules must match the actual layer names of your model architecture. q_proj/v_proj is correct for LLaMA but wrong for GPT-2 (which uses c_attn). Use model.named_modules() to inspect, or set target_modules='all-linear'. ↓
fix Use target_modules='all-linear' to safely target all linear layers regardless of architecture name. Or inspect: {name for name, mod in model.named_modules() if isinstance(mod, torch.nn.Linear)}.
breaking pip cannot find a compatible `torch` distribution for the current environment. This typically occurs on environments like Alpine Linux or with very recent Python versions (e.g., 3.13) for which `torch` does not provide pre-built wheels, causing `peft` installation to fail due to its `torch` dependency. ↓
fix Use a Python version and base image combination for which `torch` pre-built wheels are available (e.g., Python 3.8-3.11 on Debian/Ubuntu-based images). Alternatively, install `torch` manually from source, which can be complex.
Install
pip install peft bitsandbytes Install compatibility stale last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) - - - -
3.10 alpine (musl) - - - -
3.10 slim (glibc) - - 13.83s 4.8G
3.10 slim (glibc) - - 13.92s 4.9G
3.11 alpine (musl) - - - -
3.11 alpine (musl) - - - -
3.11 slim (glibc) - - 18.99s 4.8G
3.11 slim (glibc) - - 18.89s 5.0G
3.12 alpine (musl) - - - -
3.12 alpine (musl) - - - -
3.12 slim (glibc) - - 21.01s 4.8G
3.12 slim (glibc) - - 20.55s 5.0G
3.13 alpine (musl) - - - -
3.13 alpine (musl) - - - -
3.13 slim (glibc) - - 16.12s 4.8G
3.13 slim (glibc) - - 16.40s 5.0G
3.9 alpine (musl) - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) - - - -
3.9 slim (glibc) - - - -
Imports
- get_peft_model wrong
# Applying LoRA to quantized model without prepare step: from peft import get_peft_model model = AutoModelForCausalLM.from_pretrained(name, load_in_4bit=True) model = get_peft_model(model, config) # missing prepare_model_for_kbit_trainingcorrectfrom peft import LoraConfig, get_peft_model, TaskType config = LoraConfig( r=16, lora_alpha=32, target_modules=['q_proj', 'v_proj'], lora_dropout=0.05, bias='none', task_type=TaskType.CAUSAL_LM ) model = get_peft_model(model, config) model.print_trainable_parameters() - merge_and_unload wrong
# merge_and_unload on quantized model produces wrong results: model = AutoModelForCausalLM.from_pretrained(base_id, load_in_4bit=True) peft_model = PeftModel.from_pretrained(model, adapter_id) merged = peft_model.merge_and_unload() # outputs differ from unmerged peft_modelcorrect# Only works on non-quantized (full precision) base model: model = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.float16) peft_model = PeftModel.from_pretrained(model, adapter_id) merged = peft_model.merge_and_unload() merged.save_pretrained('merged_model')
Quickstart stale last tested: 2026-04-23
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
import torch
# Load base model
model = AutoModelForCausalLM.from_pretrained(
'meta-llama/Llama-3.2-1B',
torch_dtype=torch.bfloat16,
device_map='auto'
)
# Configure LoRA
config = LoraConfig(
r=16,
lora_alpha=32,
target_modules='all-linear', # applies to all linear layers (QLoRA style)
lora_dropout=0.05,
bias='none',
task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
# trainable params: 6,815,744 || all params: 1,242,343,424 || trainable%: 0.55
# After training — save adapter only:
model.save_pretrained('lora_adapter/')
# Reload for inference:
base = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B', torch_dtype=torch.bfloat16)
from peft import PeftModel
peft_model = PeftModel.from_pretrained(base, 'lora_adapter/')