PEFT

raw JSON →
0.18.1 verified Tue May 12 auth: no python install: stale quickstart: stale

Hugging Face Parameter-Efficient Fine-Tuning library. LoRA, QLoRA, LoHa, IA3, prompt tuning and more. Current version is 0.18.1 (Jan 2026). Requires Python >=3.10. PEFT <0.18.0 is incompatible with Transformers v5.

pip install peft
error ModuleNotFoundError: No module named 'peft'
cause The 'peft' library is not installed in the Python environment you are currently using or is not accessible in the execution path.
fix
Install the 'peft' library using pip: pip install peft
error TypeError: LoraConfig.__init__() got an unexpected keyword argument '<argument-name>'
cause You are attempting to load a PEFT adapter configuration (e.g., LoraConfig) that was saved with a newer version of the 'peft' library than is currently installed, leading to unrecognized keyword arguments.
fix
Upgrade your 'peft' library to the latest version: pip install -U peft
error ValueError: Target module <MODULE_NAME> is not supported.
cause The PEFT method (e.g., LoRA) is trying to inject adapters into a module type within your base model that it does not explicitly support or recognize. This can happen with custom model architectures, specific quantization layers (like `QuantLinear`), or newer model types (like `Gemma4ClippableLinear`).
fix
Manually specify the target_modules in your PeftConfig (e.g., LoraConfig) to include only the supported linear layers or other modules you intend to fine-tune. You may need to inspect model.named_modules() to find the correct layer names. For example, LoraConfig(target_modules=["q_proj", "v_proj"]).
error ValueError: Attempting to unscale FP16 gradients.
cause This error typically occurs when using automatic mixed precision (AMP) with `fp16=True` in the Trainer, but trainable PEFT weights are inadvertently left in `torch.float16` or `torch.bfloat16`, which conflicts with the gradient scaling process.
fix
Explicitly cast the trainable PEFT parameters to torch.float32 after initializing the PEFT model. While PEFT versions 0.12.0+ generally handle this, specific setups might still require it:
from peft import get_peft_model

# ... (model and peft_config setup)
peft_model = get_peft_model(model, peft_config)

# Ensure trainable parameters are in float32
for param in peft_model.parameters():
    if param.requires_grad:
        param.data = param.data.float()

# Then proceed with Trainer(model=peft_model, fp16=True, ...)
breaking PEFT <0.18.0 is incompatible with Transformers v5. Using peft<0.18.0 with transformers>=5.0 will raise ImportError or cause silent incorrect behavior.
fix Upgrade to peft>=0.18.0 before upgrading to Transformers v5.
breaking Python 3.9 support dropped in PEFT 0.18.0.
fix Pin peft<0.18.0 for Python 3.9 environments, or upgrade Python to 3.10+.
breaking merge_and_unload() produces incorrect results (different outputs than unmerged peft_model) when the base model is quantized (bitsandbytes 4-bit/8-bit). This is a fundamental limitation — quantized weights cannot be cleanly merged.
fix To merge and save a full-precision model: reload the base model without quantization (torch_dtype=torch.float16), then load the adapter and merge. Only quantize after merging if needed.
breaking prepare_model_for_kbit_training() must be called before get_peft_model() when using bitsandbytes quantization. Skipping it causes dtype mismatch errors during the backward pass.
fix Pattern: model = prepare_model_for_kbit_training(model) then model = get_peft_model(model, config). Enable gradient checkpointing first: model.gradient_checkpointing_enable().
gotcha save_pretrained() on a PeftModel saves only the adapter weights (small, ~MBs), not the full model. This is correct behavior but surprises users expecting a complete loadable checkpoint.
fix To load: use PeftModel.from_pretrained(base_model, adapter_path). The base model must be loaded separately. To get a standalone model: use merge_and_unload() on a non-quantized base, then save_pretrained().
gotcha target_modules must match the actual layer names of your model architecture. q_proj/v_proj is correct for LLaMA but wrong for GPT-2 (which uses c_attn). Use model.named_modules() to inspect, or set target_modules='all-linear'.
fix Use target_modules='all-linear' to safely target all linear layers regardless of architecture name. Or inspect: {name for name, mod in model.named_modules() if isinstance(mod, torch.nn.Linear)}.
breaking pip cannot find a compatible `torch` distribution for the current environment. This typically occurs on environments like Alpine Linux or with very recent Python versions (e.g., 3.13) for which `torch` does not provide pre-built wheels, causing `peft` installation to fail due to its `torch` dependency.
fix Use a Python version and base image combination for which `torch` pre-built wheels are available (e.g., Python 3.8-3.11 on Debian/Ubuntu-based images). Alternatively, install `torch` manually from source, which can be complex.
pip install peft bitsandbytes
python os / libc status wheel install import disk
3.10 alpine (musl) - - - -
3.10 alpine (musl) - - - -
3.10 slim (glibc) - - 13.83s 4.8G
3.10 slim (glibc) - - 13.92s 4.9G
3.11 alpine (musl) - - - -
3.11 alpine (musl) - - - -
3.11 slim (glibc) - - 18.99s 4.8G
3.11 slim (glibc) - - 18.89s 5.0G
3.12 alpine (musl) - - - -
3.12 alpine (musl) - - - -
3.12 slim (glibc) - - 21.01s 4.8G
3.12 slim (glibc) - - 20.55s 5.0G
3.13 alpine (musl) - - - -
3.13 alpine (musl) - - - -
3.13 slim (glibc) - - 16.12s 4.8G
3.13 slim (glibc) - - 16.40s 5.0G
3.9 alpine (musl) - - - -
3.9 alpine (musl) - - - -
3.9 slim (glibc) - - - -
3.9 slim (glibc) - - - -

LoRA fine-tuning on all linear layers. Save adapter only — not the full model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    'meta-llama/Llama-3.2-1B',
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

# Configure LoRA
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules='all-linear',  # applies to all linear layers (QLoRA style)
    lora_dropout=0.05,
    bias='none',
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, config)
model.print_trainable_parameters()
# trainable params: 6,815,744 || all params: 1,242,343,424 || trainable%: 0.55

# After training — save adapter only:
model.save_pretrained('lora_adapter/')

# Reload for inference:
base = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B', torch_dtype=torch.bfloat16)
from peft import PeftModel
peft_model = PeftModel.from_pretrained(base, 'lora_adapter/')