PEFT

0.18.1 verified Tue May 12 auth: no python install: stale quickstart: stale

Hugging Face Parameter-Efficient Fine-Tuning library. LoRA, QLoRA, LoHa, IA3, prompt tuning and more. Current version is 0.18.1 (Jan 2026). Requires Python >=3.10. PEFT <0.18.0 is incompatible with Transformers v5.

pip install peft

Common errors

error ModuleNotFoundError: No module named 'peft' ↓

cause The 'peft' library is not installed in the Python environment you are currently using or is not accessible in the execution path.

fix

Install the 'peft' library using pip: pip install peft

error TypeError: LoraConfig.__init__() got an unexpected keyword argument '<argument-name>' ↓

cause You are attempting to load a PEFT adapter configuration (e.g., LoraConfig) that was saved with a newer version of the 'peft' library than is currently installed, leading to unrecognized keyword arguments.

fix

Upgrade your 'peft' library to the latest version: pip install -U peft

error ValueError: Target module <MODULE_NAME> is not supported. ↓

cause The PEFT method (e.g., LoRA) is trying to inject adapters into a module type within your base model that it does not explicitly support or recognize. This can happen with custom model architectures, specific quantization layers (like `QuantLinear`), or newer model types (like `Gemma4ClippableLinear`).

fix

Manually specify the target_modules in your PeftConfig (e.g., LoraConfig) to include only the supported linear layers or other modules you intend to fine-tune. You may need to inspect model.named_modules() to find the correct layer names. For example, LoraConfig(target_modules=["q_proj", "v_proj"]).

error ValueError: Attempting to unscale FP16 gradients. ↓

cause This error typically occurs when using automatic mixed precision (AMP) with `fp16=True` in the Trainer, but trainable PEFT weights are inadvertently left in `torch.float16` or `torch.bfloat16`, which conflicts with the gradient scaling process.

fix

Explicitly cast the trainable PEFT parameters to torch.float32 after initializing the PEFT model. While PEFT versions 0.12.0+ generally handle this, specific setups might still require it:

from peft import get_peft_model

# ... (model and peft_config setup)
peft_model = get_peft_model(model, peft_config)

# Ensure trainable parameters are in float32
for param in peft_model.parameters():
    if param.requires_grad:
        param.data = param.data.float()

# Then proceed with Trainer(model=peft_model, fp16=True, ...)

Warnings

breaking PEFT <0.18.0 is incompatible with Transformers v5. Using peft<0.18.0 with transformers>=5.0 will raise ImportError or cause silent incorrect behavior. ↓

fix Upgrade to peft>=0.18.0 before upgrading to Transformers v5.

breaking Python 3.9 support dropped in PEFT 0.18.0. ↓

fix Pin peft<0.18.0 for Python 3.9 environments, or upgrade Python to 3.10+.

breaking merge_and_unload() produces incorrect results (different outputs than unmerged peft_model) when the base model is quantized (bitsandbytes 4-bit/8-bit). This is a fundamental limitation — quantized weights cannot be cleanly merged. ↓

fix To merge and save a full-precision model: reload the base model without quantization (torch_dtype=torch.float16), then load the adapter and merge. Only quantize after merging if needed.

breaking prepare_model_for_kbit_training() must be called before get_peft_model() when using bitsandbytes quantization. Skipping it causes dtype mismatch errors during the backward pass. ↓

fix Pattern: model = prepare_model_for_kbit_training(model) then model = get_peft_model(model, config). Enable gradient checkpointing first: model.gradient_checkpointing_enable().

gotcha save_pretrained() on a PeftModel saves only the adapter weights (small, ~MBs), not the full model. This is correct behavior but surprises users expecting a complete loadable checkpoint. ↓

fix To load: use PeftModel.from_pretrained(base_model, adapter_path). The base model must be loaded separately. To get a standalone model: use merge_and_unload() on a non-quantized base, then save_pretrained().

gotcha target_modules must match the actual layer names of your model architecture. q_proj/v_proj is correct for LLaMA but wrong for GPT-2 (which uses c_attn). Use model.named_modules() to inspect, or set target_modules='all-linear'. ↓

fix Use target_modules='all-linear' to safely target all linear layers regardless of architecture name. Or inspect: {name for name, mod in model.named_modules() if isinstance(mod, torch.nn.Linear)}.

breaking pip cannot find a compatible `torch` distribution for the current environment. This typically occurs on environments like Alpine Linux or with very recent Python versions (e.g., 3.13) for which `torch` does not provide pre-built wheels, causing `peft` installation to fail due to its `torch` dependency. ↓

fix Use a Python version and base image combination for which `torch` pre-built wheels are available (e.g., Python 3.8-3.11 on Debian/Ubuntu-based images). Alternatively, install `torch` manually from source, which can be complex.

Install

pip install peft bitsandbytes

Install compatibility stale last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) - - - -

3.10 slim (glibc) - - 13.83s 4.8G

3.10 slim (glibc) - - 13.92s 4.9G

3.11 alpine (musl) - - - -

3.11 slim (glibc) - - 18.99s 4.8G

3.11 slim (glibc) - - 18.89s 5.0G

3.12 alpine (musl) - - - -

3.12 slim (glibc) - - 21.01s 4.8G

3.12 slim (glibc) - - 20.55s 5.0G

3.13 alpine (musl) - - - -

3.13 slim (glibc) - - 16.12s 4.8G

3.13 slim (glibc) - - 16.40s 5.0G

3.9 alpine (musl) - - - -

3.9 slim (glibc) - - - -

Imports

get_peft_model

wrong

# Applying LoRA to quantized model without prepare step:
from peft import get_peft_model
model = AutoModelForCausalLM.from_pretrained(name, load_in_4bit=True)
model = get_peft_model(model, config)  # missing prepare_model_for_kbit_training

correct

from peft import LoraConfig, get_peft_model, TaskType

config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=['q_proj', 'v_proj'],
    lora_dropout=0.05,
    bias='none',
    task_type=TaskType.CAUSAL_LM
)
model = get_peft_model(model, config)
model.print_trainable_parameters()

For quantized models (4-bit/8-bit via bitsandbytes), always call prepare_model_for_kbit_training(model) before get_peft_model(). Skipping it causes gradient/dtype errors.

merge_and_unload

wrong

# merge_and_unload on quantized model produces wrong results:
model = AutoModelForCausalLM.from_pretrained(base_id, load_in_4bit=True)
peft_model = PeftModel.from_pretrained(model, adapter_id)
merged = peft_model.merge_and_unload()  # outputs differ from unmerged peft_model

correct

# Only works on non-quantized (full precision) base model:
model = AutoModelForCausalLM.from_pretrained(base_id, torch_dtype=torch.float16)
peft_model = PeftModel.from_pretrained(model, adapter_id)
merged = peft_model.merge_and_unload()
merged.save_pretrained('merged_model')

merge_and_unload() does not work correctly on quantized (bitsandbytes 4-bit/8-bit) models. Load the base model in full precision (float16/bfloat16) before merging.

Quickstart stale last tested: 2026-04-23

LoRA fine-tuning on all linear layers. Save adapter only — not the full model.

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model, TaskType, prepare_model_for_kbit_training
import torch

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    'meta-llama/Llama-3.2-1B',
    torch_dtype=torch.bfloat16,
    device_map='auto'
)

# Configure LoRA
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules='all-linear',  # applies to all linear layers (QLoRA style)
    lora_dropout=0.05,
    bias='none',
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, config)
model.print_trainable_parameters()
# trainable params: 6,815,744 || all params: 1,242,343,424 || trainable%: 0.55

# After training — save adapter only:
model.save_pretrained('lora_adapter/')

# Reload for inference:
base = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-3.2-1B', torch_dtype=torch.bfloat16)
from peft import PeftModel
peft_model = PeftModel.from_pretrained(base, 'lora_adapter/')