Unsloth

2026.4.4 · active · verified Sat Apr 11

Unsloth is a library that enables 2-5X faster training, reinforcement learning, and finetuning of large language models (LLMs) on consumer GPUs, often reducing VRAM usage. As of version 2026.4.4, it continues to provide significant optimizations for Hugging Face Transformers models. Releases follow a calendar-based versioning scheme, indicating frequent updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to load a pre-trained LLM using Unsloth's `FastLanguageModel.from_pretrained` with 4-bit quantization and then prepare it for PEFT (LoRA) finetuning using `get_peft_model`. It sets up the model for efficient training on a GPU.

from unsloth import FastLanguageModel
import torch

# 1. Load a pre-trained model and tokenizer
max_seq_length = 2048 # Max sequence length for your model
dtype = None # None for auto detection (bfloat16 preferred, float16 as fallback)
load_in_4bit = True # Use 4-bit quantization

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-instruct-v0.2-bnb-4bit", # or "mistralai/Mistral-7B-Instruct-v0.2"
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)

# 2. Prepare the model for training (add LoRA adapters)
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # LoRA attention dimension
    target_modules = [
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
    lora_alpha = 16, # Alpha for LoRA scaling
    lora_dropout = 0.05, # Dropout for LoRA layers
    bias = "none", # Only under the LoRA layers
    use_gradient_checkpointing = "current-device",
    random_state = 3407,
    use_rslora = False,
    loftq_config = None,
)

# 3. Define a simple dataset (for demonstration)
# In a real scenario, you would load your actual dataset here
# from datasets import Dataset
# dataset = Dataset.from_dict({"text": ["...", "..."]})

print("Model and tokenizer loaded and prepared for finetuning!")
print(f"Using dtype: {model.dtype}, Quantization: {model.quantization_method}")
# Further steps would involve data preparation and using Hugging Face Trainer

view raw JSON →