{"id":4315,"library":"unsloth","title":"Unsloth","description":"Unsloth is a library that enables 2-5X faster training, reinforcement learning, and finetuning of large language models (LLMs) on consumer GPUs, often reducing VRAM usage. As of version 2026.4.4, it continues to provide significant optimizations for Hugging Face Transformers models. Releases follow a calendar-based versioning scheme, indicating frequent updates.","status":"active","version":"2026.4.4","language":"en","source_language":"en","source_url":"https://github.com/unslothai/unsloth","tags":["ML","LLM","finetuning","transformers","AI","GPU","quantization","PEFT","LoRA"],"install":[{"cmd":"pip install \"unsloth[cu121]\" --upgrade","lang":"bash","label":"Recommended for CUDA 12.1+"},{"cmd":"pip install \"unsloth[cu118]\" --upgrade","lang":"bash","label":"Recommended for CUDA 11.8"},{"cmd":"pip install unsloth --upgrade","lang":"bash","label":"Base install (requires manual PyTorch/CUDA setup)"}],"dependencies":[{"reason":"Required Python version range for unsloth.","package":"python","version":">=3.9, <3.15"},{"reason":"Core deep learning framework. Specific CUDA versions are often required and should match unsloth's installation.","package":"torch","optional":false},{"reason":"Unsloth integrates deeply with Hugging Face Transformers for model loading and training.","package":"transformers","optional":false}],"imports":[{"symbol":"FastLanguageModel","correct":"from unsloth import FastLanguageModel"},{"note":"PatchModel is often imported alongside FastLanguageModel for convenience.","symbol":"PatchModel","correct":"from unsloth import FastLanguageModel, PatchModel"}],"quickstart":{"code":"from unsloth import FastLanguageModel\nimport torch\n\n# 1. Load a pre-trained model and tokenizer\nmax_seq_length = 2048 # Max sequence length for your model\ndtype = None # None for auto detection (bfloat16 preferred, float16 as fallback)\nload_in_4bit = True # Use 4-bit quantization\n\nmodel, tokenizer = FastLanguageModel.from_pretrained(\n    model_name = \"unsloth/mistral-7b-instruct-v0.2-bnb-4bit\", # or \"mistralai/Mistral-7B-Instruct-v0.2\"\n    max_seq_length = max_seq_length,\n    dtype = dtype,\n    load_in_4bit = load_in_4bit,\n)\n\n# 2. Prepare the model for training (add LoRA adapters)\nmodel = FastLanguageModel.get_peft_model(\n    model,\n    r = 16, # LoRA attention dimension\n    target_modules = [\n        \"q_proj\", \"k_proj\", \"v_proj\", \"o_proj\",\n        \"gate_proj\", \"up_proj\", \"down_proj\",\n    ],\n    lora_alpha = 16, # Alpha for LoRA scaling\n    lora_dropout = 0.05, # Dropout for LoRA layers\n    bias = \"none\", # Only under the LoRA layers\n    use_gradient_checkpointing = \"current-device\",\n    random_state = 3407,\n    use_rslora = False,\n    loftq_config = None,\n)\n\n# 3. Define a simple dataset (for demonstration)\n# In a real scenario, you would load your actual dataset here\n# from datasets import Dataset\n# dataset = Dataset.from_dict({\"text\": [\"...\", \"...\"]})\n\nprint(\"Model and tokenizer loaded and prepared for finetuning!\")\nprint(f\"Using dtype: {model.dtype}, Quantization: {model.quantization_method}\")\n# Further steps would involve data preparation and using Hugging Face Trainer","lang":"python","description":"This quickstart demonstrates how to load a pre-trained LLM using Unsloth's `FastLanguageModel.from_pretrained` with 4-bit quantization and then prepare it for PEFT (LoRA) finetuning using `get_peft_model`. It sets up the model for efficient training on a GPU."},"warnings":[{"fix":"Always install Unsloth with the specific CUDA extra that matches your PyTorch/system CUDA installation (e.g., `pip install \"unsloth[cu121]\"`). Verify your `torch.version.cuda`.","message":"Unsloth performance critically depends on matching your CUDA toolkit version (if using NVIDIA GPUs) with the correct `[cuXXX]` installation extra (e.g., `[cu121]` for CUDA 12.1). Mismatching can lead to significant performance degradation or runtime errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your environment has a compatible NVIDIA GPU and that PyTorch is configured to use it (e.g., `torch.cuda.is_available()` should return `True`).","message":"Unsloth is primarily designed and optimized for GPU acceleration. While it might run on a CPU, performance will be severely degraded and practically unusable for meaningful LLM finetuning. Ensure you have a compatible NVIDIA GPU with sufficient VRAM.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Check the Unsloth GitHub README or release notes for recommended `transformers` and `peft` versions. Regularly update Unsloth and its dependencies, or pin versions to known-working configurations.","message":"Unsloth often relies on specific versions of `transformers` and `peft` libraries for optimal compatibility and performance. New releases of these dependencies might introduce breaking changes or require updates to Unsloth itself.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to the latest Unsloth documentation or GitHub README for the correct API signatures and recommended parameters when upgrading. Pay attention to changes in `dtype`, `load_in_4bit`, `r`, `target_modules`, and `use_gradient_checkpointing`.","message":"The arguments and default values for `FastLanguageModel.from_pretrained` and `FastLanguageModel.get_peft_model` can change between major Unsloth releases, especially concerning quantization, LoRA configuration, and gradient checkpointing.","severity":"breaking","affected_versions":"Likely between calendar-year updates (e.g., 202X.Y.Z)"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}