{"id":4976,"library":"liger-kernel","title":"Liger Kernel","description":"Liger Kernel (LinkedIn GPU Efficient Runtime Kernel) is an open-source Python library offering a collection of highly optimized Triton kernels for Large Language Model (LLM) training. It significantly increases multi-GPU training throughput by up to 20% and reduces memory usage by up to 60-80%, enabling longer context lengths and larger batch sizes. The library is actively developed, with its current version being 0.7.0, and has frequent releases introducing support for new models and algorithms.","status":"active","version":"0.7.0","language":"en","source_language":"en","source_url":"https://github.com/linkedin/Liger-Kernel","tags":["LLM","training","Triton","kernels","optimization","HuggingFace","PyTorch","GPU","memory efficiency"],"install":[{"cmd":"pip install liger-kernel","lang":"bash","label":"Stable Release"},{"cmd":"pip install liger-kernel-nightly","lang":"bash","label":"Nightly Build"}],"dependencies":[{"reason":"Core dependency for PyTorch integration and GPU operations.","package":"torch","optional":false},{"reason":"Required for patching Hugging Face models, a common use case.","package":"transformers","optional":true},{"reason":"Underlying framework for kernel implementation.","package":"triton","optional":false}],"imports":[{"note":"Simplest way to automatically patch supported Hugging Face Causal LMs.","symbol":"AutoLigerKernelForCausalLM","correct":"from liger_kernel.transformers import AutoLigerKernelForCausalLM"},{"note":"For model-specific patching APIs (e.g., Llama, Gemma, etc.).","symbol":"apply_liger_kernel_to_llama","correct":"from liger_kernel.transformers import apply_liger_kernel_to_llama"},{"note":"For using optimized post-training kernels or composing custom models.","symbol":"LigerFusedLinearCrossEntropyLoss","correct":"from liger_kernel.chunked_loss import LigerFusedLinearCrossEntropyLoss"}],"quickstart":{"code":"import torch\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nfrom liger_kernel.transformers import AutoLigerKernelForCausalLM\n\n# Ensure a GPU is available\nif not torch.cuda.is_available():\n    print(\"CUDA not available. Liger-Kernel requires a GPU.\")\n    exit()\n\n# 1. Load your Hugging Face model and tokenizer\nmodel_name = \"PY007/TinyLlama-1.1B-Chat-v0.1\"\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).cuda()\n\n# 2. Patch the model using AutoLigerKernelForCausalLM\n# This will automatically replace compatible layers with Liger Kernels\n# No explicit assignment needed, it modifies the model in-place (monkey-patching)\n_ = AutoLigerKernelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).cuda()\n\nprint(f\"Model type after Liger Kernel patching: {type(model)}\")\n\n# Example usage (inference/forward pass - for training, integrate into your training loop)\ninput_text = \"Hello, my name is\"\ninputs = tokenizer(input_text, return_tensors=\"pt\").to(\"cuda\")\n\nwith torch.no_grad():\n    outputs = model(**inputs)\n    logits = outputs.logits\n\nprint(\"Model successfully patched and executed.\")\n# For full training integration, you would typically use a Hugging Face Trainer \n# or TRL trainer and set use_liger_kernel=True in your training arguments.","lang":"python","description":"The simplest way to integrate Liger-Kernel is by using `AutoLigerKernelForCausalLM` to automatically patch a Hugging Face Causal LM. For training, it often integrates with `transformers.Trainer` or TRL trainers by setting a flag."},"warnings":[{"fix":"Ensure you have a supported GPU and a PyTorch installation (`torch >= 2.1.2`) that is compatible with Triton. Verify `torch.cuda.is_available()` returns True.","message":"Liger-Kernel fundamentally relies on GPU hardware (NVIDIA, AMD, or Intel) and the Triton framework for its performance optimizations. It will not provide benefits on CPU-only setups and requires a compatible PyTorch and Triton installation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Thoroughly benchmark your training pipeline with and without `torch.compile` when Liger-Kernel is enabled. If slowdowns occur, consider running without `torch.compile` or investigating potential incompatibilities with the specific model architecture.","message":"While Liger-Kernel generally integrates well with `torch.compile`, there have been specific reports where using both together for certain models (e.g., Orpheus-TTS) led to significantly slower training, despite memory reductions. Benchmark your specific workload.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Understand that advertised performance benefits are scenario-dependent. While Liger-Kernel generally improves efficiency, conduct your own benchmarks to confirm improvements for your specific use case.","message":"Optimal performance gains (e.g., 20% throughput increase, 60% memory reduction) are typically observed under specific benchmark conditions, such as training LLaMA 3-8B with `bf16` precision, `AdamW` optimizer, `FSDP1` on multiple A100 GPUs, and large sequence lengths/batch sizes. Results may vary for different models, hardware, or training configurations.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always check Liger-Kernel release notes for `transformers` compatibility. For `transformers` versions below 4.52.0, you might need an older Liger-Kernel release or face unexpected behavior. Upgrade to Liger-Kernel 0.7.0 or newer for full Transformers v5 compatibility.","message":"When upgrading `transformers` library, especially around major version changes or specific model refactorings, ensure Liger-Kernel has corresponding support. Version 0.7.0 explicitly added full support for Transformers v5 and all versions >= 4.52.0.","severity":"gotcha","affected_versions":"<0.7.0 with Transformers > v5, or Transformers <4.52.0"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}