Liger Kernel
Liger Kernel (LinkedIn GPU Efficient Runtime Kernel) is an open-source Python library offering a collection of highly optimized Triton kernels for Large Language Model (LLM) training. It significantly increases multi-GPU training throughput by up to 20% and reduces memory usage by up to 60-80%, enabling longer context lengths and larger batch sizes. The library is actively developed, with its current version being 0.7.0, and has frequent releases introducing support for new models and algorithms.
Warnings
- gotcha Liger-Kernel fundamentally relies on GPU hardware (NVIDIA, AMD, or Intel) and the Triton framework for its performance optimizations. It will not provide benefits on CPU-only setups and requires a compatible PyTorch and Triton installation.
- gotcha While Liger-Kernel generally integrates well with `torch.compile`, there have been specific reports where using both together for certain models (e.g., Orpheus-TTS) led to significantly slower training, despite memory reductions. Benchmark your specific workload.
- gotcha Optimal performance gains (e.g., 20% throughput increase, 60% memory reduction) are typically observed under specific benchmark conditions, such as training LLaMA 3-8B with `bf16` precision, `AdamW` optimizer, `FSDP1` on multiple A100 GPUs, and large sequence lengths/batch sizes. Results may vary for different models, hardware, or training configurations.
- gotcha When upgrading `transformers` library, especially around major version changes or specific model refactorings, ensure Liger-Kernel has corresponding support. Version 0.7.0 explicitly added full support for Transformers v5 and all versions >= 4.52.0.
Install
-
pip install liger-kernel -
pip install liger-kernel-nightly
Imports
- AutoLigerKernelForCausalLM
from liger_kernel.transformers import AutoLigerKernelForCausalLM
- apply_liger_kernel_to_llama
from liger_kernel.transformers import apply_liger_kernel_to_llama
- LigerFusedLinearCrossEntropyLoss
from liger_kernel.chunked_loss import LigerFusedLinearCrossEntropyLoss
Quickstart
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from liger_kernel.transformers import AutoLigerKernelForCausalLM
# Ensure a GPU is available
if not torch.cuda.is_available():
print("CUDA not available. Liger-Kernel requires a GPU.")
exit()
# 1. Load your Hugging Face model and tokenizer
model_name = "PY007/TinyLlama-1.1B-Chat-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).cuda()
# 2. Patch the model using AutoLigerKernelForCausalLM
# This will automatically replace compatible layers with Liger Kernels
# No explicit assignment needed, it modifies the model in-place (monkey-patching)
_ = AutoLigerKernelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).cuda()
print(f"Model type after Liger Kernel patching: {type(model)}")
# Example usage (inference/forward pass - for training, integrate into your training loop)
input_text = "Hello, my name is"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
print("Model successfully patched and executed.")
# For full training integration, you would typically use a Hugging Face Trainer
# or TRL trainer and set use_liger_kernel=True in your training arguments.