Liger Kernel

0.7.0 · active · verified Sun Apr 12

Liger Kernel (LinkedIn GPU Efficient Runtime Kernel) is an open-source Python library offering a collection of highly optimized Triton kernels for Large Language Model (LLM) training. It significantly increases multi-GPU training throughput by up to 20% and reduces memory usage by up to 60-80%, enabling longer context lengths and larger batch sizes. The library is actively developed, with its current version being 0.7.0, and has frequent releases introducing support for new models and algorithms.

Warnings

Install

Imports

Quickstart

The simplest way to integrate Liger-Kernel is by using `AutoLigerKernelForCausalLM` to automatically patch a Hugging Face Causal LM. For training, it often integrates with `transformers.Trainer` or TRL trainers by setting a flag.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from liger_kernel.transformers import AutoLigerKernelForCausalLM

# Ensure a GPU is available
if not torch.cuda.is_available():
    print("CUDA not available. Liger-Kernel requires a GPU.")
    exit()

# 1. Load your Hugging Face model and tokenizer
model_name = "PY007/TinyLlama-1.1B-Chat-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).cuda()

# 2. Patch the model using AutoLigerKernelForCausalLM
# This will automatically replace compatible layers with Liger Kernels
# No explicit assignment needed, it modifies the model in-place (monkey-patching)
_ = AutoLigerKernelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).cuda()

print(f"Model type after Liger Kernel patching: {type(model)}")

# Example usage (inference/forward pass - for training, integrate into your training loop)
input_text = "Hello, my name is"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits

print("Model successfully patched and executed.")
# For full training integration, you would typically use a Hugging Face Trainer 
# or TRL trainer and set use_liger_kernel=True in your training arguments.

view raw JSON →