SGLang Kernel Library

0.5.10.post1 · active · verified Thu Apr 16

sgl-kernel is the core kernel library for SGLang, providing high-performance GPU-accelerated operations for LLM inference, including optimized attention, MoE routing, and CUDA graph execution. It is primarily used as a dependency of the main `sglang` library, which is currently at version `0.5.10.post1` and sees frequent updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `sglang`, the main library that leverages `sgl-kernel` for its high-performance execution. It shows a simple LLM generation task. Note that `sgl-kernel` itself does not expose a high-level API for direct user interaction; its functionality is accessed through `sglang`.

import sglang as sl
import os

os.environ['SGLANG_DEV_MODE'] = 'True' # Optional: for development features

# Launch an SGLang runtime (which utilizes sgl-kernel for execution)
runtime = sl.Runtime("openai/gpt-4o-mini") # Or your preferred local model path

@sl.function
def generate_joke(s, topic):
    s += f"Give me a joke about {topic}."
    s += sl.gen("joke", max_tokens=64, temperature=0.7)

# Run the function
state = runtime.run(generate_joke, topic="cats")

print(f"Joke about cats: {state['joke']}")

runtime.shutdown()

view raw JSON →