CPM Kernels
raw JSON → 1.0.11 verified Fri May 01 auth: no python
CUDA kernels for the CPM (Chinese Pre-trained Model) family, providing optimized operations like rotary position embedding, layer normalization, and activation functions. Current version is 1.0.11, with irregular release cadence.
pip install cpm-kernels Common errors
error ImportError: cannot import name 'rotary_embedding' from 'cpm_kernels' ↓
cause Wrong import path; rotary_embedding is in cpm_kernels.library.
fix
Use 'from cpm_kernels.library import rotary_embedding' instead.
error RuntimeError: CUDA error: no kernel image is available for execution on the device ↓
cause The installed cpm-kernels wheel may not have been compiled for your specific GPU architecture.
fix
Reinstall cpm-kernels from source with
pip install --no-binary cpm-kernels cpm-kernels to compile for your GPU. Warnings
gotcha Requires a CUDA-capable GPU and PyTorch compiled with CUDA. Runtime errors if no GPU available. ↓
fix Ensure torch.cuda.is_available() is True before using.
deprecated Some kernel functions (e.g., 'fused_ln') are deprecated in newer versions and may be removed. Check documentation. ↓
fix Use recommended alternatives (e.g., torch's native layernorm) or avoid deprecated calls.
Imports
- CPUKernel
from cpm_kernels import CPUKernel - rotary_embedding wrong
from cpm_kernels import rotary_embeddingcorrectfrom cpm_kernels.library import rotary_embedding
Quickstart
import torch
from cpm_kernels import CPMKernel
device = 'cuda' if torch.cuda.is_available() else 'cpu'
x = torch.randn(2, 4, 64).to(device)
kernel = CPMKernel()
result = kernel.rotary_embedding(x, start=0)
print(result.shape)