CPM Kernels

raw JSON →
1.0.11 verified Fri May 01 auth: no python

CUDA kernels for the CPM (Chinese Pre-trained Model) family, providing optimized operations like rotary position embedding, layer normalization, and activation functions. Current version is 1.0.11, with irregular release cadence.

pip install cpm-kernels
error ImportError: cannot import name 'rotary_embedding' from 'cpm_kernels'
cause Wrong import path; rotary_embedding is in cpm_kernels.library.
fix
Use 'from cpm_kernels.library import rotary_embedding' instead.
error RuntimeError: CUDA error: no kernel image is available for execution on the device
cause The installed cpm-kernels wheel may not have been compiled for your specific GPU architecture.
fix
Reinstall cpm-kernels from source with pip install --no-binary cpm-kernels cpm-kernels to compile for your GPU.
gotcha Requires a CUDA-capable GPU and PyTorch compiled with CUDA. Runtime errors if no GPU available.
fix Ensure torch.cuda.is_available() is True before using.
deprecated Some kernel functions (e.g., 'fused_ln') are deprecated in newer versions and may be removed. Check documentation.
fix Use recommended alternatives (e.g., torch's native layernorm) or avoid deprecated calls.

Initialize CPMKernel and apply rotary position embedding on a random tensor.

import torch
from cpm_kernels import CPMKernel

device = 'cuda' if torch.cuda.is_available() else 'cpu'
x = torch.randn(2, 4, 64).to(device)
kernel = CPMKernel()
result = kernel.rotary_embedding(x, start=0)
print(result.shape)