VSA - Video Sparse Attention Kernel

0.1.7 verified Sat May 09 auth: no python

VSA (Video Sparse Attention) is a CUDA kernel for efficient sparse attention in video diffusion models, part of the FastVideo library. Current version 0.0.5 (PyPI) and 0.1.7 (GitHub release). Development is active with frequent releases. Requires Python >=3.10 and CUDA toolkit.

pip install vsa

Common errors

error ModuleNotFoundError: No module named 'vsa' ↓

cause VSA not installed or installed incorrectly.

fix

Run: pip install vsa

error RuntimeError: CUDA error: no kernel image is available for execution on the device ↓

cause VSA kernel compiled for a different CUDA architecture than the GPU supports.

fix

Set environment variable TORCH_CUDA_ARCH_LIST before installation, e.g., export TORCH_CUDA_ARCH_LIST="8.0" for Ampere.

error ImportError: libcuda.so: cannot open shared object file ↓

cause CUDA driver library not found.

fix

Install NVIDIA drivers and ensure LD_LIBRARY_PATH includes CUDA library path.

Warnings

breaking PyPI version 0.0.5 is outdated and may have API incompatibilities with the latest GitHub releases. ↓

fix Install directly from GitHub or wait for a new PyPI release: pip install git+https://github.com/hao-ai-lab/FastVideo.git#subdirectory=csrc/attn/video_sparse_attn

gotcha VSA requires a CUDA-compatible GPU and the CUDA Toolkit to be installed. Without it, import will fail. ↓

fix Ensure nvcc is in PATH and torch is CUDA-enabled.

deprecated The 'v0' code paths were removed in release v0.1.2. If you rely on any v0 features, upgrade carefully. ↓

fix Update your code to use the new API if present, or stay on v0.0.5.

Imports

vsa
wrong
```
from video_sparse_attn import ...
```
correct
```
import vsa
```
The PyPI package name is vsa, and the module is also vsa. The CUDA source lives under csrc/attn/video_sparse_attn but the Python package is just vsa.

Quickstart

Basic usage of VSA for sparse video attention.

import torch
import vsa

# Create query, key, value tensors (batch, heads, seq_len, dim)
q = torch.randn(1, 8, 1024, 64, device='cuda')
k = torch.randn(1, 8, 1024, 64, device='cuda')
v = torch.randn(1, 8, 1024, 64, device='cuda')

# Sparse attention using VSA
output = vsa.video_sparse_attn(q, k, v)
print(output.shape)