VSA - Video Sparse Attention Kernel

raw JSON →
0.1.7 verified Sat May 09 auth: no python

VSA (Video Sparse Attention) is a CUDA kernel for efficient sparse attention in video diffusion models, part of the FastVideo library. Current version 0.0.5 (PyPI) and 0.1.7 (GitHub release). Development is active with frequent releases. Requires Python >=3.10 and CUDA toolkit.

pip install vsa
error ModuleNotFoundError: No module named 'vsa'
cause VSA not installed or installed incorrectly.
fix
Run: pip install vsa
error RuntimeError: CUDA error: no kernel image is available for execution on the device
cause VSA kernel compiled for a different CUDA architecture than the GPU supports.
fix
Set environment variable TORCH_CUDA_ARCH_LIST before installation, e.g., export TORCH_CUDA_ARCH_LIST="8.0" for Ampere.
error ImportError: libcuda.so: cannot open shared object file
cause CUDA driver library not found.
fix
Install NVIDIA drivers and ensure LD_LIBRARY_PATH includes CUDA library path.
breaking PyPI version 0.0.5 is outdated and may have API incompatibilities with the latest GitHub releases.
fix Install directly from GitHub or wait for a new PyPI release: pip install git+https://github.com/hao-ai-lab/FastVideo.git#subdirectory=csrc/attn/video_sparse_attn
gotcha VSA requires a CUDA-compatible GPU and the CUDA Toolkit to be installed. Without it, import will fail.
fix Ensure nvcc is in PATH and torch is CUDA-enabled.
deprecated The 'v0' code paths were removed in release v0.1.2. If you rely on any v0 features, upgrade carefully.
fix Update your code to use the new API if present, or stay on v0.0.5.

Basic usage of VSA for sparse video attention.

import torch
import vsa

# Create query, key, value tensors (batch, heads, seq_len, dim)
q = torch.randn(1, 8, 1024, 64, device='cuda')
k = torch.randn(1, 8, 1024, 64, device='cuda')
v = torch.randn(1, 8, 1024, 64, device='cuda')

# Sparse attention using VSA
output = vsa.video_sparse_attn(q, k, v)
print(output.shape)