fa3-fwd

raw JSON →
0.0.3 verified Mon Apr 27 auth: no python

fa3-fwd provides a forward-only implementation of FlashAttention-3 for efficient attention computation on GPUs. Version 0.0.3, pre-release, no stable release cadence.

pip install fa3-fwd
error ModuleNotFoundError: No module named 'fa3_fwd'
cause Wrong import path, missing install, or Python environment issue.
fix
Run 'pip install fa3-fwd' and use 'import fa3_fwd' (underscore, not hyphen).
error RuntimeError: FlashAttention only supported on CUDA
cause Tensors are on CPU instead of GPU.
fix
Move tensors to CUDA: q = q.cuda(), etc.
error TypeError: flash_attn_forward() missing 3 required positional arguments: 'q', 'k', 'v'
cause Missing required keyword arguments or too few positional args.
fix
Call with three tensors: flash_attn_forward(q, k, v).
breaking Only forward pass is implemented; no backward pass. Cannot be used for training.
fix Use full FlashAttention-3 library if backward is needed.
deprecated The API is experimental and may change without notice in future versions.
fix Pin version if stability is required.
gotcha Requires CUDA-capable GPU and PyTorch with CUDA. Will raise RuntimeError on CPU.
fix Ensure tensors are on CUDA device.

Basic usage of flash attention forward pass.

import torch
from fa3_fwd import flash_attn_forward

q = torch.randn(1, 8, 64, 128, device='cuda', dtype=torch.bfloat16)
k = torch.randn(1, 8, 64, 128, device='cuda', dtype=torch.bfloat16)
v = torch.randn(1, 8, 64, 128, device='cuda', dtype=torch.bfloat16)

out = flash_attn_forward(q, k, v)
print(out.shape)