{"id":2361,"library":"xformers","title":"XFormers","description":"XFormers is a PyTorch-based library providing a collection of composable, optimized building blocks for Transformer models. It aims to accelerate deep learning research by offering flexible and highly efficient components, including advanced attention mechanisms and fused operations that often outperform native PyTorch implementations in terms of speed and memory usage. Actively developed by Meta Platforms, Inc., the library frequently releases updates, with the current stable version being 0.0.35.","status":"active","version":"0.0.35","language":"en","source_language":"en","source_url":"https://github.com/facebookresearch/xformers","tags":["pytorch","transformers","deep-learning","gpu-acceleration","attention-mechanisms","machine-learning","optimization"],"install":[{"cmd":"pip install xformers","lang":"bash","label":"Latest stable release (requires latest PyTorch)"},{"cmd":"pip install -U xformers --index-url https://download.pytorch.org/whl/cu126","lang":"bash","label":"For CUDA 12.6 (adjust URL for other CUDA versions like cu118, cu128, cu130)"},{"cmd":"pip install --pre -U xformers","lang":"bash","label":"Latest development binaries"},{"cmd":"pip install ninja\npip install -v --no-build-isolation -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers","lang":"bash","label":"From source (for specific PyTorch versions or nightlies)"}],"dependencies":[{"reason":"Core dependency; strict version compatibility with xformers and CUDA is critical.","package":"torch","optional":false},{"reason":"Optional, but recommended for some optimized kernels like Triton Flash Attention.","package":"triton","optional":true},{"reason":"Optional, significantly speeds up building from source.","package":"ninja","optional":true}],"imports":[{"note":"The primary optimized attention function.","symbol":"memory_efficient_attention","correct":"from xformers.ops import memory_efficient_attention"},{"note":"Common attention bias for causal masking.","symbol":"LowerTriangularMask","correct":"from xformers.ops.fmha.attn_bias import LowerTriangularMask"},{"note":"Base class for attention operators, used for dispatching or enforcing specific backends.","symbol":"AttentionOpBase","correct":"from xformers.ops import AttentionOpBase"}],"quickstart":{"code":"import torch\nfrom xformers.ops import memory_efficient_attention, LowerTriangularMask\n\n# Ensure tensors are on CUDA if available\ndevice = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n\n# Assume batch_size=2, seq_len=128, num_heads=8, head_dim=64\nbatch_size = 2\nseq_len = 128\nnum_heads = 8\nhead_dim = 64\n\n# Create dummy query, key, value tensors\n# xFormers memory_efficient_attention typically expects (batch_size, seq_len, num_heads, head_dim)\nquery = torch.randn(batch_size, seq_len, num_heads, head_dim, device=device)\nkey = torch.randn(batch_size, seq_len, num_heads, head_dim, device=device)\nvalue = torch.randn(batch_size, seq_len, num_heads, head_dim, device=device)\n\n# It's common to use float16 (half precision) for performance with xFormers\nquery = query.half()\nkey = key.half()\nvalue = value.half()\n\n# Example 1: Standard memory-efficient attention\n# xFormers automatically dispatches to the best available operator\noutput_attn = memory_efficient_attention(query, key, value)\nprint(f\"Output attention shape (standard): {output_attn.shape}\")\n\n# Example 2: Causal attention with a lower triangular mask\n# Note: The attn_bias argument structure has changed in newer versions (e.g., v0.0.21+)\n# For LowerTriangularMask, it often handles internal expansion if num_heads is implicitly available.\nattn_bias = LowerTriangularMask()\noutput_causal_attn = memory_efficient_attention(query, key, value, attn_bias=attn_bias)\nprint(f\"Output attention shape (causal): {output_causal_attn.shape}\")\n\n# To verify installation and available kernels:\n# import subprocess\n# subprocess.run([\"python\", \"-m\", \"xformers.info\"])","lang":"python","description":"This quickstart demonstrates how to use `xformers.ops.memory_efficient_attention` with dummy PyTorch tensors for both standard and causal attention patterns. It highlights the typical tensor shape and the common practice of using half-precision floating-point numbers (float16) for performance on GPUs. The `xformers.info` utility is also mentioned for diagnostics. Ensure PyTorch and CUDA are properly installed and configured."},"warnings":[{"fix":"Always install xformers with a matching PyTorch wheel via `--index-url https://download.pytorch.org/whl/cuXXx` (e.g., `cu126`) or build from source if a specific PyTorch/CUDA combination is needed.","message":"Strict compatibility requirements with PyTorch and CUDA versions. Installing 'xformers' via pip without specifying a PyTorch index URL can lead to incompatibility issues or an unwanted PyTorch upgrade.","severity":"breaking","affected_versions":"All versions"},{"fix":"If reproducibility is critical, avoid using xFormers' non-deterministic kernels or use alternative attention implementations. Check `xformers.info` output for deterministic kernel availability.","message":"Many xFormers optimizations, particularly `memory_efficient_attention`, can produce non-deterministic results, meaning repeated runs with the same inputs might yield slightly different outputs.","severity":"gotcha","affected_versions":"All versions using optimized kernels"},{"fix":"Upgrade GPU hardware or use older xFormers/PyTorch versions if V100 compatibility is essential. Consider using Flash-Attention 3 on Ampere GPUs or Flash-Attention 2 through PyTorch on Linux.","message":"Dropped support for V100 and older NVIDIA GPUs, following PyTorch's deprecation schedule. Flash-Attention 2 support for building as part of xFormers is also deprecated.","severity":"breaking","affected_versions":"0.0.33.post2 and later"},{"fix":"Consult the official xFormers documentation and CHANGELOG for updated API usage and recommended alternatives for constructing Transformer components.","message":"Many classes and modules within `xformers.factory`, `xformers.triton`, and `xformers.components` have been or will be deprecated.","severity":"deprecated","affected_versions":"0.0.22 and later (tracking issue #848)"},{"fix":"Manually ensure your `attn_bias` tensor has the correct dimensions, including the head dimension (e.g., by using `.expand()` or `.repeat()` if broadcasting is intended).","message":"The `memory_efficient_attention` function now expects the `attn_bias` argument to explicitly have a head dimension. It no longer automatically broadcasts batch/head dimensions for `attn_bias`.","severity":"breaking","affected_versions":"0.0.21 and later"},{"fix":"Prioritize `pip install` with appropriate PyTorch/CUDA index URLs. If building from source, ensure Visual Studio Build Tools (C++ desktop development), correct CUDA Toolkit, and `git config --global core.longpaths true` are configured.","message":"Building xFormers from source on Windows can be complex due to dependencies on Visual Studio Build Tools, specific CUDA Toolkit versions, and potential long path issues. Pre-built wheels are highly recommended.","severity":"gotcha","affected_versions":"All versions when building from source on Windows"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}