{"id":10206,"library":"sageattention","title":"SageAttention","description":"SageAttention is a Python library providing accurate and efficient 8-bit plug-and-play attention mechanisms, including Mixture-of-Experts (MoE) implementations. It aims to accelerate large language models with minimal performance drop. The current bleeding-edge version is 2.0.1, though the PyPI package might lag behind GitHub releases. Releases typically occur when major architectural changes or significant features are implemented.","status":"active","version":"2.0.1","language":"en","source_language":"en","source_url":"https://github.com/thu-ml/SageAttention","tags":["attention","transformer","moe","mixture-of-experts","8-bit","deep-learning","pytorch"],"install":[{"cmd":"pip install sageattention","lang":"bash","label":"PyPI release (may be older)"},{"cmd":"pip install git+https://github.com/thu-ml/SageAttention.git","lang":"bash","label":"Latest development version from GitHub"}],"dependencies":[{"reason":"Core deep learning framework for tensor operations.","package":"torch","optional":false},{"reason":"Provides elegant tensor manipulations.","package":"einops","optional":false},{"reason":"Used for implementing Rotary Position Embeddings.","package":"rotary_embedding_torch","optional":false}],"imports":[{"symbol":"SageMoE","correct":"from sageattention.sagemoe.moe_layer import SageMoE"},{"symbol":"TransformerBlock","correct":"from sageattention.sagemoe.transformer_block import TransformerBlock"},{"note":"Higher-level model, primarily for causal language modeling.","symbol":"SageAttentionForCausalLM","correct":"from sageattention.models import SageAttentionForCausalLM"}],"quickstart":{"code":"import torch\nfrom sageattention.sagemoe.moe_layer import SageMoE\nfrom sageattention.sagemoe.transformer_block import TransformerBlock\n\n# Example for SageMoE\n# Initialize a Mixture-of-Experts layer\nmoe_model = SageMoE(dim=512, num_experts=8, top_k=2)\n# Create a dummy input tensor\nx_moe = torch.randn(1, 10, 512) # (batch_size, sequence_length, embedding_dimension)\n# Pass input through the MoE layer\noutput_moe = moe_model(x_moe)\nprint(f\"SageMoE Output Shape: {output_moe.shape}\")\n\n# Example for TransformerBlock\n# Initialize a Transformer block with attention and MoE\ntransformer_block = TransformerBlock(dim=512, heads=8, dim_head=64, ff_mult=4, num_experts=8, top_k=2)\n# Create a dummy input tensor\nx_transformer = torch.randn(1, 10, 512)\n# Pass input through the Transformer block\noutput_transformer = transformer_block(x_transformer)\nprint(f\"TransformerBlock Output Shape: {output_transformer.shape}\")\n","lang":"python","description":"This quickstart demonstrates how to instantiate and use the core `SageMoE` (Mixture-of-Experts) layer and a `TransformerBlock` which internally uses SageAttention. It initializes dummy input tensors and shows the output shapes."},"warnings":[{"fix":"Users migrating from versions prior to 2.0.0 should review the official GitHub README and examples for v2.0.x. Many classes, function signatures, and import paths may have changed. Update your code to reflect the new API.","message":"Major architectural changes were introduced in v2.0.0. This update added support for MoE and new models, significantly altering internal structures and potentially public APIs for lower-level components.","severity":"breaking","affected_versions":"<2.0.0 to >=2.0.0"},{"fix":"For the absolute latest features, bug fixes, and the API demonstrated in the main GitHub README, install directly from GitHub: `pip install git+https://github.com/thu-ml/SageAttention.git`. Be aware this might install a pre-release or development version.","message":"The PyPI package version (currently 1.0.6) significantly lags behind the latest GitHub releases (currently 2.0.1). This means `pip install sageattention` might not give you the features or fixes shown in the latest documentation or GitHub issues.","severity":"gotcha","affected_versions":"All versions where PyPI is outdated compared to GitHub."},{"fix":"Always ensure input tensors conform to the expected dimensions, typically `(batch_size, sequence_length, embedding_dimension)`, where `embedding_dimension` must match the `dim` parameter passed to the module's constructor. Consult the module's `__init__` or `forward` method signature for precise requirements.","message":"Incorrect tensor shapes are a common source of runtime errors when working with attention and MoE modules.","severity":"gotcha","affected_versions":"All versions."}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure the library is installed in your current environment: `pip install sageattention` (for stable PyPI) or `pip install git+https://github.com/thu-ml/SageAttention.git` (for latest GitHub). Verify your Python interpreter is pointing to the correct environment.","cause":"The `sageattention` library, or specific sub-modules, are not installed or are not accessible in your current Python environment.","error":"ModuleNotFoundError: No module named 'sageattention.sagemoe'"},{"fix":"Verify that the `dim` parameter during module initialization correctly matches the last dimension of your input tensor. Ensure batch and sequence dimensions are consistent across inputs if applicable. For example, if `dim=512`, your input should typically be `(batch, seq_len, 512)`.","cause":"Input tensors provided to SageAttention modules (e.g., `SageMoE`, `TransformerBlock`) do not have the expected dimensions or the `dim` parameter does not match the input's last dimension.","error":"RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension Z"},{"fix":"This error frequently signals a breaking change in the API. Consult the latest GitHub README or the source code for the specific module to find the correct constructor arguments for your installed version. You may need to adapt your initialization code.","cause":"You are attempting to use constructor arguments or an API from a previous major version (e.g., pre-2.0.0) with a newer installed version of SageAttention.","error":"TypeError: __init__() got an unexpected keyword argument 'some_old_argument_name'"}]}