{"library":"sageattention","title":"SageAttention","description":"SageAttention is a Python library providing accurate and efficient 8-bit plug-and-play attention mechanisms, including Mixture-of-Experts (MoE) implementations. It aims to accelerate large language models with minimal performance drop. The current bleeding-edge version is 2.0.1, though the PyPI package might lag behind GitHub releases. Releases typically occur when major architectural changes or significant features are implemented.","language":"python","status":"active","last_verified":"Fri Apr 17","install":{"commands":["pip install sageattention","pip install git+https://github.com/thu-ml/SageAttention.git"],"cli":null},"imports":["from sageattention.sagemoe.moe_layer import SageMoE","from sageattention.sagemoe.transformer_block import TransformerBlock","from sageattention.models import SageAttentionForCausalLM"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import torch\nfrom sageattention.sagemoe.moe_layer import SageMoE\nfrom sageattention.sagemoe.transformer_block import TransformerBlock\n\n# Example for SageMoE\n# Initialize a Mixture-of-Experts layer\nmoe_model = SageMoE(dim=512, num_experts=8, top_k=2)\n# Create a dummy input tensor\nx_moe = torch.randn(1, 10, 512) # (batch_size, sequence_length, embedding_dimension)\n# Pass input through the MoE layer\noutput_moe = moe_model(x_moe)\nprint(f\"SageMoE Output Shape: {output_moe.shape}\")\n\n# Example for TransformerBlock\n# Initialize a Transformer block with attention and MoE\ntransformer_block = TransformerBlock(dim=512, heads=8, dim_head=64, ff_mult=4, num_experts=8, top_k=2)\n# Create a dummy input tensor\nx_transformer = torch.randn(1, 10, 512)\n# Pass input through the Transformer block\noutput_transformer = transformer_block(x_transformer)\nprint(f\"TransformerBlock Output Shape: {output_transformer.shape}\")\n","lang":"python","description":"This quickstart demonstrates how to instantiate and use the core `SageMoE` (Mixture-of-Experts) layer and a `TransformerBlock` which internally uses SageAttention. It initializes dummy input tensors and shows the output shapes.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}