{"id":6630,"library":"fla-core","title":"Core operations for flash-linear-attention","description":"fla-core is a Python library providing efficient, Triton-based implementations of core operations and kernels for state-of-the-art linear attention and state-space models. It serves as a minimal-dependency subset of the larger 'flash-linear-attention' project, focusing on the fundamental computational building blocks. It is currently at version 0.4.2 and follows a regular release cadence, often in conjunction with its parent project, flash-linear-attention.","status":"active","version":"0.4.2","language":"en","source_language":"en","source_url":"https://github.com/fla-org/flash-linear-attention","tags":["deep-learning","attention","pytorch","gpu","triton","kernels","sequence-modeling","state-space-models"],"install":[{"cmd":"pip install fla-core","lang":"bash","label":"Install from PyPI"}],"dependencies":[{"reason":"Core deep learning framework for tensor operations and GPU acceleration.","package":"torch","optional":false},{"reason":"Required for high-performance GPU kernels.","package":"triton","optional":false},{"reason":"For flexible tensor manipulations.","package":"einops","optional":false}],"imports":[{"note":"Commonly used fused normalization module from fla-core.","symbol":"FusedRMSNormGated","correct":"from fla.modules import FusedRMSNormGated"},{"note":"Example of a kernel import for Kimi Delta Attention operations.","symbol":"chunk_kda","correct":"from fla.ops.kda import chunk_kda"},{"note":"fla-core does not contain high-level layers or models; these are part of the 'flash-linear-attention' package. Importing directly from 'fla.layers' or 'fla.models' will fail if only 'fla-core' is installed.","wrong":"from fla.layers import MultiScaleRetention","symbol":"MultiScaleRetention","correct":"from flash_linear_attention.layers import MultiScaleRetention"}],"quickstart":{"code":"import torch\nfrom fla.modules import FusedRMSNormGated\nimport os\n\n# fla-core operations require a CUDA-enabled GPU\nif not torch.cuda.is_available():\n    raise RuntimeError(\"CUDA not available. fla-core requires a CUDA-enabled GPU.\")\n\ndevice = torch.device(\"cuda\")\n\n# Define model parameters\nhidden_size = 768\nbatch_size = 4\nsequence_length = 512\n\n# Initialize FusedRMSNormGated module from fla-core\nnorm_layer = FusedRMSNormGated(hidden_size).to(device)\n\n# Create a dummy input tensor\ninput_tensor = torch.randn(batch_size, sequence_length, hidden_size, device=device, dtype=torch.float16)\n\n# Perform a forward pass\noutput_tensor = norm_layer(input_tensor)\n\nprint(f\"Input tensor shape: {input_tensor.shape}\")\nprint(f\"Output tensor shape: {output_tensor.shape}\")\nprint(\"FusedRMSNormGated operation successful, demonstrating fla-core usage.\")","lang":"python","description":"This quickstart demonstrates the use of a fused normalization module from `fla-core`. It initializes `FusedRMSNormGated` and applies it to a dummy tensor on a CUDA-enabled GPU. This illustrates how to integrate low-level, optimized operations provided by `fla-core`."},"warnings":[{"fix":"If you need layers or models, install the full `flash-linear-attention` package: `pip install flash-linear-attention`.","message":"The `fla-core` package is a minimal subset of `flash-linear-attention`. It contains core kernels and modules (e.g., in `fla.ops` and `fla.modules`) but lacks higher-level layers and models (e.g., `fla.layers`, `fla.models`). Attempting to import these high-level components with only `fla-core` installed will result in an `ImportError`.","severity":"gotcha","affected_versions":">=0.3.2"},{"fix":"Review the documentation or source code for the specific kernel being used to ensure correct input tensor dimension ordering (e.g., (B, T, H, K) or (B, H, T, K)).","message":"The input tensor format for some kernels switched from 'head-first' to 'sequence-first'. This change affects how dimensions are ordered for input tensors (e.g., `(batch, heads, sequence, dim)` might become `(batch, sequence, heads, dim)`).","severity":"breaking","affected_versions":"Between v0.2.0 and v0.4.0 (specifically around v0.2.0-v0.3.0 releases)."},{"fix":"Ensure Triton is installed correctly (`pip install triton`) and that the appropriate backend is configured for your GPU hardware. Check Triton FAQs for platform-specific instructions.","message":"fla-core heavily relies on NVIDIA Triton for its optimized kernels. Specific Triton versions (>=3.0 or nightly) and correct backend installations are required, especially for AMD ROCm or Intel XPU GPUs.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade your Python environment to version 3.10 or newer.","message":"Requires Python 3.10 or higher. Running with older Python versions will lead to installation or runtime errors.","severity":"gotcha","affected_versions":"<=0.4.2"},{"fix":"Install the latest nightly version of Triton, which often includes fixes for such hardware-specific issues. Refer to Triton's GitHub issues or FAQs for the most current solutions.","message":"Users on H100 GPUs may encounter 'MMA Assertion Error' or 'LinearLayout Assertion Error' due to known Triton issues.","severity":"gotcha","affected_versions":"All versions, depending on Triton compatibility."},{"fix":"Ensure your PyTorch installation is version 2.5 or newer: `pip install 'torch>=2.5'`.","message":"PyTorch version requirement: fla-core expects PyTorch >= 2.5. Older versions may cause compatibility issues or runtime errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}