{"id":5045,"library":"rotary-embedding-torch","title":"Rotary Embedding for PyTorch","description":"This library provides a Pytorch implementation of the Rotary Positional Embedding (RoPE), a crucial component for modern transformer architectures like LLaMA, designed to improve the model's ability to handle long sequences. It offers an easy-to-use API to apply rotary embeddings to query and key tensors. The current version is 0.8.9, and it follows a rapid release cadence for bug fixes and minor improvements.","status":"active","version":"0.8.9","language":"en","source_language":"en","source_url":"https://github.com/lucidrains/rotary-embedding-torch","tags":["pytorch","deep-learning","transformer","attention","positional-embedding","rope"],"install":[{"cmd":"pip install rotary-embedding-torch","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Core deep learning framework","package":"torch","optional":false}],"imports":[{"symbol":"RotaryEmbedding","correct":"from rotary_embedding_torch import RotaryEmbedding"}],"quickstart":{"code":"import torch\nfrom rotary_embedding_torch import RotaryEmbedding\n\n# Define embedding dimension\ndim = 64\n# Initialize RotaryEmbedding. max_seq_len can be set for pre-computation.\n# If not set, it's computed dynamically based on input.\nrotary_emb = RotaryEmbedding(dim=dim, max_seq_len=2048)\n\n# Create dummy query and key tensors\n# shape: (batch_size, num_heads, sequence_length, head_dim)\nseq_len = 1024\nq = torch.randn(1, 8, seq_len, dim)\nk = torch.randn(1, 8, seq_len, dim)\n\n# Apply rotary embeddings\nq_rot = rotary_emb(q)\nk_rot = rotary_emb(k)\n\nprint(f\"Original query shape: {q.shape}\")\nprint(f\"Rotary-embedded query shape: {q_rot.shape}\")\nprint(f\"Example embedded value (first element): {q_rot[0, 0, 0, 0].item():.4f}\")","lang":"python","description":"This example demonstrates how to initialize `RotaryEmbedding` and apply it to example query and key tensors, typically used in self-attention mechanisms. The `dim` parameter must match the last dimension of your input tensors (head_dim)."},"warnings":[{"fix":"Upgrade to `rotary-embedding-torch>=0.8.0`.","message":"Prior to v0.8.0, there was a bug in the chi scale multiplication which could lead to incorrect positional embeddings, particularly when using specific scaling factors. Models trained with affected versions might show subtle performance degradation or instability.","severity":"gotcha","affected_versions":"< 0.8.0"},{"fix":"Upgrade to `rotary-embedding-torch>=0.8.6` to ensure compatibility with `torch.compile`.","message":"When using `torch.compile` for performance optimization, versions prior to v0.8.6 might encounter issues due to the `seq_len` being cached as a non-integer type. This could lead to compilation failures or incorrect behavior with JIT.","severity":"gotcha","affected_versions":"< 0.8.6"},{"fix":"Initialize `RotaryEmbedding` with an appropriate `max_seq_len` if you know the maximum sequence length your model will handle to optimize caching and prevent dynamic re-computation overhead.","message":"The `RotaryEmbedding` object caches internal calculations based on the maximum sequence length encountered. If inputs change drastically in `seq_len` (e.g., during inference with varying sequence lengths) or if `max_seq_len` is not adequately pre-configured, it might lead to unnecessary re-computations or out-of-bounds errors if the input `seq_len` exceeds the initially cached maximum.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}