Lion Optimizer for PyTorch
lion-pytorch provides an efficient and high-performance implementation of the Lion optimizer for PyTorch. Based on the paper 'Symbolic Discovery of Optimization Algorithms', Lion often outperforms AdamW and other adaptive optimizers, especially in large-scale models, due to its sign-based update mechanism. The library is actively maintained, currently at version 0.2.4, and requires Python 3.9+.
Common errors
-
ModuleNotFoundError: No module named 'lion_pytorch'
cause The `lion-pytorch` library is not installed in your current Python environment.fixRun `pip install lion-pytorch` to install the library. -
TypeError: Lion.__init__() got an unexpected keyword argument 'eps'
cause You are attempting to pass an `eps` parameter to the `Lion` optimizer, which it does not support. This parameter is common in optimizers like Adam/AdamW.fixRemove the `eps` argument from the `Lion` optimizer's constructor. -
TypeError: Lion.__init__() missing 1 required positional argument: 'params'
cause The `Lion` optimizer constructor requires an iterable of model parameters (e.g., `model.parameters()`) as its first argument.fixEnsure you pass `model.parameters()` to the optimizer: `optimizer = Lion(model.parameters(), lr=...)`.
Warnings
- gotcha Lion often requires a significantly lower learning rate (e.g., 1/3 to 1/10) compared to AdamW for optimal performance and stability. Directly applying learning rates common for AdamW may lead to issues like NaN losses or poor convergence.
- gotcha While powerful, Lion's optimal performance often requires re-tuning hyperparameters (especially `lr` and `betas`) specific to your task and model, rather than using it as a direct drop-in replacement with existing AdamW settings.
- gotcha Unlike optimizers like Adam, Lion uses only two `betas` values for momentum and update, and does *not* accept an `eps` (epsilon) parameter. Attempting to pass `eps` will result in a `TypeError`.
Install
-
pip install lion-pytorch
Imports
- Lion
from lion_pytorch import Lion
Quickstart
import torch
from torch import nn
from lion_pytorch import Lion
# 1. Define a simple PyTorch model
class SimpleModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(10, 2)
self.relu = nn.ReLU()
self.output = nn.Linear(2, 1)
def forward(self, x):
return self.output(self.relu(self.linear(x)))
model = SimpleModel()
# 2. Instantiate the Lion optimizer
# Note: Lion often requires a smaller learning rate than AdamW (e.g., 1e-4)
optimizer = Lion(model.parameters(), lr=1e-4, weight_decay=1e-2)
# 3. Create dummy data and target
inputs = torch.randn(32, 10) # 32 samples, 10 features
targets = torch.randn(32, 1) # 32 samples, 1 target value
# 4. Define a loss function
criterion = nn.MSELoss()
# 5. Training loop (one step for quickstart demonstration)
optimizer.zero_grad() # Clear gradients from previous step
outputs = model(inputs) # Forward pass
loss = criterion(outputs, targets) # Calculate loss
loss.backward() # Backward pass (compute gradients)
optimizer.step() # Update model parameters
print(f"Loss after one optimization step: {loss.item():.4f}")