PyTorch-Ignite
PyTorch-Ignite is a lightweight and user-friendly library designed to simplify training and evaluating neural networks with PyTorch. It provides a high-level API for setting up training loops, handling events, and integrating various experiment tracking tools. Currently at version 0.5.4, it maintains an active release cadence with frequent bug fixes and feature enhancements.
Common errors
-
AttributeError: module 'ignite.contrib' has no attribute 'metrics'
cause Attempting to import metrics or handlers from the deprecated `ignite.contrib` module after PyTorch-Ignite v0.5.0.fixChange import statements. For example, `from ignite.contrib.metrics import Accuracy` should become `from ignite.metrics import Accuracy`. -
TypeError: 'LRScheduler' object is not callable
cause Trying to call an `LRScheduler` instance directly (e.g., `lr_scheduler(engine)`) after PyTorch-Ignite v0.4.9, where its API changed to an attachable handler.fixInstead of calling, attach the `LRScheduler` instance to the trainer. Example: `LRScheduler(optimizer, lr_scheduler_function).attach(trainer, Events.ITERATION_STARTED)`. -
ValueError: Distributed environment is not initialized.
cause Using `ignite.distributed` functionalities (e.g., `idist.spawn`, `idist.get_rank()`) without properly initializing the distributed backend first.fixCall `ignite.distributed.init_distributed()` or similar initialization function relevant to your setup (e.g., `torch.distributed.init_process_group` if managing manually) at the start of your script before any distributed operations. -
RuntimeError: Expected all tensors to be on the same device, but found tensors on cuda:0 and cpu
cause A common PyTorch error that can occur in Ignite if models or data are not explicitly moved to the correct device (CPU/GPU) or if different parts of the pipeline are on mixed devices.fixEnsure your model and input data are consistently on the same device. Use `.to(device)` on models, tensors, and data loaders (via custom collate_fn) where `device = 'cuda' if torch.cuda.is_available() else 'cpu'`.
Warnings
- breaking All modules under `ignite.contrib.metrics` and `ignite.contrib.handlers` were moved directly to `ignite.metrics` and `ignite.handlers` respectively.
- breaking The `LRScheduler` handler was refactored to be attached to `Events.ITERATION_STARTED` and now requires an optimizer argument, changing its usage pattern significantly.
- gotcha When using `ignite.distributed` (idist) for distributed training, ensure the distributed backend is properly initialized (e.g., `idist.initialize()`) before using `idist` utilities or distributed engines.
- gotcha Event filtering with `every`, `once`, `before`, `after` can be powerful but also complex. Misunderstanding their interaction can lead to handlers not being triggered as expected.
Install
-
pip install pytorch-ignite
Imports
- Engine
from ignite.engine import Engine
- Events
from ignite.engine import Events
- create_supervised_trainer
from ignite.engine import create_supervised_trainer
- Accuracy
from ignite.contrib.metrics import Accuracy
from ignite.metrics import Accuracy
- ModelCheckpoint
from ignite.contrib.handlers import ModelCheckpoint
from ignite.handlers import ModelCheckpoint
Quickstart
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from ignite.engine import Engine, Events, create_supervised_trainer, create_supervised_evaluator
from ignite.metrics import Accuracy, Loss
# 1. Define a simple model, optimizer, loss function
class SimpleModel(nn.Module):
def __init__(self):
super().__init__()
self.fc = nn.Linear(10, 2)
def forward(self, x):
return self.fc(x)
model = SimpleModel()
optimizer = optim.SGD(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()
# 2. Create dummy data
X = torch.randn(100, 10)
y = torch.randint(0, 2, (100,))
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=10)
# 3. Create trainer and evaluator
trainer = create_supervised_trainer(model, optimizer, criterion)
evaluator = create_supervised_evaluator(model, criterion, metrics={'accuracy': Accuracy(), 'nll': Loss(criterion)})
# 4. Define handlers for events
@trainer.on(Events.EPOCH_COMPLETED)
def log_training_results(engine):
evaluator.run(dataloader)
metrics = evaluator.state.metrics
print(f"Epoch {engine.state.epoch}/{engine.state.max_epochs} - Avg accuracy: {metrics['accuracy']:.2f}, Avg loss: {metrics['nll']:.2f}")
# 5. Run the training
trainer.run(dataloader, max_epochs=2)
print("\nTraining complete.")