Captum
Captum is an open-source model interpretability and understanding library for PyTorch. It provides a comprehensive suite of attribution algorithms to explain predictions of deep learning models, assess the importance of layers and neurons, and evaluate model robustness and concept influence. The library, currently at version 0.8.0, maintains an active development cycle with regular feature additions and improvements.
Warnings
- breaking Support for Python 3.8 has been dropped as of Captum v0.8.0. Users must upgrade to Python 3.9 or a newer version to use Captum 0.8.0 and later.
- breaking While PyPI metadata specifies PyTorch >= 1.10, the v0.8.0 release notes state that 'support for PyTorch 1.10' has been dropped. This suggests that users should ideally target newer PyTorch versions (e.g., >=1.11) for full compatibility and ongoing support, despite PyTorch 1.10 potentially still working as a minimum requirement.
- deprecated Captum Insights, the interactive visualization widget, will be deprecated in the next major release. Users should plan to migrate to alternative visualization methods or integrate custom solutions.
- gotcha The DeepLift attribution algorithm in Captum currently does not support all non-linear activation types, which may lead to errors or unexpected behavior when used with models containing unsupported activation functions.
- gotcha When performing LLM attribution with text-based inputs, using default empty string baselines can result in out-of-distribution inputs, leading to less meaningful attributions. Defining custom, contextually relevant baselines is often recommended for better results.
Install
-
pip install captum
Imports
- IntegratedGradients
from captum.attr import IntegratedGradients
- DeepLift
from captum.attr import DeepLift
- LayerConductance
from captum.attr import LayerConductance
- LLMAttribution
from captum.attr import LLMAttribution
Quickstart
import torch
import torch.nn as nn
from captum.attr import IntegratedGradients
# 1. Define a simple PyTorch model
class ToyModel(nn.Module):
def __init__(self):
super().__init__()
self.lin1 = nn.Linear(3, 3)
self.relu = nn.ReLU()
self.lin2 = nn.Linear(3, 2)
def forward(self, input):
return self.lin2(self.relu(self.lin1(input)))
model = ToyModel()
model.eval() # Set model to evaluation mode
# 2. Define input and baseline tensors
input_tensor = torch.rand(2, 3, requires_grad=True)
baseline_tensor = torch.zeros(2, 3)
# 3. Instantiate an attribution algorithm (e.g., Integrated Gradients)
ig = IntegratedGradients(model)
# 4. Compute attributions
# target specifies the output index to explain (e.g., target=0 for the first output class)
attributions, delta = ig.attribute(input_tensor, baseline_tensor, target=0, return_convergence_delta=True)
print('Input Tensor:', input_tensor)
print('IG Attributions:', attributions)
print('Convergence Delta:', delta)