Accelerate
raw JSON → 1.13.0 verified Tue May 12 auth: no python install: stale quickstart: stale
Hugging Face library to run PyTorch training across any distributed configuration with minimal code changes. Current version is 1.13.0 (Mar 2026). Requires Python >=3.10. Core pattern: Accelerator() + accelerator.prepare() + accelerator.backward(). Must run accelerate config before first use.
pip install accelerate Common errors
error ModuleNotFoundError: No module named 'accelerate' ↓
cause The 'accelerate' library is not installed or not accessible in the current Python environment.
fix
Install the library using 'pip install accelerate'.
error bash: accelerate: command not found ↓
cause The 'accelerate' command-line tool is not found, possibly due to installation issues or PATH misconfiguration.
fix
Ensure 'accelerate' is installed and accessible by checking the installation path and verifying the PATH environment variable.
error ImportError: cannot import name 'partialstate' from 'accelerate' ↓
cause Attempting to import a non-existent 'partialstate' module from the 'accelerate' library.
fix
Verify the correct module name and import statement; refer to the 'accelerate' documentation for accurate usage.
error AttributeError: module 'openai' has no attribute 'ChatCompletion' ↓
cause The 'openai' module does not have an attribute named 'ChatCompletion', possibly due to an outdated version or incorrect import.
fix
Update the 'openai' library to the latest version and check the documentation for the correct usage of 'ChatCompletion'.
error TypeError: 'NoneType' object is not iterable ↓
cause An operation is attempting to iterate over a 'None' object, indicating that a variable expected to be iterable is 'None'.
fix
Ensure that the variable is properly initialized and assigned an iterable value before iteration.
Warnings
breaking accelerate config must be run before first use. Without a config file, Accelerate falls back to single-process CPU mode silently — multi-GPU training simply won't use multiple GPUs. ↓
fix Run accelerate config once after install, or programmatically: from accelerate.utils import write_basic_config; write_basic_config(). For CI: set ACCELERATE_CONFIG_FILE env var pointing to a pre-built config.
breaking Python 3.9 support dropped in 1.13.0. Accelerate now requires Python >=3.10. ↓
fix Upgrade Python to 3.10+. Pin accelerate<1.13.0 for Python 3.9 environments.
breaking Accelerator() initialized outside the training function raises ValueError when using notebook_launcher for multi-GPU. Silently falls back to 1 GPU without error if no notebook_launcher is used. ↓
fix Always initialize Accelerator() inside the training function passed to notebook_launcher. Never create it at notebook cell level or module level when using multi-GPU in notebooks.
breaking accelerator.load_state() fails with PyTorch 2.6+ due to torch.load weights_only=True default flip. Optimizer states with custom objects (omegaconf.ListConfig, etc.) raise UnpicklingError. ↓
fix Use torch.serialization.add_safe_globals([ListConfig]) to allowlist custom types, or pass weights_only=False to the underlying load call if the checkpoint source is trusted.
breaking DeepSpeed integration: only one nn.Module per Accelerator instance is supported. Passing multiple models to accelerator.prepare() with DeepSpeed raises AssertionError. ↓
fix With DeepSpeed, create a separate Accelerator instance per model, or merge models before wrapping.
gotcha accelerate launch ignores Python script argument ordering. Flags intended for the script must come after --, otherwise they are parsed as accelerate launch flags. ↓
fix Use: accelerate launch script.py --my-arg value. If ambiguous: accelerate launch -- script.py --my-arg value.
gotcha loss.backward() instead of accelerator.backward(loss) silently bypasses mixed precision gradient scaling. Training proceeds but gradients are wrong under fp16/bf16 — numerical instability or NaN loss. ↓
fix Replace all loss.backward() calls with accelerator.backward(loss) throughout the training loop.
breaking Installation of core dependencies like numpy fails due to missing C compilers in the environment, particularly common in minimal Docker images (e.g., Alpine). This prevents packages requiring compilation from being built from source. ↓
fix Ensure that build-essential tools, including a C compiler (e.g., gcc, g++), are installed in your environment before attempting to install Python packages that require compilation. For Alpine-based images, this typically involves `apk add build-base python3-dev`.
Install
accelerate config python -c "from accelerate.utils import write_basic_config; write_basic_config(mixed_precision='fp16')" Install compatibility stale last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) - - - -
3.10 slim (glibc) - - 4.00s 4.7G
3.11 alpine (musl) - - - -
3.11 slim (glibc) - - 6.93s 4.7G
3.12 alpine (musl) - - - -
3.12 slim (glibc) - - 7.68s 4.7G
3.13 alpine (musl) - - - -
3.13 slim (glibc) - - 5.69s 4.7G
3.9 alpine (musl) - - - -
3.9 slim (glibc) - - - -
Imports
- Accelerator wrong
# Module-level Accelerator initialization breaks notebook_launcher multi-GPU accelerator = Accelerator() # at top of notebook cell def training_function(): # ValueError: Accelerator should only be initialized inside your training functioncorrectfrom accelerate import Accelerator def training_function(): # Accelerator MUST be initialized inside the training function for notebook_launcher accelerator = Accelerator(mixed_precision='fp16') model, optimizer, dataloader = accelerator.prepare(model, optimizer, dataloader) for batch in dataloader: optimizer.zero_grad() loss = model(batch) accelerator.backward(loss) # NOT loss.backward() optimizer.step() - accelerator.backward wrong
loss = criterion(outputs, targets) loss.backward() # bypasses mixed precision scaling and gradient accumulation handlingcorrectloss = criterion(outputs, targets) accelerator.backward(loss)
Quickstart stale last tested: 2026-04-23
from accelerate import Accelerator
import torch
import torch.nn as nn
def train():
accelerator = Accelerator(mixed_precision='bf16')
model = nn.Linear(10, 1)
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
dataloader = ... # your DataLoader
# prepare() handles device placement and distributed wrapping
model, optimizer, dataloader = accelerator.prepare(
model, optimizer, dataloader
)
model.train()
for batch in dataloader:
optimizer.zero_grad()
outputs = model(batch['input'])
loss = nn.functional.mse_loss(outputs, batch['target'])
accelerator.backward(loss) # not loss.backward()
optimizer.step()
# Save on main process only
accelerator.wait_for_everyone()
if accelerator.is_main_process:
accelerator.save_model(model, 'output/')