PyTorch

2.10.0 verified Tue May 12 auth: no python install: stale quickstart: stale

Deep learning framework with GPU-accelerated tensor operations. Current version is 2.10.0 (Jan 2026). Install command varies by CUDA version — plain pip install torch gives CPU-only build. torch.load weights_only default changed to True in 2.6, breaking thousands of existing checkpoints. TorchScript deprecated in 2.10.

pip install torch

Common errors

error torch.cuda.is_available() returns False ↓

cause The default `pip install torch` command installs the CPU-only version of PyTorch, even if a CUDA-enabled GPU is present.

fix

Install the correct CUDA-enabled version of PyTorch by following the instructions on the official PyTorch website (e.g., pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 for CUDA 11.8).

error RuntimeError: CUDA out of memory. Tried to allocate X MiB (GPU Y; Z GiB total capacity; W GiB already allocated; V MiB free; U GiB reserved in total by PyTorch) ↓

cause The model, batch size, or intermediate tensors require more GPU memory than is available on the device.

fix

Reduce the batch size, decrease model complexity, free up unused tensors explicitly using del and torch.cuda.empty_cache(), or switch to a GPU with more memory.

error RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load(..., map_location='cpu') to load the tensor to CPU. ↓

cause You are trying to load a PyTorch model or tensor that was saved while on a CUDA-enabled device, into an environment where CUDA is not available or detected.

fix

When loading, explicitly map the tensors to the CPU using torch.load(path, map_location='cpu').

error TypeError: Only tensors, lists, tuples of tensors, or ints, floats, bools, None are valid inputs for JIT functions. Got a <class 'dict'> at position 1. ↓

cause TorchScript (used by `torch.jit.script` or `torch.jit.trace`) has limitations on the types of inputs and operations it can handle, and complex types like dictionaries are often not directly supported as inputs or intermediate values in a traced graph.

fix

Refactor the TorchScripted part of your code to only use supported types (tensors, lists/tuples of tensors, primitive types) as inputs and intermediate values, or consider using torch.export for newer PyTorch versions.

Warnings

breaking torch.load() weights_only default changed from False to True in PyTorch 2.6. All existing torch.load() calls without explicit weights_only= will raise UnpicklingError if the checkpoint contains optimizer states, custom classes, or numpy arrays. Broke thousands of projects. ↓

fix For state_dict-only checkpoints: torch.load(path, weights_only=True). For full checkpoints with optimizer etc: torch.load(path, weights_only=False) — only on trusted files. To allowlist specific types: torch.serialization.add_safe_globals([MyClass]).

breaking Plain pip install torch installs CPU-only build. CUDA builds require a custom --index-url. LLM-generated install instructions almost never include this. torch.cuda.is_available() returns False after CPU-only install. ↓

fix Use the PyTorch install selector: https://pytorch.org/get-started/locally/. For CUDA 12.8: pip install torch --index-url https://download.pytorch.org/whl/cu128

breaking torchvision, torchaudio version must exactly match torch version. Installing latest torch with mismatched torchvision versions causes ImportError or silent incorrect behavior. ↓

fix Install all PyTorch ecosystem packages together with the same --index-url: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

deprecated TorchScript (torch.jit.script, torch.jit.trace) deprecated in PyTorch 2.10. The PyTorch team recommends migrating to torch.export for model deployment. ↓

fix Migrate to torch.export.export() for export/deployment. torch.jit still works but will receive no new features and will eventually be removed.

gotcha Forgetting model.eval() during inference causes BatchNorm and Dropout layers to behave as if training — different results each run and incorrect predictions. ↓

fix Always call model.eval() before inference. Pair with torch.no_grad() to disable gradient computation: with torch.no_grad(): output = model(x)

gotcha optimizer.zero_grad() must be called before loss.backward() each step. Forgetting it accumulates gradients across batches — silent training bug. ↓

fix Call optimizer.zero_grad() at the start of each training step, before the forward pass. Or use optimizer.zero_grad(set_to_none=True) for slightly better memory performance.

gotcha Tensors on different devices cannot be combined. CPU tensor + CUDA tensor raises RuntimeError. Common when target labels stay on CPU while model outputs are on CUDA. ↓

fix Move all tensors to the same device: x, y = x.to(device), y.to(device) at the start of each training step.

breaking ERROR: No matching distribution found for torch often indicates that PyTorch wheels are not available for your specific Python version, operating system, or architecture (e.g., very new Python versions, Alpine Linux, or unusual hardware). ↓

fix Verify your Python version, OS, and architecture are officially supported by PyTorch. Consult the PyTorch install selector (https://pytorch.org/get-started/locally/) to find the correct installation command, which might involve using a specific `--index-url`, a different Python environment, or a different base image if using Docker.

Install

pip install torch --index-url https://download.pytorch.org/whl/cu128

pip install torch --index-url https://download.pytorch.org/whl/cu124

pip install torch --index-url https://download.pytorch.org/whl/rocm6.2

Install compatibility stale last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) - - - -

3.10 slim (glibc) - - - 4.6G

3.10 slim (glibc) - - - 5.1G

3.10 slim (glibc) - - - -

3.11 alpine (musl) - - - -

3.11 slim (glibc) - - - 4.6G

3.11 slim (glibc) - - - 5.0G

3.11 slim (glibc) - - - -

3.12 alpine (musl) - - - -

3.12 slim (glibc) - - - 4.6G

3.12 slim (glibc) - - - 5.0G

3.12 slim (glibc) - - - 6.7G

3.12 slim (glibc) - - - -

3.13 alpine (musl) - - - -

3.13 slim (glibc) - - - 4.6G

3.13 slim (glibc) - - - 5.0G

3.13 slim (glibc) - - - 6.7G

3.13 slim (glibc) - - - -

3.9 alpine (musl) - - - -

3.9 slim (glibc) - - - 6.4G

3.9 slim (glibc) - - - 5.0G

3.9 slim (glibc) - - - -

Imports

torch.load

wrong

model.load_state_dict(torch.load('model.pt'))  # raises UnpicklingError in 2.6+ — weights_only now defaults to True

correct

# For trusted checkpoints (your own models):
model.load_state_dict(torch.load('model.pt', weights_only=True))

# For checkpoints with non-tensor objects (optimizer states, custom classes):
checkpoint = torch.load('checkpoint.pt', weights_only=False)  # only for trusted files

torch.load weights_only parameter flipped from False to True default in 2.6. All existing torch.load() calls without explicit weights_only= raise UnpicklingError if the checkpoint contains non-tensor objects.

device

wrong

model = MyModel().cuda()  # crashes on CPU-only machines with no CUDA
tensor = tensor.cuda()

correct

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MyModel().to(device)
tensor = tensor.to(device)

Always use .to(device) with a device variable rather than hardcoding .cuda(). .cuda() raises RuntimeError on machines without CUDA.

Quickstart stale last tested: 2026-04-23

Standard training loop and inference pattern. Always use model.eval() + torch.no_grad() for inference.

import torch
import torch.nn as nn

# Device setup
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Simple model
model = nn.Sequential(
    nn.Linear(10, 64),
    nn.ReLU(),
    nn.Linear(64, 1)
).to(device)

# Training step
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
loss_fn = nn.MSELoss()

model.train()
for x, y in dataloader:
    x, y = x.to(device), y.to(device)
    optimizer.zero_grad()
    loss = loss_fn(model(x), y)
    loss.backward()
    optimizer.step()

# Inference
model.eval()
with torch.no_grad():
    predictions = model(test_x.to(device))

# Save / load
torch.save(model.state_dict(), 'model.pt')
model.load_state_dict(torch.load('model.pt', weights_only=True))