PyTorch
PyTorch is an open-source machine learning framework that accelerates the path from research prototyping to production deployment. It provides two high-level features: Tensor computation (like NumPy) with strong GPU acceleration, and a deep neural network library built on a tape-based autograd system. The `pytorch` PyPI meta-package (current version 2.2.2) provides a convenient way to install the core `torch`, `torchvision`, and `torchaudio` libraries. PyTorch has frequent updates, typically releasing major stable versions multiple times a year, with minor patch releases in between.
Common errors
-
RuntimeError: CUDA out of memory.
cause The GPU ran out of memory, usually due to a batch size that is too large, a model with too many parameters, or not releasing unused tensors.fixReduce batch size, decrease model complexity, use `torch.no_grad()` for inference, or call `torch.cuda.empty_cache()` (though often not sufficient alone if the core issue is size). -
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
cause Operations attempting to combine tensors located on different devices (e.g., one on CPU, another on GPU).fixEnsure all tensors involved in an operation are on the same device. Use `tensor.to(device)` to move tensors, where `device` is typically 'cpu' or 'cuda'. -
RuntimeError: The size of tensor a (X) must match the size of tensor b (Y) at non-singleton dimension Z
cause Shape mismatch between tensors, often in loss calculations, concatenation, or model input/output.fixInspect the `.shape` of all involved tensors. Use `tensor.view()`, `tensor.permute()`, `tensor.squeeze()`, `tensor.unsqueeze()`, or ensure model input/output dimensions are correct. -
RuntimeError: expected scalar type Float but found Long
cause A tensor operation or module expects a specific data type (e.g., `torch.float32`) but receives a different one (e.g., `torch.int64`). Common with loss functions or model inputs.fixConvert tensor data types explicitly using `tensor.to(torch.float32)` or `tensor.long()`, `tensor.double()`, etc. Ensure `DataLoader` outputs the correct types.
Warnings
- breaking The `torch.autograd.Variable` class was deprecated and is now effectively an alias for `torch.Tensor`. Direct tensor operations now support autograd automatically.
- deprecated The `volatile=True` argument for tensors was deprecated and removed. It was used to signal that computations in a graph should not track gradients (e.g., during inference).
- gotcha Extracting a scalar value from a single-element tensor using `float(tensor)` or `int(tensor)` will raise a runtime error if the tensor has more than one element.
- gotcha When installing PyTorch, the `pip install pytorch` command provides the CPU version by default. For GPU acceleration, specific installation instructions are required, typically involving `pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cuXXx` where `cuXXx` specifies your CUDA version.
- gotcha Moving models or tensors between CPU and GPU devices. Using `.cuda()` on tensors/models will move them to the default GPU, but `.to(device)` is more flexible.
Install
-
pip install pytorch
Imports
- torch
import torch
- nn
import torch.nn as nn
- optim
import torch.optim as optim
- DataLoader
from torch.utils.data import DataLoader, TensorDataset
Quickstart
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader
# 1. Prepare Data
x_data = torch.randn(100, 1)
y_data = 2 * x_data + 1 + torch.randn(100, 1) * 0.1 # y = 2x + 1 + noise
# Create a Dataset and DataLoader
dataset = TensorDataset(x_data, y_data)
dataloader = DataLoader(dataset, batch_size=10, shuffle=True)
# 2. Define Model
class LinearRegression(nn.Module):
def __init__(self):
super(LinearRegression, self).__init__()
self.linear = nn.Linear(1, 1) # One input feature, one output feature
def forward(self, x):
return self.linear(x)
model = LinearRegression()
# 3. Define Loss and Optimizer
criterion = nn.MSELoss() # Mean Squared Error Loss
optimizer = optim.SGD(model.parameters(), lr=0.01) # Stochastic Gradient Descent
# 4. Train the Model
num_epochs = 100
for epoch in range(num_epochs):
for batch_x, batch_y in dataloader:
# Forward pass
outputs = model(batch_x)
loss = criterion(outputs, batch_y)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 10 == 0:
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
# 5. Make Predictions
predicted_value = model(torch.tensor([[5.0]]))
print(f"\nPredicted value for x=5.0: {predicted_value.item():.4f}")
print(f"Learned parameters: Weight={model.linear.weight.item():.4f}, Bias={model.linear.bias.item():.4f}")