Torchvision

raw JSON →
0.26.0 verified Tue May 12 auth: no python install: stale

Torchvision is a PyTorch domain library providing popular datasets, model architectures, and common image and video transformations for computer vision tasks. It is actively maintained and releases are synchronized with PyTorch versions, with the current version 0.26.0 compatible with torch 2.11.0. It aims to simplify the data loading, preprocessing, and model development workflow for computer vision researchers and practitioners.

pip install torchvision
error ModuleNotFoundError: No module named 'torchvision'
cause The torchvision library is not installed in the current Python environment.
fix
Install torchvision using pip or conda: pip install torchvision or conda install torchvision -c pytorch.
error TypeError: img should be PIL Image. Got <class 'numpy.ndarray'>
cause A torchvision transform (e.g., `Resize`, `CenterCrop`, `ToPILImage`) that explicitly expects a PIL Image object received a NumPy array instead.
fix
Convert the NumPy array to a PIL Image using PIL.Image.fromarray(image_np) before applying the transform.
error RuntimeError: Dataset not found or corrupted.
cause The specified dataset (e.g., from `torchvision.datasets`) could not be found at the provided `root` path, or its automatic download/extraction failed.
fix
Verify that the root path is correct and accessible, ensure you have an active internet connection if download=True is used, and check disk space. You may need to manually download and extract the dataset.
error RuntimeError: DataLoader worker (pid XXXXX) exited unexpectedly
cause An unhandled exception occurred within the dataset's `__getitem__` method, data transformations, or the collate function when using `torch.utils.data.DataLoader` with `num_workers > 0`.
fix
Set num_workers=0 in the DataLoader to force data loading in the main process, which will reveal the specific error message and traceback for debugging.
error requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: ...
cause This error often occurs when `torchvision.models` tries to download pre-trained weights (e.g., using `pretrained=True`) but is blocked by a firewall, proxy, or network configuration.
fix
Check your network connectivity, proxy settings, and firewall rules. Alternatively, manually download the pre-trained weights from the official PyTorch model zoo and place them in the appropriate cache directory (~/.cache/torch/hub/checkpoints/).
breaking The video decoding and encoding utilities (`torchvision.io.video.*`, `read_video`, `write_video`, `VideoReader` class) were removed in Torchvision 0.26.0.
fix Migrate any video decoding/encoding code to the `TorchCodec` library (github.com/meta-pytorch/torchcodec).
deprecated The video decoding and encoding capabilities of TorchVision were deprecated starting from version 0.22 and were slated for removal. While initially targeted for 0.25, they were fully removed in 0.26.0.
fix Users on older versions should plan to migrate to `TorchCodec` before upgrading to 0.26.0 or newer.
gotcha Since version 0.25.0, KeyPoints are no longer clamped by default after a transform. This is a behavior change from previous versions.
fix If clamping is desired, explicitly use the `SanitizeKeyPoints` transform to remove keypoints outside the image area, or refer to `ClampKeyPoints` for precise control.
gotcha The `torchvision.transforms.v2` API is the recommended and actively developed set of transforms. It offers better performance and supports transforming not just images, but also bounding boxes, masks, videos, and keypoints simultaneously.
fix Prefer `from torchvision.transforms import v2` over `from torchvision import transforms` for new code and consider migrating existing code for improved functionality and performance.
gotcha A version mismatch between `torch` and `torchvision` is a common cause of runtime errors (e.g., 'undefined symbol', 'CUDA toolkit version is incompatible').
fix Always install `torch` and `torchvision` with compatible versions, ideally from the same installation command or by consulting the official PyTorch installation matrix for matching versions.
gotcha Since v0.8.0, all random transformations in `torchvision.transforms` use PyTorch's default random generator (`torch.manual_seed`) instead of Python's `random` module. Setting `random.seed()` will not affect these transforms.
fix To ensure reproducibility for random transforms, use `torch.manual_seed(seed_value)`.
breaking `torchvision` might not have pre-built wheels or official support for certain Python versions (e.g., Python 3.13, which is still in beta) or specific operating systems (e.g., Alpine Linux), leading to installation failures like 'No matching distribution found'.
fix Check `torchvision`'s official installation instructions and compatibility matrix for supported Python versions and OS distributions. Consider using a stable Python version (e.g., 3.10, 3.11, 3.12) or a more commonly supported OS distribution (e.g., Debian, Ubuntu) if encountering installation issues. Manual compilation from source might be an option but is generally not recommended unless necessary.
pip install torch==X.Y.Z+cuXXX torchvision==A.B.C+cuXXX -f https://download.pytorch.org/whl/torch_stable.html
python os / libc variant status wheel install import disk mem side effects
3.10 alpine (musl) torch==X.Y.Z+cuXXX build_error - - - - - -
3.10 alpine (musl) torch==X.Y.Z+cuXXX - - - - - -
3.10 alpine (musl) torchvision build_error - - - - - -
3.10 alpine (musl) torchvision - - - - - -
3.10 slim (glibc) torch==X.Y.Z+cuXXX build_error - 3.5s - - - -
3.10 slim (glibc) torch==X.Y.Z+cuXXX - - - - - -
3.10 slim (glibc) torchvision wheel 75.6s 8.62s 4.7G 117.6M clean
3.10 slim (glibc) torchvision - - - - - -
3.11 alpine (musl) torch==X.Y.Z+cuXXX build_error - - - - - -
3.11 alpine (musl) torch==X.Y.Z+cuXXX - - - - - -
3.11 alpine (musl) torchvision build_error - - - - - -
3.11 alpine (musl) torchvision - - - - - -
3.11 slim (glibc) torch==X.Y.Z+cuXXX build_error - 3.1s - - - -
3.11 slim (glibc) torch==X.Y.Z+cuXXX - - - - - -
3.11 slim (glibc) torchvision wheel 68.7s 13.05s 4.7G 132.0M clean
3.11 slim (glibc) torchvision - - - - - -
3.12 alpine (musl) torch==X.Y.Z+cuXXX build_error - - - - - -
3.12 alpine (musl) torch==X.Y.Z+cuXXX - - - - - -
3.12 alpine (musl) torchvision build_error - - - - - -
3.12 alpine (musl) torchvision - - - - - -
3.12 slim (glibc) torch==X.Y.Z+cuXXX build_error - 2.4s - - - -
3.12 slim (glibc) torch==X.Y.Z+cuXXX - - - - - -
3.12 slim (glibc) torchvision wheel 59.1s 14.32s 4.7G 128.1M clean
3.12 slim (glibc) torchvision - - - - - -
3.13 alpine (musl) torch==X.Y.Z+cuXXX build_error - - - - - -
3.13 alpine (musl) torch==X.Y.Z+cuXXX - - - - - -
3.13 alpine (musl) torchvision build_error - - - - - -
3.13 alpine (musl) torchvision - - - - - -
3.13 slim (glibc) torch==X.Y.Z+cuXXX build_error - 1.9s - - - -
3.13 slim (glibc) torch==X.Y.Z+cuXXX - - - - - -
3.13 slim (glibc) torchvision wheel 56.5s 10.18s 4.7G 127.8M clean
3.13 slim (glibc) torchvision - - - - - -
3.9 alpine (musl) torch==X.Y.Z+cuXXX build_error - - - - - -
3.9 alpine (musl) torch==X.Y.Z+cuXXX - - - - - -
3.9 alpine (musl) torchvision build_error - - - - - -
3.9 alpine (musl) torchvision - - - - - -
3.9 slim (glibc) torch==X.Y.Z+cuXXX build_error - 4.6s - - - -
3.9 slim (glibc) torch==X.Y.Z+cuXXX - - - - - -
3.9 slim (glibc) torchvision timeout - - - - - -
3.9 slim (glibc) torchvision - - - - - -

This quickstart demonstrates how to use `torchvision` to preprocess an image and perform inference with a pre-trained ResNet-18 model. It covers defining transformations with `torchvision.transforms.v2.Compose`, loading a pre-trained model with `torchvision.models`, and obtaining human-readable predictions.

import torch
from torchvision.transforms import v2
from torchvision import models
import os

# 1. Create a dummy image tensor (simulating a loaded image)
H, W = 256, 256 # Example image dimensions
dummy_image = torch.randint(0, 256, size=(3, H, W), dtype=torch.uint8)

# 2. Define image transforms using the recommended v2 API
preprocess = v2.Compose([
    v2.Resize((224, 224), antialias=True), # Resize for common model input sizes
    v2.ToDtype(torch.float32, scale=True), # Convert to float and scale pixel values to [0, 1]
    v2.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), # ImageNet normalization
])

# Apply transforms to the dummy image
input_tensor = preprocess(dummy_image)
input_batch = input_tensor.unsqueeze(0) # Add a batch dimension (models expect batches)

# 3. Load a pre-trained image classification model (e.g., ResNet-18)
# Use DEFAULT_WEIGHTS to automatically get the best available pre-trained weights
weights = models.ResNet18_Weights.DEFAULT
model = models.resnet18(weights=weights)
model.eval() # Set the model to evaluation mode for inference

# Get the categories the model was trained on for human-readable output
categories = weights.meta["categories"]

# 4. Perform inference
with torch.no_grad(): # Disable gradient calculation for inference to save memory and computations
    output = model(input_batch)

# 5. Get the predicted class
probabilities = torch.nn.functional.softmax(output, dim=1)
predicted_probability, predicted_idx = torch.max(probabilities, 1)
predicted_label = categories[predicted_idx.item()]

print(f"Predicted class: {predicted_label} (Probability: {predicted_probability.item():.2f})")
print("Quickstart successful: Image processed and classified using torchvision.")