Ascend NPU Bridge for PyTorch
torch-npu is a PyTorch extension that serves as an NPU bridge, adapting the Ascend Neural Network Processing Unit (NPU) to the PyTorch framework. It enables developers to leverage the powerful computational capabilities of Huawei Ascend AI Processors for deep learning training and inference within the PyTorch ecosystem. The current version is 2.9.0, with regular updates aligning with PyTorch releases and Ascend software stacks.
Warnings
- breaking `torch-npu` and `torch` versions must be strictly aligned. Installing `torch-npu` will often attempt to install a compatible `torch` version, but manual installation requires careful matching. Mismatches can lead to installation failures or runtime errors.
- gotcha torch-npu requires pre-installation of Huawei's CANN (Heterogeneous Computing Architecture) and HDK (drivers/firmware). These are system-level components and not Python packages. Ensure the CANN environment variables are sourced before running Python scripts.
- gotcha `torch.npu.set_device()` can only be called once per Python process. Unlike `torch.cuda.set_device()`, it is not possible to switch between NPU devices or set the default device multiple times within a single Python runtime.
- gotcha Ascend NPUs currently do not support the `torch.float64` (double) data type. If a double tensor is created or implicitly used, it will be automatically cast to `torch.float32` (float).
- gotcha For distributed training or explicit NPU device selection, environment variables like `ASCEND_RT_VISIBLE_DEVICES` or `HCCL_WHITELIST_DISABLE=1` are often required. Incorrect configuration can lead to devices not being utilized or communication errors.
Install
-
pip install pyyaml setuptools pip install torch==2.9.0 pip install torch-npu==2.9.0 -
pip install pyyaml setuptools pip install torch==2.9.0+cpu --index-url https://download.pytorch.org/whl/cpu pip install torch-npu==2.9.0 -
pip install pyyaml setuptools pip install torch==2.9.0 pip install torch-npu==2.9.0
Imports
- torch_npu
import torch import torch_npu
- is_available
torch.npu.is_available()
- set_device
torch.npu.set_device('npu:0')
Quickstart
# Ensure CANN environment variables are sourced (e.g., from .bashrc or executed directly)
# source /usr/local/Ascend/ascend-toolkit/set_env.sh
import torch
import torch_npu # Essential for initializing NPU backend
# Check NPU availability
if torch.npu.is_available():
print(f"NPU is available. Device count: {torch.npu.device_count()}")
device = torch.device("npu:0")
# Example tensor operations on NPU
x = torch.randn(2, 2).to(device)
y = torch.randn(2, 2).to(device)
z = x.mm(y)
print(f"Tensor on NPU:\n{x}")
print(f"Result of matrix multiplication on NPU:\n{z}")
else:
print("NPU is not available, using CPU.")
device = torch.device("cpu")
x = torch.randn(2, 2).to(device)
y = torch.randn(2, 2).to(device)
z = x.mm(y)
print(f"Tensor on CPU:\n{x}")
print(f"Result of matrix multiplication on CPU:\n{z}")