{"id":6266,"library":"torch-npu","title":"Ascend NPU Bridge for PyTorch","description":"torch-npu is a PyTorch extension that serves as an NPU bridge, adapting the Ascend Neural Network Processing Unit (NPU) to the PyTorch framework. It enables developers to leverage the powerful computational capabilities of Huawei Ascend AI Processors for deep learning training and inference within the PyTorch ecosystem. The current version is 2.9.0, with regular updates aligning with PyTorch releases and Ascend software stacks.","status":"active","version":"2.9.0","language":"en","source_language":"en","source_url":"https://github.com/Ascend/pytorch","tags":["pytorch","npu","ascend","hardware-acceleration","deep-learning","huawei"],"install":[{"cmd":"pip install pyyaml setuptools\npip install torch==2.9.0\npip install torch-npu==2.9.0","lang":"bash","label":"Basic Installation (with x86 CPU PyTorch)"},{"cmd":"pip install pyyaml setuptools\npip install torch==2.9.0+cpu --index-url https://download.pytorch.org/whl/cpu\npip install torch-npu==2.9.0","lang":"bash","label":"Installation for x86 (if non-NPU PyTorch needed from specific source)"},{"cmd":"pip install pyyaml setuptools\npip install torch==2.9.0\npip install torch-npu==2.9.0","lang":"bash","label":"Installation for Aarch64"}],"dependencies":[{"reason":"Core PyTorch library; must be version-aligned with torch-npu.","package":"torch","optional":false},{"reason":"Huawei Ascend Heterogeneous Computing Architecture; a system-level prerequisite for NPU operation.","package":"CANN","optional":false},{"reason":"Ascend drivers and firmware; a system-level prerequisite for NPU operation.","package":"HDK","optional":false},{"reason":"Runtime dependency for torch-npu.","package":"pyyaml","optional":false},{"reason":"Runtime dependency for torch-npu.","package":"setuptools","optional":false}],"imports":[{"note":"While `torch.npu` functions are exposed via `torch` after initialization, explicitly importing `torch_npu` ensures the NPU backend is properly loaded and initialized.","wrong":"import torch.npu # without prior import torch_npu","symbol":"torch_npu","correct":"import torch\nimport torch_npu"},{"note":"Used to check if an Ascend NPU device is accessible.","symbol":"is_available","correct":"torch.npu.is_available()"},{"note":"Sets the current NPU device.","symbol":"set_device","correct":"torch.npu.set_device('npu:0')"}],"quickstart":{"code":"# Ensure CANN environment variables are sourced (e.g., from .bashrc or executed directly)\n# source /usr/local/Ascend/ascend-toolkit/set_env.sh\n\nimport torch\nimport torch_npu # Essential for initializing NPU backend\n\n# Check NPU availability\nif torch.npu.is_available():\n    print(f\"NPU is available. Device count: {torch.npu.device_count()}\")\n    device = torch.device(\"npu:0\")\n    # Example tensor operations on NPU\n    x = torch.randn(2, 2).to(device)\n    y = torch.randn(2, 2).to(device)\n    z = x.mm(y)\n    print(f\"Tensor on NPU:\\n{x}\")\n    print(f\"Result of matrix multiplication on NPU:\\n{z}\")\nelse:\n    print(\"NPU is not available, using CPU.\")\n    device = torch.device(\"cpu\")\n    x = torch.randn(2, 2).to(device)\n    y = torch.randn(2, 2).to(device)\n    z = x.mm(y)\n    print(f\"Tensor on CPU:\\n{x}\")\n    print(f\"Result of matrix multiplication on CPU:\\n{z}\")","lang":"python","description":"This quickstart demonstrates how to check for NPU availability and perform a basic matrix multiplication on an Ascend NPU. It's crucial to first set up the CANN environment variables before running any NPU-accelerated code."},"warnings":[{"fix":"Always install `torch` and `torch-npu` with the same major.minor.patch version number (e.g., `torch==2.9.0` with `torch-npu==2.9.0`).","message":"`torch-npu` and `torch` versions must be strictly aligned. Installing `torch-npu` will often attempt to install a compatible `torch` version, but manual installation requires careful matching. Mismatches can lead to installation failures or runtime errors.","severity":"breaking","affected_versions":"All versions"},{"fix":"Follow the official Ascend documentation to install CANN and HDK. Source the environment script, typically `source /usr/local/Ascend/ascend-toolkit/set_env.sh`, in your shell session or an activation script.","message":"torch-npu requires pre-installation of Huawei's CANN (Heterogeneous Computing Architecture) and HDK (drivers/firmware). These are system-level components and not Python packages. Ensure the CANN environment variables are sourced before running Python scripts.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Set the desired NPU device once at the beginning of your script, or use `tensor.to('npu:X')` directly for device placement without relying on a mutable default device. For multi-device scenarios in a single process, ensure tensors are explicitly placed on their target devices and avoid relying on `set_device` after initial setup.","message":"`torch.npu.set_device()` can only be called once per Python process. Unlike `torch.cuda.set_device()`, it is not possible to switch between NPU devices or set the default device multiple times within a single Python runtime.","severity":"gotcha","affected_versions":"Before 2.1.0, and partially fixed in 2.1.0 and above, but still has limitations compared to CUDA."},{"fix":"Design models and data pipelines to primarily use `torch.float32` or `torch.float16` for NPU operations to avoid implicit type conversions and potential precision issues.","message":"Ascend NPUs currently do not support the `torch.float64` (double) data type. If a double tensor is created or implicitly used, it will be automatically cast to `torch.float32` (float).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Refer to distributed training documentation for NPU (e.g., HCCL backend). Set `ASCEND_RT_VISIBLE_DEVICES` to specify visible NPU cards (e.g., `ASCEND_RT_VISIBLE_DEVICES=0,1`). For some scenarios, `export HCCL_WHITELIST_DISABLE=1` might be necessary.","message":"For distributed training or explicit NPU device selection, environment variables like `ASCEND_RT_VISIBLE_DEVICES` or `HCCL_WHITELIST_DISABLE=1` are often required. Incorrect configuration can lead to devices not being utilized or communication errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z","problems":[]}