{"id":6014,"library":"numba-cuda","title":"Numba CUDA Target","description":"Numba-cuda provides a CUDA target for the Numba Python JIT compiler, enabling Python functions to be compiled and executed on NVIDIA GPUs. It allows users to write custom GPU kernels and device functions directly in a subset of Python. The library, currently at version 0.30.0, is actively developed by NVIDIA, with its release cycle now decoupled from the main Numba project to facilitate more frequent updates and new feature development.","status":"active","version":"0.30.0","language":"en","source_language":"en","source_url":"https://github.com/NVIDIA/numba-cuda","tags":["cuda","gpu","numba","jit","high-performance-computing","nvidia","parallel-computing","python"],"install":[{"cmd":"pip install numba-cuda","lang":"bash","label":"PyPI"},{"cmd":"conda install -c conda-forge numba-cuda","lang":"bash","label":"Conda (conda-forge)"}],"dependencies":[{"reason":"Core JIT compiler, numba-cuda is a target extension.","package":"numba"},{"reason":"Kernels often operate on NumPy arrays, which are automatically transferred to/from the device.","package":"numpy"},{"reason":"Used for NVVM bindings and interacting with the CUDA Driver API (since v0.29.0).","package":"cuda-python"},{"reason":"Runtime dependency for CUDA-enabled GPUs; required for compilation and execution. Install via `conda` or NVIDIA CUDA SDK.","package":"cudatoolkit","optional":true}],"imports":[{"note":"All CUDA-specific functionality is exposed through the `numba.cuda` module.","symbol":"cuda","correct":"from numba import cuda"}],"quickstart":{"code":"import numpy as np\nfrom numba import cuda\nimport os\n\n# Check for CUDA availability (runtime dependency)\nif not cuda.is_available():\n    print(\"CUDA is not available. Please ensure you have an NVIDIA GPU and CUDA drivers installed.\")\n    exit()\n\n# Define a CUDA kernel\n@cuda.jit\ndef add_vectors(x, y, out):\n    idx = cuda.grid(1)\n    if idx < len(out):\n        out[idx] = x[idx] + y[idx]\n\n# Host-side code\nN = 1000000\nx_host = np.arange(N, dtype=np.float32)\ny_host = np.arange(N, dtype=np.float32)\nout_host = np.empty_like(x_host)\n\n# Allocate memory on the device and copy data\nx_device = cuda.to_device(x_host)\ny_device = cuda.to_device(y_host)\nout_device = cuda.device_array_like(out_host)\n\n# Configure the kernel launch\nthreadsperblock = 256\nblockspergrid = (N + (threadsperblock - 1)) // threadsperblock\n\n# Launch the kernel\nadd_vectors[blockspergrid, threadsperblock](x_device, y_device, out_device)\n\n# Copy the result back to the host\nout_device.copy_to_host(out_host)\n\n# Verify the result\nexpected_out = x_host + y_host\nassert np.allclose(out_host, expected_out)\nprint(\"Vector addition on GPU successful!\")","lang":"python","description":"This quickstart demonstrates a basic vector addition using a Numba CUDA kernel. It covers defining a kernel with `@cuda.jit`, allocating and transferring data between host (CPU) and device (GPU) memory, configuring and launching the kernel, and copying results back to the host. Ensure you have a CUDA-enabled GPU and appropriate drivers installed."},"warnings":[{"fix":"Always include `pip install numba-cuda` (or `conda install numba-cuda`) in your environment setup alongside `numba`.","message":"The built-in CUDA target in the main `numba` package is deprecated. New features and most bug fixes are now exclusively implemented in `numba-cuda`. While the old target remains for compatibility, it's strongly recommended to install `numba-cuda` for active development and to ensure access to the latest capabilities.","severity":"deprecated","affected_versions":"Numba v0.61.0 and later when not explicitly installing numba-cuda."},{"fix":"Ensure your `try-except` blocks are robust to potential changes in error types. For maximum compatibility, catch broader exception types or consult release notes if you encounter unexpected `TypingError` propagation.","message":"In `numba-cuda` v0.28.0, there was an attempt to shift error classes from `numba.core.errors.TypingError` to `numba.cuda.errors` namespaces. This caused compatibility issues with existing code that relied on catching the old error types and was subsequently reverted. Users should be aware that such internal error type changes can be breaking.","severity":"breaking","affected_versions":"v0.28.0 (reverted in subsequent patches)"},{"fix":"Avoid relying on Numba's internal implementation details. Stick to the public API documented in `numba.cuda` for memory management (`cuda.to_device`, `cuda.device_array`), kernel launching, and device interactions.","message":"The internal `DeviceArray` implementation underwent refactoring, and certain internal `enums` and `ctypes` code were removed in `numba-cuda` v0.23.0 and v0.28.0 respectively. Code that directly interacted with these internal components or undocumented APIs may break.","severity":"breaking","affected_versions":"v0.23.0, v0.28.0"},{"fix":"Pass empty or pre-allocated device arrays as arguments to your kernel, and have the kernel write its output into these arrays. Copy the results back to the host after kernel execution if needed.","message":"Numba CUDA kernel functions cannot return values. Any results computed within a kernel must be written to arrays passed as arguments to the kernel. This is a common pattern in CUDA C/C++ and applies to Numba CUDA kernels as well.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Run your kernel once with dummy data to trigger compilation, then measure the execution time of subsequent calls. Use `cuda.synchronize()` to ensure all GPU operations have completed before measuring elapsed time.","message":"The first call to a Numba CUDA kernel includes the Just-In-Time (JIT) compilation overhead, which can be significant. For accurate performance benchmarking, always time subsequent calls to the kernel after the initial compilation has completed (e.g., by performing a 'warm-up' run).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your target GPU has compute capability 5.0 or higher. Upgrade your NVIDIA drivers and CUDA Toolkit to version 11.2 or newer.","message":"Support for NVIDIA GPUs with compute capability less than 5.0 is deprecated and will be removed in future releases. Additionally, Numba-CUDA requires a minimum CUDA Toolkit version of 11.2.","severity":"breaking","affected_versions":"All versions (deprecation active)"}],"env_vars":null,"last_verified":"2026-04-14T00:00:00.000Z","next_check":"2026-07-13T00:00:00.000Z"}