Numba CUDA Target

0.30.0 · active · verified Tue Apr 14

Numba-cuda provides a CUDA target for the Numba Python JIT compiler, enabling Python functions to be compiled and executed on NVIDIA GPUs. It allows users to write custom GPU kernels and device functions directly in a subset of Python. The library, currently at version 0.30.0, is actively developed by NVIDIA, with its release cycle now decoupled from the main Numba project to facilitate more frequent updates and new feature development.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a basic vector addition using a Numba CUDA kernel. It covers defining a kernel with `@cuda.jit`, allocating and transferring data between host (CPU) and device (GPU) memory, configuring and launching the kernel, and copying results back to the host. Ensure you have a CUDA-enabled GPU and appropriate drivers installed.

import numpy as np
from numba import cuda
import os

# Check for CUDA availability (runtime dependency)
if not cuda.is_available():
    print("CUDA is not available. Please ensure you have an NVIDIA GPU and CUDA drivers installed.")
    exit()

# Define a CUDA kernel
@cuda.jit
def add_vectors(x, y, out):
    idx = cuda.grid(1)
    if idx < len(out):
        out[idx] = x[idx] + y[idx]

# Host-side code
N = 1000000
x_host = np.arange(N, dtype=np.float32)
y_host = np.arange(N, dtype=np.float32)
out_host = np.empty_like(x_host)

# Allocate memory on the device and copy data
x_device = cuda.to_device(x_host)
y_device = cuda.to_device(y_host)
out_device = cuda.device_array_like(out_host)

# Configure the kernel launch
threadsperblock = 256
blockspergrid = (N + (threadsperblock - 1)) // threadsperblock

# Launch the kernel
add_vectors[blockspergrid, threadsperblock](x_device, y_device, out_device)

# Copy the result back to the host
out_device.copy_to_host(out_host)

# Verify the result
expected_out = x_host + y_host
assert np.allclose(out_host, expected_out)
print("Vector addition on GPU successful!")

view raw JSON →