{"id":1974,"library":"cupy-cuda12x","title":"CuPy (CUDA 12.x)","description":"CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python, utilizing NVIDIA CUDA or AMD ROCm platforms. It provides an `ndarray` and a rich set of routines with an API designed to be a drop-in replacement for NumPy and SciPy. The current version, 14.0.1, is a stable release, with the project generally following a bi-monthly release cadence for stable versions and major updates occurring less frequently.","status":"active","version":"14.0.1","language":"en","source_language":"en","source_url":"https://github.com/cupy/cupy","tags":["GPU","scientific computing","numerical computation","CUDA","NumPy","SciPy","array library"],"install":[{"cmd":"pip install cupy-cuda12x","lang":"bash","label":"Base installation for CUDA 12.x"},{"cmd":"pip install \"cupy-cuda12x[ctk]\"","lang":"bash","label":"Installation with CUDA Toolkit component wheels (Python-managed CUDA dependencies)"}],"dependencies":[{"reason":"Core dependency for array functionality and API compatibility.","package":"numpy","optional":false},{"reason":"Optional, required for SciPy-compatible routines (e.g., sparse matrices, signal processing, special functions).","package":"scipy","optional":true},{"reason":"Optional, required for bfloat16 data type support (CuPy v14+).","package":"ml_dtypes","optional":true},{"reason":"System-level dependency for NVIDIA GPUs. A compatible driver must be installed regardless of how CuPy or CUDA Toolkit components are installed.","package":"CUDA driver","optional":false}],"imports":[{"note":"Standard alias, similar to 'import numpy as np'.","symbol":"cupy","correct":"import cupy as cp"},{"note":"For SciPy-compatible routines on GPU.","symbol":"cupyx.scipy","correct":"import cupyx.scipy as sp"}],"quickstart":{"code":"import cupy as cp\nimport numpy as np\n\n# Check if a GPU is available\nif cp.cuda.is_available():\n    print(\"GPU is available. Current device:\", cp.cuda.Device().id)\n\n    # Create a CuPy array on the GPU\n    x_gpu = cp.arange(10, dtype=cp.float32).reshape(2, 5)\n    print(\"CuPy array on GPU:\\n\", x_gpu)\n\n    # Perform an operation on the GPU\n    y_gpu = x_gpu * 2 + 1\n    print(\"Result of GPU operation:\\n\", y_gpu)\n\n    # Transfer the result back to CPU (NumPy array)\n    y_cpu = cp.asnumpy(y_gpu)\n    print(\"Result on CPU (NumPy array):\\n\", y_cpu)\n\n    # Example: Dot product\n    a_gpu = cp.array([[1, 2], [3, 4]], dtype=cp.float32)\n    b_gpu = cp.array([[5, 6], [7, 8]], dtype=cp.float32)\n    c_gpu = a_gpu @ b_gpu\n    print(\"\\nMatrix multiplication on GPU:\\n\", c_gpu)\n\n    # For accurate performance timings, ensure GPU operations complete\n    cp.cuda.Stream.null.synchronize()\nelse:\n    print(\"No GPU available. CuPy will not be functional.\")","lang":"python","description":"This quickstart demonstrates basic CuPy array creation, GPU computation, and data transfer between the GPU and CPU. It also includes a check for GPU availability and emphasizes `cp.cuda.Stream.null.synchronize()` for accurate benchmarking, as GPU operations are often asynchronous."},"warnings":[{"fix":"Review code for type-sensitive operations, especially when mixing dtypes or interacting with NumPy arrays, and adjust for NumPy v2 compatibility. Test thoroughly.","message":"CuPy v14 updates its type promotion rules and casting behavior to align with NumPy v2 semantics. Code relying on NumPy v1 specific behaviors in earlier CuPy versions (v13 and prior) may behave differently.","severity":"breaking","affected_versions":"14.0.0 and later"},{"fix":"Upgrade to Python 3.10+ and a compatible CUDA Toolkit version (12.x for `cupy-cuda12x`). If cuDNN is needed, integrate an alternative Python binding for cuDNN.","message":"CuPy v14 drops support for CUDA 11 and Python 3.9. Additionally, all cuDNN-related functionality has been completely removed from CuPy. Users requiring cuDNN should consider external libraries like cuDNN Frontend.","severity":"breaking","affected_versions":"14.0.0 and later"},{"fix":"While generally beneficial for performance, for long-running processes or when strict memory limits are needed, use `cp.get_default_memory_pool().free_all_blocks()` to explicitly release unused cached memory. Monitor fragmentation with `cp.cuda.get_memory_info()`.","message":"CuPy uses a memory pool for GPU allocations. This means GPU memory might not be immediately released back to the system even after arrays go out of scope, which can cause utilities like `nvidia-smi` to report higher memory usage than expected.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Keep data on the GPU as much as possible for computations. Perform all necessary operations on CuPy arrays directly. Transfer data to CPU only when the final result is needed or for visualization/storage.","message":"Frequent data transfers between CPU (host) and GPU (device) are a major performance bottleneck due to PCIe bandwidth limitations. Avoid 'round-tripping' arrays in hot loops.","severity":"gotcha","affected_versions":"All versions"},{"fix":"This is expected behavior and generally not a problem in long-running applications. For benchmarking or time-critical initial runs, consider a warm-up execution or factor in the first-call compilation time.","message":"The first time a CuPy kernel is called for specific shapes and data types, it may experience a brief pause for JIT compilation. Subsequent calls with the same parameters will use the cached, pre-compiled kernel.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure the `cupy-cudaXX` package suffix matches your installed CUDA version (e.g., `cupy-cuda12x` for CUDA 12.x). If using PyPI `[ctk]` extras, only a compatible CUDA driver is required. Refer to CuPy's installation matrix for exact compatibility.","message":"Installing a `cupy-cudaXX` package that does not match your system's CUDA Toolkit version or compatible driver can lead to import failures or runtime errors.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If explicit blocking is required to prevent data races, use the `blocking=True` argument when calling `cp.array()` or `cp.asarray()`.","message":"In CuPy v13+, the default behavior for transferring NumPy arrays backed by pinned memory from CPU to GPU (`cupy.array()`, `cupy.asarray()`) changed from blocking to asynchronous. This can improve performance but may introduce data races if the source array is modified on the CPU before the asynchronous transfer completes.","severity":"breaking","affected_versions":"13.0.0 and later"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}