NVIDIA cuFFT for CUDA 12

11.4.1.4 verified Tue May 12 auth: no python install: stale quickstart: stale

nvidia-cufft-cu12 provides the native runtime libraries for NVIDIA's CUDA Fast Fourier Transform (cuFFT) product, a GPU-accelerated library for performing FFT calculations. It is a fundamental component for various scientific and engineering applications, including deep learning, computer vision, and computational physics. The library is actively maintained by the Nvidia CUDA Installer Team and receives frequent updates; the current version is 11.4.1.4, released on June 5, 2025. It primarily serves as a low-level dependency for higher-level Python frameworks and libraries that leverage GPU-accelerated FFTs.

pip install nvidia-cufft-cu12

Common errors

error OSError: libcufft.so.X: cannot open shared object file: No such file or directory ↓

cause This error occurs when a program tries to load the cuFFT library at runtime but cannot find the shared object file (libcufft.so.X on Linux or cufft64_X.dll on Windows) because its directory is not in the system's library search path, or an incompatible CUDA Toolkit version is installed.

fix

Ensure the NVIDIA CUDA Toolkit is correctly installed and that the directory containing libcufft.so.X (e.g., /usr/local/cuda/lib64 or C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\vX.X\bin) is included in your LD_LIBRARY_PATH (Linux) or PATH (Windows) environment variable. Also, verify that the CUDA version expected by the application matches your installed CUDA Toolkit. On Linux, you might need to run sudo ldconfig after updating LD_LIBRARY_PATH. For Python applications, os.add_dll_directory() can be used on Windows.

error fatal error: cufft.h: No such file or directory ↓

cause This compilation error indicates that the C/C++ compiler cannot locate the `cufft.h` header file, which is necessary for projects that directly use the cuFFT library's API. This usually happens when the CUDA Toolkit's include directory is not correctly specified in the compiler's search paths.

fix

Add the CUDA Toolkit's include directory to your compiler's include paths. For nvcc, you can use the -I flag (e.g., -I/usr/local/cuda/include or -cudalib=cufft). If using a different compiler (like g++ or icpc), explicitly add -I<CUDA_HOME>/include to your compilation flags, where <CUDA_HOME> is your CUDA installation directory.

error ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. ... requires nvidia-cufft-cu12==X.Y.Z which is incompatible. ↓

cause This error occurs when installing Python packages (e.g., PyTorch, TensorFlow) that have specific version requirements for NVIDIA CUDA runtime libraries, including `nvidia-cufft-cu12`. The `pip` dependency resolver detects a conflict between an already installed `nvidia-cufft-cu12` version and the version required by the package being installed.

fix

Identify the exact nvidia-cufft-cu12 version required by your main deep learning framework (e.g., PyTorch or TensorFlow) and ensure you install that specific version, or a compatible one. Often, installing the framework with its recommended CUDA support (e.g., pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 or pip install tensorflow[and-cuda]) will automatically manage these dependencies. If conflicts persist, consider creating a clean virtual environment and installing all required packages together.

error CUFFT_INTERNAL_ERROR ↓

cause This is a generic error from the cuFFT library indicating an internal driver or library issue. It can stem from various problems, including insufficient GPU memory, an invalid CUDA context, or an unexpected state during cuFFT plan creation or execution.

fix

Troubleshoot by checking GPU memory availability before cuFFT operations. Ensure a valid CUDA context is established and not prematurely destroyed (e.g., avoid cudaDeviceReset() in critical sections without re-initializing the context). If the problem persists, try updating your NVIDIA drivers and CUDA Toolkit to the latest compatible versions, or simplify your cuFFT calls to isolate the problematic operation.

Warnings

breaking Deprecated GPU architectures: From CUDA 12.0 onwards, GPU architectures SM35 and SM37 are no longer supported. The minimum required architecture is SM50. Older CUDA versions (e.g., 11.0) also deprecated earlier architectures like SM30. ↓

fix Ensure your GPU hardware has a compute capability of SM50 or higher for CUDA 12.0+ applications.

deprecated Legacy cuFFT callback functionality: Support for callback routines using separately compiled device code (legacy callbacks) has been deprecated since CUDA 11.4. CUDA Graphs capture for legacy callbacks that load data in out-of-place mode transforms is no longer supported from CUDA 11.8. ↓

fix Migrate to Link-Time Optimized (LTO) callbacks, which are supported from CUDA 12.6 Update 2 onwards, to avoid deprecation issues and leverage improved performance.

gotcha Performance degradation with legacy callbacks: Users have reported significant performance decreases (up to 20% or more) when using legacy cuFFT callbacks in CUDA 11.8 and newer (e.g., 12.2, 12.4, 12.9+) compared to CUDA 11.7. This often manifests as increased time spent in `cuMemFree_v2` during `cufftExecC2R` or `R2C` operations. ↓

fix Consider updating to CUDA 12.6 Update 2 or newer and migrating to LTO callbacks, or investigate the performance implications for your specific callback implementations.

gotcha Memory leak with `nvc++ -cudalib=cufft`: A potential memory leak in cuFFT library version v10.9.0.58 (shipped with CUDA 11.8) when used with `nvc++` and the `-cudalib=cufft` flag. This was linked to cuFFT failing to deallocate internal structures if the active CUDA context at program finalization was not the same used for plan creation. ↓

fix This issue was not observed in CUDA 12.0 and later. If using CUDA 11.8, ensure your CUDA context management is consistent or consider upgrading to a newer CUDA Toolkit version. Replacing `-cudalib=cufft` with `-lcufft` during compilation was also noted as a workaround.

gotcha Interference of `cudaDeviceReset()` with `cufftPlanMany`: Calling `cudaDeviceReset()` before `cufftPlanMany` can lead to `CUFFT_INTERNAL_ERROR`. While adding `cudaSetDevice(0)` after the reset might mitigate it, `cudaDeviceReset()` is generally not recommended for regular use. ↓

fix Avoid using `cudaDeviceReset()` in critical paths before cuFFT plan creation. If absolutely necessary, re-establish the CUDA device context (e.g., with `cudaSetDevice(0)`) after `cudaDeviceReset()`.

gotcha `CUFFT_INTERNAL_ERROR` in `cufftXtSetGPU` for multi-GPU FFTs: When performing large multi-GPU FFTs, `cufftXtSetGPU` can return an opaque 'internal error,' potentially indicating an out-of-memory condition or an unspecified library issue. ↓

fix Monitor GPU memory usage for large multi-GPU FFTs. If the error persists, consider reducing data size or reporting as a bug with NVIDIA, as 'internal error' provides limited actionable information.

gotcha Installation timeouts/failures with concurrent downloads from `pypi.nvidia.com`: Users attempting to install `nvidia-cufft-cu12` (and other NVIDIA PyPI packages) with tools that use concurrent downloads (e.g., `uv`) may experience failures due to timeout or network issues with `pypi.nvidia.com`. ↓

fix Try setting the environment variable `UV_CONCURRENT_DOWNLOADS=1` (for `uv` users) or similar mechanisms to limit concurrent downloads when installing from `pypi.nvidia.com`.

breaking Attempting to install NVIDIA Python packages (e.g., `nvidia-cufft-cu12`) directly from PyPI.org will result in a `RuntimeError` indicating the package is a placeholder. These packages are hosted on the NVIDIA Python Package Index and require a specific installation method. ↓

fix Install `nvidia-pyindex` first using `pip install nvidia-pyindex`, then install the desired package. Alternatively, configure pip to use the NVIDIA Python Package Index directly by adding `--extra-index-url https://pypi.nvidia.com` to your pip command or by configuring your pip.conf/pip.ini.

Install

pip install nvmath-python[cu12]

Install compatibility stale last tested: 2026-05-12

python os / libc variant status wheel install import disk

3.10 alpine (musl) nvidia-cufft-cu12 build_error - - - -

3.10 alpine (musl) cu12 build_error - - - -

3.10 alpine (musl) nvidia-cufft-cu12 - - - -

3.10 alpine (musl) cu12 - - - -

3.10 slim (glibc) nvidia-cufft-cu12 wheel 6.6s - 390M

3.10 slim (glibc) cu12 wheel 56.4s - 3.4G

3.10 slim (glibc) nvidia-cufft-cu12 - - - -

3.10 slim (glibc) cu12 - - - -

3.11 alpine (musl) nvidia-cufft-cu12 build_error - - - -

3.11 alpine (musl) cu12 build_error - - - -

3.11 alpine (musl) nvidia-cufft-cu12 - - - -

3.11 alpine (musl) cu12 - - - -

3.11 slim (glibc) nvidia-cufft-cu12 wheel 6.2s - 392M

3.11 slim (glibc) cu12 wheel 54.7s - 3.4G

3.11 slim (glibc) nvidia-cufft-cu12 - - - -

3.11 slim (glibc) cu12 - - - -

3.12 alpine (musl) nvidia-cufft-cu12 build_error - - - -

3.12 alpine (musl) cu12 build_error - - - -

3.12 alpine (musl) nvidia-cufft-cu12 - - - -

3.12 alpine (musl) cu12 - - - -

3.12 slim (glibc) nvidia-cufft-cu12 wheel 5.8s - 384M

3.12 slim (glibc) cu12 wheel 49.0s - 3.4G

3.12 slim (glibc) nvidia-cufft-cu12 - - - -

3.12 slim (glibc) cu12 - - - -

3.13 alpine (musl) nvidia-cufft-cu12 build_error - - - -

3.13 alpine (musl) cu12 build_error - - - -

3.13 alpine (musl) nvidia-cufft-cu12 - - - -

3.13 alpine (musl) cu12 - - - -

3.13 slim (glibc) nvidia-cufft-cu12 wheel 5.7s - 383M

3.13 slim (glibc) cu12 wheel 46.2s - 3.4G

3.13 slim (glibc) nvidia-cufft-cu12 - - - -

3.13 slim (glibc) cu12 - - - -

3.9 alpine (musl) nvidia-cufft-cu12 build_error - - - -

3.9 alpine (musl) cu12 build_error - - - -

3.9 alpine (musl) nvidia-cufft-cu12 - - - -

3.9 alpine (musl) cu12 - - - -

3.9 slim (glibc) nvidia-cufft-cu12 wheel 7.0s - 390M

3.9 slim (glibc) cu12 wheel 53.0s - 2.9G

3.9 slim (glibc) nvidia-cufft-cu12 - - - -

3.9 slim (glibc) cu12 - - - -

Imports

cuFFT functionality
```
This package is a low-level runtime library and is not directly imported in Python. High-level libraries like `nvmath-python`, `torch`, or `tensorflow` provide Python interfaces that utilize `nvidia-cufft-cu12` under the hood.
```
nvidia-cufft-cu12 primarily provides the underlying C/C++ binaries for GPU-accelerated FFTs. Python users typically interact with cuFFT through frameworks like PyTorch, TensorFlow, or dedicated Python wrappers like `nvmath-python`.
nvmath.fft
```
from nvmath.fft import fft, ifft
```
This is the recommended way to directly access cuFFT functionalities in Python via NVIDIA's `nvmath-python` library, which depends on `nvidia-cufft-cu12`.

Quickstart stale last tested: 2026-04-24

This quickstart demonstrates how to perform a 1D complex-to-complex FFT and inverse FFT using `nvmath-python`, which leverages the `nvidia-cufft-cu12` runtime library. Ensure `cupy` is also installed (it's a dependency of `nvmath-python[cu12]`) for GPU array operations.

import os
import nvmath.fft as nvfft
import cupy as cp

# Ensure CUDA is available and nvmath-python is correctly set up
# (e.g., pip install nvmath-python[cu12] and appropriate CUDA Toolkit installation)

# Example: Perform a 1D complex-to-complex FFT using nvmath-python
size = 1024
x = cp.arange(size, dtype=cp.complex64)

# Perform forward FFT
y = nvfft.fft(x)

# Perform inverse FFT
z = nvfft.ifft(y)

print(f"Original data (first 5 elements): {x[:5].tolist()}")
print(f"FFT result (first 5 elements): {y[:5].tolist()}")
print(f"Inverse FFT result (first 5 elements): {z[:5].tolist()}")
print(f"Difference from original (max abs error): {cp.max(cp.abs(x - z))}")