cuDF - GPU Dataframe (C++)
`libcudf-cu12` is the underlying C++ library for cuDF, a GPU-accelerated DataFrame library for Python, part of the NVIDIA RAPIDS ecosystem. It enables pandas-like data manipulation directly on the GPU, leveraging CUDA for high performance. This PyPI meta-package primarily serves as a runtime dependency that transitively pulls in the Python `cudf-cu12` package. cuDF follows a rapid monthly release cadence (YYYY.MM.patch), consistently introducing new features and breaking changes. The current stable version is `26.4.0`.
Common errors
-
ModuleNotFoundError: No module named 'cudf' (or ImportError: cannot import name 'cudf' from 'cudf')
cause The Python `cudf` package is not correctly installed or accessible in the environment, even if `libcudf-cu12` (the C++ runtime meta-package) is present. `libcudf-cu12` installs `cudf-cu12`, but sometimes installation or environment path issues occur.fixVerify `cudf-cu12` is present in your environment: `pip list | grep cudf`. If missing or an incorrect version is installed, try re-installing it directly: `pip install --upgrade --no-deps cudf-cu12` or `pip install cudf-cu12==<VERSION>` matching your CUDA version. -
CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
cause The system lacks a compatible NVIDIA GPU, CUDA drivers are not correctly installed, or the environment is misconfigured to access the GPU. This indicates a fundamental issue with the CUDA setup.fixEnsure an NVIDIA CUDA-capable GPU is installed, NVIDIA drivers are up-to-date, and the CUDA Toolkit is correctly installed and configured. Verify `nvidia-smi` runs successfully. Check if the `CUDA_VISIBLE_DEVICES` environment variable is set incorrectly, hiding the GPU. -
PyArrowTypeError: Expected array of type Int64, got array of type Timestamp[ns]
cause An incompatibility between the installed `cudf` version and the `pyarrow` version. `cudf` has strict `pyarrow` requirements, and even minor mismatches can lead to type or schema validation failures during data conversion.fixPin your `pyarrow` version to the one specifically required by your `cudf` installation. Check the `Requires-Dist` output of `pip show cudf-cu12` or refer to the `cudf` documentation for the exact `pyarrow` version compatible with your `cudf` release (e.g., `pip install pyarrow==<REQUIRED_VERSION>`). -
cupy.cuda.runtime.CUDARuntimeError: cudaErrorInitializationError: initialization error
cause This error often indicates a mismatch between the `cupy-cudaXX` package and the installed CUDA Toolkit, or a more fundamental issue with the GPU's context initialization or available resources.fixEnsure your `cupy-cudaXX` package version precisely matches your system's CUDA Toolkit version (e.g., `cupy-cuda12x` for CUDA 12.x). Verify that `cudf` and `cupy` are compatible. Restarting the Python kernel or environment can sometimes resolve transient GPU context issues.
Warnings
- breaking cuDF introduces frequent breaking changes with its monthly release cycle, affecting API signatures (e.g., `DataFrame.apply_chunks` removal), internal behaviors, and default arguments. Examples include changes to partitioning APIs and removal of resource management functions.
- gotcha Strict compatibility requirements exist for `pyarrow`, `cupy-cudaXX`, and the underlying CUDA Toolkit. Mismatched versions are a common source of runtime errors, unexpected behavior, and performance issues, especially after minor cuDF updates. For `v26.04.00`, `pyarrow 19` is required.
- deprecated Several APIs, including specific `nvtext` functions (`byte_pair_encoding`), `DataFrame.apply_rows`, `DataFrame.apply_chunks`, and certain `cudf::round` overloads for float types, have been deprecated or removed in recent versions.
Install
-
pip install libcudf-cu12
Imports
- cudf
import libcudf
import cudf
- DataFrame
import cudf gdf = cudf.DataFrame(...)
Quickstart
import cudf
import numpy as np
# Create a cuDF DataFrame directly on the GPU
data = {
'col1': np.random.rand(10),
'col2': np.random.randint(0, 100, 10)
}
gdf = cudf.DataFrame(data)
print("cuDF DataFrame head:")
print(gdf.head())
print(f"\nMean of col1: {gdf['col1'].mean().item()}")