cuDF - GPU Dataframe
cuDF is a GPU-accelerated Python DataFrame library that mirrors the pandas API, enabling data scientists to perform data manipulation and analytics tasks entirely on NVIDIA GPUs. It is a core component of the RAPIDS suite of open-source libraries, designed to significantly speed up data processing for large datasets by leveraging GPU parallelism and memory bandwidth. cuDF is actively developed with frequent releases, typically aligned with the RAPIDS project's release cycle.
Common errors
-
ModuleNotFoundError: No module named 'cudf'
cause cuDF was not installed correctly, the Python environment is not configured to find it, or the CUDA version specified in the `pip install` command (`cudf-cu12`) does not match the system's CUDA environment.fixEnsure you are installing `cudf-cu12` (or the correct CUDA version suffix) with `pip install cudf-cu12 rmm-cu12 --extra-index-url https://pypi.nvidia.com`. Verify your CUDA toolkit installation and that `nvcc --version` reports a compatible version. Restart your Python kernel or environment. -
error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully.
cause This often occurs during `pip install` on Windows, or with incompatible Python versions (e.g., Python 3.12 was an issue for older cuDF versions), or when build dependencies are missing.fixEnsure you are using a Python version supported by the specific cuDF release (e.g., `>=3.11` for current). For Windows, consider using WSL2 or the official RAPIDS Docker images, as native Windows support can be challenging. Ensure `rmm-cu12` is included in the install command. -
Arrow versions required by Ibis and the Arrow versions required by cuDF.
cause Dependency conflicts, especially concerning `pyarrow`, are common. cuDF often has strict (and sometimes rapidly changing) requirements for `pyarrow` due to ABI compatibility with its underlying C++ components.fixTry installing `pyarrow` first, then `cudf-cu12` (e.g., `pip install pyarrow==<compatible_version>` then `pip install cudf-cu12 rmm-cu12 --extra-index-url https://pypi.nvidia.com`). If using conda, ensure all `rapidsai`, `conda-forge`, and `nvidia` channels are configured and use a single `conda install` command for all relevant packages to resolve dependencies holistically. -
Operations not running on GPU / fallback to CPU unexpectedly.
cause This can happen if data is implicitly copied to the CPU, if an operation is not yet GPU-accelerated by cuDF, or if `cudf.pandas` is not correctly installed/enabled before `pandas` is imported.fixFor `cudf.pandas`, ensure it's enabled as the very first step (`%load_ext cudf.pandas` or `import cudf.pandas; cudf.pandas.install()`). Profile your code to identify CPU bottlenecks (`cudf.pandas` has profiling features). Convert `pandas` DataFrames to `cudf.DataFrame` explicitly (`cudf.DataFrame.from_pandas(pdf)`) to keep data on the GPU where possible. Review cuDF documentation for supported operations and their GPU acceleration status.
Warnings
- breaking Starting with v26.04.00, cuDF requires PyArrow 19 or newer. Older PyArrow versions will lead to installation failures or runtime errors due to ABI incompatibilities.
- breaking The `DataFrame.apply_rows`, `DataFrame.apply_chunks`, and `Groupby.apply_grouped` APIs have been deprecated in v25.10.00 and fully removed in v25.12.00.
- breaking cuDF v25.08.00 and later officially drop support for CUDA 11, requiring CUDA 12.x or newer. Users on older CUDA versions will experience build or runtime failures.
- deprecated The `nvtext::byte_pair_encoding` APIs were deprecated in v26.04.00.
- gotcha When using `cudf.pandas`, avoid mixing it with direct `import cudf` statements in the same session/script. `cudf.pandas` operates as a pandas accelerator and manages data movement, while direct `cudf` imports expect GPU DataFrames.
- gotcha The default behavior for filling missing values in `Series/Index.values` changed to `np.nan` for numeric types in v25.10.00. This might alter downstream computations if `None` or another sentinel value was implicitly expected.
Install
-
pip install cudf-cu12 rmm-cu12 --extra-index-url https://pypi.nvidia.com -
conda install -c rapidsai -c conda-forge -c nvidia cudf=26.04 python='>=3.11,<3.12' cuda-version=12.0
Imports
- cudf
import cudf
- cudf.pandas
import pandas as pd; import cudf.pandas; cudf.pandas.install()
import cudf.pandas; cudf.pandas.install() # before import pandas
Quickstart
import cudf
# Create a cuDF DataFrame from a dictionary
data = {'col1': [1, 2, 3, 4], 'col2': [10.0, 20.0, 15.0, 25.0], 'col3': ['A', 'B', 'C', 'A']}
df = cudf.DataFrame(data)
print("Original DataFrame:")
print(df)
# Perform a groupby aggregation
grouped_df = df.groupby('col3').agg({'col1': 'sum', 'col2': 'mean'})
print("\nGrouped DataFrame:")
print(grouped_df)
# Using the cudf.pandas accelerator (restart kernel if pandas was already imported)
# %load_ext cudf.pandas # For Jupyter/IPython
# import cudf.pandas; cudf.pandas.install() # For scripts, before import pandas
# import pandas as pd
# pdf = pd.DataFrame(data) # This would be accelerated by cuDF.pandas