{"id":8044,"library":"cudf-cu12","title":"cuDF - GPU Dataframe","description":"cuDF is a GPU-accelerated Python DataFrame library that mirrors the pandas API, enabling data scientists to perform data manipulation and analytics tasks entirely on NVIDIA GPUs. It is a core component of the RAPIDS suite of open-source libraries, designed to significantly speed up data processing for large datasets by leveraging GPU parallelism and memory bandwidth. cuDF is actively developed with frequent releases, typically aligned with the RAPIDS project's release cycle.","status":"active","version":"26.4.0","language":"en","source_language":"en","source_url":"https://github.com/rapidsai/cudf","tags":["GPU","DataFrame","RAPIDS","data science","pandas-like","cuda"],"install":[{"cmd":"pip install cudf-cu12 rmm-cu12 --extra-index-url https://pypi.nvidia.com","lang":"bash","label":"Recommended for CUDA 12.x"},{"cmd":"conda install -c rapidsai -c conda-forge -c nvidia cudf=26.04 python='>=3.11,<3.12' cuda-version=12.0","lang":"bash","label":"Conda installation example for CUDA 12.x"}],"dependencies":[{"reason":"Requires Python >=3.11.","package":"python","optional":false},{"reason":"Essential for I/O operations (e.g., Parquet, Feather) and data interoperability; strict version requirements often apply.","package":"pyarrow","optional":false},{"reason":"RAPIDS Memory Manager, a dependency for efficient GPU memory allocation.","package":"rmm","optional":false},{"reason":"Requires a compatible NVIDIA CUDA Toolkit installation on the system.","package":"cuda-toolkit","optional":false}],"imports":[{"note":"Standard import for direct cuDF DataFrame operations.","symbol":"cudf","correct":"import cudf"},{"note":"To enable the pandas accelerator mode, `cudf.pandas.install()` must be called *before* pandas is imported or used. For Jupyter/IPython, use `%load_ext cudf.pandas` as the first command. Direct `import cudf` is generally incompatible with `cudf.pandas` acceleration.","wrong":"import pandas as pd; import cudf.pandas; cudf.pandas.install()","symbol":"cudf.pandas","correct":"import cudf.pandas; cudf.pandas.install() # before import pandas"}],"quickstart":{"code":"import cudf\n\n# Create a cuDF DataFrame from a dictionary\ndata = {'col1': [1, 2, 3, 4], 'col2': [10.0, 20.0, 15.0, 25.0], 'col3': ['A', 'B', 'C', 'A']}\ndf = cudf.DataFrame(data)\nprint(\"Original DataFrame:\")\nprint(df)\n\n# Perform a groupby aggregation\ngrouped_df = df.groupby('col3').agg({'col1': 'sum', 'col2': 'mean'})\nprint(\"\\nGrouped DataFrame:\")\nprint(grouped_df)\n\n# Using the cudf.pandas accelerator (restart kernel if pandas was already imported)\n# %load_ext cudf.pandas  # For Jupyter/IPython\n# import cudf.pandas; cudf.pandas.install() # For scripts, before import pandas\n# import pandas as pd\n# pdf = pd.DataFrame(data) # This would be accelerated by cuDF.pandas\n","lang":"python","description":"This quickstart demonstrates creating a cuDF DataFrame and performing a basic groupby aggregation, similar to pandas. It also includes a comment illustrating how to enable `cudf.pandas` for zero-code-change GPU acceleration of existing pandas workflows, emphasizing the importance of activating it before `pandas` is imported."},"warnings":[{"fix":"Ensure your environment has `pyarrow>=19` installed. If using conda, update `pyarrow` and `cudf` together.","message":"Starting with v26.04.00, cuDF requires PyArrow 19 or newer. Older PyArrow versions will lead to installation failures or runtime errors due to ABI incompatibilities.","severity":"breaking","affected_versions":">=26.04.00"},{"fix":"Refactor code to use alternative cuDF operations, such as element-wise operations, or more efficient `apply` patterns if available. Consider using `map_partitions` with Dask-cuDF for custom row-wise logic across partitions.","message":"The `DataFrame.apply_rows`, `DataFrame.apply_chunks`, and `Groupby.apply_grouped` APIs have been deprecated in v25.10.00 and fully removed in v25.12.00.","severity":"breaking","affected_versions":">=25.10.00"},{"fix":"Upgrade your NVIDIA CUDA Toolkit installation to version 12.x. Ensure your `cudf-cuXX` package matches your CUDA runtime version.","message":"cuDF v25.08.00 and later officially drop support for CUDA 11, requiring CUDA 12.x or newer. Users on older CUDA versions will experience build or runtime failures.","severity":"breaking","affected_versions":">=25.08.00"},{"fix":"Consult `nvtext` documentation for recommended replacement APIs for byte pair encoding, or consider alternative text processing methods.","message":"The `nvtext::byte_pair_encoding` APIs were deprecated in v26.04.00.","severity":"deprecated","affected_versions":">=26.04.00"},{"fix":"Choose either `cudf.pandas` (by installing it before importing `pandas`) for pandas-like workflows or `import cudf` for explicit cuDF usage. Do not use both in the same execution context unless explicitly managing data conversions between `cudf.DataFrame` and `pandas.DataFrame`.","message":"When using `cudf.pandas`, avoid mixing it with direct `import cudf` statements in the same session/script. `cudf.pandas` operates as a pandas accelerator and manages data movement, while direct `cudf` imports expect GPU DataFrames.","severity":"gotcha","affected_versions":"All"},{"fix":"Explicitly handle missing values using `.fillna()` or `.replace()` if a specific fill value other than `np.nan` is required.","message":"The default behavior for filling missing values in `Series/Index.values` changed to `np.nan` for numeric types in v25.10.00. This might alter downstream computations if `None` or another sentinel value was implicitly expected.","severity":"gotcha","affected_versions":">=25.10.00"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure you are installing `cudf-cu12` (or the correct CUDA version suffix) with `pip install cudf-cu12 rmm-cu12 --extra-index-url https://pypi.nvidia.com`. Verify your CUDA toolkit installation and that `nvcc --version` reports a compatible version. Restart your Python kernel or environment.","cause":"cuDF was not installed correctly, the Python environment is not configured to find it, or the CUDA version specified in the `pip install` command (`cudf-cu12`) does not match the system's CUDA environment.","error":"ModuleNotFoundError: No module named 'cudf'"},{"fix":"Ensure you are using a Python version supported by the specific cuDF release (e.g., `>=3.11` for current). For Windows, consider using WSL2 or the official RAPIDS Docker images, as native Windows support can be challenging. Ensure `rmm-cu12` is included in the install command.","cause":"This often occurs during `pip install` on Windows, or with incompatible Python versions (e.g., Python 3.12 was an issue for older cuDF versions), or when build dependencies are missing.","error":"error: subprocess-exited-with-error × Getting requirements to build wheel did not run successfully."},{"fix":"Try installing `pyarrow` first, then `cudf-cu12` (e.g., `pip install pyarrow==<compatible_version>` then `pip install cudf-cu12 rmm-cu12 --extra-index-url https://pypi.nvidia.com`). If using conda, ensure all `rapidsai`, `conda-forge`, and `nvidia` channels are configured and use a single `conda install` command for all relevant packages to resolve dependencies holistically.","cause":"Dependency conflicts, especially concerning `pyarrow`, are common. cuDF often has strict (and sometimes rapidly changing) requirements for `pyarrow` due to ABI compatibility with its underlying C++ components.","error":"Arrow versions required by Ibis and the Arrow versions required by cuDF."},{"fix":"For `cudf.pandas`, ensure it's enabled as the very first step (`%load_ext cudf.pandas` or `import cudf.pandas; cudf.pandas.install()`). Profile your code to identify CPU bottlenecks (`cudf.pandas` has profiling features). Convert `pandas` DataFrames to `cudf.DataFrame` explicitly (`cudf.DataFrame.from_pandas(pdf)`) to keep data on the GPU where possible. Review cuDF documentation for supported operations and their GPU acceleration status.","cause":"This can happen if data is implicitly copied to the CPU, if an operation is not yet GPU-accelerated by cuDF, or if `cudf.pandas` is not correctly installed/enabled before `pandas` is imported.","error":"Operations not running on GPU / fallback to CPU unexpectedly."}]}