cuDF - GPU Dataframe

26.4.0 · active · verified Thu Apr 16

cuDF is a GPU-accelerated Python DataFrame library that mirrors the pandas API, enabling data scientists to perform data manipulation and analytics tasks entirely on NVIDIA GPUs. It is a core component of the RAPIDS suite of open-source libraries, designed to significantly speed up data processing for large datasets by leveraging GPU parallelism and memory bandwidth. cuDF is actively developed with frequent releases, typically aligned with the RAPIDS project's release cycle.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates creating a cuDF DataFrame and performing a basic groupby aggregation, similar to pandas. It also includes a comment illustrating how to enable `cudf.pandas` for zero-code-change GPU acceleration of existing pandas workflows, emphasizing the importance of activating it before `pandas` is imported.

import cudf

# Create a cuDF DataFrame from a dictionary
data = {'col1': [1, 2, 3, 4], 'col2': [10.0, 20.0, 15.0, 25.0], 'col3': ['A', 'B', 'C', 'A']}
df = cudf.DataFrame(data)
print("Original DataFrame:")
print(df)

# Perform a groupby aggregation
grouped_df = df.groupby('col3').agg({'col1': 'sum', 'col2': 'mean'})
print("\nGrouped DataFrame:")
print(grouped_df)

# Using the cudf.pandas accelerator (restart kernel if pandas was already imported)
# %load_ext cudf.pandas  # For Jupyter/IPython
# import cudf.pandas; cudf.pandas.install() # For scripts, before import pandas
# import pandas as pd
# pdf = pd.DataFrame(data) # This would be accelerated by cuDF.pandas

view raw JSON →