BigQuery DataFrames (bigframes)

2.39.0 · active · verified Sun Apr 12

BigQuery DataFrames (bigframes) provides a scalable Python DataFrame and machine learning (ML) API powered by the BigQuery engine. It offers a pandas-like interface for analyzing and manipulating data directly within BigQuery, enabling efficient processing of terabytes of data and seamless integration with BigQuery ML and Vertex AI. The library is actively maintained, currently at version 2.39.0, with a rapid release cadence introducing new features and improvements.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize BigQuery DataFrames with your GCP project ID and load data from a public BigQuery table. It then performs a basic operation (`head()`) to trigger query execution and display results. Ensure you have authenticated to Google Cloud and enabled the BigQuery API for your project.

import bigframes.pandas as bpd
import os

# Set your GCP Project ID. Ensure the BigQuery API is enabled for this project.
# For local development, authenticate using `gcloud auth application-default login`.
PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id') 

bpd.options.bigquery.project = PROJECT_ID
# bpd.options.bigquery.location = "US" # Uncomment and set if your dataset is not in US multi-region
# bpd.options.bigquery.ordering_mode = "partial" # Recommended for performance

# Load a public BigQuery dataset into a BigQuery DataFrame
df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")

# Perform a simple operation and display the head (triggers computation)
print(df.head())

view raw JSON →