{"id":4456,"library":"bigframes","title":"BigQuery DataFrames (bigframes)","description":"BigQuery DataFrames (bigframes) provides a scalable Python DataFrame and machine learning (ML) API powered by the BigQuery engine. It offers a pandas-like interface for analyzing and manipulating data directly within BigQuery, enabling efficient processing of terabytes of data and seamless integration with BigQuery ML and Vertex AI. The library is actively maintained, currently at version 2.39.0, with a rapid release cadence introducing new features and improvements.","status":"active","version":"2.39.0","language":"en","source_language":"en","source_url":"https://github.com/googleapis/python-bigquery-dataframes","tags":["dataframes","bigquery","google-cloud","analytics","machine-learning","pandas-api-compatible"],"install":[{"cmd":"pip install bigframes","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Implicitly used for backend BigQuery interactions and often needed for explicit client operations (e.g., dataset creation).","package":"google-cloud-bigquery","optional":false},{"reason":"Provides a pandas-compatible API; often used in conjunction with bigframes for local operations or `to_pandas()` conversions.","package":"pandas","optional":true}],"imports":[{"note":"The primary entry point for the pandas-like API.","symbol":"bigframes.pandas","correct":"import bigframes.pandas as bpd"},{"note":"Used for scikit-learn-like machine learning APIs.","symbol":"bigframes.ml","correct":"import bigframes.ml"},{"note":"Provides access to BigQuery SQL functions without pandas equivalents.","symbol":"bigframes.bigquery","correct":"import bigframes.bigquery"}],"quickstart":{"code":"import bigframes.pandas as bpd\nimport os\n\n# Set your GCP Project ID. Ensure the BigQuery API is enabled for this project.\n# For local development, authenticate using `gcloud auth application-default login`.\nPROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id') \n\nbpd.options.bigquery.project = PROJECT_ID\n# bpd.options.bigquery.location = \"US\" # Uncomment and set if your dataset is not in US multi-region\n# bpd.options.bigquery.ordering_mode = \"partial\" # Recommended for performance\n\n# Load a public BigQuery dataset into a BigQuery DataFrame\ndf = bpd.read_gbq(\"bigquery-public-data.ml_datasets.penguins\")\n\n# Perform a simple operation and display the head (triggers computation)\nprint(df.head())\n","lang":"python","description":"This quickstart demonstrates how to initialize BigQuery DataFrames with your GCP project ID and load data from a public BigQuery table. It then performs a basic operation (`head()`) to trigger query execution and display results. Ensure you have authenticated to Google Cloud and enabled the BigQuery API for your project."},"warnings":[{"fix":"Set `bpd.options.bigquery.allow_large_results = True` or pass `allow_large_results=True` directly to the method, e.g., `df.to_pandas(allow_large_results=True)`.","message":"In BigQuery DataFrames v2.0+, the default for `allow_large_results` changed from `True` to `False` for methods that return results to the client (e.g., `peek()`, `to_pandas()`, `to_pandas_batches()`). This can lead to 'BigQuery has a maximum response size limit' errors for large results.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Understand that BigQuery DataFrames operations build a query plan. Use methods like `head()`, `to_pandas()`, or printing the object to trigger execution and retrieve results.","message":"BigQuery DataFrames uses lazy evaluation. Operations are generally not executed immediately but are instead translated into BigQuery SQL and run only when results are explicitly requested (e.g., by calling `head()`, `to_pandas()`, `to_arrow()`, `plot()`, or printing the DataFrame/Series).","severity":"gotcha","affected_versions":"All"},{"fix":"Avoid converting large DataFrames to pandas locally. Perform aggregations, filtering, and transformations using BigQuery DataFrames APIs first. Only use `to_pandas()` on small, already reduced datasets, or when absolutely necessary.","message":"Converting large BigQuery DataFrames to pandas DataFrames using `to_pandas()` can lead to out-of-memory errors on the client side, as it pulls all data into local memory. This negates the scalability benefits of BigQuery DataFrames.","severity":"gotcha","affected_versions":"All"},{"fix":"Be aware of potential BigQuery storage costs. For long-running or frequently used temporary results, consider managing them explicitly. You can close sessions using `bpd.close_session()` to potentially clean up temporary resources faster, though tables persist for 7 days.","message":"BigQuery DataFrames stores temporary data (e.g., intermediate results) in BigQuery tables within your specified project. These tables persist for seven days by default in `_anonymous_` datasets, incurring storage costs.","severity":"gotcha","affected_versions":"All"},{"fix":"Set `bpd.options.bigquery.location = \"YOUR_REGION\"` (e.g., \"EU\", \"asia-east1\") before calling `read_gbq()` if your data resides outside the 'US' multi-region.","message":"When using `read_gbq()`, if your BigQuery dataset is not located in the default 'US' multi-region, you must explicitly set the location using `bpd.options.bigquery.location` or a `NotFound` exception will occur.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}