PyBigQuery (OBSOLETE)
PyBigQuery is an **obsolete** SQLAlchemy dialect for Google BigQuery. It is no longer actively maintained and should not be used for new projects. The last version released was 0.10.2, in 2019. Users are strongly advised to migrate to `google-cloud-bigquery-sqlalchemy` (PyPI package `google-cloud-bigquery-sqlalchemy`, GitHub repository `googleapis/python-bigquery-sqlalchemy`) which is the actively maintained and modern successor.
Common errors
-
This application is running on Python 3.10.x (or newer), but pybigquery only officially supports up to 3.9.x.
cause The Python environment version is incompatible with the outdated `pybigquery` library.fixDowngrade your Python environment to 3.9 or earlier, or (strongly recommended) migrate your project to use `google-cloud-bigquery-sqlalchemy`. -
sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:bigquery
cause The `pybigquery` package (which registers the 'bigquery' dialect for SQLAlchemy) is not installed, or its installation is corrupted.fixEnsure `pybigquery` is correctly installed in your active environment: `pip install pybigquery`. If the problem persists, try `pip install --force-reinstall pybigquery`. -
AttributeError: 'LegacyConnection' object has no attribute 'execute' (or similar SQLAlchemy 2.0 API errors)
cause Attempting to use SQLAlchemy 2.0 specific API patterns with the older PyBigQuery dialect, which lacks SQLAlchemy 2.0 compatibility.fixEnsure you are using SQLAlchemy 1.x API patterns (e.g., `connection.execute(text(...))`). However, this is a strong indication that migration to `google-cloud-bigquery-sqlalchemy` is overdue for full SQLAlchemy 2.0 support. -
google.api_core.exceptions.NotFound: 404 Not Found: Dataset 'your_dataset' was not found in project 'your-gcp-project-id'.
cause The specified BigQuery project ID or dataset ID in the SQLAlchemy connection string is incorrect, or the authenticated Google Cloud user/service account lacks necessary permissions.fixVerify the `project_id` and `dataset_id` in your connection string. Confirm your Google Cloud credentials (e.g., `GOOGLE_APPLICATION_CREDENTIALS` environment variable or `gcloud auth application-default login`) have 'BigQuery Data Viewer' and 'BigQuery Job User' roles for the specified project and dataset. -
No module named 'pybigquery.sqlalchemy' (when trying to import)
cause Users attempting to directly import modules from within the `pybigquery` package, often misinterpreting how SQLAlchemy dialects are integrated.fixPyBigQuery is a SQLAlchemy dialect plugin. You generally don't import directly from it. Instead, you import `create_engine` from `sqlalchemy` and use a `bigquery://` connection string. The `pybigquery` package only needs to be installed to register the dialect.
Warnings
- breaking PyBigQuery (version 0.10.2) is **obsolete and no longer maintained**. It has critical limitations, known issues, and security vulnerabilities that will not be addressed. All users must migrate to the actively maintained successor, `google-cloud-bigquery-sqlalchemy`.
- gotcha PyBigQuery only officially supports Python versions `<3.10`. Attempting to use it with Python 3.10, 3.11, or newer will highly likely lead to `ModuleNotFoundError`, `ImportError`, or other runtime compatibility issues due to unmaintained dependencies.
- gotcha PyBigQuery does not support SQLAlchemy 2.0 style usage. Its last release predates SQLAlchemy 2.0 by several years, meaning it lacks features like 2.0-specific connection and execution patterns.
- gotcha The PyPI project page for `pybigquery` points its GitHub source URL to the repository of its *successor*, `googleapis/python-bigquery-sqlalchemy`. This can cause significant confusion, making users believe they are looking at the source or issue tracker for `pybigquery` when it's actually for the modern library.
Install
-
pip install pybigquery
Imports
- create_engine
from sqlalchemy import create_engine
Quickstart
import os
from sqlalchemy import create_engine, text
# Ensure GOOGLE_APPLICATION_CREDENTIALS points to your service account key file
# or use `gcloud auth application-default login` for Application Default Credentials.
# Replace 'your-gcp-project-id' and 'your_dataset' with actual values or set env vars.
project_id = os.environ.get('BIGQUERY_PROJECT_ID', 'your-gcp-project-id')
dataset_id = os.environ.get('BIGQUERY_DATASET_ID', 'your_dataset')
if project_id == 'your-gcp-project-id':
print("WARNING: Set BIGQUERY_PROJECT_ID env var or replace 'your-gcp-project-id'.")
if dataset_id == 'your_dataset':
print("WARNING: Set BIGQUERY_DATASET_ID env var or replace 'your_dataset'.")
try:
# Connect using the 'bigquery' dialect provided by pybigquery
# Note: For production, ensure credentials are set securely.
engine = create_engine(f"bigquery://{project_id}/{dataset_id}")
with engine.connect() as connection:
# Example query (replace with your specific table/data)
# Note: 'text()' is required for string-based SQL statements in SQLAlchemy 1.4+/2.0.
result = connection.execute(text("SELECT 1 as num, 'hello' as str"))
for row in result:
print(f"Retrieved row: num={row.num}, str='{row.str}'")
print("Successfully connected to BigQuery using PyBigQuery (obsolete).")
except Exception as e:
print(f"An error occurred: {e}")
print("Troubleshooting hint: Check BigQuery project/dataset IDs, credentials, and Python version.")
print("RECOMMENDATION: Migrate to 'google-cloud-bigquery-sqlalchemy' for active support.")