SQLAlchemy BigQuery Dialect
SQLAlchemy BigQuery Dialect (sqlalchemy-bigquery) is a Python library that provides a SQLAlchemy dialect for connecting to and interacting with Google BigQuery. It enables users to leverage SQLAlchemy's Core and ORM functionalities to query, manipulate, and manage BigQuery data, treating it like a traditional relational database. The library is actively maintained by Google Cloud, with regular releases, and its current version is 1.16.0.
Warnings
- breaking Version 1.16.0 removed support for Python 3.8. Users on Python 3.8 or older must upgrade their Python environment to at least 3.9 to use `sqlalchemy-bigquery` 1.16.0 and later. Python 3.7 support was removed in earlier versions.
- breaking Version 1.11.0 raised the minimum required SQLAlchemy version to 1.4.x and introduced full support for SQLAlchemy 2.0. Applications using older SQLAlchemy versions (pre-1.4) will need to upgrade SQLAlchemy when updating to `sqlalchemy-bigquery` 1.11.0 or newer.
- gotcha When querying tables from non-default BigQuery projects or datasets, specify the full `project.dataset.table` format for table names or `project.dataset` in the `schema` parameter for `Table` objects.
- gotcha For optimal performance when handling large datasets, especially when reading results, install `sqlalchemy-bigquery` with the `bqstorage` extra (`pip install sqlalchemy-bigquery[bqstorage]`). This leverages the BigQuery Storage API.
- gotcha The `sqlalchemy.JSON()` column type is not natively supported for DDL (Data Definition Language) statements in `sqlalchemy-bigquery` as of version 1.16.0. Attempting to use it in `Table` definitions or Alembic migrations may result in an `sqlalchemy.exc.UnsupportedFeature` error.
Install
-
pip install sqlalchemy-bigquery -
pip install sqlalchemy-bigquery[bqstorage]
Imports
- create_engine
from sqlalchemy.engine import create_engine
- Table, Column, Integer, String, MetaData
from sqlalchemy import Table, Column, Integer, String, MetaData
- text
from sqlalchemy import text
Quickstart
import os
from sqlalchemy import create_engine, text, Table, MetaData, Column, String, Integer
# Configure your Google Cloud Project ID. This can also be inferred from the environment.
# For local development, ensure GOOGLE_APPLICATION_CREDENTIALS points to your service account key file.
PROJECT_ID = os.environ.get('GCP_PROJECT_ID', 'your-gcp-project-id')
DATASET_ID = os.environ.get('BIGQUERY_DATASET_ID', 'your_dataset_id')
TABLE_NAME = os.environ.get('BIGQUERY_TABLE_NAME', 'your_table_name')
# Connection string format: 'bigquery://[PROJECT_ID]/[DATASET_ID]'
# If PROJECT_ID is omitted, it will try to infer from the environment.
# Authentication typically happens via GOOGLE_APPLICATION_CREDENTIALS env var or gcloud CLI.
engine = create_engine(f'bigquery://{PROJECT_ID}/{DATASET_ID}')
try:
# Example 1: Execute a raw SQL query
with engine.connect() as connection:
result = connection.execute(text(f"SELECT 1 FROM `{PROJECT_ID}.{DATASET_ID}.{TABLE_NAME}` LIMIT 1"))
print(f"Raw SQL query result: {result.scalar()}")
# Example 2: Reflect table metadata and query using SQLAlchemy Core
metadata = MetaData()
my_table = Table(TABLE_NAME, metadata, autoload_with=engine)
with engine.connect() as connection:
# Select all columns from the table, limit to 5 rows
select_stmt = my_table.select().limit(5)
result = connection.execute(select_stmt)
print("\nQuerying table via SQLAlchemy Core (first 5 rows):")
for row in result:
print(row)
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure GOOGLE_APPLICATION_CREDENTIALS is set, your project/dataset/table exist, and you have permissions.")