BigQuery Schema Generator

1.6.1 · active · verified Sun Apr 12

bigquery-schema-generator is a Python library that generates BigQuery schemas from newline-delimited JSON or CSV data. Unlike BigQuery's native auto-detection which typically samples only the first 500 records, this tool processes all input data to create a more comprehensive and accurate schema. Currently at version 1.6.1, the library maintains an active release cadence, providing regular updates and bug fixes.

Warnings

Install

Imports

Quickstart

This example demonstrates how to use `SchemaGenerator` as a library to deduce a BigQuery schema from a list of Python dictionaries. The `deduce_schema_from_dict` method processes the records, and `flatten_schema` converts the internal representation into a BigQuery-compatible JSON schema format.

import json
from bigquery_schema_generator.schema_generator import SchemaGenerator

# Example data as a list of dictionaries
data = [
    {
        "id": "rec1",
        "name": "Alice",
        "values": [10, 20]
    },
    {
        "id": "rec2",
        "name": "Bob",
        "values": [30]
    },
    {
        "id": "rec3",
        "name": None, # Will be NULLABLE
        "values": []   # Will be REPEATED (empty array)
    }
]

# Initialize the schema generator
generator = SchemaGenerator()

# Deduce schema from a list of dictionaries
schema_map = generator.deduce_schema_from_dict(data)
schema = generator.flatten_schema(schema_map)

# Print the generated BigQuery schema in JSON format
print(json.dumps(schema, indent=2))

view raw JSON →