{"id":4896,"library":"bigquery-schema-generator","title":"BigQuery Schema Generator","description":"bigquery-schema-generator is a Python library that generates BigQuery schemas from newline-delimited JSON or CSV data. Unlike BigQuery's native auto-detection which typically samples only the first 500 records, this tool processes all input data to create a more comprehensive and accurate schema. Currently at version 1.6.1, the library maintains an active release cadence, providing regular updates and bug fixes.","status":"active","version":"1.6.1","language":"en","source_language":"en","source_url":"https://github.com/bxparks/bigquery-schema-generator","tags":["BigQuery","schema","schema generation","data engineering","utility","JSON","CSV"],"install":[{"cmd":"pip install bigquery-schema-generator","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"SchemaGenerator","correct":"from bigquery_schema_generator.schema_generator import SchemaGenerator"}],"quickstart":{"code":"import json\nfrom bigquery_schema_generator.schema_generator import SchemaGenerator\n\n# Example data as a list of dictionaries\ndata = [\n    {\n        \"id\": \"rec1\",\n        \"name\": \"Alice\",\n        \"values\": [10, 20]\n    },\n    {\n        \"id\": \"rec2\",\n        \"name\": \"Bob\",\n        \"values\": [30]\n    },\n    {\n        \"id\": \"rec3\",\n        \"name\": None, # Will be NULLABLE\n        \"values\": []   # Will be REPEATED (empty array)\n    }\n]\n\n# Initialize the schema generator\ngenerator = SchemaGenerator()\n\n# Deduce schema from a list of dictionaries\nschema_map = generator.deduce_schema_from_dict(data)\nschema = generator.flatten_schema(schema_map)\n\n# Print the generated BigQuery schema in JSON format\nprint(json.dumps(schema, indent=2))","lang":"python","description":"This example demonstrates how to use `SchemaGenerator` as a library to deduce a BigQuery schema from a list of Python dictionaries. The `deduce_schema_from_dict` method processes the records, and `flatten_schema` converts the internal representation into a BigQuery-compatible JSON schema format."},"warnings":[{"fix":"Upgrade to `bigquery-schema-generator` version 1.6.1 or later (`pip install --upgrade bigquery-schema-generator`).","message":"Prior to version 1.6.1, repeated type mismatches for a single field in the input data could cause the schema generator to 'forget' the field's type, leading to multiple warnings and potentially an unstable schema deduction. Ensure you are using version 1.6.1 or newer for robust type inference with inconsistent data.","severity":"gotcha","affected_versions":"<1.6.1"},{"fix":"Review generated schemas if your data contains nulls in fields that might infer as `REPEATED` types. Use the `--keep_nulls` flag or similar options if specific handling of nulls is required to prevent their conversion to `REPEATED` in some contexts.","message":"As of version 1.6.0, `null` fields are now allowed to convert to `REPEATED` (e.g., an empty list `[]`) to align with how `bq load` interprets null values for array-like fields. This changes the previous behavior where `null` fields would typically be omitted or result in `NULLABLE`. Be aware of this change if your schema generation logic relied on a different interpretation for nulls in potentially repeated fields.","severity":"gotcha","affected_versions":">=1.6.0"},{"fix":"If updating an existing table, ensure new fields in the generated schema are `NULLABLE` or `REPEATED`. If `REQUIRED` fields are critical, consider creating a new table or using BigQuery's `ALTER TABLE` statements cautiously to modify modes where permitted.","message":"When using `SchemaGenerator` with existing BigQuery tables, be mindful of BigQuery's strict rules regarding schema evolution. Specifically, you cannot add `REQUIRED` columns to an existing BigQuery table; new columns must be `NULLABLE` or `REPEATED`. While the library helps generate a schema, attempting to apply a schema with newly introduced `REQUIRED` fields to an existing table will result in an error. The `--infer_mode` flag, when used with CSV, can infer `REQUIRED` fields if all values are non-null.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}