GetSchema
GetSchema is a Python library designed to infer JSON schemas from sample data records. It analyzes diverse data inputs to automatically generate a robust JSON Schema definition. The project is actively maintained, with frequent patch releases addressing bug fixes and minor feature enhancements, currently at version 0.2.11.
Warnings
- breaking Versions prior to 0.2.5 incorrectly auto-converted values like 0, 0.0, empty strings (''), and false to `null` during schema inference. This behavior was a bug and has been fixed in v0.2.5 and later.
- gotcha The default inference for fields containing only `null` values changed in v0.2.10. Prior to this version, such fields might have been inferred as `{"type": "null"}` or `{"type": "string"}` (since v0.2.3). From v0.2.10, they default to `{"type": ["null", "string"]}` (nullable string).
- gotcha In versions prior to v0.2.3, if a field contained only `null` values across all sample records, the inferred JSON Schema type for that field would be `{"type": "null"}`. From v0.2.3, this default was changed to `{"type": "string"}` (and later to `{"type": ["null", "string"]}` in v0.2.10).
Install
-
pip install getschema
Imports
- infer_schema
import getschema # or from getschema import infer_schema
Quickstart
import getschema
import json
sample_records = [
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 24, "city": "London", "email": "bob@example.com"},
{"name": "Charlie", "age": None, "city": "Paris"},
{"name": "David", "age": 35, "city": "Berlin", "hobbies": ["reading", "hiking"]}
]
# Infer schema from a list of records
schema = getschema.infer_schema(sample_records)
print(json.dumps(schema, indent=2))