Adjust Precision for Schema
This library (version 0.3.4) is designed for use in Singer.io data integration targets to address and overcome precision differences that can arise between various data source systems, Python's native numeric types, and target data warehouses or databases. It aims to ensure data consistency and accuracy, particularly for decimal and floating-point numbers, during the ETL process. The release cadence appears to be irregular, based on available PyPI data.
Warnings
- gotcha Without an explicit `multipleOf` or `precision`/`scale` definition in your Singer.io JSON Schema, the library may not be able to correctly infer the desired precision for numeric fields. Ensure your schemas are as explicit as possible for critical numeric types.
- gotcha Floating-point inaccuracies in Python can lead to unexpected rounding behavior. While this library aims to mitigate this, always test the precision adjustments with edge cases (e.g., `X.Y4999` vs `X.Y5000`) to ensure desired rounding.
- gotcha Schema evolution and changes in source data precision can silently break downstream data pipelines if not properly managed. Relying solely on automatic precision adjustment without validation can mask underlying data quality issues.
Install
-
pip install adjust-precision-for-schema
Imports
- adjust_precision
from adjust_precision_for_schema import adjust_precision
Quickstart
import json
from adjust_precision_for_schema import adjust_precision
# Example Singer SCHEMA message (simplified)
# This schema defines a 'price' field with a logical 'decimal' type
# and an implied precision/scale (e.g., up to 2 decimal places).
schema_message = {
"type": "SCHEMA",
"stream": "products",
"schema": {
"type": "object",
"properties": {
"id": {"type": "integer"},
"name": {"type": "string"},
"price": {
"type": ["number", "null"],
""_singer_type": "decimal",
""maximum": 1000000000000000000000000000000000000.00,
""multipleOf": 0.01
}
}
},
"key_properties": ["id"]
}
# Example Singer RECORD message
record_message = {
"type": "RECORD",
"stream": "products",
"record": {
"id": 1,
"name": "Product A",
"price": 123.456789 # Value with more precision than schema intends
}
}
# Another record with a value that should be adjusted minimally
record_message_2 = {
"type": "RECORD",
"stream": "products",
"record": {
"id": 2,
"name": "Product B",
"price": 99.99999999999999 # Value that should round up
}
}
# Hypothetical function call to adjust precision based on the schema
# The exact API (e.g., arguments, return type) is inferred.
adjusted_record_1 = adjust_precision(record_message['record'], schema_message['schema'])
adjusted_record_2 = adjust_precision(record_message_2['record'], schema_message['schema'])
print("Original Record 1 Price:", record_message['record']['price'])
print("Adjusted Record 1 Price:", adjusted_record_1['price'])
print("Original Record 2 Price:", record_message_2['record']['price'])
print("Adjusted Record 2 Price:", adjusted_record_2['price'])