Pandas Data Types for SQL Systems
db-dtypes is a specialized Python library that provides Pandas Extension Dtypes to represent and interact with data types commonly found in relational databases, particularly BigQuery and Spanner. It aims to ensure data integrity and improve performance by providing faithful representations of types like Date, Time, and JSON. The library is actively maintained by Google, with multiple releases per year, and its current version is 1.5.0.
Warnings
- breaking Python 2 support was dropped as of January 1, 2020. The library currently requires Python >= 3.9. Older versions of the library (1.4.3 onwards) also explicitly dropped support for Python 3.7 and 3.8.
- breaking With `google-cloud-bigquery` version 3.0.0, `db-dtypes` became a required dependency for the `pandas` extra. BigQuery's `DATE` and `TIME` data types now map directly to `dbdate` and `dbtime` dtypes, and the `date_as_object` parameter was removed.
- gotcha For `db-dtypes` extension types (like `dbdate`, `dbtime`, `dbjson`) to be recognized by pandas, the `db_dtypes` module must be imported at least once. This import performs the necessary registration of the custom dtypes.
- gotcha When converting BigQuery `DATE` data to pandas `dbdate` dtype, any date values falling outside the range of `pandas.Timestamp.min` (1677-09-22) and `pandas.Timestamp.max` (2262-04-11) will map to the generic `object` dtype in pandas instead of `dbdate`.
- gotcha Standard Pandas `float64` can introduce precision errors when dealing with database `DECIMAL` or `NUMERIC` types. While `db-dtypes` aims to improve type fidelity, there isn't a direct `DecimalDtype` class for explicit import like `DateDtype`.
Install
-
pip install db-dtypes
Imports
- db_dtypes
import db_dtypes # noqa: F401
- DateDtype
from db_dtypes import DateDtype
- TimeDtype
from db_dtypes import TimeDtype
- JSONDtype
from db_dtypes import JSONDtype
Quickstart
import datetime
import pandas as pd
import db_dtypes # noqa: F401
# Using dbdate dtype
dates = pd.Series([datetime.date(2023, 1, 1), '2023-01-02'], dtype='dbdate')
print('Dates Series:')
print(dates)
# Using dbtime dtype
times = pd.Series([datetime.time(10, 30, 0), '15:45:00.123'], dtype='dbtime')
print('\nTimes Series:')
print(times)
# Using dbjson dtype
json_data = pd.Series([{'key': 'value'}, [1, 2, 3], 'null'], dtype='dbjson')
print('\nJSON Series:')
print(json_data)