DataProperty
DataProperty is a Python library designed to extract and analyze properties from various data types, including numbers, strings, and dates. It provides functionalities to determine characteristics like type, alignment, width, and digit counts for individual data points or entire matrices. The library is actively maintained, with version 1.1.0 released recently, and it has a regular release cadence addressing bug fixes, performance, and Python version compatibility.
Warnings
- breaking Python 3.7 and 3.8 are no longer supported as of `dataproperty` version 1.0.2.
- breaking Python 3.6 is no longer supported as of `dataproperty` version 1.0.0.
- breaking The `set_log_level` and `is_multibyte_str` functions were removed in version 1.0.0.
- gotcha The `decimal` context used by the `get_integer_digit` function was changed from a global to a local scope in version 1.0.2. If your application modified the global `decimal` context and expected `dataproperty` to honor it, behavior might differ.
- gotcha Versions prior to 0.54.2 had issues with `dict` inputs, causing preprocessing failures or improper padding length calculations.
Install
-
pip install dataproperty
Imports
- DataProperty
from dataproperty import DataProperty
- DataPropertyExtractor
from dataproperty import DataPropertyExtractor
Quickstart
import datetime
from dataproperty import DataPropertyExtractor
dp_extractor = DataPropertyExtractor()
dt = datetime.datetime(2017, 1, 1, 0, 0, 0)
inf = float("inf")
nan = float("nan")
data_matrix = [
[1, 1.1, "aa", 1, 1, True, inf, nan, dt],
[2, 2.2, "bbb", 2.2, 2.2, False, "inf", "nan", dt],
[3, 3.33, "cccc", -3, "ccc", "true", inf, "NAN", "2017-01-01T01:23:45+0900"],
]
dp_extractor.headers = ["int", "float", "str", "num", "mix", "bool", "inf", "nan", "time"]
col_dp_list = dp_extractor.to_column_dp_list(dp_extractor.to_dp_matrix(data_matrix))
for col_idx, col_dp in enumerate(col_dp_list):
print(str(col_dp))