Visions Data Type Inference
Visions is a Python library for declarative data type inference and validation. It allows users to define custom data types and their hierarchical relationships, then detect and cast these types within data structures like Pandas Series. The current version is 0.8.1, and it maintains a moderate release cadence, primarily focusing on compatibility fixes and API improvements.
Warnings
- breaking In `v0.7.0`, the API for defining and interacting with typesets underwent a significant breaking change. Public methods on typesets became static, and a new declarative API was introduced.
- gotcha Users may encounter compatibility issues with specific Python, Pandas, or NumPy versions, especially when using older `visions` versions. For example, `bottleneck` was removed in `v0.7.2` due to Python 3.9+ incompatibility, and `imghdr` was removed in `v0.8.0`.
- gotcha While `visions` primarily targets Pandas DataFrames, `v0.7.2` introduced explicit support for Numpy and Spark backends. The behavior and available features may differ significantly when operating with different backends, potentially leading to unexpected results if not explicitly managed.
Install
-
pip install visions
Imports
- detect_type
from visions import detect_type
- cast_to_detected_type
from visions import cast_to_detected_type
- StandardTypeset
from visions.typesets import StandardTypeset
- Integer
from visions.types import Integer
Quickstart
import pandas as pd
from visions import detect_type, cast_to_detected_type
from visions.types import Integer, String
from visions.typesets import StandardTypeset
# Example data
data = pd.Series([1, 2, 3, 4, 5])
# Initialize a typeset
typeset = StandardTypeset()
# Detect the initial type of the data
initial_type = detect_type(data, typeset)
print(f"Detected initial type: {initial_type.__class__.__name__}")
# Cast the data to a different type (e.g., String)
casted_data = cast_to_detected_type(data, String, typeset)
casted_type = detect_type(casted_data, typeset)
print(f"Casted data type: {casted_type.__class__.__name__}")
print(f"Casted data content: {casted_data.to_list()}")