DataShape
raw JSON → 0.5.2 verified Fri May 01 auth: no python maintenance
A data description language for describing and validating the structure of tabular and array data. Current version is 0.5.2. The library is in maintenance mode; no active development expected.
pip install datashape==0.5.2 Common errors
error TypeError: expected shape ... ↓
cause Using 'dshape' with incompatible numpy dtype or array.
fix
Ensure the numpy array's dtype and shape match the datashape string exactly, e.g., '3 * int32' requires an array of length 3 with dtype int32.
error ImportError: cannot import name 'DataShape' from 'datashape' ↓
cause Trying to import DataShape after installing a newer version that moved it, or incorrect import path.
fix
Use 'from datashape import DataShape' (ensure version 0.5.x). If you have an older version, upgrade: pip install --upgrade datashape.
Warnings
breaking The 'var' syntax for variable-length dimensions changed in 0.5.0. Previously 'var' was used directly, now it must be written as 'var * type'. ↓
fix Update shape strings to use 'var * type', e.g., 'var * int32' instead of 'var int32'.
deprecated The 'datashape.predicates' module is deprecated and may be removed in a future version. Use top-level functions like 'isrecord', 'isscalar', etc., from 'datashape' directly. ↓
fix Replace 'from datashape.predicates import isrecord' with 'from datashape import isrecord'.
gotcha DataShape's 'match()' method returns a boolean, not a validation report. It may raise TypeError if the data type is incompatible. ↓
fix Wrap match() in a try-except block for type validation, or use 'validate()' method if available (not in 0.5.2).
Imports
- DataShape wrong
import datashape.DataShapecorrectfrom datashape import DataShape - dshape wrong
from datashape.predicates import dshapecorrectfrom datashape import dshape
Quickstart
from datashape import dshape, DataShape
# Define a simple shape
ds = dshape('3 * int32')
print(ds)
# Validate data
import numpy as np
arr = np.array([1, 2, 3], dtype=np.int32)
result = ds.match(arr)
print(result)