TacoReader

raw JSON →
2.4.21 verified Mon Apr 27 auth: no python

TacoReader is a Python library for querying AI-ready datasets, supporting lazy SQL evaluation via DuckDB. It reads TACO datasets in ZIP (.tacozip), folder, or TacoCat consolidated formats. Current version: 2.4.21, requires Python >=3.10, <3.14. Release cadence: feature releases every few months, patch releases as needed.

pip install tacoreader
error ModuleNotFoundError: No module named 'tacoreader'
cause tacoreader is not installed or Python environment is wrong.
fix
Install with pip install tacoreader. Verify with python -m pip list. Use correct Python version (>=3.10, <3.14).
error AttributeError: module 'tacoreader' has no attribute 'load'
cause Importing wrong symbol or using outdated version before 'load' was added.
fix
Use from tacoreader import load and ensure tacoreader >=2.3.0. Check version: python -c 'import tacoreader; print(tacoreader.__version__)'
error duckdb.InvalidInputException: Invalid Input Error: No files found that match the pattern
cause Dataset path is incorrect or the file format is not recognized.
fix
Verify the dataset path exists and has .tacozip extension or is a folder with correct structure. Use absolute paths if ambiguous.
breaking In v2.4.0, `load()` accepts `pathlib.Path` objects directly. If you were converting Path to str, update to use Path directly.
fix Replace `str(path)` with just `path` when calling `load()`.
deprecated The `base_path` parameter in `load()` is deprecated as of v2.4.0. Use `load()` with absolute paths or ensure paths are relative to the dataset root.
fix Remove `base_path` argument and provide paths as absolute or relative to dataset.
gotcha When using `Dataset.concat()` with pre-filtered datasets, filters are preserved. If you expect filters to be cleared, you may get empty results.
fix Ensure you understand filter propagation; if needed, apply the same filter after concat.

Load a TACO dataset, filter by time, and execute a SQL query.

from tacoreader import load

dataset = load("path/to/dataset.tacozip")
# Apply a filter and query
filtered = dataset.filter("time==2024")
result = filtered.sql("SELECT * FROM dataset LIMIT 5")
print(result)