TacoReader
raw JSON → 2.4.21 verified Mon Apr 27 auth: no python
TacoReader is a Python library for querying AI-ready datasets, supporting lazy SQL evaluation via DuckDB. It reads TACO datasets in ZIP (.tacozip), folder, or TacoCat consolidated formats. Current version: 2.4.21, requires Python >=3.10, <3.14. Release cadence: feature releases every few months, patch releases as needed.
pip install tacoreader Common errors
error ModuleNotFoundError: No module named 'tacoreader' ↓
cause tacoreader is not installed or Python environment is wrong.
fix
Install with
pip install tacoreader. Verify with python -m pip list. Use correct Python version (>=3.10, <3.14). error AttributeError: module 'tacoreader' has no attribute 'load' ↓
cause Importing wrong symbol or using outdated version before 'load' was added.
fix
Use
from tacoreader import load and ensure tacoreader >=2.3.0. Check version: python -c 'import tacoreader; print(tacoreader.__version__)' error duckdb.InvalidInputException: Invalid Input Error: No files found that match the pattern ↓
cause Dataset path is incorrect or the file format is not recognized.
fix
Verify the dataset path exists and has .tacozip extension or is a folder with correct structure. Use absolute paths if ambiguous.
Warnings
breaking In v2.4.0, `load()` accepts `pathlib.Path` objects directly. If you were converting Path to str, update to use Path directly. ↓
fix Replace `str(path)` with just `path` when calling `load()`.
deprecated The `base_path` parameter in `load()` is deprecated as of v2.4.0. Use `load()` with absolute paths or ensure paths are relative to the dataset root. ↓
fix Remove `base_path` argument and provide paths as absolute or relative to dataset.
gotcha When using `Dataset.concat()` with pre-filtered datasets, filters are preserved. If you expect filters to be cleared, you may get empty results. ↓
fix Ensure you understand filter propagation; if needed, apply the same filter after concat.
Imports
- load
from tacoreader import load - TacoDataset
from tacoreader import TacoDataset
Quickstart
from tacoreader import load
dataset = load("path/to/dataset.tacozip")
# Apply a filter and query
filtered = dataset.filter("time==2024")
result = filtered.sql("SELECT * FROM dataset LIMIT 5")
print(result)