Tabulator (dataflows-tabulator)
Tabulator is a Python library providing a consistent and robust interface for streaming and processing tabular data from various sources and formats, including CSV, Excel, JSON, and SQL databases. It serves as a foundational data reading component within the `dataflows` framework. Currently at version 1.54.3, the library maintains a stable release cadence with regular updates.
Common errors
-
ModuleNotFoundError: No module named 'dataflows_tabulator'
cause The Python package `dataflows-tabulator` is installed, but you tried to import it using its PyPI name with underscores, which is not the correct module name.fixChange your import statement from `import dataflows_tabulator` to `import tabulator`. -
tabulator.errors.TabulatorException: Missing dependency for 'json' format. Please install 'ijson'.
cause You are attempting to read a JSON file, but the optional `ijson` dependency (needed for efficient JSON parsing) has not been installed.fixInstall the `json` extra for `dataflows-tabulator`: `pip install dataflows-tabulator[json]` or `pip install ijson` directly. -
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x__ in position _: invalid start byte
cause The file you are trying to read is not encoded in UTF-8, but `tabulator` defaults to UTF-8 for text files.fixSpecify the correct encoding when creating the stream, for example: `tabulator.Stream('data.csv', encoding='latin-1')`. Common alternative encodings include 'latin-1', 'cp1252', or 'iso-8859-1'.
Warnings
- gotcha The PyPI package is named `dataflows-tabulator`, but the Python module you should `import` is simply `tabulator`. Attempting to `import dataflows_tabulator` will result in a `ModuleNotFoundError`.
- gotcha While core formats like CSV, Excel, and ODS are supported out-of-the-box, reading/writing certain other formats (e.g., large JSON, HTML, SQL databases) requires installing additional 'extra' dependencies. Without these, `tabulator` will raise an error when attempting to use those formats.
- gotcha When reading files, `tabulator` attempts to infer the file format and encoding. This inference is usually reliable but can fail with non-standard files or specific character encodings. Incorrect inference can lead to parsing errors or garbled text.
Install
-
pip install dataflows-tabulator -
pip install dataflows-tabulator[json,html,sql]
Imports
- tabulator
import dataflows_tabulator
import tabulator
- Stream
from tabulator import Stream
Quickstart
import tabulator
import os
# Example CSV data (in-memory string for quickstart)
csv_data = "id,name\n1,Alice\n2,Bob"
# Create a simple CSV file for demonstration
file_path = 'example.csv'
with open(file_path, 'w', encoding='utf-8') as f:
f.write(csv_data)
# Read the data using tabulator
# For local files, simply pass the path
table = tabulator.Stream(file_path, headers='first-row')
table.open()
print("Headers:", table.headers)
print("Rows:")
for row in table:
print(row)
table.close()
# Cleanup (optional)
os.remove(file_path)