{"id":7137,"library":"dataflows-tabulator","title":"Tabulator (dataflows-tabulator)","description":"Tabulator is a Python library providing a consistent and robust interface for streaming and processing tabular data from various sources and formats, including CSV, Excel, JSON, and SQL databases. It serves as a foundational data reading component within the `dataflows` framework. Currently at version 1.54.3, the library maintains a stable release cadence with regular updates.","status":"active","version":"1.54.3","language":"en","source_language":"en","source_url":"https://github.com/akariv/tabulator-py","tags":["dataflows","tabular data","csv","excel","json","data processing","io","data-io"],"install":[{"cmd":"pip install dataflows-tabulator","lang":"bash","label":"Base Install (includes Excel, ODS support)"},{"cmd":"pip install dataflows-tabulator[json,html,sql]","lang":"bash","label":"Install with common optional extras"}],"dependencies":[{"reason":"Core dependency for data packaging concepts","package":"datapackage","optional":false},{"reason":"Required for JSON path operations","package":"jsonpath-rw-ext","optional":false},{"reason":"Required for JSON schema validation","package":"jsonschema","optional":false},{"reason":"Core dependency for table schema management","package":"tableschema","optional":false},{"reason":"Required for reading/writing modern Excel files (.xlsx)","package":"openpyxl","optional":false},{"reason":"Required for reading legacy Excel files (.xls)","package":"xlrd","optional":false},{"reason":"Required for writing legacy Excel files (.xls)","package":"xlwt","optional":false},{"reason":"Required for reading/writing Open Document Spreadsheet files (.ods)","package":"odfpy","optional":false},{"reason":"Command-line interface toolkit","package":"click","optional":false},{"reason":"Markdown parser (used for some internal documentation/rendering)","package":"mistune","optional":false},{"reason":"Optional: Required for efficient parsing of large JSON files","package":"ijson","optional":true},{"reason":"Optional: Required for reading HTML tables","package":"beautifulsoup4","optional":true},{"reason":"Optional: Required for efficient HTML parsing","package":"lxml","optional":true},{"reason":"Optional: Required for reading/writing SQL database tables","package":"sqlalchemy","optional":true}],"imports":[{"note":"The PyPI package is `dataflows-tabulator`, but the importable module is `tabulator`.","wrong":"import dataflows_tabulator","symbol":"tabulator","correct":"import tabulator"},{"symbol":"Stream","correct":"from tabulator import Stream"}],"quickstart":{"code":"import tabulator\nimport os\n\n# Example CSV data (in-memory string for quickstart)\ncsv_data = \"id,name\\n1,Alice\\n2,Bob\"\n\n# Create a simple CSV file for demonstration\nfile_path = 'example.csv'\nwith open(file_path, 'w', encoding='utf-8') as f:\n    f.write(csv_data)\n\n# Read the data using tabulator\n# For local files, simply pass the path\ntable = tabulator.Stream(file_path, headers='first-row')\ntable.open()\n\nprint(\"Headers:\", table.headers)\nprint(\"Rows:\")\nfor row in table:\n    print(row)\n\ntable.close()\n\n# Cleanup (optional)\nos.remove(file_path)\n","lang":"python","description":"This quickstart demonstrates how to read a local CSV file using `tabulator.Stream`. It opens the stream, prints headers, iterates through rows as lists, and then closes the stream. `headers='first-row'` automatically infers headers from the first row of the data source."},"warnings":[{"fix":"Always use `import tabulator` or `from tabulator import ...` in your code.","message":"The PyPI package is named `dataflows-tabulator`, but the Python module you should `import` is simply `tabulator`. Attempting to `import dataflows_tabulator` will result in a `ModuleNotFoundError`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install `dataflows-tabulator` with the necessary extras, for example: `pip install dataflows-tabulator[json,html,sql]`. Alternatively, install the specific dependency (e.g., `pip install ijson`) separately.","message":"While core formats like CSV, Excel, and ODS are supported out-of-the-box, reading/writing certain other formats (e.g., large JSON, HTML, SQL databases) requires installing additional 'extra' dependencies. Without these, `tabulator` will raise an error when attempting to use those formats.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Explicitly provide the `format` and `encoding` arguments to `tabulator.Stream()` when encountering issues, e.g., `tabulator.Stream('data.csv', format='csv', encoding='latin-1')`.","message":"When reading files, `tabulator` attempts to infer the file format and encoding. This inference is usually reliable but can fail with non-standard files or specific character encodings. Incorrect inference can lead to parsing errors or garbled text.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change your import statement from `import dataflows_tabulator` to `import tabulator`.","cause":"The Python package `dataflows-tabulator` is installed, but you tried to import it using its PyPI name with underscores, which is not the correct module name.","error":"ModuleNotFoundError: No module named 'dataflows_tabulator'"},{"fix":"Install the `json` extra for `dataflows-tabulator`: `pip install dataflows-tabulator[json]` or `pip install ijson` directly.","cause":"You are attempting to read a JSON file, but the optional `ijson` dependency (needed for efficient JSON parsing) has not been installed.","error":"tabulator.errors.TabulatorException: Missing dependency for 'json' format. Please install 'ijson'."},{"fix":"Specify the correct encoding when creating the stream, for example: `tabulator.Stream('data.csv', encoding='latin-1')`. Common alternative encodings include 'latin-1', 'cp1252', or 'iso-8859-1'.","cause":"The file you are trying to read is not encoded in UTF-8, but `tabulator` defaults to UTF-8 for text files.","error":"UnicodeDecodeError: 'utf-8' codec can't decode byte 0x__ in position _: invalid start byte"}]}