Pyexcel IO
Pyexcel-io is a Python library providing a unified API to read and write structured data from various file formats (CSV, TSV, and extensions like XLS, XLSX, ODS via plugins) and databases (SQLAlchemy, Django models). It focuses on data processing, not formatting, fonts, or charts. It is actively maintained with frequent minor releases.
Warnings
- breaking Pyexcel-io v0.6.0 dropped support for Python 2 and older Python 3 versions (3.3, 3.4, 3.5). Python 3.6+ is now required.
- breaking The plugin interface was completely rewritten in v0.6.0, making plugins developed for older versions (e.g., v0.3.x) incompatible. This also impacts PyInstaller users requiring updated configurations.
- gotcha The `chardet` library, previously an implicit dependency for `csvz` and `tsvz` readers, became optional in v0.6.7. If your application relies on these formats, you may need to install `chardet` explicitly.
- gotcha In versions prior to 0.6.7, pathnames containing dots ('.') could cause `file_name` errors when using `get_writer`.
- gotcha When multiple pyexcel plugins (e.g., `pyexcel-ods` and `pyexcel-odsr`) are installed that support the same file format, `pyexcel-io` might pick one unpredictably. This can lead to unexpected behavior or an error.
- gotcha Pyexcel-io specifically focuses on tabular data; it does not support reading or writing fonts, colors, charts, images, or password-protected Excel files.
- gotcha While pyexcel-io supports 'streaming' for large files, many underlying plugins for formats like XLS, XLSX, and ODS (e.g., `pyexcel-xls`, `pyexcel-xlsx`, `pyexcel-ods`) still load the entire file into memory due to their zipped XML structure. Only `csv` readers, `pyexcel-xlsxr`, `pyexcel-odsr`, and `pyexcel-htmlr` truly support partial/streaming reads for memory efficiency.
Install
-
pip install pyexcel-io
Imports
- get_data
from pyexcel_io import get_data
- save_data
from pyexcel_io import save_data
Quickstart
import os
from pyexcel_io import save_data, get_data
from io import StringIO
# --- Writing data to a virtual CSV file (StringIO) ---
data_to_write = [
["Name", "Age", "City"],
["Alice", 30, "New York"],
["Bob", 24, "London"]
]
output_stream = StringIO()
save_data(output_stream, data_to_write, file_type='csv')
# Simulate getting content from the stream
output_stream.seek(0)
csv_content = output_stream.getvalue()
print("Generated CSV content:\n" + csv_content)
# --- Reading data from the virtual CSV file (StringIO) ---
input_stream = StringIO(csv_content)
# get_data returns a dictionary where keys are sheet names (or 'csv' for single file)
dread_data = get_data(input_stream, file_type='csv')
# Access the first (and only) sheet, usually named 'csv'
if 'csv' in dread_data:
print("\nRead data:")
for row in dread_data['csv']:
print(row)
else:
print("Error: Could not read data from CSV stream.")
# Example with a physical file (uncomment to test locally)
# file_name = "example.csv"
# save_data(file_name, data_to_write)
# data_from_file = get_data(file_name)
# print(f"\nData from {file_name}: {data_from_file[file_name]}")
# os.remove(file_name) # Clean up