Fastexcel
Fastexcel is a high-performance Python library for reading Excel files (.xlsx), implemented in Rust. It focuses on speed and memory efficiency, making it suitable for large datasets. The library is actively maintained with frequent minor releases, typically on a monthly cadence.
Warnings
- breaking Fastexcel has dropped support for older Python versions. v0.17.1 dropped Python 3.9, and v0.13.0 dropped Python 3.8. The current minimum required Python version is 3.10.
- gotcha Setting `schema_sample_rows=0` when initializing `Reader` is no longer allowed and will raise an error.
- gotcha Prior to v0.17.1, cells containing Excel error values like `#DIV/0!` might not have been consistently treated as null during type inference.
- gotcha In versions prior to v0.19.0, using `use_columns` with `load_table` when `column_names` was *not* specified could lead to incorrect behavior or errors.
Install
-
pip install fastexcel
Imports
- Reader
from fastexcel import Reader
Quickstart
import pandas as pd
from fastexcel import Reader
import os
# Create a dummy Excel file for demonstration
file_path = "dummy_data.xlsx"
data = {'ColumnA': [1, 2, 3], 'ColumnB': ['X', 'Y', 'Z']}
df = pd.DataFrame(data)
df.to_excel(file_path, index=False)
try:
# Initialize the reader with the Excel file path
reader = Reader(file_path)
# Load the first sheet into an Apache Arrow Table
table = reader.load_table(sheet_name=0)
print("\nData from first sheet (Arrow Table):")
print(table)
# To access sheet names, first call load_ws()
reader.load_ws() # Loads all worksheet metadata
if reader.ws_names:
first_sheet_name = reader.ws_names[0]
table_by_name = reader.load_table(sheet_name=first_sheet_name)
print(f"\nData from sheet '{first_sheet_name}' by name:")
print(table_by_name)
except Exception as e:
print(f"An error occurred: {e}")
finally:
# Clean up the dummy file
if os.path.exists(file_path):
os.remove(file_path)
print(f"\nCleaned up {file_path}")