Fast XLSX Reader
Pyxlsx is a fast and memory-efficient Python library designed for reading data from XLSX (Excel) files. It focuses on parsing core data, shared strings, and basic styling, making it suitable for high-performance data extraction. The current stable version is 1.1.3, and releases are typically made for bug fixes and minor improvements as needed.
Common errors
-
ModuleNotFoundError: No module named 'lxml'
cause The essential `lxml` dependency was not installed correctly or is missing from your environment.fixRun `pip install lxml` or `pip install pyxlsx` (which should pull `lxml`). If `lxml` installation fails, consult its documentation for system-specific prerequisites. -
FileNotFoundError: [Errno 2] No such file or directory: 'your_file.xlsx'
cause The path to the XLSX file provided to `pyxlsx.Reader` is incorrect or the file does not exist at the specified location.fixVerify the file path is absolute or correct relative to your script's execution directory. Double-check the filename and extension. -
ValueError: Invalid XML: missing required element 'worksheet'
cause The XLSX file is either corrupted, not a valid XLSX file, or encrypted/password-protected in a way `pyxlsx` cannot handle.fixEnsure the `.xlsx` file is valid and can be opened in Excel or another spreadsheet program. Try opening and re-saving the file, or use a different file. `pyxlsx` does not support encrypted files.
Warnings
- gotcha Pyxlsx is a reader-only library. It does not support writing, modifying, or creating XLSX files. Features like macros, charts, and complex formulas are also not parsed.
- gotcha The `lxml` dependency can sometimes be challenging to install, especially on systems without pre-compiled wheels or proper C compiler setup. It's crucial for `pyxlsx`'s performance.
- gotcha Pyxlsx provides basic parsing of cell values. While it handles basic number and date formatting, complex date/time formats, custom number formats, or cells with errors might be returned as their raw underlying value or require additional post-processing.
Install
-
pip install pyxlsx
Imports
- Reader
from pyxlsx import Reader
Quickstart
import openpyxl
import os
from pyxlsx import Reader
# Create a dummy XLSX file for demonstration
file_path = "example.xlsx"
workbook = openpyxl.Workbook()
sheet = workbook.active
sheet['A1'] = "Name"
sheet['B1'] = "Age"
sheet['A2'] = "Alice"
sheet['B2'] = 30
sheet['A3'] = "Bob"
sheet['B3'] = 24
workbook.save(file_path)
print(f"Created dummy file: {file_path}")
# Use pyxlsx to read the file
try:
reader = Reader(file_path)
# pyxlsx iterators are one-pass; convert to list if you need to iterate multiple times
rows_data = list(reader.rows())
if rows_data:
print("Header row:", rows_data[0])
print("Data rows:")
for row in rows_data[1:]:
print(row)
else:
print("No data found in the file.")
except FileNotFoundError:
print(f"Error: The file {file_path} was not found. Please ensure it exists.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
finally:
# Clean up the dummy file
if os.path.exists(file_path):
os.remove(file_path)
print(f"Cleaned up dummy file: {file_path}")