Pyreadstat: Read/Write SAS, SPSS, Stata Files
Pyreadstat is a Python library that allows reading and writing SAS (.sas7bdat, .xpt), SPSS (.sav, .zsav), and Stata (.dta) files into/from pandas and polars data frames. It is currently at version 1.3.3 and maintains an active, though irregular, release cadence to adapt to new pandas/polars versions and add features.
Warnings
- breaking Older versions of pyreadstat (prior to 1.3.3) may not be compatible with pandas 3.0 due to API changes in pandas. This can lead to errors when reading or writing dataframes.
- gotcha Handling character encodings in statistical files (SAS, SPSS, Stata) can be complex. If you encounter encoding errors or corrupted text, explicitly specify the correct encoding when reading files.
- gotcha Reading very large statistical files can consume significant memory, potentially leading to `MemoryError` as `pyreadstat` loads the entire file into memory. This is particularly relevant for files with millions of rows or numerous columns.
- gotcha SAS time variables are often stored as numeric values representing seconds or days since a reference date. pyreadstat attempts to convert these, but specific formats might require custom handling or the `sas_time_is_datetime_format` parameter.
Install
-
pip install pyreadstat -
pip install "pyreadstat[polars]"
Imports
- pyreadstat
import pyreadstat
Quickstart
import pyreadstat
import pandas as pd
import os
# Example: Create a dummy SAS file for reading and writing
data = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}
df_to_write = pd.DataFrame(data)
output_file = 'test_sas_file.sas7bdat'
# Write a dummy SAS file
try:
pyreadstat.write_sas7bdat(df_to_write, output_file)
print(f"Dummy SAS file '{output_file}' created successfully.")
# Read the SAS file
df_read, meta = pyreadstat.read_sas7bdat(output_file)
print("\nDataFrame read from file:")
print(df_read)
print("\nMetadata read from file:")
print(f"Column Names: {meta.column_names}")
print(f"Column Labels: {meta.column_labels}")
print(f"Table Name: {meta.table_name}")
except Exception as e:
print(f"An error occurred: {e}")
finally:
if os.path.exists(output_file):
os.remove(output_file)
print(f"Cleaned up '{output_file}'.")