{"id":5027,"library":"pyreadstat","title":"Pyreadstat: Read/Write SAS, SPSS, Stata Files","description":"Pyreadstat is a Python library that allows reading and writing SAS (.sas7bdat, .xpt), SPSS (.sav, .zsav), and Stata (.dta) files into/from pandas and polars data frames. It is currently at version 1.3.3 and maintains an active, though irregular, release cadence to adapt to new pandas/polars versions and add features.","status":"active","version":"1.3.3","language":"en","source_language":"en","source_url":"https://github.com/Roche/pyreadstat","tags":["sas","spss","stata","pandas","polars","data-interchange","statistical-files","data-wrangling"],"install":[{"cmd":"pip install pyreadstat","lang":"bash","label":"Standard installation"},{"cmd":"pip install \"pyreadstat[polars]\"","lang":"bash","label":"With Polars support (optional)"}],"dependencies":[{"reason":"Core dependency for DataFrame operations.","package":"pandas","optional":false},{"reason":"Core dependency, provides a common API for DataFrame operations.","package":"narwhals","optional":false},{"reason":"Optional dependency for Polars DataFrame support.","package":"polars","optional":true}],"imports":[{"symbol":"pyreadstat","correct":"import pyreadstat"}],"quickstart":{"code":"import pyreadstat\nimport pandas as pd\nimport os\n\n# Example: Create a dummy SAS file for reading and writing\ndata = {'col1': [1, 2, 3], 'col2': ['A', 'B', 'C']}\ndf_to_write = pd.DataFrame(data)\noutput_file = 'test_sas_file.sas7bdat'\n\n# Write a dummy SAS file\ntry:\n    pyreadstat.write_sas7bdat(df_to_write, output_file)\n    print(f\"Dummy SAS file '{output_file}' created successfully.\")\n\n    # Read the SAS file\n    df_read, meta = pyreadstat.read_sas7bdat(output_file)\n\n    print(\"\\nDataFrame read from file:\")\n    print(df_read)\n    print(\"\\nMetadata read from file:\")\n    print(f\"Column Names: {meta.column_names}\")\n    print(f\"Column Labels: {meta.column_labels}\")\n    print(f\"Table Name: {meta.table_name}\")\n\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")\nfinally:\n    if os.path.exists(output_file):\n        os.remove(output_file)\n        print(f\"Cleaned up '{output_file}'.\")","lang":"python","description":"This quickstart demonstrates how to import pyreadstat, write a simple pandas DataFrame to a SAS .sas7bdat file, and then read it back, illustrating basic data and metadata retrieval. The same `read_*` and `write_*` patterns apply to SPSS and Stata files."},"warnings":[{"fix":"Upgrade pyreadstat to version 1.3.3 or newer to ensure compatibility with pandas 3.0. If you cannot upgrade pyreadstat, consider pinning your pandas version to `<3.0`.","message":"Older versions of pyreadstat (prior to 1.3.3) may not be compatible with pandas 3.0 due to API changes in pandas. This can lead to errors when reading or writing dataframes.","severity":"breaking","affected_versions":"<1.3.3"},{"fix":"Use the `encoding` parameter in `read_sas7bdat()`, `read_sav()`, or `read_dta()`. Common encodings include 'latin-1', 'cp1252', 'utf-8'. Example: `pyreadstat.read_sas7bdat('file.sas7bdat', encoding='latin-1')`.","message":"Handling character encodings in statistical files (SAS, SPSS, Stata) can be complex. If you encounter encoding errors or corrupted text, explicitly specify the correct encoding when reading files.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your system has sufficient RAM to accommodate the file size. For extremely large files, consider preprocessing them with other tools (e.g., SAS/SPSS/Stata itself) to extract subsets, or convert them to more memory-efficient formats (like Parquet or Feather) before processing with Python if only specific columns/rows are needed.","message":"Reading very large statistical files can consume significant memory, potentially leading to `MemoryError` as `pyreadstat` loads the entire file into memory. This is particularly relevant for files with millions of rows or numerous columns.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If SAS date/time variables are not parsed correctly, inspect the metadata (`meta.variable_formats`) and consider passing `sas_time_is_datetime_format=True` to `read_sas7bdat` if the SAS file explicitly stores datetime types. Otherwise, manual conversion might be necessary post-read.","message":"SAS time variables are often stored as numeric values representing seconds or days since a reference date. pyreadstat attempts to convert these, but specific formats might require custom handling or the `sas_time_is_datetime_format` parameter.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}