pandas-read-xml (Legacy)
This library, `pandas-read-xml`, provides functionality to read XML files directly into pandas DataFrames. It aims to simplify the process of converting hierarchical XML data into a tabular format, offering options for path specification and automatic flattening. It is important to note that `pandas.read_xml` was introduced into the core pandas library in version 1.3.0, largely superseding the need for this standalone package for newer pandas installations. The latest version of this standalone library is 0.3.1.
Warnings
- breaking The functionality of this library has been incorporated into the main `pandas` library itself as `pandas.read_xml()` since `pandas` version 1.3.0. For new projects or installations with pandas >= 1.3.0, it is generally recommended to use `pd.read_xml()` directly instead of this standalone package.
- deprecated The `pandas-read-xml` GitHub repository explicitly states: 'Note that this isn't a mature or anything close to a complete solution. So I don't recommend using it in "production".' This suggests it was intended as a temporary solution before native pandas support.
- gotcha Working with complex or deeply nested XML structures can be challenging. Both `pandas-read-xml` and `pandas.read_xml` might require careful use of XPath expressions, handling of XML namespaces, and potentially pre-processing with XSLT to flatten data.
- gotcha The `root_is_rows` and `transpose` arguments in `pandas-read-xml` (and similar logic in `pandas.read_xml`'s `xpath` and structure interpretation) can be tricky. Incorrect usage might lead to a transposed DataFrame or incorrect row/column interpretation if the XML structure doesn't align with the default assumptions.
- gotcha XML data can sometimes have mixed types within the same tags (e.g., some instances are single elements, others are lists), making flattening difficult. `pandas_read_xml` includes `flatten()` and `auto_flatten()` methods to address this, but it remains a complex issue.
Install
-
pip install pandas-read-xml
Imports
- read_xml
import pandas_read_xml as pdx df = pdx.read_xml(...)
Quickstart
import pandas_read_xml as pdx
import io
xml_data = """<?xml version='1.0' encoding='utf-8'?>
<root>
<item id="1">
<name>Apple</name>
<price>1.00</price>
</item>
<item id="2">
<name>Banana</name>
<price>0.50</price>
</item>
</root>"""
# To read from a file, replace io.StringIO(xml_data) with 'path/to/your/file.xml'
df = pdx.read_xml(io.StringIO(xml_data), ['root', 'item'])
print(df)