Pydifact: EDIFACT Parser and Serializer
Pydifact is a Python library designed to provide comprehensive support for reading and writing EDIFACT (Electronic Data Interchange For Administration, Commerce and Transport) files. Despite being an older format, EDIFACT remains a standard in many business sectors, such as for the transfer of medical reports in Austria. This library is a work in progress, with its current version being 0.2.3, and the API is not yet stable, implying a potentially rapid release cadence with breaking changes.
Common errors
-
AttributeError: module 'collections' has no attribute 'Iterable'
cause This error typically occurs when using an older version of `pydifact` (e.g., <0.2.0) with Python 3.10 or newer. Python 3.10 removed `collections.Iterable` in favor of `collections.abc.Iterable`.fixUpgrade `pydifact` to version 0.2.0 or higher: `pip install --upgrade pydifact`. These versions explicitly require Python >=3.10 and correctly use `collections.abc.Iterable`. -
EdiFactSyntaxError: ...
cause Pydifact is designed to raise `EdiFactSyntaxError` when it encounters malformed EDIFACT syntax, rather than silently attempting to parse or ignore errors.fixImplement proper error handling (e.g., `try-except EdiFactSyntaxError`) around parsing operations, or preprocess/validate EDIFACT files to ensure they conform to the expected syntax before feeding them to pydifact. -
My code only reads the first message, not all messages in an interchange.
cause Users sometimes confuse the `Interchange` and `Message` classes or assume direct iteration over `Interchange` yields messages. The `Interchange` object represents the entire EDIFACT file, which can contain multiple messages.fixAfter creating an `Interchange` object, you must explicitly call `interchange.get_messages()` to retrieve an iterable of `Message` objects. Then, iterate over each `Message` to access its `segments` property.
Warnings
- breaking The API is not yet stable and frequent breaking changes can occur between minor versions. Always consult the `CHANGELOG.md` before upgrading.
- breaking The `SegmentCollection` class has been removed/deprecated. Functionality has been moved to the `Interchange` and `Message` classes.
- breaking Calls to `Segment()` now *must* provide the segment tag name as the first positional parameter.
- breaking Support for Python versions older than 3.10 has been dropped.
- gotcha When defining custom control characters, if you previously used positional arguments instead of keyword arguments, the introduction of a 'reserved' character parameter in some versions might shift argument positions and cause unexpected behavior.
Install
-
pip install pydifact
Imports
- Interchange
from pydifact.segmentcollection import SegmentCollection
from pydifact.segmentcollection import Interchange
- Segment
from pydifact.segments import Segment
Quickstart
from pydifact.segmentcollection import Interchange
from pydifact.segments import Segment
# Example EDIFACT data (Interchange containing one message)
edifact_data = (
"UNA:+,? '\n"
"UNB+UNOC:1+1234+3333+200102:2212+42'\n"
"UNH+42z42+PAORES:93:1:IA'\n"
"MSG+1:45'\n"
"IFT+3+XYZCOMPANY AVAILABILITY'\n"
"ERC+A7V:1:AMD'\n"
"UNT+5+42z42'\n"
"UNZ+2+42'"
)
# --- Reading an EDIFACT interchange from a string ---
interchange = Interchange.from_str(edifact_data)
print("\n--- Reading Interchange ---")
for message in interchange.get_messages():
for segment in message.segments:
print(f"Segment tag: {segment.tag}, content: {segment.elements}")
# --- Creating an EDIFACT interchange ---
new_interchange = Interchange()
new_interchange.add_segment(Segment("UNA", [":", "+", ",", "?", " ", "'"])) # Optional, if custom control characters are needed
new_interchange.add_segment(Segment("UNB", ["UNOC:1", "SENDER", "RECEIVER", "20230101:1000", "REF123"]))
new_message = new_interchange.new_message("ORDER", "D", "96A", "UN")
new_message.add_segment(Segment("BGM", ["220", "ORDER123"]))
new_message.add_segment(Segment("DTM", ["137:20230101:1000"]))
new_interchange.add_message(new_message)
print("\n--- Serializing Interchange ---")
print(new_interchange.serialize(break_lines=True))