DAWG2 Python Library
DAWG2-Python is a pure-Python library providing read-only access to Directed Acyclic Word Graphs (DAWGs) or Deterministic Acyclic Finite State Automata (DAFSAs). It specifically works with `.dawg` files created by the `dawgdic` C++ library or the original `DAWG` Python extension. The current version is 0.9.0, with infrequent releases typically for minor updates or dependency adjustments, focusing on API and binary compatibility with the original `DAWG` library where possible.
Common errors
-
AttributeError: 'DAWG' object has no attribute 'save'
cause Attempting to save a DAWG created or loaded with `dawg_python` (dawg2-python). This library is read-only.fixUse the original `dawg` library (e.g., `import dawg; d = dawg.DAWG(words); d.save('file.dawg')`) to create and save DAWG files. `dawg_python` is only for loading and querying. -
TypeError: DAWG() takes no arguments
cause Attempting to initialize `dawg_python.DAWG` with a list of words or other data. The constructor is intended for internal use or loading, not direct data input for building a DAWG.fixInitialize the `DAWG` object without arguments (e.g., `d = dawg_python.DAWG()`) and then call `.load('filepath.dawg')` to load an existing DAWG file. Creating DAWGs from data requires the original `dawg` library. -
ModuleNotFoundError: No module named 'dawg'
cause This error likely occurs if you are following documentation that refers to the *original* `dawg` library (the C-extension version) for creating `.dawg` files, but you have only installed `dawg2-python`.fixIf you intend to *create* `.dawg` files, you must install the `dawg` library (note the missing `_python`): `pip install dawg`. If you only intend to *read* existing `.dawg` files, ensure you are importing `dawg_python` correctly and loading an already saved file.
Warnings
- gotcha DAWG2-Python is a pure-Python library for *reading* existing DAWG files. It does not support creating, modifying, or saving DAWG structures. Attempting to use methods like `save()` or passing data to the constructor will result in errors.
- breaking Starting from version 0.8.0, the minimal Python version required has been updated to 3.8.
- gotcha The `setup.py` build system was migrated to `poetry` in version 0.8.0. This primarily affects maintainers and those building from source, but it indicates a change in development tooling.
Install
-
pip install dawg2-python
Imports
- DAWG
from dawg_python import DAWG
- BytesDAWG
from dawg_python import BytesDAWG
- RecordDAWG
from dawg_python import RecordDAWG
Quickstart
import os
import tempfile
import dawg # This is the *original* DAWG library, for creating the .dawg file
import dawg_python # This is dawg2-python, for reading the .dawg file
# Create a dummy .dawg file using the original 'dawg' library
# (dawg2-python is for reading, not creating/saving)
def create_dummy_dawg(filepath):
words = ['apple', 'apricot', 'banana', 'bandana', 'cat']
d = dawg.DAWG(words)
d.save(filepath)
# Use a temporary file for demonstration
with tempfile.NamedTemporaryFile(suffix='.dawg', delete=False) as tmp_file:
dawg_filepath = tmp_file.name
try:
# 1. Create a DAWG file (requires the 'dawg' library installed)
print(f"Creating a temporary DAWG file at {dawg_filepath}...")
create_dummy_dawg(dawg_filepath)
print("DAWG file created.")
# 2. Load and query the DAWG file using dawg2-python
print("Loading DAWG using dawg2-python...")
reader = dawg_python.DAWG().load(dawg_filepath)
print("DAWG loaded successfully.")
# Querying words
print(f"'apple' in DAWG: {'apple' in reader}")
print(f"'banana' in DAWG: {'banana' in reader}")
print(f"'orange' in DAWG: {'orange' in reader}")
# Getting prefixes
print(f"Prefixes for 'bandana': {list(reader.prefixes('bandana'))}")
# Iterating through all words
print("First 5 words:")
for i, word in enumerate(reader.iterkeys()):
if i >= 5: break
print(f"- {word}")
finally:
# Clean up the temporary file
if os.path.exists(dawg_filepath):
os.remove(dawg_filepath)
print(f"Cleaned up temporary file: {dawg_filepath}")