DAWG-Python

0.7.2 · active · verified Thu Apr 16

Pure-python reader for DAWGs (Directed Acyclic Word Graphs / Deterministic Acyclic Finite State Automata). It's designed to load and query existing DAWG files, often created by the dawgdic C++ library or the DAWG Python C extension, but can also build small DAWGs from sorted word lists. The current version is 0.7.2, with releases occurring infrequently as needed.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a DAWG (for illustration), save it to a file, and then load and query it. It also shows basic usage of `IntDAWG` for words with associated integer payloads. The core use case is loading and querying, with file creation typically handled by other, more performant tools for large datasets.

import os
from dawg import DAWG, IntDAWG

# 1. Create a sample DAWG file (in a real scenario, this might come from dawgdic)
words_to_build = ['apple', 'apricot', 'banana', 'cat', 'dog']
# For large sets, words should be pre-sorted for performance.
temp_dawg = DAWG(words_to_build)
dawg_file_path = 'sample_data.dawg'
temp_dawg.save(dawg_file_path)

# 2. Load the DAWG from a file (primary use case)
loaded_dawg = DAWG().load(dawg_file_path)

# 3. Query the loaded DAWG
print(f"Is 'apple' in DAWG? {'apple' in loaded_dawg}")
print(f"Words starting with 'a': {list(loaded_dawg.keys('a'))}")
print(f"Longest prefix for 'apricot': {loaded_dawg.longest_prefix('apricot')}")

# Clean up the temporary file
os.remove(dawg_file_path)

# Example with IntDAWG for words with integer payloads
int_data = [('hello', 10), ('world', 20)]
int_dawg_obj = IntDAWG(int_data)
print(f"Value for 'world': {int_dawg_obj['world']}")

view raw JSON →