Pynini
Pynini is a Python library providing efficient Python bindings for the OpenFst C++ library, enabling the construction, manipulation, and compilation of finite-state transducers (FSTs) and finite-state acceptors (FSAs). It's widely used for tasks in natural language processing (NLP) such as text normalization, phonology, and speech recognition. The current version is 2.1.7, with active development and regular releases.
Common errors
-
TypeError: unsupported operand type(s) for +: 'pynini.Fst' and 'str'
cause Attempting to concatenate a `pynini.Fst` object directly with a Python string.fixConvert the Python string into a finite-state acceptor using `pynini.accep()` before concatenation. Example: `my_fst + pynini.accep("literal_string")`. -
pynini.exceptions.FstCompilerException: OpenFst compilation failed for ...
cause This error indicates that the underlying OpenFst library failed to compile the provided grammar or string. Common causes include malformed FST text syntax, invalid UTF-8 characters, or exceeding OpenFst's internal limits.fixCarefully examine the input string, text file, or arguments passed to functions like `pynini.string_map()` or `pynini.string_file()`. Ensure correct OpenFst FST text format, proper character escaping, and valid UTF-8 encoding. Test with smaller, simpler inputs to isolate the issue. -
FileNotFoundError: No such file or directory: '...' (OpenFst)
cause Typically occurs when `pynini.string_file()` is called with a path to a non-existent file, or the Python process lacks read permissions for the specified file.fixVerify that the file path is absolutely correct and that the file exists at that location. Ensure the Python process has the necessary read permissions for the file. Use an absolute path or ensure the file is in the current working directory.
Warnings
- gotcha Pynini FSTs operate within specific weight semirings (e.g., Tropical, Log, Probability). The default is Tropical. Operations (like composition, union) on FSTs with different semirings will result in a runtime error.
- gotcha Pynini FST objects are generally immutable. Operations like `compose()`, `union()`, `closure()`, `concat()` return new FST objects rather than modifying the original in place.
- gotcha Despite `pip install pynini` usually working, Pynini is a C++ library binding. On some Linux distributions, macOS versions, or for advanced compilation flags, you might encounter issues with the bundled OpenFst leading to linker errors or missing symbols.
- gotcha Working with very large grammars or performing complex FST operations can be memory and CPU intensive, potentially leading to slow execution or out-of-memory errors.
Install
-
pip install pynini
Imports
- Fst
import pynini
- string_file
import pynini
- union
import pynini
- accep
import pynini
- compose
import pynini
Quickstart
import pynini as pn
# Create a simple acceptor for 'hello'
hello_fst = pn.accep("hello")
# Create an acceptor for 'world'
world_fst = pn.accep("world")
# Concatenate them with a space acceptor using the overloaded '+' operator
hello_world_fst = hello_fst + pn.accep(" ") + world_fst
print(f"Hello World FST: {hello_world_fst}")
print(f"Shortest path for Hello World: {pn.shortestpath(hello_world_fst).string()}")
# Create a simple transducer mapping 'a' to 'b'
a_to_b_map = pn.string_map([("a", "b"), ("c", "d")])
# Input string as an acceptor
input_string_fst = pn.accep("apple_cake")
# Compose the input with the transducer
output_fst = pn.compose(input_string_fst, a_to_b_map)
# Get the shortest path (result string)
if not output_fst.empty():
print(f"'apple_cake' -> '{pn.shortestpath(output_fst).string()}'")
else:
print("No valid output path found.")