Morfessor

raw JSON →
2.0.6 verified Mon Apr 27 auth: no python

Morfessor is a tool for unsupervised morphological segmentation of words, often used in NLP and computational linguistics. It supports both morfessor-baseline and morfessor-cat (a cascade model). Current version 2.0.6, with infrequent releases.

pip install morfessor
error ModuleNotFoundError: No module named 'morfessor.baseline'
cause Trying to import from a submodule that does not exist in version 2.x.
fix
Import from the top-level: from morfessor import Morfessor.
error MorfessorBaseline not defined
cause Using the old class name from version 1.x.
fix
Use Morfessor instead of MorfessorBaseline.
breaking Python 3.6+ required starting from version 2.0.0. Python 2 is no longer supported.
fix Upgrade Python to 3.6 or later. Use pip install 'morfessor<2.0.0' if legacy Python is needed.
breaking The API changed significantly in version 2.0.0: the main class is now `Morfessor` instead of `MorfessorBaseline`. Old code using `MorfessorBaseline` will break.
fix Replace `MorfessorBaseline` with `Morfessor` in imports and instantiation.
gotcha Pretrained models are not included. The `Morfessor` class must be trained on data before segmenting. Using without training will produce trivial splits.
fix Train the model using `model.load_data('corpus.txt')` or `model.train_batch()` before calling `segment()`.

Basic segmentation using the default Morfessor model.

from morfessor import Morfessor

# Initialize and segment a word
model = Morfessor()
result = model.segment('unhappiness')
print(result)  # ['un', 'happiness']