WordSegment

raw JSON →
1.3.1 verified Mon Apr 27 auth: no python maintenance

English word segmentation library that segments concatenated text into separate words. Version 1.3.1 released October 2019; no active development since then. Designed for segmenting hashtags, URLs, or other concatenated phrases.

pip install wordsegment
error ImportError: No module named 'wordsegment'
cause The library is not installed or the environment is wrong.
fix
Run 'pip install wordsegment' in the correct Python environment.
error TypeError: segment() missing 1 required positional argument: 'text'
cause Calling segment without arguments or passing keyword incorrectly.
fix
Use segment('yourtext') with a single string argument.
gotcha The library uses a unigram model that may not handle proper nouns or domain-specific terms well. It can produce unexpected segmentations for text with numbers or mixed case.
fix Preprocess input to lowercase or clean punctuation. Consider adding custom words to the model.
deprecated This library is no longer actively maintained. Python 2 support may have issues; Python 3 is recommended.
fix Consider using an alternative like 'wordninja' or a more modern NLP library.
gotcha The 'segment' function expects a single string without spaces. Passing a string with spaces will still segment but may produce incorrect results.
fix Remove spaces or punctuation from input before passing to segment.

Basic segmentation example using the default model.

from wordsegment import segment
print(segment('wheninthecourse'))  # Output: ['when', 'in', 'the', 'course']