{"id":4854,"library":"wordninja","title":"Wordninja","description":"Wordninja is a Python library that probabilistically splits concatenated words based on English Wikipedia uni-gram frequencies. It is designed to segment strings like 'imateapot' into ['im', 'a', 'teapot']. The current version is 2.0.0, released in August 2019, with a focus on stability rather than active new feature development.","status":"maintenance","version":"2.0.0","language":"en","source_language":"en","source_url":"https://github.com/keredson/wordninja","tags":["nlp","text-processing","word-segmentation","word-splitting"],"install":[{"cmd":"pip install wordninja","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The primary function `split` is accessed directly from the imported `wordninja` module, not imported directly.","wrong":"from wordninja import split","symbol":"split","correct":"import wordninja\nwordninja.split('text')"}],"quickstart":{"code":"import wordninja\n\nsplit_words = wordninja.split('thisisateststring')\nprint(split_words)\n\nsplit_phrase = wordninja.split('hellofromtheotherside')\nprint(split_phrase)","lang":"python","description":"Demonstrates how to import the library and use the `split` function to segment a concatenated string into a list of words."},"warnings":[{"fix":"To preserve punctuation, you may need to pre-process the text, or consider using the `wordninja-enhanced` fork (e.g., `pip install wordninja-enhanced`) which includes punctuation preservation and other features not present in the original library.","message":"The default `wordninja.split()` function removes punctuation from the input string and does not retain it in the output. For example, 'hello.world' becomes ['hello', 'world'].","severity":"gotcha","affected_versions":"2.0.0"},{"fix":"For critical applications, consider reviewing results for specific edge cases or exploring custom language models. The library supports custom language models if provided as gzipped text files with one word per line in decreasing order of probability.","message":"The probabilistic model can sometimes produce 'over-aggressive' or incorrect splits for specific words or abbreviations, such as 'patreon' splitting into ['pat', 're', 'on'] or 'esg' into ['e', 's', 'g'].","severity":"gotcha","affected_versions":"2.0.0"},{"fix":"If working with non-English languages or specific domains, ensure your custom word list adheres strictly to the required format. The `wordninja-enhanced` fork offers out-of-the-box support for several additional languages.","message":"The library is primarily designed for English. While custom language models are supported, creating them requires a specific gzipped text file format (one word per line, decreasing order of probability).","severity":"gotcha","affected_versions":"2.0.0"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}