{"id":5231,"library":"fugashi","title":"Fugashi: Fast Pythonic Japanese Tokenization","description":"Fugashi is a Cython wrapper for MeCab, providing fast and Pythonic Japanese tokenization and morphological analysis. It offers pre-built wheels for common platforms and simplifies dictionary installation, primarily recommending UniDic. The library is actively maintained and currently at version 1.5.2.","status":"active","version":"1.5.2","language":"en","source_language":"en","source_url":"https://github.com/polm/fugashi","tags":["nlp","japanese","tokenization","mecab","morphological analysis"],"install":[{"cmd":"pip install fugashi","lang":"bash","label":"Base installation"},{"cmd":"pip install 'fugashi[unidic-lite]'","lang":"bash","label":"Install with lightweight UniDic"},{"cmd":"pip install 'fugashi[unidic]' && python -m unidic download","lang":"bash","label":"Install with full UniDic (requires download)"}],"dependencies":[{"reason":"Core tokenizer engine. Wheels include binaries, but manual installation from source is needed on unsupported platforms.","package":"MeCab","optional":false},{"reason":"Recommended Japanese dictionary for morphological analysis. Required for most use cases, available as `unidic-lite` (lightweight) or `unidic` (full).","package":"UniDic","optional":false}],"imports":[{"symbol":"Tagger","correct":"from fugashi import Tagger"},{"note":"Use when working with custom or non-UniDic dictionaries.","symbol":"GenericTagger","correct":"from fugashi import GenericTagger"},{"note":"Helper for creating named tuple wrappers for custom dictionary features.","symbol":"create_feature_wrapper","correct":"from fugashi import create_feature_wrapper"}],"quickstart":{"code":"from fugashi import Tagger\n\n# Initialize Tagger with '-Owakati' for whitespace-separated output\ntagger = Tagger('-Owakati')\n\ntext = \"麩菓子は、麩を主材料とした日本の菓子。\"\n\n# Get whitespace-separated tokens\nwakati_output = tagger.parse(text)\nprint(f\"Wakati: {wakati_output}\")\n\n# Iterate through words to get detailed features (UniDic assumed by default)\nprint(\"\\nDetailed analysis:\")\nfor word in tagger(text):\n    print(f\"Surface: {word.surface}\\tLemma: {word.feature.lemma}\\tPOS: {word.pos}\")\n\n# Example with GenericTagger and custom features (if not using UniDic or need specific fields)\n# from fugashi import GenericTagger, create_feature_wrapper\n# CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')\n# custom_tagger = GenericTagger(wrapper=CustomFeatures)\n# print(\"\\nCustom Tagger example:\")\n# for word in custom_tagger.parseToNodeList(\"テスト\"): # Example, requires a configured custom dictionary\n#     print(f\"Surface: {word.surface}\\tAlpha: {word.feature.alpha}\")","lang":"python","description":"Initializes a `Tagger` for Japanese text, demonstrating basic tokenization (wakati) and accessing morphological features like lemma and part-of-speech using UniDic features. The `tagger()` call directly yields `Word` objects for convenient iteration and attribute access."},"warnings":[{"fix":"Upgrade to Python 3.9+.","message":"Support for Python 3.6 and earlier versions was dropped in fugashi v1.2.0. Users on older Python versions must upgrade or use fugashi v1.1.2 or earlier.","severity":"breaking","affected_versions":"<1.2.0"},{"fix":"Install a dictionary via `pip install 'fugashi[unidic-lite]'` or `pip install 'fugashi[unidic]' && python -m unidic download`.","message":"Fugashi requires a MeCab dictionary to function. Forgetting to install one (e.g., `unidic-lite` or `unidic`) is a common error and will lead to initialization failures.","severity":"gotcha","affected_versions":"All"},{"fix":"Manually install MeCab from source for your specific platform before `pip install fugashi`. Consult MeCab documentation for details.","message":"On platforms where pre-built wheels are not available (e.g., musl-based Linux distros like Alpine, PowerPC, or Windows 32-bit), MeCab itself must be installed from source *before* installing fugashi.","severity":"gotcha","affected_versions":"All"},{"fix":"Ensure you are using `fugashi` for its Pythonic list-based node access, which simplifies processing with list comprehensions and other idioms.","message":"Earlier versions of MeCab wrappers (like `mecab-python3`) often returned a linked-list structure for `parseToNode`. Fugashi adopted a more Pythonic approach by returning a Python list of nodes, which is a significant API improvement for usability.","severity":"deprecated","affected_versions":"Older MeCab wrappers (<fugashi v1.0)"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}