{"id":6937,"library":"unidic","title":"UniDic for Python","description":"UniDic is a dictionary for the MeCab morphological analyzer, specifically designed for modern written Japanese. The `unidic` Python package provides this dictionary data, allowing it to be easily integrated with MeCab wrappers like `fugashi` or `mecab-python3`. The current version is 1.1.0, and it generally releases new versions to incorporate updated UniDic data or minor quality-of-life improvements.","status":"active","version":"1.1.0","language":"en","source_language":"en","source_url":"https://github.com/polm/unidic-py","tags":["NLP","Japanese","dictionary","MeCab"],"install":[{"cmd":"pip install unidic","lang":"bash","label":"Install UniDic package"},{"cmd":"python -m unidic download","lang":"bash","label":"Download dictionary data (required after install)"}],"dependencies":[{"reason":"Required for performing morphological analysis using the UniDic data. unidic provides the dictionary, not the analyzer.","package":"mecab-python3","optional":true},{"reason":"Alternative MeCab wrapper for performing morphological analysis using the UniDic data. unidic provides the dictionary, not the analyzer.","package":"fugashi","optional":true}],"imports":[{"symbol":"unidic","correct":"import unidic"},{"note":"This constant provides the path to the installed UniDic dictionary, to be passed to a MeCab Tagger.","symbol":"DICDIR","correct":"import unidic\nunidic.DICDIR"}],"quickstart":{"code":"import unidic\nimport fugashi # or mecab-python3\nimport subprocess\nimport os\n\n# Ensure dictionary is downloaded (important step!)\ntry:\n    subprocess.run(['python', '-m', 'unidic', 'download'], check=True, capture_output=True)\n    print(\"UniDic dictionary downloaded successfully.\")\nexcept subprocess.CalledProcessError as e:\n    if 'already exists' in e.stderr.decode():\n        print(\"UniDic dictionary already exists.\")\n    else:\n        print(f\"Error downloading UniDic dictionary: {e.stderr.decode()}\")\n        # Handle error or exit if download failed\n\n# Initialize MeCab Tagger with UniDic\ntagger = fugashi.Tagger(f'-d \"{unidic.DICDIR}\"')\n\n# Analyze a Japanese sentence\nsentence = \"今日の天気は晴れです\"\nresult = tagger.parse(sentence)\nprint(f\"Sentence: {sentence}\")\nprint(f\"Analysis: \\n{result}\")\n\n# Access individual tokens (fugashi specific)\nwords = tagger.parseToNodeList(sentence)\nfor word in words:\n    if word.surface == '': # Skip empty node for EOS\n        continue\n    print(f\"Surface: {word.surface}, Lemma: {word.lemma}, POS: {word.pos1}\")","lang":"python","description":"After installing `unidic` and a MeCab wrapper like `fugashi` (or `mecab-python3`), you must first download the actual dictionary data using `python -m unidic download`. Then, you can import `unidic` and pass `unidic.DICDIR` to your MeCab tagger to enable Japanese morphological analysis."},"warnings":[{"fix":"To use the previous default (UniDic 2.3.0), you need to explicitly install it using `pip install unidic['2.3.0+2020-10-08']` or specifically reference that version's DICDIR if multiple are installed.","message":"Version 1.1.0 changed the 'latest' or default UniDic version from 2.3.0 to 3.1.0. If your application relied on the older default dictionary, its behavior may change.","severity":"breaking","affected_versions":">=1.1.0"},{"fix":"Always execute `python -m unidic download` after installation and before first use. Consider `unidic-lite` if disk space is a major concern, as it's a smaller version of UniDic.","message":"The `unidic` package itself is small, but after `pip install`, you MUST run `python -m unidic download` to fetch the actual dictionary data. This download can be large (around 770MB-1GB on disk). If this step is skipped, the package will not function.","severity":"gotcha","affected_versions":"All"},{"fix":"Install either `fugashi` (`pip install fugashi`) or `mecab-python3` (`pip install mecab-python3`) in addition to `unidic`.","message":"UniDic is a dictionary, not an analyzer. It requires a separate MeCab wrapper like `fugashi` or `mecab-python3` to perform morphological analysis. Without one of these, `unidic` alone cannot process text.","severity":"gotcha","affected_versions":"All"},{"fix":"Only install `unidic` if Japanese NLP is a requirement for your project. Consider making it an optional dependency if your library supports multiple languages.","message":"The `unidic` dictionary is specifically for Japanese language processing. It is unnecessary and incorrect to install and use it if your application does not require Japanese text analysis.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}