{"id":5505,"library":"sudachidict-full","title":"Sudachi Dictionary (Full Edition)","description":"sudachidict-full is a data package that provides the largest, 'full' edition of the Japanese dictionary for use with SudachiPy, a powerful Japanese morphological analyzer. It does not provide direct Python APIs for tokenization but serves as a dependency for SudachiPy. The current version is 20260116, and new versions are released regularly (typically every 2-3 months) to update dictionary entries and synonyms.","status":"active","version":"20260116","language":"en","source_language":"en","source_url":"https://github.com/WorksApplications/SudachiDict","tags":["nlp","japanese","dictionary","morphological-analysis"],"install":[{"cmd":"pip install sudachidict-full sudachipy","lang":"bash","label":"Install with SudachiPy"}],"dependencies":[{"reason":"This package provides dictionary data for SudachiPy. SudachiPy is required to perform morphological analysis.","package":"sudachipy","optional":false}],"imports":[{"note":"This provides the file path to the installed 'full' dictionary, useful for explicit configuration with SudachiPy, though SudachiPy often finds it automatically.","symbol":"Path","correct":"from sudachidict_full.dictionary import Path"}],"quickstart":{"code":"from sudachipy import tokenizer\nfrom sudachipy import dictionary\n\n# sudachidict-full must be installed for this to load the full dictionary.\n# SudachiPy automatically selects the largest installed dictionary by default.\n# To explicitly ensure the 'full' dictionary is used, you can pass dict_type='full'.\n# tokenizer_obj = dictionary.Dictionary(dict_type='full').create()\n\n# Create a Sudachi tokenizer instance (will use the 'full' dict if installed)\ntokenizer_obj = dictionary.Dictionary().create()\nmode = tokenizer.Tokenizer.SplitMode.C\n\ntext = \"寿司は美味しい。\"\n\n# Tokenize the text\nprint(f\"Original text: {text}\")\n\nmorphemes = tokenizer_obj.tokenize(text, mode)\n\nprint(\"\\nTokenization results (Surface form, Part-of-Speech, Base form):\")\nfor m in morphemes:\n    print(f\"  {m.surface()}\\t{m.part_of_speech()}\\t{m.base_form()}\")\n\n# Example of getting the dictionary path (for advanced configuration)\n# import sudachidict_full\n# dict_path = sudachidict_full.dictionary.Path()\n# print(f\"\\nPath to the 'full' dictionary: {dict_path}\")\n","lang":"python","description":"This quickstart demonstrates how to use `sudachipy` with the `sudachidict-full` dictionary. After installing `sudachidict-full` alongside `sudachipy`, `sudachipy` will automatically detect and use the full dictionary by default when creating a `Dictionary` instance. The example tokenizes a simple Japanese sentence."},"warnings":[{"fix":"Ensure `sudachipy` is installed: `pip install sudachipy`.","message":"sudachidict-full is a data package providing dictionary files, not a standalone library for performing morphological analysis. You must install `sudachipy` separately to utilize the dictionary data.","severity":"gotcha","affected_versions":"All versions of sudachidict-full"},{"fix":"Review your application's reliance on Sudachi's normalization features and adapt post-processing or logic as needed. Test with affected inputs to understand the new behavior.","message":"Starting from version 20251022, Sudachi's internal dictionary normalization has been partly discontinued and replaced with a synonym dictionary. This change may lead to different tokenization results or altered behavior for applications relying on the previous normalization process.","severity":"breaking","affected_versions":">=20251022"},{"fix":"Either install only `sudachidict-full` if you want the default behavior, or explicitly specify `dict_type='full'` in your `sudachipy` code for clarity and control.","message":"When multiple Sudachi dictionaries (e.g., `sudachidict-small`, `sudachidict-core`, `sudachidict-full`) are installed, `sudachipy`'s default `dictionary.Dictionary().create()` method will automatically prioritize and load the largest available dictionary. To explicitly guarantee the 'full' dictionary is used, you can initialize with `dict_type='full'` (e.g., `dictionary.Dictionary(dict_type='full').create()`).","severity":"gotcha","affected_versions":"All versions where multiple dictionaries might be installed"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}