{"id":3285,"library":"sudachidict-core","title":"Sudachi Dictionary Core Edition","description":"SudachiDict-core is the default dictionary for SudachiPy, a Python-based Japanese morphological analyzer. It provides a comprehensive basic vocabulary for tokenization and linguistic analysis. The dictionary packages are updated frequently, often multiple times a quarter, incorporating new words and improving synonym definitions. The current version is 20260116.","status":"active","version":"20260116","language":"en","source_language":"en","source_url":"https://github.com/WorksApplications/SudachiDict","tags":["natural-language-processing","nlp","japanese","morphological-analysis","dictionary","tokenization"],"install":[{"cmd":"pip install sudachipy sudachidict-core","lang":"bash","label":"Install SudachiPy and Core Dictionary"}],"dependencies":[{"reason":"This package provides dictionary data for use with SudachiPy, a Japanese morphological analyzer. While technically a standalone data package, it is primarily consumed by SudachiPy.","package":"sudachipy","optional":true}],"imports":[{"note":"sudachidict-core installs dictionary files that SudachiPy automatically discovers or can be explicitly configured to use. You do not import symbols from 'sudachidict-core' itself.","symbol":"sudachidict-core","correct":"This is a data-only package. It provides dictionary resources for SudachiPy and is not directly imported into Python code."}],"quickstart":{"code":"from sudachipy import Dictionary, SplitMode\n\n# Initialize the Sudachi dictionary (core edition is used by default if installed)\n# dict_type='core' explicitly ensures the core dictionary is loaded.\n# The dictionary files are loaded from the installed sudachidict-core package.\ndict_obj = Dictionary(dict_type='core')\ntokenizer = dict_obj.create()\n\ntext = \"外国人参政権\"\n\n# Perform tokenization in mode A (shortest path)\nmode = SplitMode.A\n# morphemes = tokenizer.tokenize(text, mode)\n# For SudachiPy v0.6.0+ (sudachi.rs-based), mode is passed at tokenizer creation\n# Example for v0.6.0+ (requires updating sudachipy install if not latest)\n# tokenizer_a = dict_obj.create(mode=SplitMode.A)\n# morphemes = tokenizer_a.tokenize(text)\n\n# For compatibility with older SudachiPy (pre-v0.6.0) or simpler quickstart:\n# Use the example from SudachiPy's README, which passes mode to tokenize()\nmorphemes = tokenizer.tokenize(text, mode)\n\nprint(f\"Original text: {text}\")\nprint(f\"Tokens (Mode A): {[m.surface() for m in morphemes]}\")\n\n# Example accessing morpheme details\nif morphemes:\n    first_morpheme = morphemes[0]\n    print(f\"\\nFirst morpheme: {first_morpheme.surface()}\")\n    print(f\"  Part-of-speech: {first_morpheme.part_of_speech()}\")\n    print(f\"  Normalized form: {first_morpheme.normalized_form()}\")\n    print(f\"  Dictionary form: {first_morpheme.dictionary_form()}\")\n    print(f\"  Reading form: {first_morpheme.reading_form()}\")\n","lang":"python","description":"This quickstart demonstrates how to use the 'sudachidict-core' dictionary through the 'SudachiPy' library to perform Japanese morphological analysis. It initializes the tokenizer with the core dictionary and then tokenizes an example Japanese sentence."},"warnings":[{"fix":"Pin the 'sudachidict-core' version in your project dependencies (e.g., `sudachidict-core==20260116`) to ensure consistent behavior. Regularly review release notes for significant changes if upgrading.","message":"Dictionary updates are versioned by date (e.g., '20260116'), not semantic versioning. Frequent updates can introduce changes to tokenization, part-of-speech tags, and normalization behavior, particularly due to additions/modifications in 'synonyms.txt'.","severity":"breaking","affected_versions":"All versions (behavioral changes between date-based releases)"},{"fix":"Install `sudachidict-core` via `pip`, then use `sudachipy.Dictionary(dict_type='core').create()` to load and utilize the dictionary through `SudachiPy`.","message":"This package is a dictionary resource, not a Python library providing direct classes or functions for import. Its role is to supply data to the 'SudachiPy' morphological analyzer. Attempting to `import sudachidict_core` directly will likely result in an ImportError or unexpected behavior.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Choose the appropriate dictionary edition (`sudachidict-small`, `sudachidict-core`, or `sudachidict-full`) based on your application's requirements. 'Core' is a good general-purpose choice, while 'full' includes more proper nouns. Install the specific dictionary package and ensure SudachiPy is configured to use it.","message":"Sudachi offers three dictionary editions: 'small', 'core' (default), and 'full'. Each has a different scope of vocabulary. Using 'core' when 'full' is needed for specific proper nouns (or vice versa) will lead to suboptimal tokenization results.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure that the environment where `pip install sudachidict-core` is run has an internet connection. In restricted environments, you may need to pre-download the dictionary files or configure a local package mirror.","message":"The actual dictionary files (e.g., `system.dic`) are not bundled directly within the `sudachidict-core` Python package. Instead, they are downloaded from a remote server during the `pip install` process. This requires an active internet connection during installation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For modern `SudachiPy` (v0.5.2+), simply installing `sudachidict-core` (or other editions) makes them discoverable by default. You can explicitly specify `dict_type='core'` when creating a `Dictionary` object if needed.","message":"For SudachiPy versions prior to v0.5.2, a separate `sudachipy link` command was often required to make the dictionary available. This command is no longer available in newer `SudachiPy` versions (v0.5.2 and later).","severity":"deprecated","affected_versions":"SudachiPy < 0.5.2"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}