Sudachi Dictionary (Full Edition)

20260116 · active · verified Mon Apr 13

sudachidict-full is a data package that provides the largest, 'full' edition of the Japanese dictionary for use with SudachiPy, a powerful Japanese morphological analyzer. It does not provide direct Python APIs for tokenization but serves as a dependency for SudachiPy. The current version is 20260116, and new versions are released regularly (typically every 2-3 months) to update dictionary entries and synonyms.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `sudachipy` with the `sudachidict-full` dictionary. After installing `sudachidict-full` alongside `sudachipy`, `sudachipy` will automatically detect and use the full dictionary by default when creating a `Dictionary` instance. The example tokenizes a simple Japanese sentence.

from sudachipy import tokenizer
from sudachipy import dictionary

# sudachidict-full must be installed for this to load the full dictionary.
# SudachiPy automatically selects the largest installed dictionary by default.
# To explicitly ensure the 'full' dictionary is used, you can pass dict_type='full'.
# tokenizer_obj = dictionary.Dictionary(dict_type='full').create()

# Create a Sudachi tokenizer instance (will use the 'full' dict if installed)
tokenizer_obj = dictionary.Dictionary().create()
mode = tokenizer.Tokenizer.SplitMode.C

text = "寿司は美味しい。"

# Tokenize the text
print(f"Original text: {text}")

morphemes = tokenizer_obj.tokenize(text, mode)

print("\nTokenization results (Surface form, Part-of-Speech, Base form):")
for m in morphemes:
    print(f"  {m.surface()}\t{m.part_of_speech()}\t{m.base_form()}")

# Example of getting the dictionary path (for advanced configuration)
# import sudachidict_full
# dict_path = sudachidict_full.dictionary.Path()
# print(f"\nPath to the 'full' dictionary: {dict_path}")

view raw JSON →