UniDic-lite

1.0.8 · maintenance · verified Wed Apr 15

unidic-lite is a small version of UniDic, a Japanese morphological analysis dictionary, packaged for Python. It is designed to be installable directly via pip without requiring additional downloads, unlike the larger 'unidic' package. It uses UniDic 2.1.2 from 2013 and occupies approximately 250MB of disk space after installation. The current version is 1.0.8, released in January 2021, and its release cadence is infrequent as it primarily serves as a static dictionary resource.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `unidic-lite` with the `fugashi` library, a common MeCab wrapper. `unidic-lite.DICDIR` provides the path to the installed dictionary, which must be passed to the `Tagger` initialization.

import unidic_lite
from fugashi import Tagger

# unidic-lite needs to be explicitly passed to the Tagger
tagger = Tagger(f'-d "{unidic_lite.DICDIR}"')

text = "すもももももももものうち"

# Analyze the text
words = []
for word in tagger(text):
    words.append(f'{word.surface}\t{word.feature.pos1}\t{word.feature.lemma}')

print('\n'.join(words))

view raw JSON →