Janome: Japanese Morphological Analyzer

0.5.0 · active · verified Thu Apr 16

Janome is a Japanese morphological analysis engine (or tokenizer, POS-tagger) written in pure Python, including a built-in dictionary and language model. It aims to be easy to install and provides concise, well-designed APIs for various Python applications. Janome uses mecab-ipadic-2.7.0-20070801 as its built-in dictionary. The current version is 0.5.0, released in July 2023, with a release cadence of approximately 6-18 months between major versions.

Common errors

Warnings

Install

Imports

Quickstart

Initializes the Tokenizer and processes a Japanese sentence, printing each token with its morphological information. An example for 'wakati-gaki' (word segmentation) mode is also included, which returns only surface forms.

from janome.tokenizer import Tokenizer

t = Tokenizer()
text = 'すもももももももものうち'

for token in t.tokenize(text):
    print(token)

# Example of 'wakati-gaki' mode (surface forms only)
# tokens_wakati = t.tokenize(text, wakati=True)
# print(tokens_wakati)

view raw JSON →