natto-py (MeCab binding)

1.0.1 · active · verified Sun Apr 12

natto-py is a Python package that provides a Foreign Function Interface (FFI) binding to MeCab, the part-of-speech and morphological analyzer for the Japanese language. It allows Python applications to leverage MeCab's capabilities without requiring SWIG or a C compiler for installation. The current version is 1.0.1, and the library is actively maintained with an irregular release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize `natto-py` with MeCab and perform both string-based and node-based parsing. The `with` statement is used for proper resource management, and MeCab options are set to obtain detailed morphological features for each node.

import os
from natto import MeCab

# Optional: Set MeCab path and charset if auto-detection fails
# os.environ['MECAB_PATH'] = os.environ.get('MECAB_PATH', '/usr/local/lib/libmecab.so')
# os.environ['MECAB_CHARSET'] = os.environ.get('MECAB_CHARSET', 'utf8')

# Instantiate MeCab with recommended options for detailed parsing
# -F: node-format for features, -U: unk-format for unknown words
with MeCab(r'-F%m,%f[0],%f[1],%f[2],%f[3],%f[4],%f[5],%f[6],%f[7],%f[8]\n -U?,?,?,?,?,?,?,?,?,?\n') as nm:
    text = 'これは日本語のテキストです。'
    print(f"Parsed text (string output):\n{nm.parse(text)}\n")

    print("Parsed text (node output with features):")
    for n in nm.parse(text, as_nodes=True):
        if not n.is_eos(): # Ignore end-of-sentence nodes
            print(f'Surface: {n.surface}, Feature: {n.feature}, Cost: {n.cost}')

view raw JSON →