MeCab Python 3 Wrapper

1.0.12 · active · verified Sun Apr 12

mecab-python3 is a Python wrapper for the MeCab morphological analyzer, primarily used for Japanese text. It provides an interface to MeCab's tokenization and part-of-speech tagging functionalities. The library currently supports Python 3.8 and greater, with releases often driven by new Python version compatibility and critical bug fixes.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic morphological analysis using MeCab, including word segmentation ('wakati-gaki') and detailed analysis. It highlights the necessity of a dictionary for operation and includes basic error handling for common setup issues.

import MeCab
import os

# The library needs a dictionary to function. unidic-lite is recommended.
# Ensure it's installed: pip install unidic-lite

try:
    # Initialize Tagger for 'wakati-gaki' (word segmentation)
    wakati = MeCab.Tagger('-Owakati')
    text_wakati = wakati.parse('すもももももももものうち')
    print(f"Wakati-gaki: {text_wakati.strip().split()}")

    # Initialize Tagger for detailed analysis (default)
    tagger = MeCab.Tagger('') # An empty string often defaults to system mecabrc or attempts autodetection
    text_full = tagger.parse('これは日本語の形態素解析のテストです。')
    print(f"\nDetailed Analysis:\n{text_full}")

except RuntimeError as e:
    print(f"MeCab initialization failed: {e}")
    print("Please ensure a dictionary (e.g., unidic-lite) is installed and correctly configured.")
    print("On Windows, you may also need the Microsoft Visual C++ Redistributable.")
except ImportError as e:
    print(f"Missing dictionary package: {e}")
    print("Please install a dictionary, e.g., 'pip install unidic-lite'")

view raw JSON →