MeCab Python 3 Wrapper
mecab-python3 is a Python wrapper for the MeCab morphological analyzer, primarily used for Japanese text. It provides an interface to MeCab's tokenization and part-of-speech tagging functionalities. The library currently supports Python 3.8 and greater, with releases often driven by new Python version compatibility and critical bug fixes.
Warnings
- breaking Python 2 support was officially dropped in version 1.0.3. Users on Python 2 must use an older version.
- breaking The bundled IPAdic dictionary was removed in version 1.0.0 (first released as 1.0.0a1). This means MeCab will now rely on system-installed dictionaries or those installed via Python packages like `unidic-lite`.
- gotcha MeCab requires a dictionary to perform analysis. While `mecab-python3` installs the Python wrapper, a dictionary like `unidic-lite` or `unidic` must be installed separately (e.g., `pip install unidic-lite`) or by using the `mecab-python3[unidic-lite]` extra. Without a dictionary, `MeCab.Tagger()` will fail with a `RuntimeError`.
- gotcha On Windows, a Microsoft Visual C++ Redistributable is required for `mecab-python3` wheels to function correctly. This is a common oversight leading to runtime errors.
- gotcha Since version 1.0.4, errors during MeCab initialization (e.g., missing dictionary) now consistently throw a `RuntimeError` instead of printing messages directly to `stdout`. This is an improvement but could affect scripts that parsed `stdout` for error detection.
- gotcha The `MeCab.VERSION` attribute (if accessed) refers to the version of the underlying MeCab C++ library (typically 0.996), not the `mecab-python3` Python package version. It has not changed for many years and is not generally useful for tracking the Python wrapper's version.
Install
-
pip install mecab-python3 -
pip install mecab-python3[unidic-lite] -
pip install unidic-lite
Imports
- MeCab
import MeCab
Quickstart
import MeCab
import os
# The library needs a dictionary to function. unidic-lite is recommended.
# Ensure it's installed: pip install unidic-lite
try:
# Initialize Tagger for 'wakati-gaki' (word segmentation)
wakati = MeCab.Tagger('-Owakati')
text_wakati = wakati.parse('すもももももももものうち')
print(f"Wakati-gaki: {text_wakati.strip().split()}")
# Initialize Tagger for detailed analysis (default)
tagger = MeCab.Tagger('') # An empty string often defaults to system mecabrc or attempts autodetection
text_full = tagger.parse('これは日本語の形態素解析のテストです。')
print(f"\nDetailed Analysis:\n{text_full}")
except RuntimeError as e:
print(f"MeCab initialization failed: {e}")
print("Please ensure a dictionary (e.g., unidic-lite) is installed and correctly configured.")
print("On Windows, you may also need the Microsoft Visual C++ Redistributable.")
except ImportError as e:
print(f"Missing dictionary package: {e}")
print("Please install a dictionary, e.g., 'pip install unidic-lite'")