pykakasi
pykakasi is a Python Natural Language Processing (NLP) library that transliterates Japanese text (hiragana, katakana, and kanji) into rōmaji (Latin/Roman alphabet). It supports NFC-normalized characters and is based on the C-language kakasi library. The current version is 2.3.0, and while there isn't a strict release cadence, updates are made as needed.
Warnings
- deprecated The old API methods (`setMode()`, `getConverter()`, `do()`, and the `wakati` class) are deprecated in pykakasi v2.1.0 and will be removed entirely in v3.0. Migrate to the `kakasi().convert()` method for future compatibility.
- gotcha pykakasi is distributed under the GNU General Public License v3.0 or later (GPLv3+). This license has implications for commercial use, potentially requiring source code disclosure if the library is incorporated into proprietary software.
- gotcha The library primarily expects Unicode characters in Normalization Form C (NFC). Using text in Normalization Form D (NFD) might lead to incorrect or unexpected conversion results due to separated diacritics.
- gotcha The original GitHub repository (miurahr/pykakasi) was archived in July 2022 and now points to Codeberg. Users looking for the official source or contributing should refer to the Codeberg repository.
- gotcha According to some analyses (e.g., Snyk report mentioned in a blog post), the project's maintenance status has been rated as 'Inactive,' despite its continued widespread usage. This might indicate slower response times for issues or new feature development.
Install
-
pip install pykakasi
Imports
- kakasi
from pykakasi import kakasi
Quickstart
import pykakasi
kks = pykakasi.kakasi()
text = "かな漢字交じり文"
# Configure conversion modes (optional, defaults to Hepburn romaji, no spaces)
kks.setMode('H', 'a') # Hiragana to romaji
kks.setMode('K', 'a') # Katakana to romaji
kks.setMode('J', 'a') # Kanji to romaji
kks.setMode('r', 'Hepburn') # Use Hepburn Romanization
kks.setMode('s', True) # Add spaces
kks.setMode('C', True) # Capitalize
converter = kks.getConverter()
result_old_api = converter.do(text)
print(f"Old API result: {result_old_api}")
# Recommended new API (v2.0.0+)
result_new_api = pykakasi.kakasi().convert(text)
print("\nNew API result (default modes):")
for item in result_new_api:
print(f"Original: {item['orig']}, Kana: {item['kana']}, Hiragana: {item['hira']}, Romaji: {item['hepburn']}")