Ko Speech Tools

raw JSON →
0.1.0 verified Fri May 01 auth: no python

Korean speech and NLP tools for tokenization, romanization, and pronunciation analysis. Version 0.1.0, early development with monthly release cadence.

pip install ko-speech-tools
error ModuleNotFoundError: No module named 'ko_speech_tools'
cause Python environment does not have the package installed or the package name misspelled as 'ko-speech-tools' (with hyphen).
fix
Run 'pip install ko-speech-tools' and import with underscores: 'import ko_speech_tools'.
error AttributeError: module 'ko_speech_tools' has no attribute 'KoSpeechTokenizer'
cause Older version of the package (pre-0.1.0) did not export KoSpeechTokenizer at top level, or the import path was wrong.
fix
Upgrade to 0.1.0: 'pip install --upgrade ko-speech-tools' and use 'from ko_speech_tools import KoSpeechTokenizer'.
error TypeError: romanize() missing 1 required positional argument: 'text'
cause Called romanize without arguments or passed keyword argument incorrectly.
fix
Call romanize(text) or romanize(text=your_string).
gotcha The library is in early alpha (0.1.0); expect breaking changes in minor versions. Pin to exact version in production.
fix Use ko-speech-tools==0.1.0 in requirements.txt.
gotcha Romanization does not handle all ambiguous Korean syllables (e.g., homographs) correctly. Always verify output for sensitive applications.
fix Use an additional disambiguation step or a more mature romanizer for critical tasks.
deprecated The function 'tokenize_syllables' is renamed to 'syllable_tokenize' in v0.2.0 (upcoming).
fix Use 'KoSpeechTokenizer.syllable_tokenize()' instead of 'tokenize_syllables()'.

Basic usage: tokenize, romanize, and get pronunciation of Korean text.

from ko_speech_tools import KoSpeechTokenizer, romanize, pronounce

text = "안녕하세요"
tokenizer = KoSpeechTokenizer()
tokens = tokenizer.tokenize(text)
print("Tokens:", tokens)

romanized = romanize(text)
print("Romanized:", romanized)

pron = pronounce(text)
print("Pronunciation:", pron)