KoNLPy
KoNLPy (Korean Natural Language Processing in Python) is a Python package designed for Korean text analysis. It provides a consistent API for various popular Korean NLP tools written primarily in Java, including Hannanum, Kkma, Komoran, Mecab (unsupported on Windows), and Okt (Open Korean Text). The current version is 0.6.0, and while its release cadence is irregular, the library is actively maintained to integrate new upstream NLP tools.
Common errors
-
OSError: 'JVM' is not running.
cause The Java Development Kit (JDK) is either not installed, not found in the system's PATH, or the `JAVA_HOME` environment variable is not correctly set. The underlying JPype library cannot locate an available Java Virtual Machine.fixInstall JDK 8 or higher (e.g., OpenJDK). Set the `JAVA_HOME` environment variable to the root directory of your JDK installation (e.g., `C:\Program Files\Java\jdk-11` or `/usr/lib/jvm/java-11-openjdk`). Add the `bin` subdirectory of your JDK (`%JAVA_HOME%\bin` or `$JAVA_HOME/bin`) to your system's PATH environment variable. Restart your terminal/IDE. -
FileNotFoundError: 'mecab-ko-dic' not found.
cause The Mecab dictionary required by the `Mecab` tagger is missing or incorrectly configured. This often happens on systems where Mecab is not fully supported (like Windows) or if its complex dependencies weren't met during installation.fixIf on Windows, consider using `Okt`, `Komoran`, `Kkma`, or `Hannanum` instead of `Mecab`. On Linux/macOS, ensure Mecab and its dictionary (`mecab-ko-dic`) are correctly installed by following specific instructions for your OS, which typically involves installing `python-mecab-ko`. -
java.lang.OutOfMemoryError: Java heap space
cause The Java Virtual Machine (JVM) that KoNLPy uses has exhausted its allocated memory. This usually occurs when processing extremely long texts or a large volume of data.fixIncrease the JVM's maximum heap size. You can do this by setting the `_JAVA_OPTIONS` environment variable before running your Python script. For example, to allocate 4GB of memory, use `export _JAVA_OPTIONS="-Xmx4g"` on Linux/macOS or `set _JAVA_OPTIONS="-Xmx4g"` on Windows. -
AttributeError: module 'konlpy' has no attribute 'tag'
cause This error occurs when attempting to call a tagger directly from the top-level `konlpy` module (e.g., `konlpy.tag.Okt()`) without correctly importing the `tag` submodule or the specific tagger class.fixEnsure you import the tagger class correctly. The standard way is `from konlpy.tag import Okt` and then `okt = Okt()`. Alternatively, you can import the submodule as `from konlpy import tag` and then use `tagger = tag.Okt()`.
Warnings
- breaking KoNLPy relies on Java-based NLP tools and therefore requires a Java Development Kit (JDK) 8 or higher to be installed and properly configured in your system's PATH. Without a correctly configured JVM, most taggers will fail to initialize or run, often with `OSError: 'JVM' is not running.`
- gotcha The Mecab tagger (`konlpy.tag.Mecab`) is not officially supported on Windows due to its reliance on a C++ library and specific dictionary setup. Installation on non-Linux/macOS systems can be very challenging and prone to errors.
- gotcha Processing very large texts or numerous documents concurrently can lead to `java.lang.OutOfMemoryError: Java heap space` errors. This is due to the underlying JVM's default memory limits.
Install
-
pip install konlpy
Imports
- Okt
from konlpy.tag import Okt
- Kkma
from konlpy.tag import Kkma
- Komoran
from konlpy.tag import Komoran
- Mecab
from konlpy.tag import Mecab
- Hannanum
from konlpy.tag import Hannanum
Quickstart
from konlpy.tag import Okt
okt = Okt()
text = "아버지가 방에 들어가신다."
print(f"Original text: {text}")
print(f"Tokenization: {okt.morphs(text)}")
print(f"Part-of-speech tagging: {okt.pos(text)}")
print(f"Nouns: {okt.nouns(text)}")