Kiwi, the Korean Tokenizer for Python

0.23.1 · active · verified Thu Apr 16

Kiwipiepy is a fast and accurate Korean morphological analyzer (tokenizer) for Python, wrapping the high-performance C++ library Kiwi. It supports various features like part-of-speech tagging, named entity recognition, dialect analysis, and typo correction. The library is actively maintained with frequent updates, often aligning with the core Kiwi library's releases.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize the `Kiwi` tokenizer and perform basic morphological analysis on a Korean sentence. It also includes an example of using the `split_complex` option for more granular analysis.

from kiwipiepy import Kiwi

# Initialize the Kiwi tokenizer
kiwi = Kiwi()

# Analyze a Korean sentence
text = "안녕하세요 한국어 형태소 분석기 키위입니다."
result = kiwi.tokenize(text)

# Print the analysis result
for token in result:
    print(f"Token: {token.form}, Tag: {token.tag}, Start: {token.start}, Len: {token.len}")

# Example with additional options (e.g., split complex words)
text_complex = "그녀는책을읽었다"
result_complex = kiwi.tokenize(text_complex, split_complex=True)
print("\nComplex word analysis:")
for token in result_complex:
    print(f"Token: {token.form}, Tag: {token.tag}")

view raw JSON →