{"id":9060,"library":"kiwipiepy","title":"Kiwi, the Korean Tokenizer for Python","description":"Kiwipiepy is a fast and accurate Korean morphological analyzer (tokenizer) for Python, wrapping the high-performance C++ library Kiwi. It supports various features like part-of-speech tagging, named entity recognition, dialect analysis, and typo correction. The library is actively maintained with frequent updates, often aligning with the core Kiwi library's releases.","status":"active","version":"0.23.1","language":"en","source_language":"en","source_url":"https://github.com/bab2min/kiwipiepy","tags":["Korean","NLP","tokenizer","morphological analysis","text processing"],"install":[{"cmd":"pip install kiwipiepy","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"symbol":"Kiwi","correct":"from kiwipiepy import Kiwi"}],"quickstart":{"code":"from kiwipiepy import Kiwi\n\n# Initialize the Kiwi tokenizer\nkiwi = Kiwi()\n\n# Analyze a Korean sentence\ntext = \"안녕하세요 한국어 형태소 분석기 키위입니다.\"\nresult = kiwi.tokenize(text)\n\n# Print the analysis result\nfor token in result:\n    print(f\"Token: {token.form}, Tag: {token.tag}, Start: {token.start}, Len: {token.len}\")\n\n# Example with additional options (e.g., split complex words)\ntext_complex = \"그녀는책을읽었다\"\nresult_complex = kiwi.tokenize(text_complex, split_complex=True)\nprint(\"\\nComplex word analysis:\")\nfor token in result_complex:\n    print(f\"Token: {token.form}, Tag: {token.tag}\")","lang":"python","description":"This quickstart demonstrates how to initialize the `Kiwi` tokenizer and perform basic morphological analysis on a Korean sentence. It also includes an example of using the `split_complex` option for more granular analysis."},"warnings":[{"fix":"Move `oov_handling` from `Kiwi(oov_handling=...)` to `kiwi.tokenize(text, oov_handling=...)`. Check documentation for new valid `oov_handling` string values.","message":"The `oov_handling` parameter has moved from the `Kiwi` constructor to the `tokenize()` method and now supports new strategies. Old code passing `oov_handling` to `Kiwi()` will break.","severity":"breaking","affected_versions":"0.23.0+"},{"fix":"Pass typo correction options to `kiwi.tokenize(text, typos=True, match_typo_with_stem=True, ...)` instead of `Kiwi(typos=True, ...)`.","message":"Typo correction options like `typos`, `match_typo_with_stem` have moved from the `Kiwi` constructor to the `tokenize()` method. Passing them during initialization will result in a `TypeError`.","severity":"breaking","affected_versions":"0.23.0+"},{"fix":"Use the default `model_type` or explicitly specify `model_type='sbg'` (the current default and recommended general-purpose model) or `model_type='ngram'`.","message":"The `knlm` and `sbg` (older, smaller) model types are no longer the default options. Specifying `model_type='knlm'` or `model_type='sbg'` (older) might lead to warnings or unexpected behavior.","severity":"deprecated","affected_versions":"0.22.1+"},{"fix":"For critical multithreaded applications, create a separate `Kiwi` instance for each thread or ensure any dictionary modifications are synchronized.","message":"While v0.22.0 improved multithread safety for `Kiwi` objects, concurrent modifications to user dictionaries or other internal states shared across threads using a single `Kiwi` instance can still lead to unexpected behavior or race conditions. Creating a `Kiwi` instance per thread is generally safer for heavy concurrent use cases.","severity":"gotcha","affected_versions":"All versions, especially pre-0.22.0"},{"fix":"Ensure the `Kiwi` instance remains stable and in scope when performing subsequent operations like `join()` on results obtained from it. Upgrade to the latest version to benefit from fixes in v0.22.0.","message":"In earlier versions, operations like `Kiwi.join()` could potentially fail or lead to incorrect results if the `Kiwi` instance or its associated `MorphemeSet` was modified or deleted after tokenization, due to lingering references.","severity":"gotcha","affected_versions":"0.20.0 - 0.21.x"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install kiwipiepy` to install the library.","cause":"The kiwipiepy library is not installed in the current Python environment.","error":"ModuleNotFoundError: No module named 'kiwipiepy'"},{"fix":"Move the `oov_handling` argument to the `kiwi.tokenize()` method: `kiwi.tokenize(text, oov_handling='new_strategy')`.","cause":"Attempting to pass `oov_handling` as an argument to the `Kiwi` constructor in versions 0.23.0 or later.","error":"TypeError: Kiwi.__init__() got an unexpected keyword argument 'oov_handling'"},{"fix":"Move typo correction options to the `kiwi.tokenize()` method: `kiwi.tokenize(text, typos=True, match_typo_with_stem=True)`.","cause":"Attempting to pass typo correction options (`typos`, `match_typo_with_stem`, etc.) to the `Kiwi` constructor in versions 0.23.0 or later.","error":"TypeError: Kiwi.__init__() got an unexpected keyword argument 'typos'"},{"fix":"Remove the `model_type` argument to use the default, or use a currently supported model type like `model_type='sbg'` or `model_type='ngram'`.","cause":"Using an outdated or unrecognized `model_type` when initializing `Kiwi`.","error":"ValueError: invalid model_type 'knlm'"},{"fix":"Ensure you are on the latest `kiwipiepy` version. If the issue persists, simplify the input, avoid concurrent dictionary modifications, or report the specific input that causes the crash to the library maintainers.","cause":"While many segfaults were fixed in later versions (e.g., v0.20.1, v0.20.4, v0.22.0) related to specific inputs, pretokenized spans, or typo correction, some specific edge cases might still trigger them, often involving complex inputs or concurrent dictionary modifications.","error":"segmentation fault (core dumped)"}]}