{"id":8799,"library":"zhon","title":"Zhon","description":"Zhon is a Python library that provides constants commonly used in Chinese text processing, such as CJK characters, Chinese punctuation marks, and patterns for Pinyin and Zhuyin. The library is currently active, with version 2.1.1 released in November 2024, and generally follows an infrequent release cadence.","status":"active","version":"2.1.1","language":"en","source_language":"en","source_url":"https://github.com/tsroten/zhon","tags":["chinese","mandarin","segmentation","tokenization","punctuation","hanzi","unicode","radicals","han","cjk","cedict","cc-cedict","traditional","simplified","characters","pinyin","zhuyin"],"install":[{"cmd":"pip install zhon","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Requires Python 3.9 or newer.","package":"python","optional":false}],"imports":[{"symbol":"hanzi","correct":"import zhon.hanzi"},{"symbol":"pinyin","correct":"import zhon.pinyin"},{"symbol":"zhuyin","correct":"import zhon.zhuyin"},{"symbol":"cedict","correct":"import zhon.cedict"}],"quickstart":{"code":"import re\nimport zhon.hanzi\n\ntext = 'I broke a plate: 我打破了一个盘子.'\ncjk_characters = re.findall('[{}]'.format(zhon.hanzi.characters), text)\n\nprint(f\"Original text: {text}\")\nprint(f\"Found CJK characters: {cjk_characters}\")\n\n# Example with Pinyin\nimport zhon.pinyin\npinyin_text = 'Yuànzi lǐ tíngzhe yí liàng chē.'\npinyin_words = re.findall(zhon.pinyin.word, pinyin_text, re.I)\n\nprint(f\"Original Pinyin text: {pinyin_text}\")\nprint(f\"Found Pinyin words: {pinyin_words}\")","lang":"python","description":"This quickstart demonstrates how to use `zhon.hanzi.characters` to find CJK characters in a string and `zhon.pinyin.word` to extract Pinyin words, leveraging Python's `re` module."},"warnings":[{"fix":"Be aware of the scope of Han ideographs. For language-specific filtering, additional logic or libraries may be required.","message":"The `zhon.hanzi.characters` constant represents all Han ideographs, which includes characters used in Chinese, Japanese, and Korean. It does not strictly filter for 'Chinese only' characters and should not be solely relied upon for language identification.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade your Python environment to 3.9 or newer. If compatibility with older Python is strictly necessary, consider pinning to an older Zhon version (e.g., `<2.0.0`), though these versions may be unmaintained.","message":"Older versions of Zhon (prior to 2.x) supported Python 2.7. Version 2.1.1 and newer explicitly require Python >= 3.9. Attempting to use Zhon 2.x on older Python environments will result in installation or runtime errors.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Update regex pattern formatting to use f-strings or `.format()`: `re.findall(f'[{zhon.hanzi.characters}]', text)` or `re.findall('[{}]'.format(zhon.hanzi.characters), text)`.","message":"Earlier documentation and examples for regex pattern building sometimes used the `%s` operator for string formatting (e.g., `re.findall('[%s]' % zhon.hanzi.characters)`). While still functional in some cases, the f-string or `.format()` method is the modern and recommended approach in Python 3 for clarity and safety.","severity":"deprecated","affected_versions":"<2.x examples, but applies to all Python 3 versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure your Python environment is version 3.9 or newer. Use `python --version` to check. If needed, create a new virtual environment with a compatible Python version.","cause":"Attempting to install or run `zhon` version 2.1.1 or higher with a Python version older than 3.9.","error":"ERROR: Package 'zhon' requires a different Python: ..."},{"fix":"Constants are organized into thematic submodules. Correctly import and access them, for example: `import zhon.hanzi` then `zhon.hanzi.characters`.","cause":"Trying to access constants directly from the top-level `zhon` module instead of its submodules (e.g., `zhon.hanzi`, `zhon.pinyin`).","error":"AttributeError: module 'zhon' has no attribute 'characters'"}]}