Zhon

2.1.1 · active · verified Thu Apr 16

Zhon is a Python library that provides constants commonly used in Chinese text processing, such as CJK characters, Chinese punctuation marks, and patterns for Pinyin and Zhuyin. The library is currently active, with version 2.1.1 released in November 2024, and generally follows an infrequent release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `zhon.hanzi.characters` to find CJK characters in a string and `zhon.pinyin.word` to extract Pinyin words, leveraging Python's `re` module.

import re
import zhon.hanzi

text = 'I broke a plate: 我打破了一个盘子.'
cjk_characters = re.findall('[{}]'.format(zhon.hanzi.characters), text)

print(f"Original text: {text}")
print(f"Found CJK characters: {cjk_characters}")

# Example with Pinyin
import zhon.pinyin
pinyin_text = 'Yuànzi lǐ tíngzhe yí liàng chē.'
pinyin_words = re.findall(zhon.pinyin.word, pinyin_text, re.I)

print(f"Original Pinyin text: {pinyin_text}")
print(f"Found Pinyin words: {pinyin_words}")

view raw JSON →