Hanzi Identifier

1.3.0 · active · verified Thu Apr 16

Hanzi Identifier is a Python module designed to identify Chinese text as either Simplified or Traditional characters. It leverages the CC-CEDICT data for character identification. The current stable version is 1.3.0. The library has an irregular release cadence, with major and minor updates occurring every few years.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core functionality of `hanzidentifier` including checking for Chinese characters, identifying a string's type (Simplified, Traditional, Both, Mixed, Unknown), and using the helper functions `is_simplified` and `is_traditional`.

import hanzidentifier

# Basic identification
print(f"'你好!' identifies as: {hanzidentifier.identify('你好!')}")
print(f"'你好!' is Simplified: {hanzidentifier.is_simplified('你好!')}")
print(f"'你好!' is Traditional: {hanzidentifier.is_traditional('你好!')}")

# Example with strictly Simplified Chinese
print(f"'软件' identifies as: {hanzidentifier.identify('软件')}")
print(f"'软件' is Simplified: {hanzidentifier.is_simplified('软件')}")

# Example with strictly Traditional Chinese
print(f"'軟體' identifies as: {hanzidentifier.identify('軟體')}")
print(f"'軟體' is Traditional: {hanzidentifier.is_traditional('軟體')}")

# Example with mixed characters
print(f"'国家和國家' identifies as: {hanzidentifier.identify('国家和國家')}")

# Example with no Chinese characters
print(f"'Hello World' has Chinese: {hanzidentifier.has_chinese('Hello World')}")
print(f"'Hello World' identifies as: {hanzidentifier.identify('Hello World')}")

view raw JSON →