{"id":9029,"library":"hanzidentifier","title":"Hanzi Identifier","description":"Hanzi Identifier is a Python module designed to identify Chinese text as either Simplified or Traditional characters. It leverages the CC-CEDICT data for character identification. The current stable version is 1.3.0. The library has an irregular release cadence, with major and minor updates occurring every few years.","status":"active","version":"1.3.0","language":"en","source_language":"en","source_url":"https://github.com/tsroten/hanzidentifier","tags":["chinese","hanzi","simplified","traditional","linguistics","text processing"],"install":[{"cmd":"pip install hanzidentifier","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Provides the CC-CEDICT data used for character identification.","package":"zhon","optional":false}],"imports":[{"symbol":"hanzidentifier","correct":"import hanzidentifier"},{"note":"Often used directly, but can also be accessed via hanzidentifier.identify.","symbol":"identify","correct":"from hanzidentifier import identify"},{"symbol":"is_simplified","correct":"from hanzidentifier import is_simplified"},{"symbol":"is_traditional","correct":"from hanzidentifier import is_traditional"},{"symbol":"has_chinese","correct":"from hanzidentifier import has_chinese"}],"quickstart":{"code":"import hanzidentifier\n\n# Basic identification\nprint(f\"'你好！' identifies as: {hanzidentifier.identify('你好！')}\")\nprint(f\"'你好！' is Simplified: {hanzidentifier.is_simplified('你好！')}\")\nprint(f\"'你好！' is Traditional: {hanzidentifier.is_traditional('你好！')}\")\n\n# Example with strictly Simplified Chinese\nprint(f\"'软件' identifies as: {hanzidentifier.identify('软件')}\")\nprint(f\"'软件' is Simplified: {hanzidentifier.is_simplified('软件')}\")\n\n# Example with strictly Traditional Chinese\nprint(f\"'軟體' identifies as: {hanzidentifier.identify('軟體')}\")\nprint(f\"'軟體' is Traditional: {hanzidentifier.is_traditional('軟體')}\")\n\n# Example with mixed characters\nprint(f\"'国家和國家' identifies as: {hanzidentifier.identify('国家和國家')}\")\n\n# Example with no Chinese characters\nprint(f\"'Hello World' has Chinese: {hanzidentifier.has_chinese('Hello World')}\")\nprint(f\"'Hello World' identifies as: {hanzidentifier.identify('Hello World')}\")\n","lang":"python","description":"This quickstart demonstrates the core functionality of `hanzidentifier` including checking for Chinese characters, identifying a string's type (Simplified, Traditional, Both, Mixed, Unknown), and using the helper functions `is_simplified` and `is_traditional`."},"warnings":[{"fix":"Refer to the CHANGES.rst file in the GitHub repository for detailed migration steps if upgrading from pre-1.0 versions. Re-evaluate constant names used in your code.","message":"Version 1.0 (released 2014-04-12) introduced breaking changes, including renaming some constants. Code written for versions prior to 1.0 will likely fail.","severity":"breaking","affected_versions":"<1.0"},{"fix":"Understand that `BOTH` signifies compatibility with both systems. If you need to confirm if a string *can* be interpreted as Simplified or Traditional, consider if `identify() in (hanzidentifier.SIMPLIFIED, hanzidentifier.BOTH, hanzidentifier.MIXED)` (for Simplified) or similar logic for Traditional, fits your use case.","message":"The `identify()` function may return `hanzidentifier.BOTH` for strings containing characters that are valid in both Simplified and Traditional Chinese character sets. This means `is_simplified()` or `is_traditional()` might return `False` if the string isn't *exclusively* of that type, even if it contains characters compatible with it.","severity":"gotcha","affected_versions":">=1.0"},{"fix":"If differentiating between Chinese, Japanese Kanji, and Korean Hanja is critical, combine `hanzidentifier` with other language detection libraries or specific CJK character set checkers.","message":"hanzidentifier is designed to identify Chinese characters. While many Japanese Kanji and Korean Hanja share ideographs with Chinese, this library does not distinguish between these languages. It will identify shared characters as Chinese.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install hanzidentifier` in your terminal to install the package.","cause":"The `hanzidentifier` library has not been installed in your current Python environment.","error":"ModuleNotFoundError: No module named 'hanzidentifier'"},{"fix":"Check the output of `hanzidentifier.identify()` first. If it returns `hanzidentifier.BOTH`, it means the characters are valid in both Simplified and Traditional contexts. Consider what your desired outcome is for such cases.","cause":"This often occurs when the string contains characters that are common to both Simplified and Traditional Chinese, leading `identify()` to return `hanzidentifier.BOTH`. `is_simplified()` strictly checks if *all* Chinese characters can be categorized as Simplified *or* are common to both, but if there's any character exclusively Traditional or if the string is just 'BOTH' without other exclusively simplified, it might not return True.","error":"My code expects `hanzidentifier.is_simplified()` to return `True` for a simplified text, but it returns `False` even though the text looks simplified."},{"fix":"Use `hanzidentifier.has_chinese()` to confirm the presence of any Chinese characters. `identify()` focuses on categorizing the *type* of Chinese characters, not merely their existence. If a string has few identifiable Chinese characters amidst many non-Chinese, the identification might default to `UNKNOWN`.","cause":"The `identify()` function returns `UNKNOWN` when it cannot determine the character system (Simplified, Traditional, Mixed, or Both) from the Chinese characters present. This might happen if the string primarily contains non-Chinese characters, or the Chinese characters found are too ambiguous in isolation.","error":"hanzidentifier.identify('Some English text with 你好') returns `hanzidentifier.UNKNOWN`, even though there are Chinese characters."}]}