faust-cchardet
faust-cchardet is a high-speed universal character encoding detector, acting as a binding to the `uchardet` library. It is an actively maintained fork of the original, unmaintained `cChardet` project. The library provides efficient character set detection for various languages and encodings. It is currently at version 2.1.19, with a release cadence primarily driven by build improvements and Python version compatibility updates.
Warnings
- gotcha This library is a fork of the original `cchardet` project, which is no longer maintained and can have build issues, especially with newer Python versions or operating systems. Always install `faust-cchardet` for the actively maintained version.
- gotcha The Python package name is `faust-cchardet`, but the module you import in your Python code is `cchardet`.
- breaking Python 2.7 support was dropped in version 2.1.6.
- gotcha `faust-cchardet` provides C bindings to `uchardet` for performance. Do not confuse it with the pure Python `chardet` library, which has its own breaking changes and API differences (e.g., in `chardet` 7.0.0, Python 3.7-3.9 support was dropped, which may not apply to `faust-cchardet`).
Install
-
pip install faust-cchardet
Imports
- detect
import cchardet as chardet result = chardet.detect(byte_string)
Quickstart
import cchardet as chardet
# Example 1: Detect encoding of a known byte string
text_bytes_utf8 = b"Hello, world!\xc3\xa9 This has a \xc3\xa9 character."
result_utf8 = chardet.detect(text_bytes_utf8)
print(f"UTF-8 example: {result_utf8}")
text_bytes_latin1 = "Français".encode('latin-1')
result_latin1 = chardet.detect(text_bytes_latin1)
print(f"Latin-1 example: {result_latin1}")
# The result is a dictionary containing 'encoding', 'confidence', and 'language'.