faust-cchardet

2.1.19 · active · verified Mon Apr 13

faust-cchardet is a high-speed universal character encoding detector, acting as a binding to the `uchardet` library. It is an actively maintained fork of the original, unmaintained `cChardet` project. The library provides efficient character set detection for various languages and encodings. It is currently at version 2.1.19, with a release cadence primarily driven by build improvements and Python version compatibility updates.

Warnings

Install

Imports

Quickstart

This example demonstrates how to import `cchardet` and use its `detect` function to identify the encoding of a byte string.

import cchardet as chardet

# Example 1: Detect encoding of a known byte string
text_bytes_utf8 = b"Hello, world!\xc3\xa9 This has a \xc3\xa9 character."
result_utf8 = chardet.detect(text_bytes_utf8)
print(f"UTF-8 example: {result_utf8}")

text_bytes_latin1 = "Français".encode('latin-1')
result_latin1 = chardet.detect(text_bytes_latin1)
print(f"Latin-1 example: {result_latin1}")

# The result is a dictionary containing 'encoding', 'confidence', and 'language'.

view raw JSON →