{"id":5144,"library":"cchardet","title":"cChardet - High-speed Universal Character Encoding Detector","description":"cChardet is a high-speed universal character encoding detector implemented as a C extension for Python. It provides fast and accurate detection of text encoding, leveraging the underlying `uchardet` library (a port of Mozilla's `chardet`). The current stable version is 2.1.7, with alpha releases for 2.2.0 indicating ongoing development and support for newer Python versions.","status":"active","version":"2.1.7","language":"en","source_language":"en","source_url":"https://github.com/PyYoshi/cChardet","tags":["encoding","detection","chardet","c-extension","speed","unicode"],"install":[{"cmd":"pip install cchardet","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"While 'from cchardet import detect' technically works, the primary API usually involves 'import cchardet' and then accessing methods from the module directly, like cchardet.detect().","wrong":"from cchardet import detect","symbol":"detect","correct":"import cchardet\ncchardet.detect(b'some bytes')"},{"note":"Used for streaming detection of encoding.","symbol":"UniversalDetector","correct":"from cchardet import UniversalDetector"}],"quickstart":{"code":"import cchardet\n\n# Example 1: Detect encoding of a simple byte string\ndata = 'これは日本語です'.encode('shift_jis')\nresult = cchardet.detect(data)\nprint(f\"Detected encoding: {result['encoding']}, confidence: {result['confidence']:.2f}\")\n\n# Example 2: Using UniversalDetector for streaming data\nfrom cchardet import UniversalDetector\n\ndetector = UniversalDetector()\nfor line in [b'Hello, world!', b'\\xcf\\x84\\xce\\xb7\\xce\\xbd \\xce\\xba\\xce\\xb1\\xce\\xbb\\xce\\xb7\\xce\\xbc\\xce\\xb5\\xce\\xbd \\xcf\\x81\\xce\\xb1!']:\n    detector.feed(line)\ndetector.close()\nstreaming_result = detector.result\nprint(f\"Streaming detected encoding: {streaming_result['encoding']}, confidence: {streaming_result['confidence']:.2f}\")","lang":"python","description":"This quickstart demonstrates basic character encoding detection using `cchardet.detect()` for a single byte string and `UniversalDetector` for streaming data. `detect()` returns a dictionary with 'encoding', 'confidence', and 'language'."},"warnings":[{"fix":"Ensure your Python environment meets the requirements of the specific `cchardet` version. For new projects, target Python 3.9+ with `cchardet` 2.1.7, or consider using the 2.2.x series for newer Python versions once stable.","message":"Python version support has significantly changed across major and minor releases. Version 2.1.6 dropped Python 2.7. Version 2.1.7 dropped Python 3.5. Future 2.2.x alpha releases indicate dropping support for Python 3.6-3.8 in favor of Python 3.10, 3.11, and 3.12. Always check the required Python version for the specific `cchardet` release you intend to use.","severity":"breaking","affected_versions":">=2.1.6"},{"fix":"If upgrading from pre-2.0.0 versions, thoroughly test your application's encoding detection behavior, especially with diverse input data, to ensure consistency and correctness.","message":"Version 2.0.0 replaced the underlying `uchardet-enhanced` library with `uchardet`. While both are based on Mozilla's chardet, this change could potentially introduce subtle differences in detection results for specific edge cases or less common encodings.","severity":"breaking","affected_versions":">=2.0.0"},{"fix":"Ensure you have a C compiler installed if you encounter build errors during `pip install`. For Windows, this typically means installing 'Build Tools for Visual Studio'. For Linux, 'build-essential' or similar packages. Consider using Docker or virtual environments with pre-built images if compilation is problematic.","message":"`cchardet` is a C extension. If pre-built wheels are not available for your specific operating system, Python version, and architecture, `pip` will attempt to compile it from source. This requires a C compiler (e.g., GCC, Clang, MSVC) to be installed and properly configured on your system.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always inspect the `confidence` value in the returned dictionary. Implement logic to handle cases where confidence is low, such as attempting other detection methods, using a default encoding, or prompting the user.","message":"The `detect` function can sometimes return a lower confidence for ambiguous encodings. A high confidence (e.g., >0.9) generally indicates a reliable detection, but lower confidence values might warrant further validation or fallback mechanisms.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}