Charset Normalizer

3.4.6 verified Tue May 12 auth: no python install: verified quickstart: verified

Charset-normalizer is a truly universal charset encoding detector for Python. It detects the encoding of raw bytes/files using a heuristic, non-training-based approach and can optionally identify the spoken language of the content. All IANA character set names supported by CPython codecs are supported. The library also ships a `normalizer` CLI tool and a drop-in `detect()` shim for Chardet migration. Current version is 3.4.6 (released March 2026); releases follow Semantic Versioning with frequent minor/patch cadence.

pip install charset-normalizer

Common errors

error AttributeError: partially initialized module 'charset_normalizer' has no attribute 'md__mypyc' (most likely due to a circular import) ↓

cause This error typically indicates a corrupted or incomplete installation of `charset-normalizer`, often due to file shadowing, stale `__pycache__` files, or issues within specific build environments like PyInstaller.

fix

Reinstall the package cleanly using pip install --force-reinstall charset-normalizer or, if using conda, conda install -c conda-forge charset-normalizer after uninstalling any existing version.

error ModuleNotFoundError: No module named 'charset_normalizer' ↓

cause The `charset-normalizer` package is not installed in the active Python environment or is not discoverable in the Python path.

fix

Install the package using pip install charset-normalizer or conda install charset-normalizer depending on your environment.

error normalizer: command not found ↓

cause The `normalizer` CLI tool, which comes with the `charset-normalizer` library, is not found in your system's PATH or was not installed correctly.

fix

Ensure charset-normalizer is installed in an environment whose scripts directory is in your system's PATH, or run the tool using python -m charset_normalizer.

error ImportError: cannot import name 'COMMON_SAFE_ASCII_CHARACTERS' from 'charset_normalizer.constant' ↓

cause This usually points to a version incompatibility or a corrupted installation, often occurring when `charset-normalizer` is used alongside other libraries (like `transformers` or `chardet`) that expect a different internal structure or version.

fix

Cleanly uninstall both charset-normalizer and any directly dependent libraries (like chardet if present), then reinstall charset-normalizer and the dependent libraries to ensure compatible versions are used.

Warnings

breaking Class aliases CharsetNormalizerMatch, CharsetNormalizerMatches, CharsetDetector, and CharsetDoctor were removed in 3.0. Code referencing these names will raise ImportError or AttributeError. ↓

fix Replace with CharsetMatch and CharsetMatches imported from charset_normalizer.models, or use the top-level from_bytes/from_path functions directly.

breaking Python 3.6 support was dropped in 3.1.0, and Python 3.5 support was dropped in 2.1.0. Installing 3.x on Python 3.6 is unsupported. ↓

fix Pin charset-normalizer<3.1 for Python 3.6, or upgrade the Python interpreter.

gotcha detect() is the legacy Chardet-compatible shim and is officially deprecated. It also lowers confidence automatically for small byte samples (3.4.3+), so results on short inputs may differ from Chardet. ↓

fix Migrate to from_bytes(...).best() for new code. Check best() for None before calling str() or accessing .encoding.

gotcha Feeding truncated or incomplete multi-byte byte sequences (e.g. a partial UTF-16 or UTF-32 file) will likely produce incorrect or empty detection results. The library is not designed for streaming partial payloads. ↓

fix Always pass the full byte sequence. Do not slice input for 'performance' — the library already samples internally (5 blocks of 512 bytes by default).

gotcha from_bytes/from_path return a CharsetMatches container, not a string or a single result. Calling str() directly on the container gives unexpected output. Always call .best() first, then check for None. ↓

fix Use: result = from_bytes(raw).best(); text = str(result) if result is not None else ''

gotcha The import name uses an underscore (charset_normalizer) but the PyPI/install name uses a hyphen (charset-normalizer). Using import charset-normalizer raises a SyntaxError. ↓

fix Always use: from charset_normalizer import ...

deprecated Internal module charset_normalizer.assets was moved into charset_normalizer.constant in 3.3.x. Any code importing from charset_normalizer.assets directly will break on 3.3+. ↓

fix Do not import internal modules. Use only the public API: from_bytes, from_path, from_fp, detect, is_binary.

Install

pip install charset-normalizer -U

Install compatibility verified last tested: 2026-05-12

python os / libc status wheel install import disk

3.10 alpine (musl) - - 0.27s 18.6M

3.10 alpine (musl) - - 0.26s 18.6M

3.10 slim (glibc) - - 0.16s 19M

3.11 alpine (musl) - - 0.34s 20.5M

3.11 alpine (musl) - - 0.35s 20.5M

3.11 slim (glibc) - - 0.27s 21M

3.11 slim (glibc) - - 0.28s 21M

3.12 alpine (musl) - - 0.34s 12.4M

3.12 slim (glibc) - - 0.29s 13M

3.12 slim (glibc) - - 0.30s 13M

3.13 alpine (musl) - - 0.36s 12.0M

3.13 alpine (musl) - - 0.33s 12.0M

3.13 slim (glibc) - - 0.33s 12M

3.13 slim (glibc) - - 0.31s 12M

3.9 alpine (musl) - - 0.25s 18.1M

3.9 alpine (musl) - - 0.26s 18.1M

3.9 slim (glibc) - - 0.20s 19M

Imports

from_bytes
```
from charset_normalizer import from_bytes
```
Primary API for detecting encoding of a bytes/bytearray object. Returns a CharsetMatches container.
from_path
```
from charset_normalizer import from_path
```
Primary API for detecting encoding of a file on disk; accepts str, bytes, or os.PathLike.
from_fp
```
from charset_normalizer import from_fp
```
Primary API for detecting encoding from an already-open binary file pointer. Does NOT close the file pointer.
detect
wrong
```
import chardet; chardet.detect(...)
```
correct
```
from charset_normalizer import detect
```
Legacy Chardet-compatible shim. Officially deprecated in favour of from_bytes; not planned for removal. Returns a dict with 'encoding', 'confidence', 'language'.
is_binary
```
from charset_normalizer import is_binary
```
Utility to detect whether bytes/path/fp point to binary (non-text) content. Added in 3.3.x.
CharsetMatches
wrong
```
from charset_normalizer import CharsetNormalizerMatches
```
correct
```
from charset_normalizer.models import CharsetMatches
```
Class was renamed from CharsetNormalizerMatches to CharsetMatches in 3.0. The old alias was removed.

CharsetMatch

wrong

from charset_normalizer import CharsetNormalizerMatch

correct

from charset_normalizer.models import CharsetMatch

Renamed from CharsetNormalizerMatch in 3.0. Old alias removed.

Quickstart verified last tested: 2026-04-23

Detect encoding of raw bytes, decode the content, and use the Chardet-compatible legacy shim.

from charset_normalizer import from_bytes, from_path, detect

# --- from raw bytes ---
raw = b'\xff\xfe' + 'Hello, world!'.encode('utf-16-le')
results = from_bytes(raw)
best = results.best()
if best is not None:
    print('Encoding:', best.encoding)        # e.g. 'utf_16'
    print('Language:', best.language)        # e.g. 'English' or ''
    print('Decoded :', str(best))            # decoded unicode string
else:
    print('Could not detect encoding (possibly binary data)')

# --- from a file path ---
# results2 = from_path('./data/sample.txt')
# print(str(results2.best()))

# --- Chardet-compatible legacy shim (deprecated but stable) ---
result = detect(raw)
print(result)  # {'encoding': 'UTF-16', 'confidence': 1.0, 'language': ''}
if result['encoding']:
    decoded = raw.decode(result['encoding'])
    print('Legacy decoded:', decoded)