pycld3

raw JSON →
0.22 verified Mon Apr 27 auth: no python

Python CFFI bindings for Google's Compact Language Detector v3 (CLD3). Provides language detection with a compact neural network. Current version is 0.22. Releases are infrequent; no active development since 2020.

pip install pycld3
error ModuleNotFoundError: No module named 'cld3'
cause Trying to import pycld3 as 'cld3' (the old name).
fix
Use 'import pycld3' instead of 'import cld3'.
error OSError: libcld3.so: cannot open shared object file: No such file or directory
cause Missing native library; pycld3 ships its own shared library but it may not be installed correctly.
fix
Reinstall pycld3: pip install --force-reinstall pycld3
error AttributeError: module 'pycld3' has no attribute 'CLD3'
cause API mismatch: older code used 'pycld3.CLD3'.
fix
Use 'pycld3.classify' and 'pycld3.top_n' instead.
breaking Python 3.6+ only: pycld3 0.22 drops support for Python 3.5 and earlier. Import error on older versions.
fix Use Python 3.6 or newer.
gotcha Check for reliable flag: The language detection may return a low probability result. Always check the `reliable` boolean attribute before trusting the output.
fix if not result.reliable: handle uncertain detection.
gotcha Thread safety: The underlying C library is not thread-safe. Do not call classify or top_n concurrently from multiple threads without locking.
fix Use locks or single-threaded access.
deprecated The `cld3` import alias (import cld3) was removed in version 0.22. Use `import pycld3`.
fix Change import from cld3 to pycld3.

Basic usage of pycld3 to detect language of a string and get top N languages.

import pycld3

# Detect language of a text
result = pycld3.classify('This is a test')
print(result)  # LanguageResult(language='en', probability=0.9999, reliable=True)

# Get top N languages
results = pycld3.top_n('This is a test', 3)
print(results)  # [LanguageResult(language='en', ...), ...]

# Language identification for multiple texts
for text in ['Bonjour', 'Hola', 'Hello']:
    result = pycld3.classify(text)
    print(f"{text} -> {result.language}")