{"id":4185,"library":"pycld2","title":"pycld2 Language Detection","description":"pycld2 provides Python bindings to Google Chromium's Compact Language Detection library (CLD2). It supports detection for over 165 languages and aims to consolidate the C++ library and its bindings into a single installable Python package. Version 0.42 was released in March 2025, with an irregular release cadence.","status":"active","version":"0.42","language":"en","source_language":"en","source_url":"https://github.com/aboSamoor/pycld2","tags":["language detection","nlp","cld2","text analysis"],"install":[{"cmd":"pip install pycld2","lang":"bash","label":"Standard Installation"},{"cmd":"sudo apt-get install build-essential python3-dev # On Debian/Ubuntu\n# On macOS: Install Xcode Command Line Tools (xcode-select --install)\n# On Windows: Install MSVC Build Tools (part of Visual Studio Community)","lang":"bash","label":"System Dependencies for Compilation"}],"dependencies":[{"reason":"pycld2 compiles C++ code during installation. System build tools are required.","package":"C/C++ compiler (e.g., GCC, Clang, MSVC)","optional":false},{"reason":"Required for compiling Python extensions. (e.g., python3-dev on Debian/Ubuntu)","package":"Python development headers","optional":false}],"imports":[{"note":"The `detect` function is the primary entry point for language detection.","wrong":"from pycld2 import detect # While technically possible, the community examples predominantly use `import pycld2 as cld2`.","symbol":"detect","correct":"import pycld2 as cld2\nisReliable, textBytesFound, details = cld2.detect(text)"}],"quickstart":{"code":"import pycld2 as cld2\n\n# Example 1: Basic detection\ntext_russian = \"а неправильный формат идентификатора дн назад\"\nisReliable, textBytesFound, details = cld2.detect(text_russian)\n\nprint(f\"Text: '{text_russian}'\")\nprint(f\"Is reliable: {isReliable}\")\nprint(f\"Detected language: {details[0][0]} ({details[0][1]})\")\nprint(f\"Details: {details}\")\n\nprint('\\n---\\n')\n\n# Example 2: Detecting multiple languages and getting vectors\ntext_mixed = \"\"\"France is the largest country in Western Europe. A accès aux chiens et aux frontaux qui lui ont été il peut consulter. The quick brown fox jumped over the lazy dog.\"\"\"\nisReliable, textBytesFound, details, vectors = cld2.detect(\n    text_mixed,\n    returnVectors=True\n)\n\nprint(f\"Text: '{text_mixed}'\")\nprint(f\"Is reliable: {isReliable}\")\nprint(f\"Detected language (summary): {details[0][0]} ({details[0][1]})\")\nprint(f\"Segment language vectors: {vectors}\")","lang":"python","description":"This quickstart demonstrates how to use `pycld2.detect()` for basic language identification and for obtaining detailed language vectors from mixed-language text. The `detect` function returns a tuple containing reliability, bytes found, a list of detected languages with confidence scores, and optionally segment-level language vectors."},"warnings":[{"fix":"Ensure that your system has the necessary C/C++ build tools (e.g., GCC, Clang, MSVC) and Python development headers installed before attempting `pip install pycld2`. For Debian/Ubuntu, `sudo apt-get install build-essential python3-dev` is often required. For macOS, `xcode-select --install`. For Windows, install Visual Studio Build Tools.","message":"Installation commonly fails with 'Failed building wheel for pycld2' errors, particularly on non-standard architectures (e.g., ARM/aarch64) or Windows, due to missing C/C++ compilers or Python development headers.","severity":"breaking","affected_versions":"All versions"},{"fix":"Always ensure your input text is either a standard Python `str` (which `pycld2` will encode to UTF-8 internally) or `bytes` that have been explicitly encoded using UTF-8 (e.g., `my_text.encode('utf-8')`).","message":"The `detect()` function strictly requires UTF-8 encoded `bytes` or a `str` as input. Passing bytes encoded in other formats (e.g., Latin-1) will raise a `pycld2.error`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Only enable `debugScoreAsQuads=True` if you explicitly need this debugging feature, as it incurs a substantial CPU cost.","message":"Setting the `debugScoreAsQuads` parameter to `True` in `detect()` can significantly impact performance, potentially causing a 2x performance hit.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Avoid using `hintEncoding` as it has no effect. Focus on ensuring the input `utf8Bytes` is correctly encoded.","message":"The `hintEncoding` parameter in the `detect()` function is currently not working and provides no biasing hint to the detector.","severity":"deprecated","affected_versions":"All versions up to 0.42"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}