{"id":1531,"library":"langdetect","title":"Language Detection (langdetect)","description":"langdetect is a pure Python port of Google's language-detection library, offering capabilities to identify the language of a given text. It supports over 50 languages and provides both a single-best guess and a list of probable languages with confidence scores. The current version is 1.0.9, released in 2018, indicating a very stable but slow release cadence, effectively in a maintenance state.","status":"maintenance","version":"1.0.9","language":"en","source_language":"en","source_url":"https://github.com/Mimino666/langdetect","tags":["language-detection","nlp","i18n","natural-language-processing"],"install":[{"cmd":"pip install langdetect","lang":"bash","label":"Install langdetect"}],"dependencies":[],"imports":[{"symbol":"detect","correct":"from langdetect import detect"},{"symbol":"detect_langs","correct":"from langdetect import detect_langs"},{"note":"Use to ensure reproducible results for short texts.","symbol":"set_seed","correct":"from langdetect import set_seed"},{"note":"Crucial for handling inputs that cannot be reliably detected.","symbol":"LangDetectException","correct":"from langdetect import LangDetectException"}],"quickstart":{"code":"from langdetect import detect, detect_langs, set_seed, LangDetectException\n\n# For reproducible results, especially with short texts where probabilities are close\nset_seed(0)\n\ntext_en = \"This is a simple English sentence.\"\ntext_fr = \"Ceci est une simple phrase française.\"\ntext_mixed = \"Hallo Welt! This is a mixed text.\"\n\ntry:\n    # Detect the primary language\n    print(f\"'{text_en}' detected as: {detect(text_en)}\")\n    print(f\"'{text_fr}' detected as: {detect(text_fr)}\")\n    print(f\"'{text_mixed}' detected as: {detect(text_mixed)}\") # May vary due to mix\n\n    # Get a list of detected languages with their probabilities\n    print(f\"Probabilities for '{text_en}': {[str(l) for l in detect_langs(text_en)]}\")\n    print(f\"Probabilities for '{text_fr}': {[str(l) for l in detect_langs(text_fr)]}\")\n\n    # Handling short/invalid text\n    text_short_or_invalid = \"a\"\n    print(f\"Attempting to detect '{text_short_or_invalid}'...\")\n    print(f\"Probabilities for '{text_short_or_invalid}': {[str(l) for l in detect_langs(text_short_or_invalid)]}\")\n\nexcept LangDetectException as e:\n    # This exception is common for very short or non-linguistic texts\n    print(f\"An error occurred: {e}. This often happens with very short or unsuitable input text.\")","lang":"python","description":"Demonstrates how to detect the primary language of a text and retrieve a list of possible languages with their confidence scores. It also includes error handling for `LangDetectException`, which is common for short or non-linguistic inputs, and how to use `set_seed` for reproducible results."},"warnings":[{"fix":"Always wrap `detect()` and `detect_langs()` calls in a `try...except LangDetectException` block. Consider pre-validating input length or content.","message":"The library frequently raises `langdetect.lang_detect_exception.LangDetectException` for short texts, empty strings, or text that doesn't contain enough linguistic information for reliable detection.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Call `langdetect.set_seed(seed_value)` at the beginning of your program to ensure reproducible results, especially important for testing and debugging.","message":"For very short texts, `langdetect` can produce non-deterministic or inconsistent results due to internal sampling. This can lead to different outputs for the same input across multiple runs if not handled.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For critical applications requiring high accuracy or specific language nuances, evaluate newer libraries like `fasttext` or `cld3` (a port of Google's newer CLD3) that offer more recent models and potentially better performance.","message":"The language models used by `langdetect` are based on an older Google project (circa 2014-2018) and are not actively updated. This may lead to less accurate results compared to newer, more sophisticated language detection libraries, especially for modern slang, domain-specific text, or less common languages/dialects.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}