Language Detection (langdetect)

1.0.9 · maintenance · verified Thu Apr 09

langdetect is a pure Python port of Google's language-detection library, offering capabilities to identify the language of a given text. It supports over 50 languages and provides both a single-best guess and a list of probable languages with confidence scores. The current version is 1.0.9, released in 2018, indicating a very stable but slow release cadence, effectively in a maintenance state.

Warnings

Install

Imports

Quickstart

Demonstrates how to detect the primary language of a text and retrieve a list of possible languages with their confidence scores. It also includes error handling for `LangDetectException`, which is common for short or non-linguistic inputs, and how to use `set_seed` for reproducible results.

from langdetect import detect, detect_langs, set_seed, LangDetectException

# For reproducible results, especially with short texts where probabilities are close
set_seed(0)

text_en = "This is a simple English sentence."
text_fr = "Ceci est une simple phrase française."
text_mixed = "Hallo Welt! This is a mixed text."

try:
    # Detect the primary language
    print(f"'{text_en}' detected as: {detect(text_en)}")
    print(f"'{text_fr}' detected as: {detect(text_fr)}")
    print(f"'{text_mixed}' detected as: {detect(text_mixed)}") # May vary due to mix

    # Get a list of detected languages with their probabilities
    print(f"Probabilities for '{text_en}': {[str(l) for l in detect_langs(text_en)]}")
    print(f"Probabilities for '{text_fr}': {[str(l) for l in detect_langs(text_fr)]}")

    # Handling short/invalid text
    text_short_or_invalid = "a"
    print(f"Attempting to detect '{text_short_or_invalid}'...")
    print(f"Probabilities for '{text_short_or_invalid}': {[str(l) for l in detect_langs(text_short_or_invalid)]}")

except LangDetectException as e:
    # This exception is common for very short or non-linguistic texts
    print(f"An error occurred: {e}. This often happens with very short or unsuitable input text.")

view raw JSON →