Lingua Language Detector

2.2.0 · active · verified Sat Apr 11

Lingua Language Detector is an accurate natural language detection library for Python, suitable for both short text snippets and mixed-language texts. It leverages Rust bindings for high performance and low memory consumption, supporting 75 languages offline. The current version is 2.2.0, with an active development cycle featuring regular minor and patch releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to build a language detector, detect the language of a single text, and identify multiple languages within a mixed-language text. It utilizes the `Language` enum and `LanguageDetectorBuilder` to configure and create a detector instance.

from lingua import Language, LanguageDetectorBuilder

# Build a detector for specific languages
languages = [Language.ENGLISH, Language.FRENCH, Language.GERMAN]
detector = LanguageDetectorBuilder.from_languages(*languages).build()

# Detect a single language
text_single = "languages are awesome"
detected_language_single = detector.detect_language_of(text_single)
print(f"Detected language (single): {detected_language_single.name}")

# Detect multiple languages in mixed text (experimental)
text_mixed = "Hello world, comment ça va? Das ist ein Test."
detected_languages_mixed = detector.detect_multiple_languages_of(text_mixed)
print("Detected languages (mixed):")
for result in detected_languages_mixed:
    print(f"  - {result.language.name}: '{text_mixed[result.start_index:result.end_index]}' ({result.start_index}-{result.end_index})")

view raw JSON →