FastText Language Detection
fasttext-langdetect is a Python wrapper for Facebook's FastText language identification model. It offers fast and up to 95% accurate language detection across over 170 languages. The library, currently at version 1.0.5 (last released in January 2023), provides a straightforward interface for identifying the language of a given text string. It downloads the necessary FastText model on its first use.
Warnings
- gotcha The FastText language model files are downloaded on the first call to the `detect()` function. This requires an active internet connection and sufficient disk space (the full model is ~126MB, the low-memory version is ~917KB). Subsequent calls will use the cached models.
- gotcha The `low_memory` parameter (defaulting to `False`) controls which model is used. Setting `low_memory=True` loads a compressed model that uses less memory but may result in slightly lower detection accuracy compared to the full model (`low_memory=False`).
- gotcha Language detection accuracy can be reduced for very short text inputs (e.g., single words, short phrases) or extremely long inputs. FastText models are generally optimized for text segments around 10-80 characters for optimal performance.
- gotcha Pre-trained FastText models are often trained on clean, well-structured text. Noisy inputs (e.g., text with spelling errors, unusual capitalization, slang, or mixed languages/code-switching) can lead to reduced detection accuracy.
Install
-
pip install fasttext-langdetect
Imports
- detect
from ftlangdetect import detect
Quickstart
from ftlangdetect import detect
# Detect language with default settings (low_memory=False for higher accuracy)
result_full = detect(text="Bugün hava çok güzel")
print(f"Full model result: {result_full}")
# Detect language with low_memory option (smaller model, slightly less accurate)
result_low_memory = detect(text="Bugün hava çok güzel", low_memory=True)
print(f"Low-memory model result: {result_low_memory}")
# Example with English text
english_text = "Hello, world! How are you?"
result_en = detect(text=english_text)
print(f"English text result: {result_en}")