Fast Language Detection
fast-langdetect is an ultra-fast and highly accurate language detection library based on FastText, a library developed by Facebook. It offers 80x faster performance and up to 95% accuracy compared to conventional methods. The library supports Python versions 3.9 to 3.13 and works offline with a lightweight model, with continuous active development.
Warnings
- breaking The configuration system was overhauled in v0.3.0, replacing environment variables (e.g., `FTLANG_CACHE`) with a dedicated `LangDetectConfig` class for explicit management. Existing code relying on environment variables for configuration will break.
- gotcha Detection accuracy can be reduced for text samples that are significantly shorter or longer than approximately 80 characters. Inputs are truncated to 80 characters by default.
- gotcha Different models have different memory footprints and accuracy. The 'lite' model is memory-friendly (~45-60 MB RSS) and works offline, while the 'full' model (~170-210 MB RSS) offers higher accuracy but consumes more memory. The `model='auto'` setting only falls back to the 'lite' model if a `MemoryError` occurs.
- gotcha The `model='auto'` fallback mechanism is specific to `MemoryError` only. Other issues like `FileNotFoundError`, `PermissionError`, or network-related errors during model loading will raise standard Python exceptions and are not silently handled or fallen back.
- gotcha As of v0.4.0, newline characters in input text are always replaced with spaces internally to prevent errors with the underlying FastText model. This transformation is logged at a DEBUG level and happens silently by default.
- gotcha The pre-trained FastText language identification models bundled or downloaded by `fast-langdetect` are licensed under the Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA 3.0) license. This is separate from the MIT license for the `fast-langdetect` code itself.
Install
-
pip install fast-langdetect
Imports
- detect_language
from fast_langdetect import detect_language
- LangDetector
from fast_langdetect import LangDetector
- LangDetectConfig
from fast_langdetect import LangDetectConfig
Quickstart
from fast_langdetect import detect_language
text1 = "Hello, how are you?"
text2 = "Bonjour, comment allez-vous?"
text3 = "Este es un texto muy largo en español, con muchas palabras y frases para probar la detección de idioma."
# Detect language with default settings (lite model)
result1 = detect_language(text1)
print(f"'{text1}' detected as: {result1.lang} (confidence: {result1.score:.2f})")
# Detect language using the 'full' model for potentially higher accuracy
result2 = detect_language(text2, model='full')
print(f"'{text2}' detected as: {result2.lang} (confidence: {result2.score:.2f})")
# Detect language with 'auto' model, which falls back to lite on MemoryError
# Also request top 2 languages
result3 = detect_language(text3, model='auto', k=2)
print(f"'{text3}' detected top 2 as: {result3}")