{"id":3485,"library":"fast-langdetect","title":"Fast Language Detection","description":"fast-langdetect is an ultra-fast and highly accurate language detection library based on FastText, a library developed by Facebook. It offers 80x faster performance and up to 95% accuracy compared to conventional methods. The library supports Python versions 3.9 to 3.13 and works offline with a lightweight model, with continuous active development.","status":"active","version":"1.0.0","language":"en","source_language":"en","source_url":"https://github.com/LlmKira/fast-langdetect","tags":["language detection","fasttext","nlp","ai","machine learning"],"install":[{"cmd":"pip install fast-langdetect","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Handles model downloads and caching.","package":"robust-downloader","optional":false},{"reason":"Used for HTTP requests, likely by robust-downloader.","package":"requests","optional":false},{"reason":"The underlying FastText model inference engine.","package":"fasttext-predict","optional":false}],"imports":[{"note":"Primary function for quick, direct language detection.","symbol":"detect_language","correct":"from fast_langdetect import detect_language"},{"note":"Used to create custom detector instances for advanced configuration.","symbol":"LangDetector","correct":"from fast_langdetect import LangDetector"},{"note":"Configuration class for setting parameters like cache directory or input length.","symbol":"LangDetectConfig","correct":"from fast_langdetect import LangDetectConfig"}],"quickstart":{"code":"from fast_langdetect import detect_language\n\ntext1 = \"Hello, how are you?\"\ntext2 = \"Bonjour, comment allez-vous?\"\ntext3 = \"Este es un texto muy largo en español, con muchas palabras y frases para probar la detección de idioma.\"\n\n# Detect language with default settings (lite model)\nresult1 = detect_language(text1)\nprint(f\"'{text1}' detected as: {result1.lang} (confidence: {result1.score:.2f})\")\n\n# Detect language using the 'full' model for potentially higher accuracy\nresult2 = detect_language(text2, model='full')\nprint(f\"'{text2}' detected as: {result2.lang} (confidence: {result2.score:.2f})\")\n\n# Detect language with 'auto' model, which falls back to lite on MemoryError\n# Also request top 2 languages\nresult3 = detect_language(text3, model='auto', k=2)\nprint(f\"'{text3}' detected top 2 as: {result3}\")","lang":"python","description":"This quickstart demonstrates how to use `fast-langdetect` to detect the language of various text inputs. It shows usage of the default 'lite' model, explicitly selecting the 'full' model for higher accuracy, and using the 'auto' model with fallback behavior."},"warnings":[{"fix":"Migrate to using `LangDetectConfig` objects. For example, pass `config=LangDetectConfig(cache_dir='your/path')` to `LangDetector` or use `FTLANG_CACHE` environment variable before importing and initializing any components if not explicitly setting `cache_dir`.","message":"The configuration system was overhauled in v0.3.0, replacing environment variables (e.g., `FTLANG_CACHE`) with a dedicated `LangDetectConfig` class for explicit management. Existing code relying on environment variables for configuration will break.","severity":"breaking","affected_versions":">=0.3.0"},{"fix":"For longer texts, disable truncation via `LangDetectConfig(max_input_length=None)` if you understand the potential performance or accuracy implications. For short texts, be aware of inherent accuracy limitations.","message":"Detection accuracy can be reduced for text samples that are significantly shorter or longer than approximately 80 characters. Inputs are truncated to 80 characters by default.","severity":"gotcha","affected_versions":"All"},{"fix":"Explicitly choose `model='lite'` for memory-constrained environments, `model='full'` for highest accuracy, or `model='auto'` with awareness of its specific fallback condition.","message":"Different models have different memory footprints and accuracy. The 'lite' model is memory-friendly (~45-60 MB RSS) and works offline, while the 'full' model (~170-210 MB RSS) offers higher accuracy but consumes more memory. The `model='auto'` setting only falls back to the 'lite' model if a `MemoryError` occurs.","severity":"gotcha","affected_versions":"All"},{"fix":"Implement standard Python error handling (try-except blocks) for potential I/O or network issues when using the library.","message":"The `model='auto'` fallback mechanism is specific to `MemoryError` only. Other issues like `FileNotFoundError`, `PermissionError`, or network-related errors during model loading will raise standard Python exceptions and are not silently handled or fallen back.","severity":"gotcha","affected_versions":"All"},{"fix":"Be aware that newline characters are processed this way. If exact text formatting is critical for other parts of your pipeline, perform necessary pre-processing before passing text to `fast-langdetect`.","message":"As of v0.4.0, newline characters in input text are always replaced with spaces internally to prevent errors with the underlying FastText model. This transformation is logged at a DEBUG level and happens silently by default.","severity":"gotcha","affected_versions":">=0.4.0"},{"fix":"If redistributing or modifying the model files, ensure compliance with the CC BY-SA 3.0 license terms, including attribution and share-alike conditions.","message":"The pre-trained FastText language identification models bundled or downloaded by `fast-langdetect` are licensed under the Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA 3.0) license. This is separate from the MIT license for the `fast-langdetect` code itself.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}