{"id":4768,"library":"spacy-language-detection","title":"spaCy Language Detection","description":"spacy-language-detection is a fully customizable language detection component for spaCy pipelines, designed for spaCy 3.0 and later. It was forked from `spacy-langdetect` to address issues and ensure compatibility with modern spaCy versions. The library enables detection of language at both document and sentence levels. The current version is 0.2.1, with releases typically focused on bug fixes and ongoing spaCy compatibility.","status":"active","version":"0.2.1","language":"en","source_language":"en","source_url":"https://github.com/davebulaval/spacy-language-detection","tags":["spaCy","NLP","language detection","text processing"],"install":[{"cmd":"pip install spacy-language-detection","lang":"bash","label":"Install package"},{"cmd":"python -m spacy download en_core_web_sm","lang":"bash","label":"Download a spaCy model (e.g., English)"}],"dependencies":[{"reason":"Core dependency for NLP pipeline integration; requires spaCy >= 3.0.","package":"spacy","optional":false},{"reason":"Default underlying language detection library; installed automatically.","package":"langdetect","optional":false}],"imports":[{"symbol":"LanguageDetector","correct":"from spacy_language_detection import LanguageDetector"},{"note":"Required for registering the custom pipeline component in spaCy 3.x","symbol":"Language","correct":"from spacy.language import Language"}],"quickstart":{"code":"import spacy\nfrom spacy.language import Language\nfrom spacy_language_detection import LanguageDetector\n\ndef get_lang_detector(nlp, name):\n    return LanguageDetector(seed=42) # Using a seed for reproducibility\n\nnlp_model = spacy.load(\"en_core_web_sm\")\nLanguage.factory(\"language_detector\", func=get_lang_detector)\nnlp_model.add_pipe('language_detector', last=True)\n\ntext = \"This is English text. Er lebt mit seinen Eltern und seiner Schwester in Berlin. Yo me divierto todos los días en el parque.\"\ndoc = nlp_model(text)\n\nprint(f\"Document language: {doc._.language}\")\nfor i, sent in enumerate(doc.sents):\n    print(f\"Sentence {i+1}: {sent} -> {sent._.language}\")","lang":"python","description":"This quickstart demonstrates how to add the `spacy-language-detection` component to a spaCy 3.x pipeline. It registers a custom language detector factory and adds it as the last component in the pipeline. It then processes a multilingual text and prints the detected language for the entire document and each individual sentence. Ensure you have a spaCy model (e.g., `en_core_web_sm`) downloaded before running."},"warnings":[{"fix":"Use `Language.factory('component_name', func=your_factory_function)` to register the component, then `nlp.add_pipe('component_name', ...)` to add it to the pipeline. Refer to the quickstart example.","message":"For spaCy 3.x, adding custom pipeline components requires using `Language.factory` to register a component factory, then `nlp.add_pipe` with the factory name. Direct instantiation like `nlp.add_pipe(LanguageDetector())` (common in spaCy 2.x and older `spacy-langdetect`) will not work.","severity":"breaking","affected_versions":">=0.2.0 (for spacy-language-detection); spaCy >= 3.0"},{"fix":"Initialize `LanguageDetector` with a `seed` parameter, e.g., `LanguageDetector(seed=42)`.","message":"The underlying `langdetect` library (used by default) is non-deterministic without a seed. For reproducible results, pass a `seed` argument to the `LanguageDetector` constructor.","severity":"gotcha","affected_versions":"<0.2.0 (for missing seed arg), all versions (for langdetect non-determinism)"},{"fix":"If token-level detection is required, consider using an older version of `spacy-langdetect` (the predecessor project) or implementing custom token-level logic.","message":"Token-level language detection was removed in version 0.2 of `spacy-language-detection` to simplify the component and focus on Doc and Span level detection.","severity":"breaking","affected_versions":">=0.2.0"},{"fix":"Migrate to `spacy-language-detection` for spaCy 3.x and later compatibility.","message":"This library (`spacy-language-detection`) is a fork of the original `spacy-langdetect` project, created to address compatibility issues with spaCy 3.x and add features like the `seed` argument. The original `spacy-langdetect` is less actively maintained and may not work correctly with newer spaCy versions.","severity":"deprecated","affected_versions":"All versions of `spacy-langdetect` when used with spaCy 3.x"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}