{"id":9202,"library":"py3langid","title":"py3langid","description":"py3langid is an actively maintained fork of the original `langid.py` library, specializing in fast and accurate language identification. It is optimized for Python 3 environments, featuring a modernized codebase and improved execution speeds. The library's current version is 0.3.0, with a release cadence that reflects ongoing enhancements and bug fixes.","status":"active","version":"0.3.0","language":"en","source_language":"en","source_url":"https://github.com/adbar/py3langid","tags":["language identification","NLP","natural language processing","text processing"],"install":[{"cmd":"pip install py3langid","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required for numerical operations and efficient feature vector handling. Specific versions are required based on Python version.","package":"numpy","optional":false}],"imports":[{"note":"The primary function for language identification, often aliased as `langid` for compatibility with the original library.","symbol":"classify","correct":"import py3langid as langid\nlangid.classify(text)"},{"note":"Used for advanced scenarios like setting a subset of target languages or enabling probability normalization.","symbol":"LanguageIdentifier","correct":"from py3langid.langid import LanguageIdentifier, MODEL_FILE"}],"quickstart":{"code":"import py3langid as langid\n\ntext_en = 'This text is in English.'\nlang, prob = langid.classify(text_en)\nprint(f\"Text: '{text_en}' -> Language: {lang}, Probability: {prob}\")\n\ntext_de = 'Dieser Text ist auf Deutsch.'\nlang, prob = langid.classify(text_de)\nprint(f\"Text: '{text_de}' -> Language: {lang}, Probability: {prob}\")\n\n# Example with probability normalization\nfrom py3langid.langid import LanguageIdentifier, MODEL_FILE\nidentifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)\ntext_norm = 'This should be enough text.'\nlang_norm, prob_norm = identifier.classify(text_norm)\nprint(f\"Text (normalized): '{text_norm}' -> Language: {lang_norm}, Normalized Probability: {prob_norm}\")","lang":"python","description":"This quickstart demonstrates basic language classification using `py3langid.classify()`. It also shows how to use `LanguageIdentifier` for more control, such as enabling probability normalization to get scores between 0 and 1."},"warnings":[{"fix":"Upgrade Python to 3.8 or newer, or pin `py3langid<0.3.0` in your project dependencies.","message":"Support for Python 3.6 and 3.7 was dropped in py3langid v0.3.0. Users on these older Python versions will need to upgrade their Python interpreter or stick to py3langid v0.2.x.","severity":"breaking","affected_versions":">=0.3.0"},{"fix":"If inconsistent results are observed or specific `uint32` typing is required, pass `datatype='uint32'` to the `classify()` method or `LanguageIdentifier` constructor (e.g., `langid.classify(text, datatype='uint32')`).","message":"The default Numpy data type for feature vectors changed from `uint32` to `uint16` in v0.2.0 for performance optimization. While generally transparent, this could affect applications sensitive to exact data types or those comparing results with older versions.","severity":"breaking","affected_versions":">=0.2.0"},{"fix":"For training new models, refer to the original `langid.py` project's documentation and Python 2 environment setup. `py3langid` primarily focuses on the classification aspect in Python 3.","message":"The original `langid.py` (and by extension `py3langid`) training scripts remain Python 2-only. Users expecting to retrain models with custom data using the provided tools might encounter compatibility issues with Python 3.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure the package is installed using `pip install py3langid` in your active Python environment.","cause":"The `py3langid` package is not installed or the Python environment where it's installed is not active.","error":"ModuleNotFoundError: No module named 'py3langid'"},{"fix":"Review the confidence score returned by `classify()`. For ambiguous cases, consider using `langid.rank(text)` to see the distribution of probabilities across multiple languages. If normalizing probabilities, ensure `norm_probs=True` is explicitly set if using `LanguageIdentifier`. For highly specific use cases, a custom-trained model might be necessary, though `py3langid` training scripts are Python 2-only.","cause":"Language identification models are trained on specific datasets and may perform suboptimally on text with unusual characteristics, code-switching, or languages underrepresented in the training data (e.g., Romanized Indian languages).","error":"Incorrect language detection for specific texts or less common languages."},{"fix":"When initializing `LanguageIdentifier`, pass `norm_probs=True` to enable normalization: `identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)`. The default `langid.classify()` does not normalize probabilities to 0-1 directly.","cause":"By default, `py3langid` returns log-probabilities for performance. These are not normalized to a 0-1 range unless explicitly requested.","error":"Classifier returns large negative numbers or unexpected probability values instead of 0-1 range."}]}