py3langid

0.3.0 · active · verified Thu Apr 16

py3langid is an actively maintained fork of the original `langid.py` library, specializing in fast and accurate language identification. It is optimized for Python 3 environments, featuring a modernized codebase and improved execution speeds. The library's current version is 0.3.0, with a release cadence that reflects ongoing enhancements and bug fixes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic language classification using `py3langid.classify()`. It also shows how to use `LanguageIdentifier` for more control, such as enabling probability normalization to get scores between 0 and 1.

import py3langid as langid

text_en = 'This text is in English.'
lang, prob = langid.classify(text_en)
print(f"Text: '{text_en}' -> Language: {lang}, Probability: {prob}")

text_de = 'Dieser Text ist auf Deutsch.'
lang, prob = langid.classify(text_de)
print(f"Text: '{text_de}' -> Language: {lang}, Probability: {prob}")

# Example with probability normalization
from py3langid.langid import LanguageIdentifier, MODEL_FILE
identifier = LanguageIdentifier.from_pickled_model(MODEL_FILE, norm_probs=True)
text_norm = 'This should be enough text.'
lang_norm, prob_norm = identifier.classify(text_norm)
print(f"Text (normalized): '{text_norm}' -> Language: {lang_norm}, Normalized Probability: {prob_norm}")

view raw JSON →