pycld2 Language Detection

0.42 · active · verified Sat Apr 11

pycld2 provides Python bindings to Google Chromium's Compact Language Detection library (CLD2). It supports detection for over 165 languages and aims to consolidate the C++ library and its bindings into a single installable Python package. Version 0.42 was released in March 2025, with an irregular release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `pycld2.detect()` for basic language identification and for obtaining detailed language vectors from mixed-language text. The `detect` function returns a tuple containing reliability, bytes found, a list of detected languages with confidence scores, and optionally segment-level language vectors.

import pycld2 as cld2

# Example 1: Basic detection
text_russian = "а неправильный формат идентификатора дн назад"
isReliable, textBytesFound, details = cld2.detect(text_russian)

print(f"Text: '{text_russian}'")
print(f"Is reliable: {isReliable}")
print(f"Detected language: {details[0][0]} ({details[0][1]})")
print(f"Details: {details}")

print('\n---\n')

# Example 2: Detecting multiple languages and getting vectors
text_mixed = """France is the largest country in Western Europe. A accès aux chiens et aux frontaux qui lui ont été il peut consulter. The quick brown fox jumped over the lazy dog."""
isReliable, textBytesFound, details, vectors = cld2.detect(
    text_mixed,
    returnVectors=True
)

print(f"Text: '{text_mixed}'")
print(f"Is reliable: {isReliable}")
print(f"Detected language (summary): {details[0][0]} ({details[0][1]})")
print(f"Segment language vectors: {vectors}")

view raw JSON →