Polyglot

16.7.4 verified Fri May 01 auth: no python maintenance

Polyglot is a natural language pipeline supporting massive multilingual applications. Version 16.7.4 (latest) provides tokenization, language detection, named entity recognition, sentiment analysis, and word embeddings for over 130 languages. Release cadence is irregular; last update was April 2021.

pip install polyglot

Common errors

error ImportError: No module named 'polyglot' ↓

cause Polyglot not installed.

fix

pip install polyglot

error ModuleNotFoundError: No module named 'icu' ↓

cause PyICU not installed or missing system ICU libraries.

fix

Install system libicu-dev and run: pip install pyicu

error LookupError: Resource ... not found. Please use the NLTK Downloader to obtain the resource ↓

cause Polyglot uses NLTK-like resource system; required models not downloaded.

fix

Run: python -m polyglot download

Warnings

breaking Polyglot requires ICU (pyicu) and CLD2 (pycld2) to be installed. Without them, imports fail or produce cryptic errors. ↓

fix Install system packages: libicu-dev, libcld2-dev; then pip install pyicu pycld2.

gotcha Text object's .words, .sentences, .entities are lazy and may raise if downloaded models are missing. Run polyglot.downloader to get required models. ↓

fix Run: python -m polyglot download embeds en

deprecated Sentiment analysis using the Text.sentiment attribute is deprecated in favor of using the explicit polyglot.sentiment module. ↓

fix Use polyglot.sentiment.SentimentAnalyzer instead.

Imports

Text
wrong
```
import polyglot
```
correct
```
from polyglot.text import Text
```
Common mistake: using bare 'import polyglot' fails because main module has no Text class.

Quickstart

Basic usage: detect language of a string.

from polyglot.text import Text

text = Text("Hello, world!")
print(text.language)  # Language detected