lexical-diversity

raw JSON →
0.1.1 verified Mon Apr 27 auth: no python

A Python library for calculating lexical diversity metrics such as TTR, HDD, MTLD, and vocd-D. Version 0.1.1 is current, with an irregular release cadence.

pip install lexical-diversity
error TypeError: expected string or bytes-like object
cause Passing a list of tokens instead of a single string.
fix
Use ' '.join(tokens) to convert the list to a string.
error ValueError: input must be a string
cause Non-string input (e.g., integer or None).
fix
Convert input to string using str() or ensure you pass a proper text string.
deprecated Some functions like 'mtld' may require text preprocessing (punctuation removal). Not automatically handled.
fix Strip punctuation and lowercase text before passing to MTLD.
gotcha Input must be a single string, not a list of tokens. Many users pass a list and get TypeError.
fix Ensure input is a string: ' '.join(tokens) if needed.

Calculate Type-Token Ratio using Flemm's method.

import lexical_diversity as ld
text = "the cat sat on the mat another cat"
ttr = ld.flemm_ttr(text)
print(ttr)