Textstat
Textstat is a Python library (version 0.7.13) for calculating a wide array of statistical features from text. It provides utilities for determining readability, complexity, and grade level using various metrics like Flesch Reading Ease, Gunning Fog, and SMOG Index. The library is actively maintained with regular patch releases addressing bug fixes and minor improvements.
Warnings
- breaking The handling of the CMU Pronouncing Dictionary (`cmudict`) for syllable counting in 'en_US' has changed multiple times between versions 0.7.5 and 0.7.11. While version 0.7.13 defaults to `nltk.corpus.cmudict`, prior versions used `python-cmudict`. This can cause issues if `nltk` is not installed or the `cmudict` data is not downloaded.
- gotcha The `1.0.0-alpha` pre-releases introduce a completely new object-oriented API (e.g., `from textstat import Text`). This API is not compatible with the stable 0.7.x series, which uses direct functional imports (e.g., `from textstat import flesch_reading_ease`). Using the alpha API with a stable installation will result in `ImportError` or `AttributeError`.
- gotcha As of version 0.7.13, the `text_standard` function's grade levels are clamped to sensible bounds. This might result in different (potentially less extreme) grade level outputs compared to earlier versions that did not have this clamping.
Install
-
pip install textstat
Imports
- flesch_reading_ease
from textstat import flesch_reading_ease
- text_standard
from textstat import text_standard
Quickstart
import textstat
text = (
"Playing games has always been thought to be important to "
"the development of well-balanced and creative children; "
"however, what part, if any, they should play in the lives "
"of adults has never been researched that deeply. I believe "
"that playing games is every bit as important for adults "
"as for children. Not only is taking time out to play games "
"with our children and other adults valuable to building "
"interpersonal relationships but is also a wonderful way "
"to release built up tension."
)
# Calculate Flesch Reading Ease score
flesch_score = textstat.flesch_reading_ease(text)
print(f"Flesch Reading Ease: {flesch_score}")
# Get the overall readability grade level
grade_level = textstat.text_standard(text)
print(f"Readability Grade Level: {grade_level}")
# Syllable count (may require NLTK cmudict download for en_US)
syllable_count = textstat.syllable_count(text)
print(f"Syllable Count: {syllable_count}")
# If NLTK cmudict is not downloaded for syllable_count:
# import nltk
# nltk.download('cmudict')