Wordfreq

3.1.1 · maintenance · verified Sun Apr 12

Wordfreq is a Python library providing high-quality estimates of word frequencies in over 40 languages, based on diverse data sources like books, web text, and social media. The library, currently at version 3.1.1, offers both 'small' and 'large' wordlists for different memory and coverage needs. While packaging updates may continue, the underlying word frequency data is a snapshot through approximately 2021 and is unlikely to be updated further due to concerns about generative AI 'polluting' language usage data. This makes the project primarily in a maintenance mode for its data.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use the `word_frequency` and `zipf_frequency` functions to retrieve word frequencies in different languages and scales. `word_frequency` returns a decimal between 0 and 1, while `zipf_frequency` returns a value on a human-friendly logarithmic scale.

from wordfreq import word_frequency, zipf_frequency

# Get the raw frequency (between 0 and 1)
freq_en = word_frequency('the', 'en')
print(f"Frequency of 'the' in English: {freq_en}")

freq_fr_cafe = word_frequency('café', 'fr')
print(f"Frequency of 'café' in French: {freq_fr_cafe}")

# Get the Zipf frequency (logarithmic scale, base-10 logarithm of occurrences per billion words)
zipf_en = zipf_frequency('computer', 'en')
print(f"Zipf frequency of 'computer' in English: {zipf_en}")

zipf_nonexistent = zipf_frequency('nonexistentword123', 'en')
print(f"Zipf frequency of 'nonexistentword123' in English: {zipf_nonexistent}")

# Example with a different wordlist (default is 'best', 'large' or 'small' can be specified)
zipf_large = zipf_frequency('quantum', 'en', wordlist='large')
print(f"Zipf frequency of 'quantum' (large list) in English: {zipf_large}")

view raw JSON →