Clean-Text

0.7.1 ยท active ยท verified Fri Apr 17

Clean-Text (pypi: clean-text) provides functions to preprocess and normalize text, making it suitable for various NLP tasks. It offers features like lowercasing, removing emojis, URLs, digits, punctuation, and normalizing whitespace. It is currently at version 0.7.1 and has an active but infrequent release cadence.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates the basic usage of the `clean` function. By default, it aggressively preprocesses text by lowercasing, removing emojis, URLs, digits, and punctuation. The example also shows how to customize cleaning behavior by disabling some default transformations.

from clean_text import clean

text = "  Hello World! ๐Ÿ‘‹  Check out my site: https://example.com This is a test. 123 ๐Ÿ˜Š "
cleaned_text = clean(text)
print(f"Original: '{text}'")
print(f"Cleaned: '{cleaned_text}'")

# To customize cleaning, for example, keep emojis and punctuation:
text_custom = "  Hello World! ๐Ÿ‘‹  This is a test. :) "
cleaned_custom = clean(text_custom, no_emoji=False, no_punct=False)
print(f"Custom Cleaned: '{cleaned_custom}'")

view raw JSON โ†’