{"id":9583,"library":"clean-text","title":"Clean-Text","description":"Clean-Text (pypi: clean-text) provides functions to preprocess and normalize text, making it suitable for various NLP tasks. It offers features like lowercasing, removing emojis, URLs, digits, punctuation, and normalizing whitespace. It is currently at version 0.7.1 and has an active but infrequent release cadence.","status":"active","version":"0.7.1","language":"en","source_language":"en","source_url":"https://github.com/luizalabs/clean-text","tags":["text-processing","nlp","text-cleaning","normalization","preprocessing"],"install":[{"cmd":"pip install clean-text","lang":"bash","label":"Install core library"},{"cmd":"pip install clean-text[all]","lang":"bash","label":"Install with all optional dependencies (emoji, unidecode)"}],"dependencies":[{"reason":"Required for emoji-related cleaning features (e.g., remove_emoji). Install with `pip install clean-text[emoji]`.","package":"emoji","optional":true},{"reason":"Required for diacritic removal and advanced text normalization. Install with `pip install clean-text[normalize]` or `pip install clean-text[all]`.","package":"unidecode","optional":true}],"imports":[{"symbol":"clean","correct":"from clean_text import clean"}],"quickstart":{"code":"from clean_text import clean\n\ntext = \"  Hello World! 👋  Check out my site: https://example.com This is a test. 123 😊 \"\ncleaned_text = clean(text)\nprint(f\"Original: '{text}'\")\nprint(f\"Cleaned: '{cleaned_text}'\")\n\n# To customize cleaning, for example, keep emojis and punctuation:\ntext_custom = \"  Hello World! 👋  This is a test. :) \"\ncleaned_custom = clean(text_custom, no_emoji=False, no_punct=False)\nprint(f\"Custom Cleaned: '{cleaned_custom}'\")","lang":"python","description":"Demonstrates the basic usage of the `clean` function. By default, it aggressively preprocesses text by lowercasing, removing emojis, URLs, digits, and punctuation. The example also shows how to customize cleaning behavior by disabling some default transformations."},"warnings":[{"fix":"Carefully review the `clean` function's parameters (e.g., `lower=False`, `no_emoji=False`, `no_digits=False`, `no_punct=False`) and set them according to your specific text cleaning needs.","message":"The default behavior of the `clean` function is highly aggressive, performing lowercasing, removing emojis, URLs, digits, and punctuation. This might lead to unexpected data loss if not explicitly configured.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Install the necessary optional dependencies using `pip install clean-text[emoji]` for emoji features, `pip install clean-text[normalize]` for diacritic normalization, or `pip install clean-text[all]` for both.","message":"Functions like `clean_text.remove_emoji` or direct usage of `clean_text.normalize` (for diacritics) require optional dependencies `emoji` and `unidecode` respectively. Calling these directly without installing their corresponding extras will result in an `ImportError`.","severity":"gotcha","affected_versions":">=0.1.0"},{"fix":"Specify `lang='your_language_code'` (e.g., `lang='pt'` for Portuguese) when calling `clean` if your text is not English and you plan to use language-dependent features like `remove_stop_words=True`.","message":"The `clean` function's `lang` parameter defaults to 'en', which primarily affects stop word removal (if enabled). Using this with non-English text can lead to incorrect behavior if stop words are to be removed.","severity":"gotcha","affected_versions":">=0.6.0"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure `clean-text` is installed in your active Python environment by running `pip install clean-text`. You can verify installation with `pip show clean-text`.","cause":"The `clean-text` package is not installed or is installed in a different Python environment than the one currently active.","error":"ModuleNotFoundError: No module named 'clean_text'"},{"fix":"Install the required extra dependency: `pip install clean-text[emoji]` or `pip install clean-text[all]` if you need all optional features.","cause":"You are attempting to use a feature that relies on the `emoji` package (e.g., directly calling `clean_text.remove_emoji`) without having installed it via the optional extras.","error":"ImportError: Missing optional dependency 'emoji'. Install clean-text[emoji] to use this feature."},{"fix":"Review and customize the `clean` function's parameters. For example, use `clean(text, lower=False, no_punct=False, no_emoji=False)` to retain capitalization, punctuation, and emojis respectively.","cause":"The `clean` function's default parameters are set to perform aggressive text cleaning, including lowercasing and removing various elements.","error":"My text is over-cleaned! All my punctuation/emojis/capitalization is gone!"}]}