{"id":8445,"library":"proces","title":"Text Preprocessing Library","description":"proces is a Python library (version 0.1.7) designed for efficient text preprocessing. It offers a flexible `TextCleaner` class with various options to clean, normalize, and prepare raw text data for natural language processing (NLP) tasks, including removing HTML, URLs, mentions, hashtags, numbers, punctuation, and handling case conversion and whitespace. As a 0.x.x release, its API might evolve.","status":"active","version":"0.1.7","language":"en","source_language":"en","source_url":"https://github.com/Ailln/proces","tags":["text processing","nlp","preprocessing","text cleaning"],"install":[{"cmd":"pip install proces","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Optional dependency for advanced text processing features like stopwords removal, requires additional data downloads.","package":"nltk","optional":true}],"imports":[{"note":"While 'import proces' works, importing `TextCleaner` directly is the canonical way to access the primary class for text cleaning.","wrong":"import proces; cleaner = proces.TextCleaner()","symbol":"TextCleaner","correct":"from proces import TextCleaner"}],"quickstart":{"code":"from proces import TextCleaner\n\n# Basic cleaning: lowercase, remove punctuation, strip whitespace\ncleaner = TextCleaner(lower=True, remove_punctuation=True, strip_whitespace=True)\ntext_input = \"  Hello, World! This is a Sample Text with HTML <br> tags. And @mentions, #hashtags, links: http://example.com 123  \"\ncleaned_text = cleaner.clean(text_input)\nprint(f\"Original: {text_input}\")\nprint(f\"Cleaned (basic): {cleaned_text}\")\n\n# Advanced cleaning: remove HTML, URLs, mentions, hashtags, numbers, replace with tokens\nadvanced_cleaner = TextCleaner(\n    lower=True,\n    remove_html=True,\n    remove_urls=True,\n    remove_mentions=True,\n    remove_hashtags=True,\n    remove_numbers=True,\n    remove_punctuation=True,\n    strip_whitespace=True,\n    replace_numbers_with='<NUM>',\n    replace_urls_with='<URL>',\n    replace_mentions_with='<MENTION>',\n    replace_hashtags_with='<HASHTAG>'\n)\ncleaned_advanced_text = advanced_cleaner.clean(text_input)\nprint(f\"Cleaned (advanced): {cleaned_advanced_text}\")","lang":"python","description":"Demonstrates basic and advanced usage of the TextCleaner class to preprocess a sample string, applying various cleaning rules and replacement tokens."},"warnings":[{"fix":"Pin your project's dependency to a specific patch version (e.g., 'proces==0.1.7') or be prepared to adapt code when upgrading to new minor versions.","message":"As a library in early development (version 0.x.x), the API of 'proces' is subject to change without strict backward compatibility guarantees. Future minor versions might introduce breaking changes.","severity":"breaking","affected_versions":"0.1.x"},{"fix":"Always use specific imports like `from proces import TextCleaner` to avoid namespace collisions and clarify intent. Double-check your `pip install` command to ensure you're installing 'proces' from PyPI (text preprocessing) and not a similarly named package.","message":"The generic package name 'proces' can easily be confused with Python's built-in 'multiprocessing' module or other process management libraries. Ensure you are importing the correct 'proces' for text preprocessing.","severity":"gotcha","affected_versions":"All versions"},{"fix":"To enable stopwords removal, you must provide your own list of stopwords (e.g., `cleaner = TextCleaner(remove_stopwords=True, stopwords_list=['a', 'the', 'is'])`) or download them via `nltk` and pass them in.","message":"The `TextCleaner` class allows for removing stopwords, but it does not come with a default set of stopwords. If `remove_stopwords=True` is set without providing a `stopwords_list`, it will have no effect.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure the library is installed using `pip install proces`. If using a virtual environment, ensure it's activated.","cause":"The 'proces' library is not installed in your current Python environment.","error":"ModuleNotFoundError: No module named 'proces'"},{"fix":"Instantiate the `TextCleaner` class first: `from proces import TextCleaner; cleaner = TextCleaner(); cleaned_text = cleaner.clean(my_text)`.","cause":"You are attempting to call a method like `clean()` directly on the imported `proces` module, rather than on an instance of the `TextCleaner` class.","error":"AttributeError: module 'proces' has no attribute 'clean'"},{"fix":"Consult the `proces` library's documentation (e.g., GitHub README) for the available `TextCleaner` initialization parameters and ensure you are only using supported arguments.","cause":"You are attempting to use an unsupported configuration option in the `TextCleaner` constructor. The library's functionality is limited to its documented parameters.","error":"TypeError: TextCleaner.__init__() got an unexpected keyword argument 'remove_emoji'"}]}