{"id":5343,"library":"nlpaug","title":"NLPAug: Natural Language Processing Augmentation Library","description":"NLPAug is a Python library designed for natural language processing data augmentation. It helps improve deep learning model performance by generating synthetic textual data, making models more robust and less prone to overfitting on small datasets. The library supports various augmentation techniques across character, word, and sentence levels. Currently at version 1.1.11, it maintains an active release cadence with several minor updates throughout the year.","status":"active","version":"1.1.11","language":"en","source_language":"en","source_url":"https://github.com/makcedward/nlpaug","tags":["NLP","data augmentation","text processing","deep learning","machine learning","text generation"],"install":[{"cmd":"pip install nlpaug","lang":"bash","label":"Basic Install"},{"cmd":"pip install nlpaug[transformers] nlpaug[nltk] nlpaug[gensim] nlpaug[audio]","lang":"bash","label":"Install with all optional dependencies"}],"dependencies":[{"reason":"Core dependency for numerical operations, including basic installation.","package":"numpy","optional":false},{"reason":"Core dependency for network requests, including basic installation.","package":"requests","optional":false},{"reason":"Required for transformer-based augmenters like ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug, and AbstSummAug.","package":"torch","optional":true},{"reason":"Required for transformer-based augmenters like ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug, and AbstSummAug.","package":"transformers","optional":true},{"reason":"Required for some transformer models used in contextual augmenters.","package":"sentencepiece","optional":true},{"reason":"Required for LambadaAug.","package":"simpletransformers","optional":true},{"reason":"Required for AntonymAug and SynonymAug (uses WordNet).","package":"nltk","optional":true},{"reason":"Required for WordEmbsAug (word2vec, GloVe, fastText models).","package":"gensim","optional":true},{"reason":"Required for audio augmenters like PitchAug, SpeedAug, and VtlpAug.","package":"librosa","optional":true},{"reason":"Required for audio augmenters (specifically for visualizations or certain audio processing, if used with librosa).","package":"matplotlib","optional":true}],"imports":[{"symbol":"KeyboardAug","correct":"from nlpaug.augmenter.char import KeyboardAug"},{"symbol":"SynonymAug","correct":"from nlpaug.augmenter.word import SynonymAug"},{"symbol":"ContextualWordEmbsAug","correct":"from nlpaug.augmenter.word import ContextualWordEmbsAug"},{"symbol":"RandomSentAug","correct":"from nlpaug.augmenter.sentence import RandomSentAug"},{"symbol":"Sequential","correct":"from nlpaug.flow import Sequential"},{"symbol":"DownloadUtil","correct":"from nlpaug.util.file.download import DownloadUtil"}],"quickstart":{"code":"import nlpaug.augmenter.char as nac\n\ntext = \"The quick brown fox jumps over the lazy dog.\"\n\n# Initialize a Keyboard Augmenter\n# Simulates typos based on keyboard proximity\naug = nac.KeyboardAug(aug_char_p=0.1, aug_word_p=0.1, aug_char_min=1)\n\n# Augment the text\naugmented_text = aug.augment(text)\n\nprint(f\"Original: {text}\")\nprint(f\"Augmented: {augmented_text[0]}\")","lang":"python","description":"This quickstart demonstrates character-level augmentation using KeyboardAug. It initializes an augmenter to simulate typos by replacing characters with nearby keys on the keyboard. The `augment` method returns a list of augmented texts, even if `n=1` (default)."},"warnings":[{"fix":"Use `nlpaug.util.file.download.DownloadUtil.download_xxx()` for models (e.g., word2vec, GloVe) and `nltk.download('wordnet')`, `nltk.download('omw-1.4')` for NLTK data before initializing the respective augmenters.","message":"Many augmenters require external model or data downloads (e.g., NLTK data for SynonymAug, pre-trained word embeddings for WordEmbsAug, or transformer models). These are not automatically installed with the base package.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always treat the output of `augment()` as a list. If only one augmented output is expected, access it via `augmented_text[0]`.","message":"The `augment()` method's output format changed from a single string to a list of strings when `n > 1` (default `n=1`) in version 0.0.9. Code expecting a direct string might break.","severity":"breaking","affected_versions":"<0.0.9"},{"fix":"Consult the official documentation for the latest API and use the recommended replacement classes and parameters.","message":"Several augmenter classes and parameters have been deprecated or replaced. For example, `WordNetAug` was replaced by `SynonymAug`, `QwertyAug` by `KeyboardAug`, and the `aug_n` parameter by `top_k`.","severity":"deprecated","affected_versions":">=0.0.7 (QwertyAug, StopWordsAug), >=0.0.9 (WordNetAug, aug_n parameter)"},{"fix":"Ensure `nlpaug` and `transformers` are updated to their latest versions for performance optimizations. Consider reducing `n` (number of augmented samples) or using batching for large datasets if performance is critical.","message":"Performance of transformer-based augmenters (e.g., `ContextualWordEmbsAug`, `ContextualWordEmbsForSentenceAug`) can be slower than expected, especially with older `transformers` library versions or large `n` values.","severity":"gotcha","affected_versions":"All 1.x versions (performance improvements rolled out in 1.1.9, 1.1.10)"},{"fix":"Pin `transformers` and `torch` versions as recommended in `nlpaug`'s GitHub README or installation guides. Updating `nlpaug` to the latest version often includes compatibility fixes.","message":"Compatibility issues with underlying libraries, particularly `transformers` and `torch`, have been observed. Specific versions may be required for certain augmenters to function correctly.","severity":"gotcha","affected_versions":"All 1.x versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}