NLPAug: Natural Language Processing Augmentation Library
NLPAug is a Python library designed for natural language processing data augmentation. It helps improve deep learning model performance by generating synthetic textual data, making models more robust and less prone to overfitting on small datasets. The library supports various augmentation techniques across character, word, and sentence levels. Currently at version 1.1.11, it maintains an active release cadence with several minor updates throughout the year.
Warnings
- gotcha Many augmenters require external model or data downloads (e.g., NLTK data for SynonymAug, pre-trained word embeddings for WordEmbsAug, or transformer models). These are not automatically installed with the base package.
- breaking The `augment()` method's output format changed from a single string to a list of strings when `n > 1` (default `n=1`) in version 0.0.9. Code expecting a direct string might break.
- deprecated Several augmenter classes and parameters have been deprecated or replaced. For example, `WordNetAug` was replaced by `SynonymAug`, `QwertyAug` by `KeyboardAug`, and the `aug_n` parameter by `top_k`.
- gotcha Performance of transformer-based augmenters (e.g., `ContextualWordEmbsAug`, `ContextualWordEmbsForSentenceAug`) can be slower than expected, especially with older `transformers` library versions or large `n` values.
- gotcha Compatibility issues with underlying libraries, particularly `transformers` and `torch`, have been observed. Specific versions may be required for certain augmenters to function correctly.
Install
-
pip install nlpaug -
pip install nlpaug[transformers] nlpaug[nltk] nlpaug[gensim] nlpaug[audio]
Imports
- KeyboardAug
from nlpaug.augmenter.char import KeyboardAug
- SynonymAug
from nlpaug.augmenter.word import SynonymAug
- ContextualWordEmbsAug
from nlpaug.augmenter.word import ContextualWordEmbsAug
- RandomSentAug
from nlpaug.augmenter.sentence import RandomSentAug
- Sequential
from nlpaug.flow import Sequential
- DownloadUtil
from nlpaug.util.file.download import DownloadUtil
Quickstart
import nlpaug.augmenter.char as nac
text = "The quick brown fox jumps over the lazy dog."
# Initialize a Keyboard Augmenter
# Simulates typos based on keyboard proximity
aug = nac.KeyboardAug(aug_char_p=0.1, aug_word_p=0.1, aug_char_min=1)
# Augment the text
augmented_text = aug.augment(text)
print(f"Original: {text}")
print(f"Augmented: {augmented_text[0]}")