Normality
Normality is a Python micro-package (current version 3.1.0) that provides a small set of text normalization functions for easier re-use. It accepts unicode or utf-8 encoded text and removes various classes of characters, such as diacritics and punctuation. This library is actively maintained and is useful as a preparation step for further text analysis.
Common errors
-
ImportError: No module named 'icu'
cause The `pyicu` dependency, which is mandatory since `normality` version 3.0, is not installed in the environment.fixInstall the `pyicu` package: `pip install pyicu`. -
AttributeError: module 'normality' has no attribute 'normaltest'
cause `normality` is a text processing library, not a statistical library. Users are likely trying to call `scipy.stats.normaltest` mistakenly on the `normality` module.fixIf you intend to perform statistical normality tests, import from `scipy.stats`: `from scipy.stats import normaltest`. If you want text normalization, use functions like `normality.normalize()`.
Warnings
- breaking As of version 3.0, `normality` requires `pyicu` as a mandatory dependency. If `pyicu` cannot be installed, users should revert to `normality < 3.0.0` to avoid `ImportError` or runtime issues.
- gotcha The `normality` library for text normalization is often confused with statistical 'normality tests' (e.g., Shapiro-Wilk, D'Agostino-Pearson, Anderson-Darling) found in libraries like SciPy (`scipy.stats.normaltest`). These are entirely different concepts; `normality` focuses on text string manipulation, not statistical analysis of data distributions.
Install
-
pip install normality
Imports
- normalize
from normality import normalize
- slugify
from normality import slugify
- collapse_spaces
from normality import collapse_spaces
- ascii_text
from normality import ascii_text
- latinize_text
from normality import latinize_text
Quickstart
from normality import normalize, slugify, collapse_spaces
text = normalize('Nie wieder "Grüne Süppchen" kochen!')
print(f"Normalized: {text}")
# Expected: nie wieder grune suppchen kochen
slug = slugify('My first blog post!')
print(f"Slugified: {slug}")
# Expected: my-first-blog-post
spaced_text = 'this \n\n\r\nhas\tlots of \nodd spacing.'
cleaned_text = collapse_spaces(spaced_text)
print(f"Collapsed spaces: {cleaned_text}")
# Expected: this has lots of odd spacing.