Bangla Unicode Normalizer

0.1.7 · maintenance · verified Fri Apr 17

bnunicodenormalizer (v0.1.7) is a Python library designed for normalizing Bangla Unicode text. It provides tools to clean and standardize Bangla text by addressing inconsistent character representations, digit forms, and other common challenges, making the text suitable for various Natural Language Processing (NLP) tasks. The library saw active development in mid-2023 and is currently in maintenance.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to initialize the Normalizer and use it to process a simple Bangla text string. The default initialization attempts to load necessary mapping files from the installed package directory.

from bnunicodenormalizer import Normalizer

# Initialize the normalizer. 
# By default, it attempts to load 'romanize_map.json' from its package directory.
bn_normalize = Normalizer()

text_to_normalize = "এই টেস্টিং টেক্সট।  ১০০ টাকা ।"
result = bn_normalize(text_to_normalize)

normalized_text = result["normalized_text"]
print(f"Original: {text_to_normalize}")
print(f"Normalized: {normalized_text}")

# The result dictionary might also contain 'detected_lang' 
# if fasttext is enabled and detects it.
# print(f"Detected language: {result.get('detected_lang', 'N/A')}")

view raw JSON →