{"id":8777,"library":"wetextprocessing","title":"WeTextProcessing","description":"WeTextProcessing is an active Python library providing production-ready Text Normalization (TN) and Inverse Text Normalization (ITN) capabilities. It primarily supports Chinese, English, and Japanese languages, leveraging Finite State Transducers (FSTs) for efficient processing. The library has a consistent release cadence, with multiple minor updates released throughout 2024 to introduce new features, improvements, and bug fixes.","status":"active","version":"1.0.4.1","language":"en","source_language":"en","source_url":"https://github.com/wenet-e2e/WeTextProcessing","tags":["text-processing","nlp","text-normalization","inverse-text-normalization","chinese","english","japanese","fst"],"install":[{"cmd":"pip install WeTextProcessing","lang":"bash","label":"Standard installation"}],"dependencies":[{"reason":"Core dependency for building and running Finite State Transducers. Requires specific platform support (Linux/macOS).","package":"pynini","optional":false},{"reason":"Used for resource loading, a common Python dependency.","package":"importlib-resources","optional":false}],"imports":[{"note":"For `WeTextProcessing` (the pypi-slug), specific language normalizers are imported from `tn.<lang>.normalizer` or `itn.<lang>.inverse_normalizer`. The `wetext` package uses a different top-level `Normalizer` import.","wrong":"from wetextprocessing import Normalizer","symbol":"Normalizer (Chinese TN)","correct":"from tn.chinese.normalizer import Normalizer"},{"note":"Inverse Normalizer for Chinese is specifically imported from its language-specific path within the `itn` submodule.","wrong":"from wetextprocessing import InverseNormalizer","symbol":"InverseNormalizer (Chinese ITN)","correct":"from itn.chinese.inverse_normalizer import InverseNormalizer"},{"note":"For English Text Normalization, use the specific English normalizer.","wrong":"from tn.chinese.normalizer import Normalizer","symbol":"Normalizer (English TN)","correct":"from tn.english.normalizer import Normalizer as EnNormalizer"}],"quickstart":{"code":"from tn.chinese.normalizer import Normalizer as ZhNormalizer\nfrom itn.chinese.inverse_normalizer import InverseNormalizer\nfrom tn.english.normalizer import Normalizer as EnNormalizer\n\n# Chinese Text Normalization with erhua removal\nzh_tn_model = ZhNormalizer(remove_erhua=True, overwrite_cache=True)\nzh_tn_text = \"你好WeTextProcessing 1.0，全新版本儿，简直666\"\nprint(f\"Chinese TN: {zh_tn_text} => {zh_tn_model.normalize(zh_tn_text)}\")\n\n# Chinese Inverse Text Normalization\nzh_itn_model = InverseNormalizer(enable_0_to_9=False, overwrite_cache=True)\nzh_itn_text = \"你好WeTextProcessing 一点零，全新版本儿，简直六六六\"\nprint(f\"Chinese ITN: {zh_itn_text} => {zh_itn_model.normalize(zh_itn_text)}\")\n\n# English Text Normalization\nen_tn_model = EnNormalizer(overwrite_cache=True)\nen_tn_text = \"Hello WeTextProcessing 1.0, life is short, just use wetext, 666, 9 and 10\"\nprint(f\"English TN: {en_tn_text} => {en_tn_model.normalize(en_tn_text)}\")","lang":"python","description":"This quickstart demonstrates how to perform Chinese Text Normalization (TN), Chinese Inverse Text Normalization (ITN), and English Text Normalization using the `WeTextProcessing` library. It showcases specific imports for each language and the use of `overwrite_cache=True` when modifying normalizer parameters, ensuring rules are rebuilt."},"warnings":[{"fix":"Review English TN usage with version 1.0.0 or later. Test thoroughly to ensure desired normalization behavior. If migrating from older versions, be aware of potential changes in output for English text.","message":"Version 1.0.0 introduced significant changes to English Text Normalization rules, simplifying them compared to NeMo. While resulting in smaller FST sizes and faster build times, existing English TN implementations might require review and adjustment.","severity":"breaking","affected_versions":">=1.0.0"},{"fix":"For Windows users, it is highly recommended to use Windows Subsystem for Linux (WSL) or a Linux virtual machine for development and deployment. Alternatively, ensure you have a compatible `pynini` wheel for your specific Python version and platform before installing `WeTextProcessing`.","message":"The `pynini` dependency, fundamental to WeTextProcessing, is primarily designed for Linux and macOS environments. Direct installation on Windows is not straightforward and often leads to errors. While there's a separate `wetext` package that doesn't depend on Pynini, `WeTextProcessing` requires it.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always pass `overwrite_cache=True` in the constructor of `Normalizer` or `InverseNormalizer` if you are changing its parameters from the default, especially during initial setup or rule modification. For subsequent uses with the same parameters, `overwrite_cache=False` (the default) can be used to load compiled rules faster.","message":"If you modify any parameters when initializing a `Normalizer` or `InverseNormalizer` (e.g., `remove_erhua`, `enable_0_to_9`), you must set `overwrite_cache=True` for the changes to take effect and for the underlying FSTs to be rebuilt. Failing to do so will result in the model reusing cached rules, ignoring your parameter changes.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If your application requires specific logging behavior from WeTextProcessing, configure logging explicitly in your application code rather than relying on the library's default global setup.","message":"Starting from version 1.0.1, the global logging configuration was disabled within the library to prevent it from overwriting logging levels of other programs in the same environment. If your application relies on WeTextProcessing configuring logging globally, this behavior has changed.","severity":"gotcha","affected_versions":">=1.0.1"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Use explicit, language-specific imports. For example, `from tn.chinese.normalizer import Normalizer` for Chinese Text Normalization, or `from tn.english.normalizer import Normalizer as EnNormalizer` for English Text Normalization.","cause":"Attempting to import `Normalizer` or `InverseNormalizer` directly from the top-level `wetextprocessing` package instead of the specific `tn.<lang>` or `itn.<lang>` submodules.","error":"ModuleNotFoundError: No module named 'tn'"},{"fix":"Install WeTextProcessing within a Linux environment (e.g., WSL on Windows) or ensure a pre-compiled `pynini` wheel compatible with your system and Python version is available and installed before installing WeTextProcessing. `conda install -c conda-forge pynini` is often recommended for Conda users.","cause":"The `pynini` dependency requires specific compilation steps and is primarily supported on Linux and macOS. This error typically occurs on Windows or other unsupported platforms during `pip install WeTextProcessing`.","error":"Failed to build wheel for pynini / ERROR: Could not build wheels for pynini which use PEP 517 and cannot be installed directly"},{"fix":"Ensure you are using the correct language-specific Normalizer. For `remove_erhua`, you must use `from tn.chinese.normalizer import Normalizer`. For English, there are different or fewer configurable options. Consult the documentation for available parameters for each language's normalizer.","cause":"You are likely trying to pass a language-specific parameter (like `remove_erhua` for Chinese) to a `Normalizer` instance that does not support it (e.g., an English normalizer, or a generic `wetext` normalizer if that package was used).","error":"TypeError: Normalizer() got an unexpected keyword argument 'remove_erhua'"}]}