{"id":2306,"library":"tensorflow-text","title":"TensorFlow Text","description":"TensorFlow Text is a library providing text-related operations, modules, and subgraphs for TensorFlow. It facilitates common text preprocessing tasks required by text-based models and offers features useful for sequence modeling not found in core TensorFlow. The library is actively maintained and typically releases new versions in lockstep with major and minor TensorFlow releases.","status":"active","version":"2.20.1","language":"en","source_language":"en","source_url":"https://github.com/tensorflow/text","tags":["tensorflow","nlp","text processing","tokenization","machine learning"],"install":[{"cmd":"pip install tensorflow-text==2.20.1","lang":"bash","label":"Install specific version (matching TensorFlow)"},{"cmd":"pip install -U tensorflow-text","lang":"bash","label":"Upgrade to latest (ensure TensorFlow is compatible)"}],"dependencies":[{"reason":"TensorFlow Text is built on TensorFlow and requires a tightly coupled version. The minor version of tensorflow-text must match the minor version of tensorflow (e.g., tensorflow-text==2.x.y requires tensorflow==2.x.*).","package":"tensorflow","optional":false},{"reason":"Common dependency for numerical operations in the TensorFlow ecosystem.","package":"numpy","optional":false},{"reason":"Explicitly limited to 0.1.8 in v2.19.0 release notes.","package":"dm-tree","optional":false}],"imports":[{"symbol":"tensorflow_text","correct":"import tensorflow_text as tf_text"}],"quickstart":{"code":"import tensorflow as tf\nimport tensorflow_text as tf_text\n\n# Create a WhitespaceTokenizer\ntokenizer = tf_text.WhitespaceTokenizer()\n\n# Input text as a TensorFlow tensor\ntext_tensor = tf.constant([\"Hello TensorFlow Text!\", \"This is a great library.\"])\n\n# Tokenize the text\ntokens = tokenizer.tokenize(text_tensor)\n\n# Print the tokens (RaggedTensor output)\nprint(\"Original text:\", text_tensor.numpy())\nprint(\"Tokenized text:\", tokens.numpy())","lang":"python","description":"This quickstart demonstrates basic tokenization using the `WhitespaceTokenizer` from TensorFlow Text. It takes a TensorFlow string tensor and outputs a `RaggedTensor` of tokens, illustrating the common workflow for text processing within the TensorFlow graph."},"warnings":[{"fix":"Always install `tensorflow-text` with a minor version matching your `tensorflow` installation (e.g., `pip install tensorflow==2.20.0 tensorflow-text==2.20.1`). If upgrading TensorFlow, ensure `tensorflow-text` is upgraded concurrently to a compatible version.","message":"TensorFlow Text versions are tightly coupled with TensorFlow versions. Installing a `tensorflow-text` version that does not precisely match the minor version of your installed `tensorflow` can lead to import errors or runtime issues.","severity":"breaking","affected_versions":"All versions"},{"fix":"For unsupported platforms, consider building `tensorflow-text` from source, ensuring it's built in the same environment as your `tensorflow` installation, or using a supported platform.","message":"After TensorFlow Text version 2.10, pre-built pip packages are only provided for Linux x86_64 and Intel-based Macs. Users on other platforms (e.g., Windows, Aarch64, Apple Silicon Macs) may need to build from source.","severity":"gotcha","affected_versions":">=2.11.0"},{"fix":"Upgrade to TensorFlow Text 2.20.1 or later to ensure these memory safety fixes are applied.","message":"Older versions of `FastWordpieceTokenizer` and `WhitespaceTokenizer` contained memory safety bugs (e.g., concerning `StringVocab` lifetime or out-of-bounds reads).","severity":"gotcha","affected_versions":"<2.20.1 (FastWordpieceTokenizer), <2.18.0 (WhitespaceTokenizer)"},{"fix":"Upgrade to TensorFlow Text 2.20.0 or later, which updated input sizes to `int32_t` to support larger inputs.","message":"Some text operations in older versions had input size limitations (e.g., using `int16_t`), which could cause issues with large inputs.","severity":"gotcha","affected_versions":"<2.20.0"},{"fix":"Upgrade to TensorFlow Text 2.19.0 or later, which includes fixes to handle these mismatches.","message":"Punctuation definition mismatches between different Unicode versions were observed in earlier releases, potentially leading to inconsistent tokenization.","severity":"gotcha","affected_versions":"<2.19.0"},{"fix":"Remove any usage of `use_unique_shared_resource_name` from your code. Review the migration guides for TensorFlow Text 2.16.1 if this was explicitly used.","message":"The `use_unique_shared_resource_name` option was removed in version 2.16.1. Code relying on this option will break.","severity":"deprecated","affected_versions":">=2.16.1"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}