TensorFlow Text

2.20.1 · active · verified Thu Apr 09

TensorFlow Text is a library providing text-related operations, modules, and subgraphs for TensorFlow. It facilitates common text preprocessing tasks required by text-based models and offers features useful for sequence modeling not found in core TensorFlow. The library is actively maintained and typically releases new versions in lockstep with major and minor TensorFlow releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates basic tokenization using the `WhitespaceTokenizer` from TensorFlow Text. It takes a TensorFlow string tensor and outputs a `RaggedTensor` of tokens, illustrating the common workflow for text processing within the TensorFlow graph.

import tensorflow as tf
import tensorflow_text as tf_text

# Create a WhitespaceTokenizer
tokenizer = tf_text.WhitespaceTokenizer()

# Input text as a TensorFlow tensor
text_tensor = tf.constant(["Hello TensorFlow Text!", "This is a great library."])

# Tokenize the text
tokens = tokenizer.tokenize(text_tensor)

# Print the tokens (RaggedTensor output)
print("Original text:", text_tensor.numpy())
print("Tokenized text:", tokens.numpy())

view raw JSON →