TFX Basic Shared Libraries (tfx-bsl)

1.17.1 · active · verified Thu Apr 16

tfx-bsl (TFX Basic Shared Libraries) is a foundational Python library within the TensorFlow Extended (TFX) ecosystem. It provides low-level, high-performance data manipulation primitives, including efficient handling of TF.Example and Apache Arrow data structures, and optimized C++ extensions. It serves as a core dependency for many TFX libraries and components like TensorFlow Data Validation (TFDV) and TensorFlow Transform (TFT). The current version is 1.17.1, and it follows the TFX release cadence, typically aligning with TensorFlow releases.

Common errors

Warnings

Install

Imports

Quickstart

Demonstrates how to convert `tf.train.Example` protobufs to an Apache Arrow `RecordBatch` and back using `tfx_bsl`'s `ExampleToRecordBatchDecoder` and `RecordBatchToExamplesEncoder`. This is a core data transformation task that `tfx-bsl` facilitates for TFX components.

import tensorflow as tf
import pyarrow as pa
from tfx_bsl.coders.example_coder import ExampleToRecordBatchDecoder, RecordBatchToExamplesEncoder

# 1. Create a list of serialized tf.train.Example
examples_list = [
    tf.train.Example(features=tf.train.Features(feature={
        'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[1, 2])),
        'feature2': tf.train.Feature(float_list=tf.train.FloatList(value=[1.0, 2.0]))
    })).SerializeToString(),
    tf.train.Example(features=tf.train.Features(feature={
        'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[3])),
        'feature2': tf.train.Feature(float_list=tf.train.FloatList(value=[3.0]))
    })).SerializeToString()
]

# 2. Decode TF.Examples to an Apache Arrow RecordBatch
decoder = ExampleToRecordBatchDecoder()
record_batch = decoder.decode(examples_list)

print(f"\nDecoded RecordBatch schema:\n{record_batch.schema}")
print(f"Decoded RecordBatch content:\n{record_batch}")

# 3. Encode the Apache Arrow RecordBatch back to TF.Examples
encoder = RecordBatchToExamplesEncoder(record_batch.schema)
encoded_examples_iterator = encoder.encode(record_batch)
encoded_examples_list = list(encoded_examples_iterator)

print(f"\nRe-encoded examples (first one):\n{tf.train.Example().FromString(encoded_examples_list[0])}")

# Verify round-trip (simplified check)
assert len(examples_list) == len(encoded_examples_list)
print("\nSuccessfully decoded to Arrow and re-encoded to TF.Example.")

view raw JSON →