TFX Basic Shared Libraries (tfx-bsl)
tfx-bsl (TFX Basic Shared Libraries) is a foundational Python library within the TensorFlow Extended (TFX) ecosystem. It provides low-level, high-performance data manipulation primitives, including efficient handling of TF.Example and Apache Arrow data structures, and optimized C++ extensions. It serves as a core dependency for many TFX libraries and components like TensorFlow Data Validation (TFDV) and TensorFlow Transform (TFT). The current version is 1.17.1, and it follows the TFX release cadence, typically aligning with TensorFlow releases.
Common errors
-
tensorflow.python.framework.errors_impl.NotFoundError: ...tfx_bsl_extension.so: undefined symbol: _ZN10tensorflow7tfrt_gpu...
cause Mismatch between the TensorFlow version and the `tfx-bsl` C++ extension. `tfx-bsl` was compiled against a different TensorFlow Application Binary Interface (ABI).fixEnsure `tfx-bsl` and `tensorflow` are installed from compatible versions. The safest way is to install `tfx` itself, which manages these dependencies, or consult the TFX compatibility matrix (`https://www.tensorflow.org/tfx/releases#python_package_compatibility`). -
ImportError: cannot import name 'DataType' from 'pyarrow.lib'
cause Incompatible versions of `tfx-bsl` and `pyarrow`. `pyarrow`'s internal API for `DataType` changed in an incompatible way.fixUpgrade or downgrade `pyarrow` to a version compatible with your `tfx-bsl` installation. If installing `tfx`, this dependency should be managed automatically. If installing manually, check `tfx-bsl`'s `install_requires` for `pyarrow` constraints. -
TypeError: descriptor 'decode' requires a 'tfx_bsl.coders.example_coder.ExampleToRecordBatchDecoder' object but received a 'bytes' object
cause Attempting to call `decode` directly on a single byte string instead of an instance of the `decoder` class, or passing a single example instead of an iterable of examples.fixEnsure you instantiate the decoder first (e.g., `decoder = ExampleToRecordBatchDecoder()`) and pass an iterable (list, tuple) of serialized examples: `record_batch = decoder.decode([example1_bytes, example2_bytes])`.
Warnings
- breaking Version mismatches between `tfx-bsl`, `tensorflow`, `tfx`, and `apache-beam` are the most common cause of runtime errors. Ensure all TFX-related packages are installed with compatible versions, ideally from the same TFX release train.
- gotcha Python 3.8 and earlier are not supported by recent `tfx-bsl` versions. Python 4.x is also not yet supported. The package specifically targets Python 3.9-3.11.
- gotcha Using `pip install tfx-bsl` without explicitly installing `tensorflow` or its `[tensorflow]` extra will result in a version of `tfx-bsl` that might not be fully functional or compatible with your existing TensorFlow installation.
- breaking TFX-BSL relies on custom C++ extensions. Issues with build environments, incompatible compilers, or missing system dependencies (like specific glibc versions) can lead to `ImportError` or segmentation faults.
Install
-
pip install tfx-bsl -
pip install tfx-bsl[tensorflow]
Imports
- ExampleToRecordBatchDecoder
from tfx_bsl.coders.example_coder import ExampleToRecordBatchDecoder
- RecordBatchToExamplesEncoder
from tfx_bsl.coders.example_coder import RecordBatchToExamplesEncoder
Quickstart
import tensorflow as tf
import pyarrow as pa
from tfx_bsl.coders.example_coder import ExampleToRecordBatchDecoder, RecordBatchToExamplesEncoder
# 1. Create a list of serialized tf.train.Example
examples_list = [
tf.train.Example(features=tf.train.Features(feature={
'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[1, 2])),
'feature2': tf.train.Feature(float_list=tf.train.FloatList(value=[1.0, 2.0]))
})).SerializeToString(),
tf.train.Example(features=tf.train.Features(feature={
'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[3])),
'feature2': tf.train.Feature(float_list=tf.train.FloatList(value=[3.0]))
})).SerializeToString()
]
# 2. Decode TF.Examples to an Apache Arrow RecordBatch
decoder = ExampleToRecordBatchDecoder()
record_batch = decoder.decode(examples_list)
print(f"\nDecoded RecordBatch schema:\n{record_batch.schema}")
print(f"Decoded RecordBatch content:\n{record_batch}")
# 3. Encode the Apache Arrow RecordBatch back to TF.Examples
encoder = RecordBatchToExamplesEncoder(record_batch.schema)
encoded_examples_iterator = encoder.encode(record_batch)
encoded_examples_list = list(encoded_examples_iterator)
print(f"\nRe-encoded examples (first one):\n{tf.train.Example().FromString(encoded_examples_list[0])}")
# Verify round-trip (simplified check)
assert len(examples_list) == len(encoded_examples_list)
print("\nSuccessfully decoded to Arrow and re-encoded to TF.Example.")