{"id":8713,"library":"tfx-bsl","title":"TFX Basic Shared Libraries (tfx-bsl)","description":"tfx-bsl (TFX Basic Shared Libraries) is a foundational Python library within the TensorFlow Extended (TFX) ecosystem. It provides low-level, high-performance data manipulation primitives, including efficient handling of TF.Example and Apache Arrow data structures, and optimized C++ extensions. It serves as a core dependency for many TFX libraries and components like TensorFlow Data Validation (TFDV) and TensorFlow Transform (TFT). The current version is 1.17.1, and it follows the TFX release cadence, typically aligning with TensorFlow releases.","status":"active","version":"1.17.1","language":"en","source_language":"en","source_url":"https://github.com/tensorflow/tfx/tree/master/tfx_bsl","tags":["tensorflow","tfx","data-processing","apache-beam","arrow","machine-learning"],"install":[{"cmd":"pip install tfx-bsl","lang":"bash","label":"Install core tfx-bsl"},{"cmd":"pip install tfx-bsl[tensorflow]","lang":"bash","label":"Install with TensorFlow dependency (recommended for full functionality)"}],"dependencies":[{"reason":"Tightly coupled for data processing (e.g., tf.train.Example, tf.io.TFRecordOptions).","package":"tensorflow","optional":true},{"reason":"Required for distributed data processing using Beam runners, commonly used by TFX components.","package":"apache-beam","optional":true},{"reason":"Core dependency for Apache Arrow data structures and operations.","package":"pyarrow","optional":false}],"imports":[{"note":"Commonly used for converting `tf.train.Example` protobufs to Apache Arrow `RecordBatch`.","symbol":"ExampleToRecordBatchDecoder","correct":"from tfx_bsl.coders.example_coder import ExampleToRecordBatchDecoder"},{"note":"Used for converting Apache Arrow `RecordBatch` back to `tf.train.Example` protobufs.","symbol":"RecordBatchToExamplesEncoder","correct":"from tfx_bsl.coders.example_coder import RecordBatchToExamplesEncoder"}],"quickstart":{"code":"import tensorflow as tf\nimport pyarrow as pa\nfrom tfx_bsl.coders.example_coder import ExampleToRecordBatchDecoder, RecordBatchToExamplesEncoder\n\n# 1. Create a list of serialized tf.train.Example\nexamples_list = [\n    tf.train.Example(features=tf.train.Features(feature={\n        'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[1, 2])),\n        'feature2': tf.train.Feature(float_list=tf.train.FloatList(value=[1.0, 2.0]))\n    })).SerializeToString(),\n    tf.train.Example(features=tf.train.Features(feature={\n        'feature1': tf.train.Feature(int64_list=tf.train.Int64List(value=[3])),\n        'feature2': tf.train.Feature(float_list=tf.train.FloatList(value=[3.0]))\n    })).SerializeToString()\n]\n\n# 2. Decode TF.Examples to an Apache Arrow RecordBatch\ndecoder = ExampleToRecordBatchDecoder()\nrecord_batch = decoder.decode(examples_list)\n\nprint(f\"\\nDecoded RecordBatch schema:\\n{record_batch.schema}\")\nprint(f\"Decoded RecordBatch content:\\n{record_batch}\")\n\n# 3. Encode the Apache Arrow RecordBatch back to TF.Examples\nencoder = RecordBatchToExamplesEncoder(record_batch.schema)\nencoded_examples_iterator = encoder.encode(record_batch)\nencoded_examples_list = list(encoded_examples_iterator)\n\nprint(f\"\\nRe-encoded examples (first one):\\n{tf.train.Example().FromString(encoded_examples_list[0])}\")\n\n# Verify round-trip (simplified check)\nassert len(examples_list) == len(encoded_examples_list)\nprint(\"\\nSuccessfully decoded to Arrow and re-encoded to TF.Example.\")","lang":"python","description":"Demonstrates how to convert `tf.train.Example` protobufs to an Apache Arrow `RecordBatch` and back using `tfx_bsl`'s `ExampleToRecordBatchDecoder` and `RecordBatchToExamplesEncoder`. This is a core data transformation task that `tfx-bsl` facilitates for TFX components."},"warnings":[{"fix":"Always install `tfx` (which pins `tfx-bsl`) or refer to the official TFX compatibility matrix: `https://www.tensorflow.org/tfx/releases#python_package_compatibility`.","message":"Version mismatches between `tfx-bsl`, `tensorflow`, `tfx`, and `apache-beam` are the most common cause of runtime errors. Ensure all TFX-related packages are installed with compatible versions, ideally from the same TFX release train.","severity":"breaking","affected_versions":"All versions"},{"fix":"Ensure your environment uses Python 3.9, 3.10, or 3.11. Check `requires_python` from PyPI for the exact range for your specific `tfx-bsl` version.","message":"Python 3.8 and earlier are not supported by recent `tfx-bsl` versions. Python 4.x is also not yet supported. The package specifically targets Python 3.9-3.11.","severity":"gotcha","affected_versions":"1.x.x onwards"},{"fix":"Prefer `pip install tfx` (which includes `tfx-bsl` and `tensorflow`) or `pip install tfx-bsl[tensorflow]` to ensure core dependencies are aligned.","message":"Using `pip install tfx-bsl` without explicitly installing `tensorflow` or its `[tensorflow]` extra will result in a version of `tfx-bsl` that might not be fully functional or compatible with your existing TensorFlow installation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Use pre-built wheels if possible. If building from source, ensure your C++ toolchain (gcc/g++) is compatible with TensorFlow's requirements and that all necessary libraries are present. Often, this means sticking to official Docker images or tested environments.","message":"TFX-BSL relies on custom C++ extensions. Issues with build environments, incompatible compilers, or missing system dependencies (like specific glibc versions) can lead to `ImportError` or segmentation faults.","severity":"breaking","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure `tfx-bsl` and `tensorflow` are installed from compatible versions. The safest way is to install `tfx` itself, which manages these dependencies, or consult the TFX compatibility matrix (`https://www.tensorflow.org/tfx/releases#python_package_compatibility`).","cause":"Mismatch between the TensorFlow version and the `tfx-bsl` C++ extension. `tfx-bsl` was compiled against a different TensorFlow Application Binary Interface (ABI).","error":"tensorflow.python.framework.errors_impl.NotFoundError: ...tfx_bsl_extension.so: undefined symbol: _ZN10tensorflow7tfrt_gpu..."},{"fix":"Upgrade or downgrade `pyarrow` to a version compatible with your `tfx-bsl` installation. If installing `tfx`, this dependency should be managed automatically. If installing manually, check `tfx-bsl`'s `install_requires` for `pyarrow` constraints.","cause":"Incompatible versions of `tfx-bsl` and `pyarrow`. `pyarrow`'s internal API for `DataType` changed in an incompatible way.","error":"ImportError: cannot import name 'DataType' from 'pyarrow.lib'"},{"fix":"Ensure you instantiate the decoder first (e.g., `decoder = ExampleToRecordBatchDecoder()`) and pass an iterable (list, tuple) of serialized examples: `record_batch = decoder.decode([example1_bytes, example2_bytes])`.","cause":"Attempting to call `decode` directly on a single byte string instead of an instance of the `decoder` class, or passing a single example instead of an iterable of examples.","error":"TypeError: descriptor 'decode' requires a 'tfx_bsl.coders.example_coder.ExampleToRecordBatchDecoder' object but received a 'bytes' object"}]}