{"id":2803,"library":"tensorflow-metadata","title":"TensorFlow Metadata","description":"TensorFlow Metadata (TFMD) provides standard representations for metadata that are useful when training machine learning models with TensorFlow. This includes formats for describing tabular data schemas (e.g., `tf.Examples`), collections of summary statistics over datasets, and problem statements. It is a foundational library used by other TensorFlow Extended (TFX) components like TensorFlow Data Validation (TFDV) and ML Metadata (MLMD). The library is actively maintained, with version 1.17.3 being the current release.","status":"active","version":"1.17.3","language":"en","source_language":"en","source_url":"https://github.com/tensorflow/metadata","tags":["tensorflow","metadata","mlops","schema","statistics","protobuf","tfx"],"install":[{"cmd":"pip install tensorflow-metadata","lang":"bash","label":"Latest stable version"}],"dependencies":[{"reason":"Crucial for defining and serializing metadata schemas and statistics. Often a source of version conflicts in the TensorFlow ecosystem.","package":"protobuf"},{"reason":"Used for various foundational utilities within the TensorFlow ecosystem.","package":"absl-py"},{"reason":"Provides common protobuf definitions shared across Google APIs and projects.","package":"googleapis-common-protos"}],"imports":[{"note":"For defining data schemas.","symbol":"Schema","correct":"from tf_metadata.proto.v0 import schema_pb2"},{"note":"For working with dataset statistics.","symbol":"Statistics","correct":"from tf_metadata.proto.v0 import statistics_pb2"},{"note":"Used by ML Metadata (MLMD) for artifact and execution tracking, built on TFMD protobufs.","symbol":"MetadataStore","correct":"from ml_metadata.metadata_store import metadata_store_pb2"}],"quickstart":{"code":"from tf_metadata.proto.v0 import schema_pb2\n\n# Create a simple schema definition\nschema = schema_pb2.Schema()\n\n# Add a feature named 'age' of type INT\nfeature_age = schema.feature.add()\nfeature_age.name = \"age\"\nfeature_age.type = schema_pb2.FeatureType.INT\nfeature_age.int_domain.is_categorical = False\nfeature_age.presence.min_fraction = 1.0 # 'age' must always be present\nfeature_age.int_domain.min = 0\nfeature_age.int_domain.max = 120\n\n# Add a feature named 'city' of type BYTES (string), which is categorical\nfeature_city = schema.feature.add()\nfeature_city.name = \"city\"\nfeature_city.type = schema_pb2.FeatureType.BYTES\nfeature_city.string_domain.is_categorical = True\nfeature_city.string_domain.value.extend([\"New York\", \"London\", \"Tokyo\"])\n\nprint(\"Generated Schema (protobuf format):\")\nprint(schema)\n\n# Serialize the schema to bytes\nserialized_schema = schema.SerializeToString()\nprint(f\"\\nSerialized Schema (bytes): {len(serialized_schema)} bytes\")\n\n# Deserialize the schema back from bytes\ndeserialized_schema = schema_pb2.Schema()\ndeserialized_schema.ParseFromString(serialized_schema)\nprint(\"\\nDeserialized Schema:\")\nprint(deserialized_schema)","lang":"python","description":"This quickstart demonstrates how to define a simple data schema using `tensorflow-metadata`'s protobuf definitions. It shows how to add features with different types and constraints, then serializes and deserializes the schema for storage or transfer."},"warnings":[{"fix":"Consult `tensorflow-metadata`'s `setup.py` or `RELEASE.md` for exact `protobuf` version requirements for your Python version (e.g., `protobuf>=4.25.2,<5` for Python 3.11). Consider using a virtual environment and carefully managing dependencies.","message":"Frequent and critical dependency conflicts with the `protobuf` library. `tensorflow-metadata` often pins `protobuf` to specific major/minor versions, which can clash with other libraries in the TensorFlow ecosystem.","severity":"breaking","affected_versions":"All versions, especially when used in complex environments."},{"fix":"Upgrade to Python 3.9 or higher. The library currently supports Python >=3.9,<4.","message":"Support for Python 3.8 was deprecated starting from version 1.15.0.","severity":"deprecated","affected_versions":"1.15.0 and later."},{"fix":"Always use the stable versions available on PyPI (`pip install tensorflow-metadata`) for production or reliable development. Only use nightly builds if you need the absolute latest features and are prepared to handle instability.","message":"Nightly builds of `tensorflow-metadata` (and related TF projects) are explicitly stated to be unstable and prone to breakages, with fixes potentially taking a week or more.","severity":"gotcha","affected_versions":"Nightly builds."},{"fix":"If you rely on statistics for nested features, re-evaluate existing pipelines and logic when upgrading to 1.15.0 or later, as the reported values might change.","message":"Version 1.15.0 introduced a semantic change to how `min/max/avg/tot num-values` are calculated for nested features, now relying on the innermost level.","severity":"breaking","affected_versions":"1.15.0 and later."},{"fix":"Remove any usage of `NaturalLanguageDomain.location_constraint_regex` from your code.","message":"The field `NaturalLanguageDomain.location_constraint_regex` was removed in version 1.15.0. It was previously documented as 'please do not use' and was never fully implemented.","severity":"breaking","affected_versions":"1.15.0 and later."}],"env_vars":null,"last_verified":"2026-04-10T00:00:00.000Z","next_check":"2026-07-09T00:00:00.000Z"}