TensorFlow Datasets (Nightly)
tensorflow/datasets is a library of datasets ready to use with TensorFlow. The `tfds-nightly` package provides daily releases, offering the latest features and bug fixes, often before they are available in the stable `tensorflow-datasets` release. It provides a vast collection of datasets for machine learning pipelines, supporting various frameworks beyond TensorFlow, including JAX and PyTorch.
Common errors
-
ModuleNotFoundError: No module named 'tensorflow_datasets'
cause The `tensorflow-datasets` or `tfds-nightly` package is not installed in the current Python environment, or the environment is not correctly activated.fixInstall the package: `pip install tfds-nightly` (or `pip install tensorflow-datasets` for stable). If using a virtual environment, ensure it is activated. -
ValueError: None values not supported.
cause This error can occur when TensorFlow operations encounter `None` values in tensors where they are not expected, particularly after changes in `tfds`'s handling of missing data or if input data contains unexpected `None`s.fixInspect your dataset and data processing pipeline for `None` values. For Hugging Face datasets, be aware of the v4.9.3 change in `None` handling. Filter out `None`s, provide explicit default values, or use `tfds.features.Optional` where appropriate. -
NonMatchingChecksumError: Checksum mismatch for downloaded file...
cause The downloaded file (or a file on the local disk) does not match the expected checksum, indicating a potential corruption, an update to the source data, or a local file system issue.fixDelete the corrupted file from the `downloads` folder and try again. If the upstream data has genuinely changed, the dataset builder needs to be updated. For custom datasets, use `tfds build --register_checksums` to update the checksum. -
TypeError: Unknown resource path: : MultiplexedPath
cause This is an error that can occur when `tfds build` is run, potentially related to file system access or an internal path resolution issue within the library.fixEnsure your environment is clean and all dependencies are up-to-date (`pip install --upgrade tfds-nightly`). If the problem persists, consult the TensorFlow Datasets GitHub issues for specific workarounds or related bugs.
Warnings
- breaking Nightly builds often include API changes and experimental features that may be unstable or subject to further modification before a stable release. Code written against a `tfds-nightly` version might break with subsequent nightly updates.
- breaking The handling of `None` values when processing Hugging Face datasets (e.g., via `HuggingfaceDatasetBuilder`) changed from defaulting to `0`/`0.0` for int/float features to using NumPy's `-inf`. This can silently alter data or cause downstream errors if your code expected the old default behavior for missing values.
- gotcha Using `tfds build` for Beam-based datasets requires `apache-beam` to be installed, but `tfds-nightly` does not always automatically install it as a direct dependency. There have also been specific `apache-beam` version compatibility pins (`<2.65.0` in v4.9.9) that might cause issues.
- gotcha There was a bug where the `resource` module, which is not available on Windows, caused a `ModuleNotFoundError` when importing `tensorflow_datasets`. While fixed in later versions, similar platform-specific dependency issues can arise in nightly builds.
Install
-
pip install tfds-nightly -
pip install tfds-nightly tensorflow matplotlib
Imports
- tfds
import tensorflow_datasets as tfds
Quickstart
import tensorflow_datasets as tfds
import os
# Set TFDS data directory (optional, but good practice for caching)
os.environ['TFDS_DATA_DIR'] = '/tmp/tfds_data'
# Load a dataset (e.g., MNIST)
ds, info = tfds.load(
'mnist',
split='train',
shuffle_files=True,
as_supervised=True, # Returns (image, label) tuples
with_info=True
)
print(f"Dataset info: {info.description}")
print(f"Number of training examples: {info.splits['train'].num_examples}")
# Iterate over a few examples
for image, label in ds.take(1):
print(f"Image shape: {image.shape}, Label: {label}")