TensorFlow I/O GCS Filesystem
TensorFlow I/O GCS Filesystem (tensorflow-io-gcs-filesystem) extends TensorFlow's capabilities to interact seamlessly with Google Cloud Storage (GCS) by enabling `tf.io.gfile` operations on `gs://` paths. It is part of the broader TensorFlow I/O project, providing a modular installation for GCS-specific functionalities. The library is actively maintained with frequent releases, often aligned with new TensorFlow versions or important bug fixes.
Warnings
- breaking TensorFlow I/O (and its sub-packages like GCS Filesystem) are often tightly coupled to specific TensorFlow versions. Mismatches can lead to runtime errors or build issues.
- gotcha This package (`tensorflow-io-gcs-filesystem`) provides *only* GCS filesystem support. It does not include other features found in the full `tensorflow-io` meta-package (e.g., other file formats like Avro, or other filesystems like S3).
- gotcha Proper Google Cloud Storage authentication is crucial. Operations will fail with permission errors if authentication is not correctly configured in the environment where your TensorFlow code runs.
- gotcha The GCS filesystem handler is registered by importing `tensorflow_io_gcs_filesystem`. If this import is omitted, `tf.io.gfile` will not recognize `gs://` paths.
- gotcha The library requires Python versions >=3.7 and <3.13.
Install
-
pip install tensorflow-io-gcs-filesystem
Imports
- GCS Filesystem Registration
import tensorflow_io_gcs_filesystem as _
Quickstart
import tensorflow as tf
import tensorflow_io_gcs_filesystem as _ # Registers the GCS filesystem handler
import os
# Ensure you have GCS authentication set up, e.g.,
# by running `gcloud auth application-default login`
# or setting the GOOGLE_APPLICATION_CREDENTIALS environment variable.
# For testing, you might use a public bucket or ensure permissions are granted.
bucket_name = os.environ.get('GCS_TEST_BUCKET', 'your-gcs-bucket-name') # Replace with your bucket
file_name = "registry_test_file.txt"
file_path = f"gs://{bucket_name}/{file_name}"
print(f"Attempting to interact with GCS path: {file_path}")
try:
# 1. Write a file to GCS
with tf.io.gfile.GFile(file_path, 'w') as f:
f.write("Hello from TensorFlow I/O GCS Filesystem registry check!")
print(f"Successfully wrote to {file_path}")
# 2. Read the file from GCS
with tf.io.gfile.GFile(file_path, 'r') as f:
content = f.read()
print(f"Successfully read from {file_path}. Content: '{content}'")
# 3. List files in the directory (or bucket prefix)
parent_path = f"gs://{bucket_name}"
print(f"Listing contents of {parent_path}:")
for item in tf.io.gfile.listdir(parent_path):
print(f" - {item}")
# 4. Check if the file exists
exists = tf.io.gfile.exists(file_path)
print(f"File {file_path} exists: {exists}")
# 5. Delete the file from GCS
tf.io.gfile.remove(file_path)
print(f"Successfully deleted {file_path}")
# Verify deletion
exists_after_delete = tf.io.gfile.exists(file_path)
print(f"File {file_path} exists after delete: {exists_after_delete}")
except tf.errors.FailedPreconditionError as e:
print(f"Error: Failed to connect to GCS. Please ensure you have authenticated and your bucket '{bucket_name}' exists and is accessible. Error details: {e}")
print("HINT: Try running `gcloud auth application-default login` or setting `GOOGLE_APPLICATION_CREDENTIALS`.")
except tf.errors.NotFoundError as e:
print(f"Error: GCS path not found. Please check bucket name and permissions. Error details: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")