TensorStore
TensorStore is an open-source C++ and Python library (version 0.1.82) designed for efficiently reading and writing large, multi-dimensional arrays. It provides a uniform API for various array formats (like Zarr, N5, Neuroglancer precomputed) and storage systems (local filesystems, Google Cloud Storage, Amazon S3-compatible object stores, HTTP servers, and in-memory storage). It features an asynchronous API, read/writeback caching, transactions with strong ACID guarantees, and optimistic concurrency for safe multi-process/machine access. The project maintains an active release cadence with frequent updates.
Warnings
- breaking TensorStore requires Python 3.11 or later. Users attempting to install or run it with older Python versions (e.g., 3.10 or below) will encounter installation failures or runtime errors.
- gotcha TensorStore's Python API is heavily asynchronous. Most I/O operations return `tensorstore.Future` objects. Failing to explicitly call `.result()` on these futures (or `await` them in an `async` context) will result in operations not completing or data not being available as expected.
- gotcha When using cloud storage drivers (e.g., Google Cloud Storage, Amazon S3), proper authentication credentials must be configured in your environment. Operations will fail with permission errors if credentials are missing or incorrect.
- gotcha Building TensorStore from source (instead of using pre-built PyPI wheels) requires specific C++ compilers (e.g., GCC 10+, Clang 8+, MSVC 2022+) and the Bazel build system. On Windows, long path names can lead to compilation errors like `fatal error C1083: Cannot open include file`.
Install
-
pip install tensorstore -
conda install tensorstore -c conda-forge
Imports
- tensorstore
import tensorstore as ts
Quickstart
import tensorstore as ts
import numpy as np
import os
# Define a temporary directory for local storage
output_dir = 'tmp_tensorstore_dataset'
# Create a new N5 dataset on the local filesystem
dataset = ts.open({
'driver': 'n5',
'kvstore': {
'driver': 'file',
'path': output_dir,
},
'metadata': {
'compression': {'type': 'gzip'},
'dataType': 'uint32',
'dimensions': [1000, 20000],
'blockSize': [100, 100],
},
'create': True,
'delete_existing': True,
}).result()
# Asynchronously write to a sub-region
write_future = dataset[80:82, 99:102].write(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint32))
# Wait for the write to complete
write_future.result()
# Read back a larger region
read_data = dataset[80:83, 99:102].read().result()
print(f"Data written to {output_dir}")
print(f"Read data:\n{read_data}")
# Clean up the temporary directory
import shutil
shutil.rmtree(output_dir)
print(f"Cleaned up {output_dir}")