TensorStore

0.1.82 · active · verified Thu Apr 09

TensorStore is an open-source C++ and Python library (version 0.1.82) designed for efficiently reading and writing large, multi-dimensional arrays. It provides a uniform API for various array formats (like Zarr, N5, Neuroglancer precomputed) and storage systems (local filesystems, Google Cloud Storage, Amazon S3-compatible object stores, HTTP servers, and in-memory storage). It features an asynchronous API, read/writeback caching, transactions with strong ACID guarantees, and optimistic concurrency for safe multi-process/machine access. The project maintains an active release cadence with frequent updates.

Warnings

Install

Imports

Quickstart

This example demonstrates creating a new N5 dataset on the local filesystem, writing a small NumPy array to a sub-region, and then reading back a larger region. It highlights the asynchronous nature of TensorStore operations, requiring `.result()` or `await` to wait for completion.

import tensorstore as ts
import numpy as np
import os

# Define a temporary directory for local storage
output_dir = 'tmp_tensorstore_dataset'

# Create a new N5 dataset on the local filesystem
dataset = ts.open({
    'driver': 'n5',
    'kvstore': {
        'driver': 'file',
        'path': output_dir,
    },
    'metadata': {
        'compression': {'type': 'gzip'},
        'dataType': 'uint32',
        'dimensions': [1000, 20000],
        'blockSize': [100, 100],
    },
    'create': True,
    'delete_existing': True,
}).result()

# Asynchronously write to a sub-region
write_future = dataset[80:82, 99:102].write(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint32))

# Wait for the write to complete
write_future.result()

# Read back a larger region
read_data = dataset[80:83, 99:102].read().result()

print(f"Data written to {output_dir}")
print(f"Read data:\n{read_data}")

# Clean up the temporary directory
import shutil
shutil.rmtree(output_dir)
print(f"Cleaned up {output_dir}")

view raw JSON →