TileDB-SOMA

2.3.0 · active · verified Fri Apr 17

TileDB-SOMA is a Python API for efficient storage and retrieval of single-cell biological data, building on the TileDB embedded array database. It implements the SOMA (Single Object for Multi-omic Access) specification, enabling scalable, language-agnostic access to annotated matrices. The library receives regular updates, typically with minor releases every 1-2 months.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a simple SOMA DataFrame, write Pandas data to it with a PyArrow schema, and then read the data back. This covers the fundamental create, write, and read operations for a basic SOMA object.

import tiledbsoma as soma
import os
import pandas as pd
import pyarrow as pa

# Define a path for the SOMA object
soma_path = "./my_soma_df_quickstart"

# Clean up if it exists
if os.path.exists(soma_path):
    soma.delete(soma_path)

# Create a SOMA DataFrame
with soma.DataFrame.create(
    soma_path,
    schema=pa.schema([
        pa.field("gene_id", pa.string()),
        pa.field("feature_val", pa.float32()),
    ]),
    index_column_names=["gene_id"],
) as sdf:
    # Prepare data
    data = pd.DataFrame({
        "gene_id": ["geneA", "geneB", "geneC"],
        "feature_val": [1.1, 2.2, 3.3]
    })
    # Write data
    sdf.write(data)

print(f"SOMA DataFrame created at: {soma_path}")

# Read data back
with soma.DataFrame.open(soma_path) as sdf_read:
    read_df = sdf_read.read().concat().to_pandas()
    print("\nRead DataFrame:")
    print(read_df)

# Clean up
soma.delete(soma_path)
print(f"\nCleaned up {soma_path}")

view raw JSON →