TileDB-SOMA
TileDB-SOMA is a Python API for efficient storage and retrieval of single-cell biological data, building on the TileDB embedded array database. It implements the SOMA (Single Object for Multi-omic Access) specification, enabling scalable, language-agnostic access to annotated matrices. The library receives regular updates, typically with minor releases every 1-2 months.
Common errors
-
tiledb.libtiledb.TileDBError: [TileDB::StorageManager] Error: Path is not a TileDB array: ...
cause Attempting to open a path that does not contain a valid TileDB array or a SOMA object of the expected type.fixVerify that the path exists and points to a correctly initialized SOMA object (e.g., a `DataFrame`, `Collection`, `SparseNDArray`). Use `soma.open()` or `soma.Collection.open()` for existing objects, not `create()`. -
ModuleNotFoundError: No module named 'tiledb'
cause The core `tiledb-py` package, which `tiledbsoma` depends on, is not installed or not accessible in the current environment.fixInstall the `tiledb-py` package: `pip install tiledb-py`. It is typically installed automatically with `tiledbsoma`, but might be missing in some environments. -
ValueError: Column '...' specified as an index column but not found in schema.
cause The `index_column_names` provided during `SOMADataFrame.create()` or `write()` do not match any fields in the `pyarrow.Schema`.fixEnsure that the column names listed in `index_column_names` exactly match field names in the `pyarrow.Schema` used for the DataFrame creation.
Warnings
- breaking Users who ingested BPCells data with versions 2.1.0 or 2.1.1 must re-ingest with version 2.1.2 or later to ensure data correctness.
- deprecated Support for MacOS Intel is being deprecated.
- gotcha SOMA object paths must be unique for creation. Attempting to create an object at an existing path will raise an error.
- gotcha Schema definition is critical. Data written to SOMA objects must conform to the defined PyArrow schema. Type mismatches can lead to errors.
Install
-
pip install tiledbsoma
Imports
- soma
import tiledbsoma as soma
- DataFrame
from tiledbsoma.soma import DataFrame
from tiledbsoma import DataFrame
- Collection
from tiledbsoma.collection import Collection
from tiledbsoma import Collection
Quickstart
import tiledbsoma as soma
import os
import pandas as pd
import pyarrow as pa
# Define a path for the SOMA object
soma_path = "./my_soma_df_quickstart"
# Clean up if it exists
if os.path.exists(soma_path):
soma.delete(soma_path)
# Create a SOMA DataFrame
with soma.DataFrame.create(
soma_path,
schema=pa.schema([
pa.field("gene_id", pa.string()),
pa.field("feature_val", pa.float32()),
]),
index_column_names=["gene_id"],
) as sdf:
# Prepare data
data = pd.DataFrame({
"gene_id": ["geneA", "geneB", "geneC"],
"feature_val": [1.1, 2.2, 3.3]
})
# Write data
sdf.write(data)
print(f"SOMA DataFrame created at: {soma_path}")
# Read data back
with soma.DataFrame.open(soma_path) as sdf_read:
read_df = sdf_read.read().concat().to_pandas()
print("\nRead DataFrame:")
print(read_df)
# Clean up
soma.delete(soma_path)
print(f"\nCleaned up {soma_path}")