{"id":10295,"library":"tiledbsoma","title":"TileDB-SOMA","description":"TileDB-SOMA is a Python API for efficient storage and retrieval of single-cell biological data, building on the TileDB embedded array database. It implements the SOMA (Single Object for Multi-omic Access) specification, enabling scalable, language-agnostic access to annotated matrices. The library receives regular updates, typically with minor releases every 1-2 months.","status":"active","version":"2.3.0","language":"en","source_language":"en","source_url":"https://github.com/single-cell-data/TileDB-SOMA/tree/main/apis/python","tags":["data-science","bioinformatics","single-cell","hpc","storage","array-database"],"install":[{"cmd":"pip install tiledbsoma","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Core dependency for TileDB array storage backend.","package":"tiledb-py","optional":false},{"reason":"Fundamental package for numerical computation.","package":"numpy","optional":false},{"reason":"Data manipulation and analysis, common for data ingestion/retrieval.","package":"pandas","optional":false},{"reason":"Used for defining SOMA object schemas and data interchange.","package":"pyarrow","optional":false},{"reason":"Primary integration target for single-cell data.","package":"anndata","optional":false}],"imports":[{"note":"The idiomatic import for the main API.","symbol":"soma","correct":"import tiledbsoma as soma"},{"note":"Top-level symbols are directly available from `tiledbsoma`.","wrong":"from tiledbsoma.soma import DataFrame","symbol":"DataFrame","correct":"from tiledbsoma import DataFrame"},{"note":"Top-level symbols are directly available from `tiledbsoma`.","wrong":"from tiledbsoma.collection import Collection","symbol":"Collection","correct":"from tiledbsoma import Collection"}],"quickstart":{"code":"import tiledbsoma as soma\nimport os\nimport pandas as pd\nimport pyarrow as pa\n\n# Define a path for the SOMA object\nsoma_path = \"./my_soma_df_quickstart\"\n\n# Clean up if it exists\nif os.path.exists(soma_path):\n    soma.delete(soma_path)\n\n# Create a SOMA DataFrame\nwith soma.DataFrame.create(\n    soma_path,\n    schema=pa.schema([\n        pa.field(\"gene_id\", pa.string()),\n        pa.field(\"feature_val\", pa.float32()),\n    ]),\n    index_column_names=[\"gene_id\"],\n) as sdf:\n    # Prepare data\n    data = pd.DataFrame({\n        \"gene_id\": [\"geneA\", \"geneB\", \"geneC\"],\n        \"feature_val\": [1.1, 2.2, 3.3]\n    })\n    # Write data\n    sdf.write(data)\n\nprint(f\"SOMA DataFrame created at: {soma_path}\")\n\n# Read data back\nwith soma.DataFrame.open(soma_path) as sdf_read:\n    read_df = sdf_read.read().concat().to_pandas()\n    print(\"\\nRead DataFrame:\")\n    print(read_df)\n\n# Clean up\nsoma.delete(soma_path)\nprint(f\"\\nCleaned up {soma_path}\")\n","lang":"python","description":"This quickstart demonstrates how to create a simple SOMA DataFrame, write Pandas data to it with a PyArrow schema, and then read the data back. This covers the fundamental create, write, and read operations for a basic SOMA object."},"warnings":[{"fix":"Upgrade to TileDB-SOMA 2.1.2 or newer and re-ingest any BPCells data created with versions 2.1.0 or 2.1.1.","message":"Users who ingested BPCells data with versions 2.1.0 or 2.1.1 must re-ingest with version 2.1.2 or later to ensure data correctness.","severity":"breaking","affected_versions":"2.1.0, 2.1.1"},{"fix":"Users on MacOS Intel systems should consider migrating to Apple Silicon (M-series) for continued support and optimal performance. Future versions may drop support entirely.","message":"Support for MacOS Intel is being deprecated.","severity":"deprecated","affected_versions":">=2.2.0"},{"fix":"Before creating a new SOMA object, ensure the path does not exist, or explicitly delete the existing object using `soma.delete(path)` if overwriting is intended.","message":"SOMA object paths must be unique for creation. Attempting to create an object at an existing path will raise an error.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Carefully define your `pyarrow.Schema` when creating SOMA objects. Ensure the `dtype` of your input data (e.g., Pandas DataFrame) is compatible with the PyArrow types in the schema.","message":"Schema definition is critical. Data written to SOMA objects must conform to the defined PyArrow schema. Type mismatches can lead to errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Verify that the path exists and points to a correctly initialized SOMA object (e.g., a `DataFrame`, `Collection`, `SparseNDArray`). Use `soma.open()` or `soma.Collection.open()` for existing objects, not `create()`.","cause":"Attempting to open a path that does not contain a valid TileDB array or a SOMA object of the expected type.","error":"tiledb.libtiledb.TileDBError: [TileDB::StorageManager] Error: Path is not a TileDB array: ..."},{"fix":"Install the `tiledb-py` package: `pip install tiledb-py`. It is typically installed automatically with `tiledbsoma`, but might be missing in some environments.","cause":"The core `tiledb-py` package, which `tiledbsoma` depends on, is not installed or not accessible in the current environment.","error":"ModuleNotFoundError: No module named 'tiledb'"},{"fix":"Ensure that the column names listed in `index_column_names` exactly match field names in the `pyarrow.Schema` used for the DataFrame creation.","cause":"The `index_column_names` provided during `SOMADataFrame.create()` or `write()` do not match any fields in the `pyarrow.Schema`.","error":"ValueError: Column '...' specified as an index column but not found in schema."}]}