cellxgene-census

raw JSON →
1.17.0 verified Mon Apr 27 auth: no python

Python API to facilitate access to and analysis of the CZ CELLxGENE Discover Census, a curated collection of single-cell RNA-seq data. As of v1.17.0, requires Python 3.10-3.12. Released monthly.

pip install cellxgene-census
error tiledb.cc.TileDBError: [TileDB::Query] Error: Cannot read from non-existent fragment
cause Census version has been deprecated and the underlying data was removed from the remote server.
fix
Use a more recent census_version string (e.g., '2023-12-01' or later) or omit the argument to use the default latest.
error KeyError: 'feature_id not found in index'
cause The var_value_filter references a column name that doesn't exist in the var dataframe for that organism/measurement.
fix
Check available columns: census['census_data']['homo_sapiens'].var['RNA'].colnames to find valid column names.
breaking Census versions are date-stamped (e.g., '2023-12-01') and older versions are periodically retired. Always specify a valid, recent version or use 'latest' with caution.
fix Pin to a specific version string or use the 'latest' alias, but be aware that 'latest' may change under you.
deprecated The `cellxgene_census.get_anndata` API underwent changes in v1.7: the 'X_name' parameter was renamed to 'X_layers' and 'obs_value_filter' became 'obs_coords'. Old code will raise TypeError.
fix Update calls: replace `X_name='counts'` with `X_layers=['counts']` and `obs_value_filter` with `obs_coords`.
gotcha The Census uses TileDB arrays; concurrent writes are not allowed. Multiple processes opening the same Census for writing will corrupt data.
fix Always open with `open_soma(mode='r')` unless you explicitly need to modify the Census (very rare).

Open latest stable Census version (year-month-day string) and fetch a subset of data for select genes.

from cellxgene_census import open_soma, get_anndata

with open_soma(census_version='2023-12-01') as census:
    adata = get_anndata(
        census=census,
        organism='Homo sapiens',
        measurement_name='RNA',
        var_value_filter='feature_id in %(var_list)s',
        var_value_filters={'var_list': ['ENSG00000161798', 'ENSG00000188229']}
    )
    print(adata.shape)