Annotated Data (anndata)
anndata is a Python package designed for efficient handling of annotated data matrices, both in memory and on disk. Positioned between pandas and xarray, it offers robust features like sparse data support, lazy operations, and a PyTorch interface, making it a cornerstone in single-cell data analysis workflows. The library maintains an active development cycle with frequent patch releases to ensure stability and incorporate new features, building on stable minor and major versions.
Warnings
- gotcha Subsetting an AnnData object (e.g., `adata_subset = adata[:, list_of_vars]`) typically returns a 'view' of the original object, not a full copy. Modifying elements of this view (except for the main data matrix `.X`) will trigger a copy-on-modify, converting the view into an independent AnnData object. However, direct modifications to `.X` on a view *can* modify the underlying original AnnData object. Always call `.copy()` explicitly on a subset (`adata_subset = adata[...].copy()`) if you intend to make independent changes.
- breaking Starting with `anndata 0.11.0`, support for Python 3.9 has been dropped. If you are using an older Python version, you will need to upgrade your Python environment to 3.10 or higher to use `anndata 0.11.0` and later.
- breaking The top-level `anndata.read_*` functions (e.g., `anndata.read_h5ad`) have been moved to `anndata.io` module. Direct imports like `from anndata import read_h5ad` will still work but it's recommended to use the new `anndata.io` module for all read/write operations.
- deprecated The `anndata.__version__` attribute is deprecated. For programmatic version checking, use `importlib.metadata.version('anndata')` instead.
- gotcha Using `anndata.concat()` with `join='outer'` on sparse datasets can significantly increase file size and memory consumption due to the explicit filling of missing variables with zeros. This can quickly lead to out-of-memory errors for large datasets.
- gotcha Writing keys with forward slashes in `.h5ad` files (`adata.uns['my/nested/key']`) was re-allowed in `0.12.3` but will be disallowed in future versions. This can lead to corrupted file structures or errors in subsequent reads.
Install
-
pip install anndata
Imports
- AnnData
import anndata as ad adata = ad.AnnData(...)
Quickstart
import anndata as ad
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix
# Create a sparse data matrix
counts = csr_matrix(np.random.poisson(1, size=(10, 5)), dtype=np.float32)
# Create observation (cell) and variable (gene) metadata
obs_data = pd.DataFrame({
'cell_type': ['T cell', 'B cell', 'T cell', 'NK cell', 'B cell', 'T cell', 'NK cell', 'B cell', 'T cell', 'NK cell'],
'patient': ['P1', 'P1', 'P2', 'P1', 'P2', 'P1', 'P2', 'P1', 'P2', 'P2']
}, index=[f'Cell_{i}' for i in range(10)])
var_data = pd.DataFrame({
'gene_name': [f'Gene_{i}' for i in range(5)],
'chromosome': ['chr1', 'chr2', 'chr1', 'chr3', 'chr2']
}, index=[f'Gene_{i}' for i in range(5)])
# Initialize an AnnData object
adata = ad.AnnData(X=counts, obs=obs_data, var=var_data)
print(adata)
print(adata.obs.head())
print(adata.var.head())
print(adata.X.shape)