Annotated Data (anndata)

raw JSON →
0.12.10 verified Fri Apr 24 auth: no python

anndata is a Python package designed for efficient handling of annotated data matrices, both in memory and on disk. Positioned between pandas and xarray, it offers robust features like sparse data support, lazy operations, and a PyTorch interface, making it a cornerstone in single-cell data analysis workflows. The library maintains an active development cycle with frequent patch releases to ensure stability and incorporate new features, building on stable minor and major versions.

pip install anndata
error ModuleNotFoundError: No module named 'anndata'
cause The 'anndata' package is not installed in the Python environment.
fix
Install the package using pip: 'pip install anndata'.
error ImportError: cannot import name 'PathLike' from 'anndata.compat'
cause An outdated version of 'anndata' is installed, lacking the 'PathLike' attribute in 'anndata.compat'.
fix
Upgrade 'anndata' to the latest version using pip: 'pip install --upgrade anndata'.
error ValueError: setting an array element with a sequence
cause Attempting to assign a sparse matrix to 'adata.X', which expects a dense array.
fix
Convert the sparse matrix to a dense array before assignment: 'adata.X = adata.layers['counts'].toarray()'.
error ValueError: Output dtype not compatible with inputs.
cause Subsetting an AnnData object with incompatible data types in 'X' can lead to this error.
fix
Ensure that 'X' has a compatible data type before subsetting, or convert it appropriately.
error TypeError: float() argument must be a string or a number, not 'csr_matrix'
cause Assigning a sparse matrix directly to 'adata.X' without conversion.
fix
Convert the sparse matrix to a dense array: 'adata.X = adata.layers['counts'].toarray()'.
gotcha Subsetting an AnnData object (e.g., `adata_subset = adata[:, list_of_vars]`) typically returns a 'view' of the original object, not a full copy. Modifying elements of this view (except for the main data matrix `.X`) will trigger a copy-on-modify, converting the view into an independent AnnData object. However, direct modifications to `.X` on a view *can* modify the underlying original AnnData object. Always call `.copy()` explicitly on a subset (`adata_subset = adata[...].copy()`) if you intend to make independent changes.
fix Use `adata_subset = adata[...].copy()` to ensure you're working with an independent copy. Be mindful when modifying `.X` directly on a view.
breaking Starting with `anndata 0.11.0`, support for Python 3.9 has been dropped. If you are using an older Python version, you will need to upgrade your Python environment to 3.10 or higher to use `anndata 0.11.0` and later.
fix Upgrade your Python environment to version 3.10 or newer.
breaking The top-level `anndata.read_*` functions (e.g., `anndata.read_h5ad`) have been moved to `anndata.io` module. Direct imports like `from anndata import read_h5ad` will still work but it's recommended to use the new `anndata.io` module for all read/write operations.
fix Update import statements from `import anndata as ad; ad.read_h5ad(...)` to `import anndata.io as aio; aio.read_h5ad(...)` or `from anndata.io import read_h5ad`.
deprecated The `anndata.__version__` attribute is deprecated. For programmatic version checking, use `importlib.metadata.version('anndata')` instead.
fix Replace `anndata.__version__` with `importlib.metadata.version('anndata')`.
gotcha Using `anndata.concat()` with `join='outer'` on sparse datasets can significantly increase file size and memory consumption due to the explicit filling of missing variables with zeros. This can quickly lead to out-of-memory errors for large datasets.
fix Carefully consider the `join` strategy. If memory is an issue, consider alternative strategies for combining data or ensure you have sufficient resources.
gotcha Writing keys with forward slashes in `.h5ad` files (`adata.uns['my/nested/key']`) was re-allowed in `0.12.3` but will be disallowed in future versions. This can lead to corrupted file structures or errors in subsequent reads.
fix Avoid using forward slashes in keys for `.obs`, `.var`, `.uns`, etc. Use `anndata.settings.disallow_forward_slash_in_h5ad = True` to proactively enforce the future behavior and identify problematic keys.
runtime status import time mem disk
3.10-alpine 1.73s 42.6MB 325.0M
3.10-slim 2.09s 42.6MB 313M
3.11-alpine
3.11-slim 3.71s 51.8MB 383M
3.12-alpine
3.12-slim 4.22s 50.7MB 366M
3.13-alpine
3.13-slim 4.14s 51.5MB 364M
3.9-alpine
3.9-slim 1.88s 39.1MB 318M

This quickstart demonstrates how to create an AnnData object from a sparse matrix and annotate it with observation (cell-level) and variable (gene-level) metadata using pandas DataFrames. It then prints a summary of the AnnData object and its annotations.

import anndata as ad
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix

# Create a sparse data matrix
counts = csr_matrix(np.random.poisson(1, size=(10, 5)), dtype=np.float32)

# Create observation (cell) and variable (gene) metadata
obs_data = pd.DataFrame({
    'cell_type': ['T cell', 'B cell', 'T cell', 'NK cell', 'B cell', 'T cell', 'NK cell', 'B cell', 'T cell', 'NK cell'],
    'patient': ['P1', 'P1', 'P2', 'P1', 'P2', 'P1', 'P2', 'P1', 'P2', 'P2']
}, index=[f'Cell_{i}' for i in range(10)])

var_data = pd.DataFrame({
    'gene_name': [f'Gene_{i}' for i in range(5)],
    'chromosome': ['chr1', 'chr2', 'chr1', 'chr3', 'chr2']
}, index=[f'Gene_{i}' for i in range(5)])

# Initialize an AnnData object
adata = ad.AnnData(X=counts, obs=obs_data, var=var_data)

print(adata)
print(adata.obs.head())
print(adata.var.head())
print(adata.X.shape)