Annotated Data (anndata)

0.12.10 · active · verified Fri Apr 10

anndata is a Python package designed for efficient handling of annotated data matrices, both in memory and on disk. Positioned between pandas and xarray, it offers robust features like sparse data support, lazy operations, and a PyTorch interface, making it a cornerstone in single-cell data analysis workflows. The library maintains an active development cycle with frequent patch releases to ensure stability and incorporate new features, building on stable minor and major versions.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create an AnnData object from a sparse matrix and annotate it with observation (cell-level) and variable (gene-level) metadata using pandas DataFrames. It then prints a summary of the AnnData object and its annotations.

import anndata as ad
import numpy as np
import pandas as pd
from scipy.sparse import csr_matrix

# Create a sparse data matrix
counts = csr_matrix(np.random.poisson(1, size=(10, 5)), dtype=np.float32)

# Create observation (cell) and variable (gene) metadata
obs_data = pd.DataFrame({
    'cell_type': ['T cell', 'B cell', 'T cell', 'NK cell', 'B cell', 'T cell', 'NK cell', 'B cell', 'T cell', 'NK cell'],
    'patient': ['P1', 'P1', 'P2', 'P1', 'P2', 'P1', 'P2', 'P1', 'P2', 'P2']
}, index=[f'Cell_{i}' for i in range(10)])

var_data = pd.DataFrame({
    'gene_name': [f'Gene_{i}' for i in range(5)],
    'chromosome': ['chr1', 'chr2', 'chr1', 'chr3', 'chr2']
}, index=[f'Gene_{i}' for i in range(5)])

# Initialize an AnnData object
adata = ad.AnnData(X=counts, obs=obs_data, var=var_data)

print(adata)
print(adata.obs.head())
print(adata.var.head())
print(adata.X.shape)

view raw JSON →