Scanpy

1.12.1 · active · verified Sun Apr 12

Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It provides comprehensive functionalities for preprocessing, visualization, clustering, trajectory inference, and differential expression testing. The Python-based implementation efficiently handles datasets of more than one million cells. Currently at version 1.12.1, it maintains a regular release cadence with major, minor, and patch releases.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates a typical single-cell RNA sequencing (scRNA-seq) analysis workflow using Scanpy. It covers loading a dataset, essential preprocessing steps (filtering, normalization, log-transformation, highly variable gene selection, regression, scaling), dimensionality reduction (PCA, UMAP), and clustering (Leiden). Finally, it visualizes the UMAP embedding colored by cluster and QC metrics. The `pbmc3k` dataset is used as a readily available example.

import scanpy as sc
import matplotlib.pyplot as plt

# Set verbosity for Scanpy (0: errors, 1: warnings, 2: info, 3: hints)
sc.settings.verbosity = 3

# Load a sample dataset (e.g., pbmc3k from 10x Genomics)
# This downloads the data if not present in the datasetdir
adata = sc.datasets.pbmc3k()

# Basic preprocessing pipeline
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)
adata = adata[:, adata.var.highly_variable]
sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])
sc.pp.scale(adata, max_value=10)

# Dimensionality reduction and clustering
sc.pp.pca(adata)
sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)
sc.tl.umap(adata)
sc.tl.leiden(adata)

# Visualization
sc.pl.umap(adata, color=['leiden', 'n_genes_by_counts', 'total_counts'], show=False)
plt.tight_layout()
plt.show()

view raw JSON →