Scanpy
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It provides comprehensive functionalities for preprocessing, visualization, clustering, trajectory inference, and differential expression testing. The Python-based implementation efficiently handles datasets of more than one million cells. Currently at version 1.12.1, it maintains a regular release cadence with major, minor, and patch releases.
Warnings
- breaking Scanpy 1.12.0 removed support for Python versions older than 3.12 and now requires anndata>=0.10. Users on older Python environments will need to upgrade. Scanpy 1.10.4 also removed Python 3.9 support.
- gotcha Directly using Scanpy's internal (non-public) APIs is not officially supported and may lead to breaking changes in minor or patch releases. Stick to the documented public API.
- gotcha Reproducibility of `sc.tl.leiden` clustering results, even with `random_state` set, has been reported to be inconsistent between different Scanpy minor versions (e.g., 1.9.3 vs 1.10.4). This may be due to changes in underlying dependencies like `numpy` or `sc.pp.neighbors`.
- deprecated The `scanpy.__version__` attribute is deprecated. Use `scanpy.version()` instead.
- deprecated Some functions within `scanpy.pp` (preprocessing module) are raising `FutureWarning` due to upcoming changes.
Install
-
pip install scanpy -
pip install 'scanpy[leiden]'
Imports
- scanpy
import scanpy as sc
- AnnData
import anndata as ad
Quickstart
import scanpy as sc import matplotlib.pyplot as plt # Set verbosity for Scanpy (0: errors, 1: warnings, 2: info, 3: hints) sc.settings.verbosity = 3 # Load a sample dataset (e.g., pbmc3k from 10x Genomics) # This downloads the data if not present in the datasetdir adata = sc.datasets.pbmc3k() # Basic preprocessing pipeline sc.pp.filter_cells(adata, min_genes=200) sc.pp.filter_genes(adata, min_cells=3) sc.pp.normalize_total(adata, target_sum=1e4) sc.pp.log1p(adata) sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5) adata = adata[:, adata.var.highly_variable] sc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt']) sc.pp.scale(adata, max_value=10) # Dimensionality reduction and clustering sc.pp.pca(adata) sc.pp.neighbors(adata, n_neighbors=10, n_pcs=40) sc.tl.umap(adata) sc.tl.leiden(adata) # Visualization sc.pl.umap(adata, color=['leiden', 'n_genes_by_counts', 'total_counts'], show=False) plt.tight_layout() plt.show()