{"id":5050,"library":"scanpy","title":"Scanpy","description":"Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It provides comprehensive functionalities for preprocessing, visualization, clustering, trajectory inference, and differential expression testing. The Python-based implementation efficiently handles datasets of more than one million cells. Currently at version 1.12.1, it maintains a regular release cadence with major, minor, and patch releases.","status":"active","version":"1.12.1","language":"en","source_language":"en","source_url":"https://github.com/scverse/scanpy","tags":["single-cell","bioinformatics","data-analysis","genomics","RNA-seq","anndata"],"install":[{"cmd":"pip install scanpy","lang":"bash","label":"Basic installation"},{"cmd":"pip install 'scanpy[leiden]'","lang":"bash","label":"Recommended for clustering (includes igraph and leidenalg)"}],"dependencies":[{"reason":"Core data structure for annotated data matrices. Scanpy is built around AnnData objects.","package":"anndata","optional":false},{"reason":"Required for some clustering algorithms, specifically when using 'leiden' algorithm with the `scanpy[leiden]` extra.","package":"igraph","optional":true},{"reason":"Required for Leiden clustering algorithm with the `scanpy[leiden]` extra.","package":"leidenalg","optional":true},{"reason":"Enables out-of-core workflows and parallel processing for some functions, especially with large datasets.","package":"dask","optional":true}],"imports":[{"note":"The conventional alias for Scanpy.","symbol":"scanpy","correct":"import scanpy as sc"},{"note":"While AnnData is a core concept, it's typically imported as 'ad' separately, or accessed via Scanpy functions returning AnnData objects.","symbol":"AnnData","correct":"import anndata as ad"}],"quickstart":{"code":"import scanpy as sc\nimport matplotlib.pyplot as plt\n\n# Set verbosity for Scanpy (0: errors, 1: warnings, 2: info, 3: hints)\nsc.settings.verbosity = 3\n\n# Load a sample dataset (e.g., pbmc3k from 10x Genomics)\n# This downloads the data if not present in the datasetdir\nadata = sc.datasets.pbmc3k()\n\n# Basic preprocessing pipeline\nsc.pp.filter_cells(adata, min_genes=200)\nsc.pp.filter_genes(adata, min_cells=3)\nsc.pp.normalize_total(adata, target_sum=1e4)\nsc.pp.log1p(adata)\nsc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)\nadata = adata[:, adata.var.highly_variable]\nsc.pp.regress_out(adata, ['total_counts', 'pct_counts_mt'])\nsc.pp.scale(adata, max_value=10)\n\n# Dimensionality reduction and clustering\nsc.pp.pca(adata)\nsc.pp.neighbors(adata, n_neighbors=10, n_pcs=40)\nsc.tl.umap(adata)\nsc.tl.leiden(adata)\n\n# Visualization\nsc.pl.umap(adata, color=['leiden', 'n_genes_by_counts', 'total_counts'], show=False)\nplt.tight_layout()\nplt.show()","lang":"python","description":"This quickstart demonstrates a typical single-cell RNA sequencing (scRNA-seq) analysis workflow using Scanpy. It covers loading a dataset, essential preprocessing steps (filtering, normalization, log-transformation, highly variable gene selection, regression, scaling), dimensionality reduction (PCA, UMAP), and clustering (Leiden). Finally, it visualizes the UMAP embedding colored by cluster and QC metrics. The `pbmc3k` dataset is used as a readily available example."},"warnings":[{"fix":"Upgrade Python to 3.12 or newer. Ensure `anndata` is updated to at least version 0.10. `pip install 'scanpy>=1.12.0'` will manage `anndata` dependencies appropriately.","message":"Scanpy 1.12.0 removed support for Python versions older than 3.12 and now requires anndata>=0.10. Users on older Python environments will need to upgrade. Scanpy 1.10.4 also removed Python 3.9 support.","severity":"breaking","affected_versions":"1.10.4, 1.12.0 and later"},{"fix":"Refer to the official API documentation for supported functions and classes. If a desired feature is not in the public API, consider opening an issue on the Scanpy GitHub repository.","message":"Directly using Scanpy's internal (non-public) APIs is not officially supported and may lead to breaking changes in minor or patch releases. Stick to the documented public API.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For critical reproducibility, tightly pin all major dependencies (Scanpy, anndata, numpy, scikit-learn, leidenalg, igraph). Be aware that perfect bit-for-bit reproducibility across all environments might be challenging. Document your full environment using `pip freeze` or similar.","message":"Reproducibility of `sc.tl.leiden` clustering results, even with `random_state` set, has been reported to be inconsistent between different Scanpy minor versions (e.g., 1.9.3 vs 1.10.4). This may be due to changes in underlying dependencies like `numpy` or `sc.pp.neighbors`.","severity":"gotcha","affected_versions":"Versions 1.10.0 and later (potentially earlier for specific dependency combinations)"},{"fix":"Replace `sc.__version__` with `sc.version()` to retrieve the library version.","message":"The `scanpy.__version__` attribute is deprecated. Use `scanpy.version()` instead.","severity":"deprecated","affected_versions":"1.11.5 and later"},{"fix":"Pay attention to `FutureWarning` messages in your console output and adapt your code to the suggested new patterns or parameters as indicated in the warnings or release notes.","message":"Some functions within `scanpy.pp` (preprocessing module) are raising `FutureWarning` due to upcoming changes.","severity":"deprecated","affected_versions":"1.11.0.dev11+g0cfd0224 and later (pre-release leading to 1.11.0)"}],"env_vars":null,"last_verified":"2026-04-12T00:00:00.000Z","next_check":"2026-07-11T00:00:00.000Z"}