scvi-tools

1.4.2 · active · verified Fri Apr 17

scvi-tools provides a suite of deep learning models for the deep probabilistic analysis of single-cell omics data. It is built on PyTorch and AnnData, offering robust methods for tasks like dimensionality reduction, batch correction, and differential expression. The library is actively developed, releasing frequent minor versions and occasional major updates.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core workflow: loading an AnnData object, preparing it with `scvi.data.setup_anndata`, initializing and training an `SCVI` model, and then extracting the latent representation and normalized expression. The setup_anndata step is crucial for all scvi-tools models.

import scvi
import scanpy as sc
import numpy as np

# For reproducibility
scvi.settings.seed = 0

# Load a dataset
# In a real scenario, you'd load your own AnnData object
# For this example, we'll use a built-in dataset
adata = scvi.data.pbmc_dataset()

# Required step: set up AnnData for scvi-tools models
# This registers the data with scvi-tools, specifying layers, batch keys, etc.
scvi.data.setup_anndata(adata, layer="counts", batch_key="batch")

# Initialize the SCVI model
model = scvi.model.SCVI(adata, n_latent=30, n_layers=2)

# Train the model
# This can take several minutes depending on hardware and dataset size
model.train()

# Get latent representation and store it in adata.obsm
adata.obsm["X_scVI"] = model.get_latent_representation()

# Get normalized expression and store it in adata.layers
adata.layers["scvi_normalized"] = model.get_normalized_expression(transform_batch="_scvi_batch_0")

print(f"Latent representation shape: {adata.obsm['X_scVI'].shape}")
print(f"Normalized expression layer shape: {adata.layers['scvi_normalized'].shape}")

# Further analysis could involve using scanpy on the latent space
# sc.pp.neighbors(adata, use_rep="X_scVI")
# sc.tl.umap(adata)
# sc.pl.umap(adata, color=["cell_type", "batch"])

view raw JSON →