scvi-tools
scvi-tools provides a suite of deep learning models for the deep probabilistic analysis of single-cell omics data. It is built on PyTorch and AnnData, offering robust methods for tasks like dimensionality reduction, batch correction, and differential expression. The library is actively developed, releasing frequent minor versions and occasional major updates.
Common errors
-
AttributeError: 'AnnData' object has no attribute '_scvi_data_registry'
cause Attempting to initialize an scvi-tools model on an AnnData object that has not been prepared with `scvi.data.setup_anndata`.fixCall `scvi.data.setup_anndata(adata, layer="counts", batch_key="batch")` before initializing your model. -
ImportError: cannot import name 'SCVI' from 'scvi'
cause Trying to import a model class directly from the top-level `scvi` module, a pattern common in pre-1.0.0 versions.fixModel classes are now in submodules. For example, `SCVI` is imported via `from scvi.model import SCVI`. -
RuntimeError: CUDA error: device-side assert triggered
cause This usually indicates an issue with the GPU setup, such as mismatched CUDA versions between PyTorch and your system, or running out of GPU memory.fixVerify your PyTorch and CUDA installation. Use `torch.cuda.is_available()` and `torch.cuda.get_device_name(0)`. If using Conda, reinstall `scvi-tools` with `conda install -c pytorch -c conda-forge -c bioconda scvi-tools`. For 'out of memory', try reducing `batch_size` during training. -
TypeError: 'numpy.ndarray' object is not callable
cause This can occur if you mistakenly try to call an `AnnData` layer or observation field (e.g., `adata.X`) as a function, often after a data transformation.fixEnsure you are accessing AnnData attributes correctly (e.g., `adata.X`, `adata.layers['counts']`, `adata.obs['cell_type']`) and not trying to call them like functions. Verify the data type of the attribute you're accessing.
Warnings
- breaking scvi-tools 1.0.0 introduced a major API overhaul, transitioning from a custom `Vaedata` object to `AnnData` as the primary data structure. This required significant code changes for users migrating from pre-1.0.0 versions.
- gotcha GPU support requires careful installation of `pytorch` with the correct CUDA version. Mismatched CUDA versions between your system, `pytorch` wheel, and potentially `cudatoolkit` in a Conda environment can lead to `RuntimeError: CUDA error` or models running slowly on CPU.
- gotcha Forgetting to call `scvi.data.setup_anndata` before initializing any model is a common mistake for users familiar with older versions or other single-cell libraries. This step is mandatory for all models since 1.0.0.
- deprecated The `transform_batch` argument in `model.get_normalized_expression()` was deprecated and then removed in version 1.4.0. Using it will raise an error.
Install
-
pip install scvi-tools -
pip install 'scvi-tools[gpu]' --extra-index-url https://download.pytorch.org/whl/cu118 -
conda install -c pytorch -c conda-forge -c bioconda scvi-tools
Imports
- scvi
import scvi
- SCVI
from scvi import SCVI
from scvi.model import SCVI
- setup_anndata
scvi.data.setup_anndata(adata, ...)
from scvi.data import setup_anndata
Quickstart
import scvi
import scanpy as sc
import numpy as np
# For reproducibility
scvi.settings.seed = 0
# Load a dataset
# In a real scenario, you'd load your own AnnData object
# For this example, we'll use a built-in dataset
adata = scvi.data.pbmc_dataset()
# Required step: set up AnnData for scvi-tools models
# This registers the data with scvi-tools, specifying layers, batch keys, etc.
scvi.data.setup_anndata(adata, layer="counts", batch_key="batch")
# Initialize the SCVI model
model = scvi.model.SCVI(adata, n_latent=30, n_layers=2)
# Train the model
# This can take several minutes depending on hardware and dataset size
model.train()
# Get latent representation and store it in adata.obsm
adata.obsm["X_scVI"] = model.get_latent_representation()
# Get normalized expression and store it in adata.layers
adata.layers["scvi_normalized"] = model.get_normalized_expression(transform_batch="_scvi_batch_0")
print(f"Latent representation shape: {adata.obsm['X_scVI'].shape}")
print(f"Normalized expression layer shape: {adata.layers['scvi_normalized'].shape}")
# Further analysis could involve using scanpy on the latent space
# sc.pp.neighbors(adata, use_rep="X_scVI")
# sc.tl.umap(adata)
# sc.pl.umap(adata, color=["cell_type", "batch"])