{"id":4188,"library":"pydeseq2","title":"PyDESeq2","description":"PyDESeq2 is a Python implementation of the DESeq2 method for differential expression analysis (DEA) with bulk RNA-seq data. It enables researchers to perform single-factor and multi-factor designs, Wald tests with multiple testing correction, and optional LFC shrinkage. The library is actively maintained, with version 0.5.4 being the latest stable release, and it is part of the scverse ecosystem, integrating with AnnData for data handling.","status":"active","version":"0.5.4","language":"en","source_language":"en","source_url":"https://github.com/owkin/PyDESeq2","tags":["bioinformatics","rna-seq","differential-expression","deseq2","anndata","data-science"],"install":[{"cmd":"pip install pydeseq2","lang":"bash","label":"Install latest version"},{"cmd":"conda install -c bioconda pydeseq2","lang":"bash","label":"Install via Bioconda"}],"dependencies":[{"reason":"Core data structure for storing count matrices and metadata, part of the scverse ecosystem.","package":"anndata","optional":false},{"reason":"Used for parsing R-style design formulas.","package":"formulaic","optional":false},{"reason":"Fundamental package for numerical computing.","package":"numpy","optional":false},{"reason":"Used for handling dataframes for counts and metadata.","package":"pandas","optional":false},{"reason":"Provides scientific computing tools, including generalized linear models.","package":"scipy","optional":false},{"reason":"Machine learning utilities.","package":"scikit-learn","optional":false},{"reason":"Optional dependency for plotting results.","package":"matplotlib-base","optional":true}],"imports":[{"symbol":"DeseqDataSet","correct":"from pydeseq2.dds import DeseqDataSet"},{"symbol":"DeseqStats","correct":"from pydeseq2.ds import DeseqStats"},{"note":"Useful for getting started with built-in example data.","symbol":"load_example_data","correct":"from pydeseq2.utils import load_example_data"}],"quickstart":{"code":"import pandas as pd\nfrom pydeseq2.dds import DeseqDataSet\nfrom pydeseq2.ds import DeseqStats\n\n# 1. Create dummy count data (genes x samples) and metadata\n# In a real scenario, load with pd.read_csv('counts.csv', index_col=0).T\n# and pd.read_csv('metadata.csv', index_col=0)\ncounts_df = pd.DataFrame(\n    {\n        'sample1': [100, 50, 200, 10, 5, 150],\n        'sample2': [120, 60, 210, 12, 6, 160],\n        'sample3': [10, 5, 20, 100, 50, 15],\n        'sample4': [15, 7, 25, 110, 55, 18]\n    },\n    index=['geneA', 'geneB', 'geneC', 'geneD', 'geneE', 'geneF']\n).T # Transpose to samples x genes\n\nmetadata = pd.DataFrame(\n    {\n        'condition': ['treated', 'treated', 'control', 'control'],\n        'batch': ['batch1', 'batch2', 'batch1', 'batch2']\n    },\n    index=['sample1', 'sample2', 'sample3', 'sample4']\n)\n\n# Ensure counts_df index (samples) matches metadata index (samples)\nassert counts_df.index.equals(metadata.index)\n\n# 2. Filter low-count genes (optional, but good practice)\ngenes_to_keep = counts_df.columns[counts_df.sum(axis=0) >= 10]\ncounts_df = counts_df[genes_to_keep]\n\n# 3. Initialize DeseqDataSet with a formulaic design\ndds = DeseqDataSet(\n    counts=counts_df,\n    metadata=metadata,\n    design='~condition'\n)\n\n# 4. Run the DESeq2 pipeline (normalization, dispersion, LFC estimation)\ndds.deseq2()\n\n# 5. Perform statistical testing\ndeseq_stats = DeseqStats(dds, contrast=['condition', 'treated', 'control'])\ndeseq_stats.wald_test()\n\n# 6. Apply LFC shrinkage (optional)\ndeseq_stats.lfc_shrink(coeff='condition_treated_vs_control')\n\n# 7. Access results\nresults = deseq_stats.results_df\nprint(results.head())\n\n# You can also access attributes directly from dds, e.g., normalized counts\n# print(dds.layers['normed_counts'].head())","lang":"python","description":"This quickstart demonstrates a typical PyDESeq2 workflow for differential expression analysis. It covers loading data, initializing a `DeseqDataSet` with a `formulaic` design string, running the DESeq2 pipeline, performing Wald tests, and applying LFC shrinkage. Results are accessible via the `DeseqStats` object's `results_df` attribute."},"warnings":[{"fix":"Upgrade your Python environment to 3.11 or higher.","message":"Python 3.10 is no longer supported starting from PyDESeq2 v0.5.3.","severity":"breaking","affected_versions":">=0.5.3"},{"fix":"Update your `DeseqDataSet` initialization to use a `formulaic` string (e.g., `design='~condition + batch'`). Ensure your Python version is >=3.11.","message":"The `design` argument of `DeseqDataSet` changed in v0.5.0 to accept `formulaic` string formulas (e.g., `'~condition'`) instead of pandas DataFrames for the design matrix directly. Python 3.9 also dropped support.","severity":"breaking","affected_versions":">=0.5.0"},{"fix":"If accessing 1D variables, check `dds.obs` or `dds.var` instead of `dds.obsm` or `dds.varm`.","message":"In v0.5.2, 1D variables stored in `obsm` and `varm` attributes of the AnnData-like `DeseqDataSet` were moved to `obs` and `var` respectively for better consistency with AnnData standards.","severity":"gotcha","affected_versions":">=0.5.2"},{"fix":"Upgrade PyDESeq2 to v0.5.4 or newer to ensure full compatibility with pandas 3.x.","message":"PyDESeq2 v0.5.4 includes fixes for pandas 3 data type and copy/write bugs. Users on older PyDESeq2 versions combined with pandas 3.x might encounter unexpected behavior.","severity":"gotcha","affected_versions":"<0.5.4"},{"fix":"Consult the PyDESeq2 documentation for specific implementation details and known differences if comparing results with R's DESeq2.","message":"PyDESeq2 is a re-implementation of the R DESeq2 method. While it aims for similar results and features, there might be subtle differences in computed values or available functionalities compared to the original R package.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}