GSEApy - Gene Set Enrichment Analysis
GSEApy is a Python package for performing Gene Set Enrichment Analysis (GSEA) and other related methods like Enrichr, ssGSEA, and GSVA. It provides functionality to analyze gene expression data to identify significantly enriched gene sets and pathways. Currently at version 1.1.13, the library is actively maintained with frequent minor releases addressing bug fixes, compatibility updates, and API improvements, especially for integration with bioinformatics tools and data formats.
Common errors
-
TypeError: 'Gene_sets' object is not subscriptable
cause This error often occurs when `gseapy.dotplot` or `gseapy.load_gmt` is used with Pandas 3.0+ and an older gseapy version.fixUpdate gseapy to the latest version (1.1.13 or newer): `pip install --upgrade gseapy`. -
AttributeError: 'int' object has no attribute 'upper'
cause This typically arises when gene names in your input data are not strings (e.g., integers) or contain mixed types, and gseapy attempts to call string methods like `.upper()` on them. This was partly addressed for Pandas 3.0 compatibility.fixEnsure all gene identifiers in your input DataFrame's index are strings. Convert them explicitly if necessary (e.g., `df.index = df.index.astype(str)`). -
ValueError: The gene_list cannot be empty, please input a list of gene symbols. (or similar error with empty gene lists)
cause Using an empty or invalid gene list as input to `gseapy.enrichr` or other functions, especially in gseapy versions prior to 1.1.4.fixVerify that `gene_list` contains valid gene symbols and is not empty. Update gseapy to 1.1.4 or newer for better error handling. -
GSEA results are inconsistent across different runs or machines when permutation_type='gene_set'.
cause A bug in gseapy v1.1.6 and v1.1.7 caused the gene name order to be inconsistent, leading to unreliable results for gene-set permutations.fixUpgrade gseapy to v1.1.8 or later (`pip install --upgrade gseapy`) and re-run your analysis.
Warnings
- breaking Versions of gseapy prior to 1.1.12 may encounter compatibility issues when used with Pandas 3.0+, leading to `AttributeError` or `TypeError` during data processing and plotting. Key fixes were implemented in v1.1.12 and v1.1.13.
- breaking A bug in gseapy v1.1.6 and v1.1.7 caused incorrect gene name order when calling `gsea()` with `permutation_type='gene_set'`, leading to potentially invalid results.
- deprecated GSEApy dropped support for Python 3.7 starting with version 1.1.10. Attempting to install or run gseapy 1.1.10+ on Python 3.7 will likely lead to dependency resolution errors or runtime issues.
- gotcha Results from `gsea()` or `prerank()` might be inconsistent between different runs or environments due to a potential compilation issue, which was addressed in v1.1.9.
- gotcha Gene symbol matching can be sensitive to casing. Although gseapy attempts to convert lowercase symbols to uppercase implicitly (since v1.1.5), inconsistent casing between your input data and gene set libraries can lead to genes not being found and thus ignored.
Install
-
pip install gseapy
Imports
- gsea
import gseapy as gp gp.gsea(...)
- enrichr
import gseapy as gp gp.enrichr(...)
- prerank
import gseapy as gp gp.prerank(...)
- ssgsea
import gseapy as gp gp.ssgsea(...)
- plot
import gseapy as gp gp.plot.dotplot(...)
Quickstart
import gseapy as gp
import os
# Example gene list for Enrichr
gene_list = ['TP53', 'MYC', 'EGFR', 'BRAF', 'KRAS', 'RB1', 'PTEN', 'PIK3CA']
# Run Enrichr analysis
enr = gp.enrichr(
gene_list=gene_list,
gene_sets=['KEGG_2021_Human', 'GO_Biological_Process_2021'], # Specify gene set libraries
organism='Human', # Default
outdir='enrichr_results_example', # Output directory
cutoff=0.5, # P-value cutoff for results
no_plot=True, # Set to False to generate plots
verbose=False
)
print(f"Enrichr results saved to: {enr.outdir}")
# Optional: Clean up the generated directory
# import shutil
# if os.path.exists('enrichr_results_example'):
# shutil.rmtree('enrichr_results_example')