GEOparse

2.0.4 · active · verified Fri Apr 17

GEOparse is a Python library designed to access, parse, and handle data from the Gene Expression Omnibus (GEO) database. It simplifies the programmatic retrieval of GEO Series (GSE), GEO DataSets (GDS), and GEO Sample (GSM) entries, providing easy access to metadata and expression tables. The current version is 2.0.4, and releases are made on an as-needed basis, typically for bug fixes or feature enhancements.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to download and parse a GEO Series (GSE) entry, access its metadata, and iterate through its associated GEO Samples (GSMs) to view their data tables. It uses 'GSE1' for a minimal example.

import GEOparse
import os

# Download a small GEO Series (GSE1 is very small for quick testing)
print("Downloading GSE1 data...")
gse = GEOparse.get_GEO(geo="GSE1", destdir="./")

print(f"\nSuccessfully parsed GEO Series: {gse.name}")
print(f"Title: {gse.metadata.get('title', ['N/A'])[0]}")
print(f"Number of samples (GSMs): {len(gse.gsms)}")

# Access and print information for the first sample
if gse.gsms:
    first_gsm_name = list(gse.gsms.keys())[0]
    first_gsm = gse.gsms[first_gsm_name]
    print(f"\nFirst Sample (GSM): {first_gsm.name}")
    print(f"Sample Title: {first_gsm.metadata.get('title', ['N/A'])[0]}")
    print(f"Sample Type: {first_gsm.metadata.get('type', ['N/A'])[0]}")
    print(f"Sample Table Head:\n{first_gsm.table.head(2)}")

view raw JSON →