{"id":9771,"library":"geoparse","title":"GEOparse","description":"GEOparse is a Python library designed to access, parse, and handle data from the Gene Expression Omnibus (GEO) database. It simplifies the programmatic retrieval of GEO Series (GSE), GEO DataSets (GDS), and GEO Sample (GSM) entries, providing easy access to metadata and expression tables. The current version is 2.0.4, and releases are made on an as-needed basis, typically for bug fixes or feature enhancements.","status":"active","version":"2.0.4","language":"en","source_language":"en","source_url":"https://github.com/guma44/GEOparse","tags":["bioinformatics","genomics","geo","ncbi","data-access","python3"],"install":[{"cmd":"pip install geoparse","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Data manipulation, especially for expression tables.","package":"pandas"},{"reason":"XML/HTML parsing, used for handling GEO SOFT files.","package":"lxml"},{"reason":"HTTP requests for downloading GEO files from NCBI servers.","package":"requests"},{"reason":"Numerical operations, often a dependency of pandas but explicitly listed.","package":"numpy"}],"imports":[{"symbol":"GEOparse","correct":"import GEOparse"},{"note":"Can also be accessed via `GEOparse.get_GEO` after `import GEOparse`.","symbol":"get_GEO","correct":"from GEOparse import get_GEO"}],"quickstart":{"code":"import GEOparse\nimport os\n\n# Download a small GEO Series (GSE1 is very small for quick testing)\nprint(\"Downloading GSE1 data...\")\ngse = GEOparse.get_GEO(geo=\"GSE1\", destdir=\"./\")\n\nprint(f\"\\nSuccessfully parsed GEO Series: {gse.name}\")\nprint(f\"Title: {gse.metadata.get('title', ['N/A'])[0]}\")\nprint(f\"Number of samples (GSMs): {len(gse.gsms)}\")\n\n# Access and print information for the first sample\nif gse.gsms:\n    first_gsm_name = list(gse.gsms.keys())[0]\n    first_gsm = gse.gsms[first_gsm_name]\n    print(f\"\\nFirst Sample (GSM): {first_gsm.name}\")\n    print(f\"Sample Title: {first_gsm.metadata.get('title', ['N/A'])[0]}\")\n    print(f\"Sample Type: {first_gsm.metadata.get('type', ['N/A'])[0]}\")\n    print(f\"Sample Table Head:\\n{first_gsm.table.head(2)}\")\n","lang":"python","description":"This quickstart demonstrates how to download and parse a GEO Series (GSE) entry, access its metadata, and iterate through its associated GEO Samples (GSMs) to view their data tables. It uses 'GSE1' for a minimal example."},"warnings":[{"fix":"Adjust your code to expect a single object from `get_GEO`. Access associated GSMs via the returned object's `.gsms` attribute (e.g., `gse.gsms`).","message":"The return type of `GEOparse.get_GEO()` changed significantly from versions 1.x to 2.x. Previously, it might have returned a tuple (e.g., `(gse_object, gsm_list)`). In 2.x, it consistently returns a single GEOparse object (e.g., `GESeries`, `GDS`, or `GSM`).","severity":"breaking","affected_versions":"<2.0.0"},{"fix":"Consider downloading files to disk using `destdir`, processing data in chunks if possible, or filtering data early. Ensure sufficient disk space and memory before attempting to parse extremely large datasets.","message":"Processing very large GEO datasets can consume significant amounts of RAM and disk space, potentially leading to out-of-memory errors or long download/parsing times.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Implement robust error handling (e.g., `try-except` blocks for `requests.exceptions.ConnectionError`), check your internet connection, verify the GEO accession ID on the NCBI GEO website, and consider implementing retry logic for downloads.","message":"Downloads from the GEO database can occasionally fail due to network connectivity issues, server-side problems at NCBI, or incorrect GEO accession IDs.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Update your code to match the 2.x API. Instead of `gse, gsms = GEOparse.get_GEO(...)` and then `gse.name`, simply use `gse = GEOparse.get_GEO(...)` and then `gse.name`. Access individual samples via `gse.gsms`.","cause":"This error occurs when code written for GEOparse 1.x (which returned a tuple from `get_GEO`) is run with GEOparse 2.x, which returns a single object. You're trying to access an attribute like 'name' directly on what is now the full `GESeries` object, but your code expects a tuple structure.","error":"AttributeError: 'tuple' object has no attribute 'name'"},{"fix":"Check your internet connection and proxy settings. Temporarily disable any firewalls or VPNs that might interfere. Try running the download again. If the issue persists, the GEO server might be experiencing temporary problems; try again later.","cause":"The connection to the NCBI GEO server was abruptly closed during download, often due to network instability, a firewall blocking the connection, or the server closing the connection.","error":"requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))"},{"fix":"Not all metadata fields are present for every entry. Use the `.get()` method for dictionary access with a default value to prevent `KeyError`: `gsm.metadata.get('some_metadata_field_name', 'N/A')`.","cause":"You are trying to access a specific metadata field (e.g., `gsm.metadata['some_field']`) that does not exist for the current GEO Series or Sample.","error":"KeyError: 'some_metadata_field_name'"}]}