MyGene.Info Python Client
mygene is an easy-to-use Python wrapper to access MyGene.Info services, which provide simple-to-use REST web services to query/retrieve gene annotation data. It is currently at version 3.2.2 and is actively maintained, with releases tied to updates in the underlying MyGene.info API and its `biothings_client` dependency.
Warnings
- breaking The `mygene` package became a thin wrapper around `biothings_client` since v3.1.0. While the `mygene.MyGeneInfo()` interface is maintained, the underlying MyGene.info v3 API introduced changes to data structures for fields like `refseq`, `accession`, `ensembl`, and `exons`. Also, the default behavior for 'dotfield' notation changed, requiring explicit `dotfield=1` for the old behavior.
- deprecated The `findgenes()` method was deprecated in version 2.0.0. It is kept as an alias for `querymany()` for backward compatibility, but `querymany()` should be used instead for new code.
- gotcha When querying with Ensembl gene IDs that include a version postfix (e.g., 'ENSG00000000003.14'), the `getgene()` method might not return results. The postfix should be removed.
- gotcha In MyGene.info API v2 (and by extension older `mygene` client versions), the 'filter' parameter was used for specifying returned fields. This parameter was replaced by 'fields' in MyGene.info API v2, though 'filter' was kept for back-compatibility in the client. Users should use 'fields' for clarity and future compatibility.
- gotcha While the `mygene` client defaults to the latest MyGene.info API (v3), older API versions (e.g., v2) can still be accessed by explicitly setting `mg.url`. However, data from older API versions are no longer updated.
Install
-
pip install mygene
Imports
- MyGeneInfo
from mygene import MyGeneInfo
Quickstart
import mygene
mg = mygene.MyGeneInfo()
# Get information for a single gene (Entrez ID for CDK2)
gene_info = mg.getgene(1017)
print(f"Gene Symbol: {gene_info.get('symbol')}, Name: {gene_info.get('name')}")
# Query for genes by symbol, returning only specific fields
query_results = mg.query('CDK2', fields='symbol,name,taxid', species='human', size=2)
for hit in query_results.get('hits', []):
print(f"Query Hit: {hit.get('symbol')} ({hit.get('taxid')}) - {hit.get('name')}")