PubChemPy
PubChemPy is a simple Python wrapper around the PubChem PUG REST API, providing an intuitive interface to query chemical information from PubChem. It allows programmatic access to compounds, substances, assays, and their properties. The current version is 1.0.5, and releases are infrequent, primarily addressing bug fixes and minor enhancements.
Warnings
- gotcha PubChem imposes rate limits (e.g., 10 requests per second from a single IP address) on its PUG REST API. PubChemPy does not inherently manage these limits, so users must implement delays or batch queries (where applicable, like `as_dataframe=True` for `get_compounds`) to avoid HTTP 429 (Too Many Requests) errors.
- gotcha API calls might return empty lists or `None` if no data is found, or raise `pubchempy.PubChemPyError` for issues like invalid CIDs or API errors. Robust error handling is crucial.
- gotcha The behavior of `Compound.fingerprint` was corrected in v1.0.4 to align with the CACTVS fingerprint specification. Users relying on outputs from older versions might observe different fingerprint values.
- gotcha Proxy configuration was a known issue in older versions (addressed in v1.0.4). While PubChemPy leverages `requests` for HTTP, direct proxy configuration might still be needed via environment variables (`http_proxy`, `https_proxy`) or by passing a `proxies` dictionary to `requests` if direct `pubchempy` support is lacking for a specific function.
Install
-
pip install PubChemPy
Imports
- pubchempy
import pubchempy as pcp
- Compound
from pubchempy import Compound
- Substance
from pubchempy import Substance
- PubChemPyError
from pubchempy import PubChemPyError
Quickstart
import pubchempy as pcp
try:
# Search for compounds by name
compounds = pcp.get_compounds('aspirin', 'name')
if compounds:
aspirin = compounds[0]
print(f"Compound Name: {aspirin.iupac_name}")
print(f"CID: {aspirin.cid}")
print(f"Molecular Formula: {aspirin.molecular_formula}")
print(f"Canonical SMILES: {aspirin.canonical_smiles}")
# Retrieve specific properties for a compound
properties = pcp.get_properties(
['molecular_weight', 'xlogp'], # List of properties to fetch
aspirin.cid,
'cid' # Namespace: 'cid' for compound IDs
)
if properties:
print(f"Molecular Weight: {properties[0]['MolecularWeight']}")
print(f"XLogP: {properties[0]['XLogP']}")
else:
print("Could not retrieve additional properties.")
else:
print("Aspirin not found in PubChem.")
except pcp.PubChemPyError as e:
print(f"A PubChemPy API error occurred: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")