PubChemPy
PubChemPy is a simple Python wrapper around the PubChem PUG REST API, providing an intuitive interface to query chemical information from PubChem. It allows programmatic access to compounds, substances, assays, and their properties. The current version is 1.0.5, and releases are infrequent, primarily addressing bug fixes and minor enhancements.
Common errors
-
pubchempy.NotFoundError: The input record was not found (e.g. invalid CID)
cause This error occurs when attempting to retrieve a Compound object using `Compound.from_cid()` with a PubChem Compound Identifier (CID) that does not exist in the database.fixEnsure the CID is valid. If searching by name or SMILES, use `get_compounds()` or `get_substances()` which return an empty list if no results are found, rather than raising an error, and then check if the list is empty before accessing elements. -
pubchempy.TimeoutError: The request timed out, from server overload or too broad a request.
cause This error indicates that the request to the PubChem PUG REST API took too long to complete, often due to server overload, a very broad search query, or requesting a large number of records at once.fixBreak down large requests into smaller, paginated queries using `listkey_count` and `listkey_start` parameters, or retrieve lists of CIDs/SIDs first and then fetch full records individually or in small batches. Avoid requesting all properties for many compounds at once. -
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get issuer certificate
cause This SSL certificate error typically occurs in environments with strict network policies, corporate proxies, or outdated certificate stores, preventing `pubchempy` (which uses `urllib`) from verifying the SSL certificate of the PubChem server.fixIf behind a proxy, configure `urllib` to use it. A common workaround, though less secure, is to disable SSL certificate verification for the `urllib` module if the issue persists and you trust the connection, by setting a default SSL context without verification (e.g., `import ssl; ssl._create_default_https_context = ssl._create_unverified_context`). -
pubchempy.BadRequestError: Request is improperly formed (syntax error in the URL, POST body, etc.)
cause This error signifies that the request sent to the PubChem API has a syntax error or is malformed, such as an incorrect identifier type for a namespace, or invalid parameters in the query.fixCarefully review the parameters passed to `pubchempy` functions, especially `namespace`, `searchtype`, and any keyword arguments, to ensure they conform to the PubChem API specifications and are correctly formatted. For example, ensure the `namespace` matches the `identifier` type (e.g., 'name' for a chemical name). -
IndexError: list index out of range (when trying to access a compound after get_compounds returns an empty list)
cause This problem occurs when `pubchempy.get_compounds()` or similar search functions return an empty list because no Compound records match the query, but the user then attempts to access an element (e.g., `results[0]`) from this empty list.fixAlways check if the list returned by `get_compounds()` or `get_substances()` is empty before attempting to access its elements. If a name exists as a PubChem Substance but not a Compound, try `get_substances()` instead.
Warnings
- gotcha PubChem imposes rate limits (e.g., 10 requests per second from a single IP address) on its PUG REST API. PubChemPy does not inherently manage these limits, so users must implement delays or batch queries (where applicable, like `as_dataframe=True` for `get_compounds`) to avoid HTTP 429 (Too Many Requests) errors.
- gotcha API calls might return empty lists or `None` if no data is found, or raise `pubchempy.PubChemPyError` for issues like invalid CIDs or API errors. Robust error handling is crucial.
- gotcha The behavior of `Compound.fingerprint` was corrected in v1.0.4 to align with the CACTVS fingerprint specification. Users relying on outputs from older versions might observe different fingerprint values.
- gotcha Proxy configuration was a known issue in older versions (addressed in v1.0.4). While PubChemPy leverages `requests` for HTTP, direct proxy configuration might still be needed via environment variables (`http_proxy`, `https_proxy`) or by passing a `proxies` dictionary to `requests` if direct `pubchempy` support is lacking for a specific function.
Install
-
pip install PubChemPy
Imports
- pubchempy
import pubchempy as pcp
- Compound
from pubchempy import Compound
- Substance
from pubchempy import Substance
- PubChemPyError
from pubchempy import PubChemPyError
Quickstart
import pubchempy as pcp
try:
# Search for compounds by name
compounds = pcp.get_compounds('aspirin', 'name')
if compounds:
aspirin = compounds[0]
print(f"Compound Name: {aspirin.iupac_name}")
print(f"CID: {aspirin.cid}")
print(f"Molecular Formula: {aspirin.molecular_formula}")
print(f"Canonical SMILES: {aspirin.canonical_smiles}")
# Retrieve specific properties for a compound
properties = pcp.get_properties(
['molecular_weight', 'xlogp'], # List of properties to fetch
aspirin.cid,
'cid' # Namespace: 'cid' for compound IDs
)
if properties:
print(f"Molecular Weight: {properties[0]['MolecularWeight']}")
print(f"XLogP: {properties[0]['XLogP']}")
else:
print("Could not retrieve additional properties.")
else:
print("Aspirin not found in PubChem.")
except pcp.PubChemPyError as e:
print(f"A PubChemPy API error occurred: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")