ChEMBL Structure Pipeline

raw JSON →
1.2.4 verified Fri May 01 auth: no python

A Python toolkit for standardizing and processing chemical structures in the ChEMBL database. Current version 1.2.4. It applies curation rules (e.g., standardization, salt stripping, charge neutralization) commonly used in ChEMBL. Release cadence is irregular, typically a few updates per year.

pip install chembl-structure-pipeline
error AttributeError: module 'chembl_structure_pipeline' has no attribute 'ChEMBLStructurePipeline'
cause Older version of the library (pre-1.0) used different class names; the class was introduced in 1.0.
fix
Install version 1.0 or later: pip install chembl-structure-pipeline>=1.0.
error chembl_structure_pipeline.exceptions.StandardizationError: No molecules could be read from input
cause Input is not a valid molblock (e.g., SMILES string was passed instead).
fix
Convert your SMILES to a molblock using RDKit: Chem.MolToMolBlock(Chem.MolFromSmiles(smiles)).
error ImportError: No module named 'rdkit'
cause The package requires RDKit but it is not installed.
fix
Install RDKit: pip install rdkit-pypi or conda install -c conda-forge rdkit.
gotcha The pipeline expects molblock (V3000 or V2000) strings, not SMILES. Failing to convert SMILES to molblock will cause silent failures or errors.
fix Use RDKit to convert SMILES to molblock before calling pipeline methods.
deprecated The function `standardize_molblock` (imported directly) is deprecated in favor of `ChEMBLStructurePipeline.standardize`.
fix Use the class-based pipeline or the module-level function if still present (but prefer new API).
gotcha The pipeline may remove stereochemistry information during standardization. This is by design but can be surprising for chiral molecules.
fix Review ChEMBL curation rules; if stereochemistry must be preserved, check options in pipeline constructor.
pip install chembl-structure-pipeline[rdkit]

Basic usage: create pipeline, standardize a molblock, and strip salts.

from chembl_structure_pipeline import ChEMBLStructurePipeline

# Create pipeline instance
pipeline = ChEMBLStructurePipeline()

# A sample molblock (MOL format)
molblock = '''

  ChemDraw03312216322D

  0  0  0     0  0              0 V3000
M V30 BEGIN CTAB
M V30 COUNTS 5 4 0 0 0
M V30 BEGIN ATOM
M V30 1 C -1.2990 0.0000 0.0000 0
M V30 2 C 0.0000 1.2990 0.0000 0
M V30 3 C 1.2990 0.0000 0.0000 0
M V30 4 C 0.0000 -1.2990 0.0000 0
M V30 5 C 0.0000 0.0000 0.0000 0
M V30 END ATOM
M V30 BEGIN BOND
M V30 1 1 1 2
M V30 2 1 2 3
M V30 3 1 3 4
M V30 4 1 4 1
M V30 END BOND
M V30 END CTAB
M END
'''

# Standardize the molecule
standardized = pipeline.standardize(molblock)
print(standardized)

# Get parent (salt stripped)
parent = pipeline.get_parent(molblock)
print(parent)