RDKit
RDKit is a free, open-source toolkit for chemoinformatics, combining chemistry with computer science. It provides functionalities for analyzing molecules, predicting chemical properties, visualizing structures, and preparing data for drug discovery. The library is primarily written in C++ with extensive Python bindings. It maintains a regular release cadence with two major releases per year (typically March/April and September/October) and monthly patch releases, indicating active development and maintenance.
Warnings
- breaking The PyPI package name for RDKit changed from `rdkit-pypi` to `rdkit`. Older installations or `requirements.txt` files might still refer to `rdkit-pypi`.
- gotcha MolFromSmiles (and similar functions) return `None` on failure (e.g., for invalid SMILES strings or during sanitization issues) instead of raising an exception. Directly attempting to use methods on a `None` object will lead to `AttributeError`.
- gotcha RDKit molecules often implicitly handle hydrogens. For accurate structural calculations, 3D conformer generation, or correct atom counts, explicit hydrogens often need to be added.
- breaking Major releases may introduce backwards incompatible changes, particularly in stereochemistry perception, MCS (Maximum Common Substructure) algorithms, canonicalization, ring finding, and default conformer generation parameters (e.g., ETKDG). These changes, while improving accuracy, can lead to different results or require code adjustments.
- gotcha Molecule sanitization is a critical step, and `Chem.SanitizeMol()` can raise `KekulizeException` or `ValenceException` for chemically invalid structures, particularly with problematic nitrogen protonation or incorrect valences. This can halt processing or result in `None` molecules if not handled.
Install
-
pip install rdkit -
conda create -n my_rdkit_env -c conda-forge rdkit conda activate my_rdkit_env
Imports
- Chem
from rdkit import Chem
- AllChem
from rdkit.Chem import AllChem
- Draw
from rdkit.Chem import Draw
- DataStructs
from rdkit import DataStructs
Quickstart
from rdkit import Chem
from rdkit.Chem import Draw
# Create a molecule from a SMILES string
smiles_string = "CCO"
molecule = Chem.MolFromSmiles(smiles_string)
if molecule is not None:
print(f"Successfully created molecule from SMILES: {smiles_string}")
print(f"Number of heavy atoms: {molecule.GetNumHeavyAtoms()}")
# Optionally add hydrogens for better geometry or calculations
mol_with_hs = Chem.AddHs(molecule)
print(f"Total number of atoms (including Hs): {mol_with_hs.GetNumAtoms()}")
# Visualize the molecule (requires Pillow installed)
# img = Draw.MolToImage(molecule)
# img.show() # Uncomment to display image
else:
print(f"Failed to create molecule from SMILES: {smiles_string}")
print("This might happen for invalid SMILES strings or if sanitization fails.")