Therapeutics Data Commons (TDC)

raw JSON →
1.1.15 verified Fri May 01 auth: no python

Therapeutics Data Commons (TDC) is a unified, open-source framework for machine learning in therapeutics. It provides standardized datasets, benchmarks, and tools for tasks like drug-target interaction, ADMET prediction, and clinical trial outcome prediction. Current version 1.1.15, with frequent updates.

pip install PyTDC
error ModuleNotFoundError: No module named 'tdc'
cause pyTDC not installed or installed in wrong environment.
fix
Run: pip install PyTDC
error AttributeError: module 'tdc' has no attribute 'single_pred'
cause Attempting to import from top-level tdc incorrectly.
fix
Use: from tdc.single_pred import ADME
error urllib.error.URLError: <urlopen error [Errno 111] Connection refused>
cause Network issue when downloading datasets.
fix
Check internet connection or set TDC_CACHE_DIR to a local path. Alternatively, use a local data file with DataLoader.
breaking Python 3.7 support dropped after version 0.9.6. Use Python 3.8+.
fix Upgrade Python to 3.8 or higher.
gotcha Data download may fail behind corporate firewall or with slow network. TDC downloads from remote servers.
fix Set environment variable TDC_CACHE_DIR to a writable directory, or use data.DataLoader for custom files.
gotcha The split method 'random' produces same splits across runs without seed? Actually you must pass seed explicitly for reproducibility.
fix Always provide seed parameter to get_split(method='random', seed=42).

Load Caco2 permeability dataset and split into train/valid/test.

from tdc.single_pred import ADME
data = ADME(name = 'Caco2_Wang')
split = data.get_split(method = 'random', seed = 42)
train, valid, test = split['train'], split['valid'], split['test']
print(train.head())