Therapeutics Data Commons (TDC)
raw JSON → 1.1.15 verified Fri May 01 auth: no python
Therapeutics Data Commons (TDC) is a unified, open-source framework for machine learning in therapeutics. It provides standardized datasets, benchmarks, and tools for tasks like drug-target interaction, ADMET prediction, and clinical trial outcome prediction. Current version 1.1.15, with frequent updates.
pip install PyTDC Common errors
error ModuleNotFoundError: No module named 'tdc' ↓
cause pyTDC not installed or installed in wrong environment.
fix
Run: pip install PyTDC
error AttributeError: module 'tdc' has no attribute 'single_pred' ↓
cause Attempting to import from top-level tdc incorrectly.
fix
Use: from tdc.single_pred import ADME
error urllib.error.URLError: <urlopen error [Errno 111] Connection refused> ↓
cause Network issue when downloading datasets.
fix
Check internet connection or set TDC_CACHE_DIR to a local path. Alternatively, use a local data file with DataLoader.
Warnings
breaking Python 3.7 support dropped after version 0.9.6. Use Python 3.8+. ↓
fix Upgrade Python to 3.8 or higher.
gotcha Data download may fail behind corporate firewall or with slow network. TDC downloads from remote servers. ↓
fix Set environment variable TDC_CACHE_DIR to a writable directory, or use data.DataLoader for custom files.
gotcha The split method 'random' produces same splits across runs without seed? Actually you must pass seed explicitly for reproducibility. ↓
fix Always provide seed parameter to get_split(method='random', seed=42).
Imports
- TDC
from tdc import TDC - SinglePredDataset wrong
from tdc import SinglePredDatasetcorrectfrom tdc.single_pred import SinglePredDataset - DataLoader
from tdc.utils import DataLoader
Quickstart
from tdc.single_pred import ADME
data = ADME(name = 'Caco2_Wang')
split = data.get_split(method = 'random', seed = 42)
train, valid, test = split['train'], split['valid'], split['test']
print(train.head())