Open Graph Benchmark (OGB)

raw JSON →
1.3.6 verified Fri May 01 auth: no python

A library for downloading, preprocessing, and evaluating on the Open Graph Benchmark (OGB) datasets. Version 1.3.6 fixes Pandas 2.0 compatibility. Releases are periodic, with occasional major dataset updates.

pip install ogb
error ModuleNotFoundError: No module named 'ogb'
cause OGB is not installed or installed in a different environment.
fix
pip install ogb
error urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed>
cause SSL certificate issues, often in corporate or restrictive networks.
fix
Set environment variable 'OGB_USE_HTTPS=0' or use a proxy. Alternatively, manually download datasets.
error OGBDatasetError: Dataset ogbn-arxiv is not found. Please check the dataset name.
cause Typo in dataset name or dataset not part of OGB.
fix
Use correct dataset name, e.g., 'ogbn-arxiv', 'ogbl-ppa', 'ogbg-molhiv'. See ogb.stanford.edu/docs.
error RuntimeError: Could not download file from ... Reason: HTTP Error 404: Not Found
cause Dataset URL has changed or version mismatch.
fix
Update ogb to latest version: pip install --upgrade ogb
deprecated Datasets ogbl-wikikg and ogbl-citation are deprecated; use ogbl-wikikg2 and ogbl-citation2 instead.
fix Use ogbl-wikikg2 and ogbl-citation2; old names will raise a deprecation warning.
deprecated ogbg-code dataset is deprecated due to target leakage; use ogbg-code2.
fix Replace ogbg-code with ogbg-code2 in dataset name.
breaking Import stuck bug in version 1.3.4: downloading may hang.
fix Upgrade to ogb==1.3.5 or later (pip install --upgrade ogb).
gotcha OGB datasets download from external URLs. Offline or restricted networks may fail.
fix Pre-download manually or set OGB_CACHE_DIR environment variable to a cached location.
gotcha Pandas 2.0 breaks compatibility with older ogb versions (<1.3.6).
fix Use pandas<2.0 or upgrade ogb to 1.3.6+.

Load the ogbn-arxiv dataset using PyG wrapper and run a dummy evaluation.

from ogb.nodeproppred import PygNodePropPredDataset
from ogb.nodeproppred import Evaluator

dataset = PygNodePropPredDataset(name='ogbn-arxiv')
split_idx = dataset.get_idx_split()
train_idx, valid_idx, test_idx = split_idx['train'], split_idx['valid'], split_idx['test']
graph = dataset[0]

# For evaluation, use the evaluator
evaluator = Evaluator(name='ogbn-arxiv')
# Example: dummy predictions and labels (for illustration only)
import torch
y_pred = torch.randn(len(graph.y))
y_true = graph.y
result = evaluator.eval({'y_pred': y_pred, 'y_true': y_true})
print(result)