OSSDatasets
raw JSON → 0.3.7 verified Sat May 09 auth: no python
OSSDatasets provides scalable access to Software Engineering datasets such as SWE-bench and GitHub issue data. It downloads and caches datasets locally for reproducible ML experiments. Current version 0.3.7, release cadence irregular.
pip install ossdata Common errors
error ModuleNotFoundError: No module named 'ossdata' ↓
cause Package not installed or installed in wrong environment.
fix
Run
pip install ossdata in your active Python environment. error ValueError: Dataset 'swe_bench' not found. Available datasets: ['swe-bench', 'swe-bench-lite', 'github-issues'] ↓
cause Dataset name typo (underscore instead of hyphen).
fix
Use exact dataset name with hyphens:
load_dataset('swe-bench', ...). Warnings
gotcha The dataset name must be exact; allowed names: 'swe-bench', 'swe-bench-lite', 'github-issues' (case-sensitive). ↓
fix Check the exact dataset name in the docs.
gotcha The returned data format is a list of dicts, not a Hugging Face Dataset or pandas DataFrame. You may need to convert manually. ↓
fix Wrap with `pd.DataFrame(dataset)` or use dict-comprehensions.
breaking Version 0.2.x and earlier used a different API: `import ossdata` then `ossdata.get_dataset(...)`. The old API is removed in 0.3.0. ↓
fix Use `from ossdata import load_dataset` and call `load_dataset(...)`.
Imports
- load_dataset
from ossdata import load_dataset
Quickstart
import os
from ossdata import load_dataset
# Load SWE-bench dataset (cached automatically)
dataset = load_dataset('swe-bench', split='train')
print(dataset[:2])