OSSDatasets

raw JSON →
0.3.7 verified Sat May 09 auth: no python

OSSDatasets provides scalable access to Software Engineering datasets such as SWE-bench and GitHub issue data. It downloads and caches datasets locally for reproducible ML experiments. Current version 0.3.7, release cadence irregular.

pip install ossdata
error ModuleNotFoundError: No module named 'ossdata'
cause Package not installed or installed in wrong environment.
fix
Run pip install ossdata in your active Python environment.
error ValueError: Dataset 'swe_bench' not found. Available datasets: ['swe-bench', 'swe-bench-lite', 'github-issues']
cause Dataset name typo (underscore instead of hyphen).
fix
Use exact dataset name with hyphens: load_dataset('swe-bench', ...).
gotcha The dataset name must be exact; allowed names: 'swe-bench', 'swe-bench-lite', 'github-issues' (case-sensitive).
fix Check the exact dataset name in the docs.
gotcha The returned data format is a list of dicts, not a Hugging Face Dataset or pandas DataFrame. You may need to convert manually.
fix Wrap with `pd.DataFrame(dataset)` or use dict-comprehensions.
breaking Version 0.2.x and earlier used a different API: `import ossdata` then `ossdata.get_dataset(...)`. The old API is removed in 0.3.0.
fix Use `from ossdata import load_dataset` and call `load_dataset(...)`.

Load a dataset by name (e.g., 'swe-bench') with automatic caching. Returns a list of dicts.

import os
from ossdata import load_dataset

# Load SWE-bench dataset (cached automatically)
dataset = load_dataset('swe-bench', split='train')
print(dataset[:2])