DataLad

raw JSON →
1.4.1 verified Fri May 01 auth: no python

DataLad is a distributed system for joint management of code, data, and their relationships, built on top of Git and git-annex. Current version is 1.4.1, with frequent bug-fix releases. Requires Python >=3.10.

pip install datalad
error ImportError: cannot import name 'Dataset' from 'datalad'
cause Dataset is not in datalad top-level; it's in datalad.api.
fix
Use 'from datalad.api import Dataset'
error AttributeError: module 'datalad' has no attribute 'api'
cause Python 3.10+ may shadow with a local file named datalad.py.
fix
Rename any local script or module named 'datalad.py'.
error CommandNotFoundError: git-annex is not installed
cause git-annex binary missing from PATH.
fix
Install git-annex via system package manager (e.g., apt-get install git-annex-standalone)
error requests.exceptions.InvalidURL: Failed to parse: ///example/dataset
cause Triple-slash URLs are valid only with a configured remote; raw source URL needed.
fix
Use a valid git URL like 'https://github.com/datalad-datasets/longnow-podcasts.git'
breaking Python 3.9 support dropped in DataLad 1.3.0; requires Python >=3.10.
fix Upgrade Python to 3.10 or later.
breaking patoolib API change in version 2.0 broke DataLad <1.4.1; DataLad 1.4.1 includes a fix.
fix Upgrade to DataLad 1.4.1 or pin patoolib<2.0.
gotcha DataLad uses custom markers for pytest; extensions must register them or use the DataLad pytest plugin introduced in 1.4.0.
fix In conftest.py, use pytest_plugins = ['datalad.tests.fixtures'] to auto-register.
gotcha Operations on adjusted branches (crippled filesystems) may require extra merge steps; fixed in 1.3.3.
fix Upgrade to 1.3.3+ or manually run git annex merge/adjust.

Install a dataset from a DataLad superdataset URL and print its ID.

from datalad.api import Dataset
ds = Dataset('.')
ds.install(source='///example/dataset')
print(ds.id)