IDC Index

raw JSON →
0.11.14 verified Fri May 01 auth: no python

Python package to simplify access to the data available in the NCI Imaging Data Commons (IDC). Provides queryable Pandas/DuckDB-based indices for DICOM studies, series, and analysis results. Latest version: 0.11.14. Released approximately every 2-4 weeks.

pip install idc-index
error AttributeError: 'IDCClient' object has no attribute 'get_collections'
cause The method was renamed/removed in v0.11.
fix
Use client.available_collections instead.
error ModuleNotFoundError: No module named 'idc_index_data'
cause The `idc-index-data` package is not installed or not up-to-date.
fix
Run pip install idc-index-data or pip install idc-index --upgrade.
error ValueError: The truth value of a DataFrame is ambiguous
cause Calling `bool(client.get_series(...))` on an empty DataFrame.
fix
Use not series.empty instead of if series:
error RuntimeError: Cannot connect to IDC index. Make sure you have internet access.
cause First-time download or cached index is corrupt; or network issue.
fix
Ensure stable internet, delete ~/.cache/idc-index and retry, or set environment variable IDC_INDEX_CACHE_DIR to a writable path.
deprecated The method `get_collections()` was removed in v0.11. Use `client.available_collections` property instead.
fix Replace `client.get_collections()` with `client.available_collections`
gotcha IDCClient() will download the index data on first instantiation if not already cached. This can be slow (>1GB download). Use `IDCClient(lazy=True)` to defer downloading until a query is made.
fix client = IDCClient(lazy=True)
gotcha Index data (parquet files) is versioned with the `idc-index-data` package. If you have an older version of idc-index-data, new methods may fail or return empty results. Always keep both packages up-to-date.
fix Run `pip install --upgrade idc-index idc-index-data`
breaking In v0.11.0, the `get_patient_study_series()` return format changed from a dict with keys to a namedtuple. Code expecting dict keys will break.
fix Access elements by index (e.g., result.patient_id) instead of dict['patient_id']

Creates a client, then queries for CT series from the TCGA-LUAD collection.

from idc_index import IDCClient

client = IDCClient()
# Get all DICOM series in the index
series = client.get_series(collection_id="TCGA-LUAD", modality="CT")
print(len(series))
# Access as Pandas DataFrame
print(series.head())