NVIDIA DALI for CUDA 12.0
raw JSON → 2.1.0 verified Sat May 09 auth: no python
NVIDIA DALI (Data Loading Library) is a GPU-accelerated data loading and augmentation library for deep learning. This package (nvidia-dali-cuda120) is built specifically for CUDA 12.0. Current version is 2.1.0, with a rapid release cadence (about monthly). Supports Python 3.10–3.14. Requires NVIDIA GPU with CUDA 12.0 driver (R525+) and nvJPEG2000 support. For CUDA 12.0 users, install this package instead of the generic nvidia-dali.
pip install nvidia-dali-cuda120 Common errors
error ModuleNotFoundError: No module named 'nvidia.dali' ↓
cause You installed the generic 'nvidia-dali' package (without CUDA suffix) or the wrong CUDA variant. The generic package may not exist for your Python version.
fix
Uninstall any existing DALI and install the correct variant:
pip uninstall nvidia-dali nvidia-dali-cudaXX -y && pip install nvidia-dali-cuda120 (replace 120 with your CUDA version). Ensure your CUDA version is 12.0. error RuntimeError: cuInit returned 999 ↓
cause The DALI CUDA variant does not match the installed CUDA driver version. This typically happens when running on a system with a different CUDA version than the package was built for.
fix
Check your CUDA driver version with
nvidia-smi and install the corresponding DALI package (e.g., nvidia-dali-cuda124 for CUDA 12.4). error TypeError: pipeline_def() got an unexpected keyword argument 'enable_experimental_executor' ↓
cause The `enable_experimental_executor` argument was introduced in DALI 2.0. If you are using an older version, this argument does not exist.
fix
Remove
enable_experimental_executor or upgrade to DALI 2.0+ with pip install --upgrade nvidia-dali-cuda120. error AttributeError: module 'nvidia.dali.fn' has no attribute 'decoders' ↓
cause You likely imported `fn` incorrectly using `from nvidia.dali import fn` (which is correct) but the 'decoders' submodule is not automatically imported. You need a separate import or use `fn.experimental.decoders` in older versions.
fix
Add
from nvidia.dali import fn, decoders or use fn.decoders.image(...) (with submodule). In DALI <1.50, use fn.experimental.decoders.image(...). Warnings
breaking Starting with DALI 2.0, the default executor is the new 'dynamic' executor. If you relied on the exact scheduling order of the old executor, your pipeline may behave differently. To use the old executor, set `enable_experimental_executor=False` in your pipeline definition. ↓
fix Add `enable_experimental_executor=False` to pipeline_def or Pipeline constructor to revert to old executor behavior.
deprecated Python 3.9 support dropped in DALI 2.0. Requires Python >=3.10. ↓
fix Upgrade Python to 3.10 or later.
gotcha The 'nvidia-dali-cuda120' package is specific to CUDA 12.0. If your system uses a different CUDA version (e.g., 12.4, 12.5, 12.6, 12.8), you must install the corresponding '-cudaXXX' variant. Installing the wrong variant may lead to silent performance degradation or runtime errors. ↓
fix Run `nvidia-smi` to check driver CUDA version, then install the matching package (e.g., `pip install nvidia-dali-cuda124`).
gotcha The DecodersSplit operator (fn.decoders.split) was removed in DALI 1.50. Use separate decoder calls per output instead. ↓
fix Replace `split` with individual decoder calls (e.g., `fn.decoders.image(images)`, `fn.decoders.video(videos)`).
Imports
- pipeline_def wrong
from nvidia.dali.pipeline import pipeline_defcorrectfrom nvidia.dali import pipeline_def - fn wrong
import nvidia.dali.fn as fncorrectfrom nvidia.dali import fn - types
from nvidia.dali import types - Pipeline wrong
from nvidia.dali import Pipelinecorrectfrom nvidia.dali.pipeline import Pipeline
Quickstart
from nvidia.dali import pipeline_def, fn, types
from nvidia.dali.plugin.pytorch import DALIGenericIterator
@pipeline_def(batch_size=4, num_threads=2, device_id=0)
def simple_pipeline():
jpegs, labels = fn.readers.file(file_root='/data/images', random_shuffle=True)
images = fn.decoders.image(jpegs, device='mixed')
images = fn.resize(images, resize_x=224, resize_y=224)
images = fn.crop_mirror_normalize(
images,
dtype=types.FLOAT,
output_layout='CHW',
mean=[0.485*255,0.456*255,0.406*255],
std=[0.229*255,0.224*255,0.225*255])
return images, labels
pipe = simple_pipeline()
pipe.build()
train_loader = DALIGenericIterator(pipe, ['images', 'labels'])
for data in train_loader:
print(data[0]['images'].shape)
break