NVIDIA DALI for CUDA 12.0

2.1.0 verified Sat May 09 auth: no python

NVIDIA DALI (Data Loading Library) is a GPU-accelerated data loading and augmentation library for deep learning. This package (nvidia-dali-cuda120) is built specifically for CUDA 12.0. Current version is 2.1.0, with a rapid release cadence (about monthly). Supports Python 3.10–3.14. Requires NVIDIA GPU with CUDA 12.0 driver (R525+) and nvJPEG2000 support. For CUDA 12.0 users, install this package instead of the generic nvidia-dali.

pip install nvidia-dali-cuda120

Common errors

error ModuleNotFoundError: No module named 'nvidia.dali' ↓

cause You installed the generic 'nvidia-dali' package (without CUDA suffix) or the wrong CUDA variant. The generic package may not exist for your Python version.

fix

Uninstall any existing DALI and install the correct variant: pip uninstall nvidia-dali nvidia-dali-cudaXX -y && pip install nvidia-dali-cuda120 (replace 120 with your CUDA version). Ensure your CUDA version is 12.0.

error RuntimeError: cuInit returned 999 ↓

cause The DALI CUDA variant does not match the installed CUDA driver version. This typically happens when running on a system with a different CUDA version than the package was built for.

fix

Check your CUDA driver version with nvidia-smi and install the corresponding DALI package (e.g., nvidia-dali-cuda124 for CUDA 12.4).

error TypeError: pipeline_def() got an unexpected keyword argument 'enable_experimental_executor' ↓

cause The `enable_experimental_executor` argument was introduced in DALI 2.0. If you are using an older version, this argument does not exist.

fix

Remove enable_experimental_executor or upgrade to DALI 2.0+ with pip install --upgrade nvidia-dali-cuda120.

error AttributeError: module 'nvidia.dali.fn' has no attribute 'decoders' ↓

cause You likely imported `fn` incorrectly using `from nvidia.dali import fn` (which is correct) but the 'decoders' submodule is not automatically imported. You need a separate import or use `fn.experimental.decoders` in older versions.

fix

Add from nvidia.dali import fn, decoders or use fn.decoders.image(...) (with submodule). In DALI <1.50, use fn.experimental.decoders.image(...).

Warnings

breaking Starting with DALI 2.0, the default executor is the new 'dynamic' executor. If you relied on the exact scheduling order of the old executor, your pipeline may behave differently. To use the old executor, set `enable_experimental_executor=False` in your pipeline definition. ↓

fix Add `enable_experimental_executor=False` to pipeline_def or Pipeline constructor to revert to old executor behavior.

deprecated Python 3.9 support dropped in DALI 2.0. Requires Python >=3.10. ↓

fix Upgrade Python to 3.10 or later.

gotcha The 'nvidia-dali-cuda120' package is specific to CUDA 12.0. If your system uses a different CUDA version (e.g., 12.4, 12.5, 12.6, 12.8), you must install the corresponding '-cudaXXX' variant. Installing the wrong variant may lead to silent performance degradation or runtime errors. ↓

fix Run `nvidia-smi` to check driver CUDA version, then install the matching package (e.g., `pip install nvidia-dali-cuda124`).

gotcha The DecodersSplit operator (fn.decoders.split) was removed in DALI 1.50. Use separate decoder calls per output instead. ↓

fix Replace `split` with individual decoder calls (e.g., `fn.decoders.image(images)`, `fn.decoders.video(videos)`).

Imports

pipeline_def
wrong
```
from nvidia.dali.pipeline import pipeline_def
```
correct
```
from nvidia.dali import pipeline_def
```
pipeline_def is a decorator defined in the top-level nvidia.dali module, not a submodule.
fn
wrong
```
import nvidia.dali.fn as fn
```
correct
```
from nvidia.dali import fn
```
fn is a module; direct import works but using 'from nvidia.dali import fn' is the canonical way.
types
```
from nvidia.dali import types
```
Used for DALIDataType, DALIInterpType, etc.

Pipeline

wrong

from nvidia.dali import Pipeline

correct

from nvidia.dali.pipeline import Pipeline

Pipeline class is in nvidia.dali.pipeline submodule.

Quickstart

Basic image classification pipeline using DALI with PyTorch integration. Ensure /data/images contains subdirectories per class with JPEG images.

from nvidia.dali import pipeline_def, fn, types
from nvidia.dali.plugin.pytorch import DALIGenericIterator

@pipeline_def(batch_size=4, num_threads=2, device_id=0)
def simple_pipeline():
    jpegs, labels = fn.readers.file(file_root='/data/images', random_shuffle=True)
    images = fn.decoders.image(jpegs, device='mixed')
    images = fn.resize(images, resize_x=224, resize_y=224)
    images = fn.crop_mirror_normalize(
        images,
        dtype=types.FLOAT,
        output_layout='CHW',
        mean=[0.485*255,0.456*255,0.406*255],
        std=[0.229*255,0.224*255,0.225*255])
    return images, labels

pipe = simple_pipeline()
pipe.build()
train_loader = DALIGenericIterator(pipe, ['images', 'labels'])
for data in train_loader:
    print(data[0]['images'].shape)
    break