oneMKL SYCL Sparse

raw JSON →
2026.0.0 verified Fri May 01 auth: no python

Intel oneAPI Math Kernel Library (oneMKL) Sparse BLAS routines for SYCL devices. This package provides optimized sparse linear algebra operations (e.g., sparse matrix-vector multiply, sparse triangular solvers) on Intel GPUs and CPUs using SYCL. Current version: 2026.0.0. Released quarterly as part of Intel's oneAPI toolkit.

pip install onemkl-sycl-sparse
error ModuleNotFoundError: No module named 'onemkl_sycl_sparse'
cause Package not installed or wrong Python environment.
fix
Install: pip install onemkl-sycl-sparse
error AttributeError: module 'onemkl_sycl_sparse' has no attribute 'sparse'
cause Incorrect import. The correct import is 'from onemkl_sycl_sparse import sparse'.
fix
Use: from onemkl_sycl_sparse import sparse
error dpctl._sycl.SyclQueueCreationError: No device of requested type available
cause No compatible SYCL device (GPU/CPU) found or driver issues.
fix
Check Intel GPU/CPU drivers, install Intel oneAPI runtime, or use 'cpu' selector: dpctl.SyclQueue("cpu")
gotcha All input arrays (row, col, val) must be on the same SYCL device as the queue. Do not mix host memory (numpy) directly without copying to USM.
fix Use dpctl.tensor.usm_ndarray to allocate device memory and copy data.
gotcha Matrix handles must be explicitly created and destroyed. Forgetting sparse.destroy_handle leads to memory leaks.
fix Always pair sparse.create_handle with sparse.destroy_handle, ideally in a try-finally block or context manager (if available).
gotcha The sparse module expects Fortran-style (column-major) ordering for some internal operations, but CSR arrays are typically row-major. Ensure row-index array is sorted and unique per row.
fix Pre-process CSR arrays to be in sorted column order per row (e.g., using scipy sparse CSR construction).
deprecated The old import path from 'oneapi.mkl' is deprecated and removed in 2026.0.0.
fix Use 'from onemkl_sycl_sparse import sparse'.
conda install -c intel onemkl-sycl-sparse

Performs a sparse matrix-vector multiply using CSR format on a SYCL GPU device.

import dpctl
import numpy as np
from onemkl_sycl_sparse import sparse

# Create a SYCL queue (device selector: 0 for gpu, 1 for cpu)
queue = dpctl.SyclQueue("gpu")

# Create a simple CSR matrix
row = np.array([0, 0, 1, 2], dtype=np.int64)
col = np.array([0, 1, 1, 2], dtype=np.int64)
val = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float64)
nrows, ncols = 3, 3

# Allocate USM memory
d_row = dpctl.tensor.usm_ndarray(row, dtype=row.dtype, queue=queue)
d_col = dpctl.tensor.usm_ndarray(col, dtype=col.dtype, queue=queue)
d_val = dpctl.tensor.usm_ndarray(val, dtype=val.dtype, queue=queue)

# Create handle and perform operation
handle = sparse.create_handle(queue)
sparse_matrix = sparse.init_csr_matrix(handle, nrows, ncols, d_row, d_col, d_val)

# Sparse matrix-vector multiply
x = np.array([1.0, 2.0, 3.0], dtype=np.float64)
d_x = dpctl.tensor.usm_ndarray(x, dtype=x.dtype, queue=queue)
d_y = dpctl.tensor.empty(3, dtype=np.float64, queue=queue)
sparse.omatadd(handle, sparse_matrix, sparse_matrix, 1.0, 1.0, d_x, d_y)
y = d_y.asnumpy()
print(y)

sparse.destroy_handle(handle)