oneMKL SYCL LAPACK

raw JSON →
2026.0.0 verified Fri May 01 auth: no python

Intel® oneAPI Math Kernel Library (oneMKL) SYCL LAPACK routines, providing dense and sparse linear algebra operations optimized for Intel CPUs and GPUs via SYCL. Current version 2026.0.0. Released as part of the Intel oneAPI toolkit with quarterly updates.

pip install onemkl-sycl-lapack
error ModuleNotFoundError: No module named 'onemkl_sycl_lapack'
cause The top-level package name does not contain the LAPACK submodule directly.
fix
Install the package: pip install onemkl-sycl-lapack and import from onemkl._onemkl_lapack.
error TypeError: getrf() missing 1 required positional argument: 'queue'
cause SYCL queue argument is now mandatory as first parameter (breaking change in 2026.0).
fix
Pass a valid dpctl.SyclQueue as first argument: getrf(queue, m, n, a, lda, pivots).
error dpctl.tensor.usm_ndarray does not have attribute 'shape'
cause Mixing numpy and dpctl tensor operations incorrectly can lead to attribute errors.
fix
Use a.shape directly on the dpctl tensor (it does support shape), or convert to numpy via dpctl.tensor.asnumpy(a).
breaking All LAPACK routines now require a SYCL queue as the first argument. In older versions, the queue was optional or implicit.
fix Update calls: `getrf(m, n, a, lda, pivots)` -> `getrf(queue, m, n, a, lda, pivots)`
deprecated Direct import from `onemkl_sycl_lapack` is deprecated. Use `onemkl._onemkl_lapack` or higher-level wrapper.
fix Change imports to `from onemkl._onemkl_lapack import ...`
gotcha Input arrays must be device-accessible (dpctl.tensor) not numpy arrays on host. Passing host memory may cause silent errors or segmentation faults.
fix Use `dpctl.tensor.from_numpy(numpy_array, queue=queue)` to transfer data to device.
gotcha The LAPACK routines modify input arrays in-place. Be sure to copy data if original is needed later.
fix Call `dpctl.tensor.copy(input)` before passing to routine.

Basic LU factorization using onemkl SYCL LAPACK with dpctl for device memory and queue management.

import dpctl
import numpy as np
from onemkl._onemkl_lapack import getrf

# Create a SYCL queue on a GPU (or default device)
queue = dpctl.SyclQueue()

# Allocate matrices as device arrays (dpctl.tensor)
a = np.array([[1., 2.], [3., 4.]], dtype=np.float64)
# Convert to device memory
a_dev = dpctl.tensor.from_numpy(a, queue=queue)

# Perform LU factorization (getrf)
# Note: output arrays are modified in-place
m = a_dev.shape[0]
n = a_dev.shape[1]
pivots = dpctl.tensor.empty(m, dtype=np.int64, queue=queue)

# The function signature: getrf(queue, m, n, a, lda, pivots)
getrf(queue, m, n, a_dev, m, pivots)

print("LU factorization completed on device.")