Intel oneMKL SYCL BLAS
raw JSON → 2026.0.0 verified Fri May 01 auth: no python
Intel oneAPI Math Kernel Library (oneMKL) BLAS routines for SYCL devices. Provides a SYCL implementation of BLAS operations using Intel oneMKL. Current version 2026.0.0; release follows Intel oneAPI release cadence (usually annual).
pip install onemkl-sycl-blas Common errors
error ModuleNotFoundError: No module named 'onemkl_sycl_blas' ↓
cause Package was renamed/restructured in v2025.0.0.
fix
Run
pip install onemkl-sycl-blas and use from onemkl._blas import blas. error RuntimeError: Queue must be created with a SYCL device ↓
cause Using a default queue without a SYCL device (e.g., no GPU or Level Zero runtime).
fix
Ensure a SYCL device is available (e.g.,
dpctl.lsplatform() shows devices) and pass a valid queue. error ValueError: Input array is not contiguous in memory ↓
cause Array is not in column-major order or is a view.
fix
Use
np.asfortranarray() or ensure order='F' when creating arrays. Warnings
breaking v2025.0.0 changed the internal module path from `onemkl_sycl_blas` to `onemkl._blas`. All imports using the old path will fail. ↓
fix Replace `import onemkl_sycl_blas` with `from onemkl._blas import blas`.
gotcha All BLAS routines expect column-major layout (Fortran order). Passing row-major arrays will produce incorrect results without error. ↓
fix Ensure arrays are column-major (use `np.asfortranarray()`) or transpose correctly.
gotcha The library does not support all BLAS operations (e.g., SPMV, TRSV). Check documentation before assuming availability. ↓
fix Consult the Intel oneMKL documentation for supported routines.
deprecated v2024.x and earlier used `dpnp` arrays. `dpnp` is deprecated; use `numpy` arrays with `dpctl` queue instead. ↓
fix Migrate from `dpnp` to `numpy` + `dpctl`.
Imports
- blas wrong
from onemkl_sycl_blas import blascorrectfrom onemkl._blas import blas - ColumnMajor, RowMajor wrong
from onemkl_sycl_blas.enums import ColumnMajorcorrectfrom onemkl._blas import ColumnMajor, RowMajor
Quickstart
import dpctl
import numpy as np
from onemkl._blas import blas, ColumnMajor
# Create a SYCL queue (use default GPU)
queue = dpctl.SyclQueue()
# Prepare matrices (ColumnMajor order required!)
m, n, k = 4, 4, 4
alpha = np.float64(1.0)
beta = np.float64(0.0)
A = np.random.rand(m, k).astype(np.float64)
B = np.random.rand(k, n).astype(np.float64)
C = np.zeros((m, n), dtype=np.float64)
# Compute C = alpha * A * B + beta * C
blas.gemm(queue, ColumnMajor, 'N', 'N', m, n, k, alpha, A, m, B, k, beta, C, m)
print(C)