Intel oneMKL SYCL BLAS

raw JSON →
2026.0.0 verified Fri May 01 auth: no python

Intel oneAPI Math Kernel Library (oneMKL) BLAS routines for SYCL devices. Provides a SYCL implementation of BLAS operations using Intel oneMKL. Current version 2026.0.0; release follows Intel oneAPI release cadence (usually annual).

pip install onemkl-sycl-blas
error ModuleNotFoundError: No module named 'onemkl_sycl_blas'
cause Package was renamed/restructured in v2025.0.0.
fix
Run pip install onemkl-sycl-blas and use from onemkl._blas import blas.
error RuntimeError: Queue must be created with a SYCL device
cause Using a default queue without a SYCL device (e.g., no GPU or Level Zero runtime).
fix
Ensure a SYCL device is available (e.g., dpctl.lsplatform() shows devices) and pass a valid queue.
error ValueError: Input array is not contiguous in memory
cause Array is not in column-major order or is a view.
fix
Use np.asfortranarray() or ensure order='F' when creating arrays.
breaking v2025.0.0 changed the internal module path from `onemkl_sycl_blas` to `onemkl._blas`. All imports using the old path will fail.
fix Replace `import onemkl_sycl_blas` with `from onemkl._blas import blas`.
gotcha All BLAS routines expect column-major layout (Fortran order). Passing row-major arrays will produce incorrect results without error.
fix Ensure arrays are column-major (use `np.asfortranarray()`) or transpose correctly.
gotcha The library does not support all BLAS operations (e.g., SPMV, TRSV). Check documentation before assuming availability.
fix Consult the Intel oneMKL documentation for supported routines.
deprecated v2024.x and earlier used `dpnp` arrays. `dpnp` is deprecated; use `numpy` arrays with `dpctl` queue instead.
fix Migrate from `dpnp` to `numpy` + `dpctl`.

Basic GEMM operation using oneMKL SYCL BLAS. Note: matrices must be in column-major order.

import dpctl
import numpy as np
from onemkl._blas import blas, ColumnMajor

# Create a SYCL queue (use default GPU)
queue = dpctl.SyclQueue()

# Prepare matrices (ColumnMajor order required!)
m, n, k = 4, 4, 4
alpha = np.float64(1.0)
beta = np.float64(0.0)
A = np.random.rand(m, k).astype(np.float64)
B = np.random.rand(k, n).astype(np.float64)
C = np.zeros((m, n), dtype=np.float64)

# Compute C = alpha * A * B + beta * C
blas.gemm(queue, ColumnMajor, 'N', 'N', m, n, k, alpha, A, m, B, k, beta, C, m)
print(C)