cuda-wasm: CUDA to WebAssembly/WebGPU Transpiler

raw JSON →
1.1.1 verified Fri May 01 auth: no javascript

cuda-wasm (v1.1.1) is a high-performance transpiler that converts CUDA C++ code to WebAssembly and WebGPU, enabling GPU-accelerated computing in browsers and Node.js without NVIDIA hardware. It uses a clean-room implementation based on public CUDA specifications, translating CUDA syntax to Rust/WebGPU patterns. Ships with TypeScript types, supports Node >=16.0.0. Key differentiators: no NVIDIA dependencies, near-native performance via WebGPU, ruv-FANN neural network integration. Released under MIT/Apache-2.0. Active development.

error TypeError: WebGPU not supported in this browser
cause Browser lacks WebGPU support (e.g., Safari, older Chrome).
fix
Use a browser with WebGPU enabled (Chrome 113+, Edge 113+, Firefox Nightly).
error Error: CUDA_RUNTIME_API_NOT_SUPPORTED
cause Using CUDA runtime API calls (e.g., cudaMalloc) in the transpiled code.
fix
Rewrite kernel to avoid runtime API; allocate memory outside and pass as args.
breaking CUDA runtime API and driver API are NOT supported. Only kernel execution is transpiled.
fix Use transpileCUDA and runKernel for kernels; do not expect CUDA runtime functions.
breaking Not all CUDA features are implemented. See compatibility table.
fix Consult the documentation for supported features; avoid dynamic parallelism and PTX.
gotcha WebGPU requires HTTPS or localhost. Using file:// or non-secure origins will fail.
fix Serve your app over HTTPS or use localhost; do not open HTML directly from filesystem.
deprecated The separate @cuda-wasm/web package is deprecated and merged into main package.
fix Use cuda-wasm directly; no need for additional packages.
npm install cuda-wasm
yarn add cuda-wasm
pnpm add cuda-wasm

Transpile a CUDA kernel and run it on WebGPU, demonstrating vector addition.

import { transpileCUDA } from 'cuda-wasm/transpile';
import { runKernel } from 'cuda-wasm/runtime';

const cudaCode = `
__global__ void vectorAdd(const float* A, const float* B, float* C, int N) {
  int i = threadIdx.x + blockIdx.x * blockDim.x;
  if (i < N) C[i] = A[i] + B[i];
}
`;

async function main() {
  const wasmModule = await transpileCUDA(cudaCode);
  const result = await runKernel(wasmModule, 'vectorAdd', {
    args: [new Float32Array([1,2,3]), new Float32Array([4,5,6]), new Float32Array(3), 3],
    grid: [1, 1, 1],
    block: [3, 1, 1]
  });
  console.log(result); // Float32Array [5, 7, 9]
}
main();