Description
The current WebNN spec supports both asynchronous and synchronous execution modes. In particular, the synchronous execution mode, including MLContext.computeSync()
, ML.createContextSync()
and MLGraphBuilder.buildSync()
methods, were introduced (only available in dedicated worker) for easy integration with ML frameworks written in C++ and compiled to Wasm, for example ONNXRuntime WebNN EP (Execution Provider) used the sync execution before.
Chromium WebNN prototype supports both execution modes for implementation feedback. Chrome team encouraged WG to check whether sync APIs are really necessary before its launch.
Recently, ONNXRuntim WebNN EP experimented (onnxruntime#19145) the async execution mode and compared the performance with sync. For sync execution, ONNXRuntime runs WebNN EP in a dedicated worker and calls WebNN computeSync()
method there, the JavaScript user code in main thread communicate (via postMessage
) with WebNN EP in worker thread through ONNXRuntime Wasm proxy. For async execution, ONNXRuntime runs WebNN EP in main thread and calls WebNN async compute()
method through asyncify.
According to the test result across 35 models (including CNNs & transformers), the model inference time difference of the two execution modes is minimum. Actually, for GPU, the async is even slightly faster than sync (async / sync 103% in average). While for CPU, the async is a bit slower than sync (async / sync 95% in average). It's because WebNN EP on CPU has less operators support currently (referring to implementation status). For each non-supported op, the model inference will fallback to run Wasm op and return to compute next WebNN sub-graph. The more ops fallback, the more async compute call (more asyncify overhead).
With more ops being supported by WebNN CPU/XNNPACK backend, the ops fallback would be less that means less asyncify overhead. And with JSPI (JavaScript Promise Integration) coming, the asyncify overhead hopefully would become even less. The performance of async execution mode is expected to be faster.
With onnxruntime#19145 merged, ONNXRuntime WebNN EP is now only using WebNN async execution mode and won't use sync execution mode anymore.
Based on this implementation experience, the proposal for WebNN spec is to remove the support of sync execution. That would help simplify the spec as well as the implementation. Wasm ML framework could use WebNN async methods via asyncify today and migrate to JSPI once it is available.