Drop the support of synchronous execution

The current WebNN spec supports both [asynchronous](https://www.w3.org/TR/webnn/#api-mlcontext-async-execution) and [synchronous](https://www.w3.org/TR/webnn/#api-mlcontext-sync-execution) execution modes. In particular, the synchronous execution mode, including [`MLContext.computeSync()`](https://www.w3.org/TR/webnn/#dom-mlcontext-computesync), [`ML.createContextSync()`](https://www.w3.org/TR/webnn/#dom-ml-createcontextsync) and [`MLGraphBuilder.buildSync()`](https://www.w3.org/TR/webnn/#dom-mlgraphbuilder-buildsync) methods, were introduced (only available in dedicated worker) for easy integration with ML frameworks written in C++ and compiled to Wasm, for example ONNXRuntime WebNN EP (Execution Provider) used the sync execution before. 

Chromium [WebNN prototype](https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/modules/ml/ml_context.idl) supports both execution modes for implementation feedback. Chrome team encouraged WG to check [whether sync APIs are really necessary before its launch](https://bugs.chromium.org/p/chromium/issues/detail?id=1488162).

Recently, ONNXRuntim WebNN EP experimented ([onnxruntime#19145](https://github.com/microsoft/onnxruntime/pull/19145)) the async execution mode and compared the performance with sync. For sync execution, ONNXRuntime runs WebNN EP in a dedicated worker and calls WebNN `computeSync()` method there, the JavaScript user code in main thread communicate (via `postMessage`) with WebNN EP in worker thread through ONNXRuntime [Wasm proxy](https://onnxruntime.ai/docs/api/js/interfaces/Env.WebAssemblyFlags.html#proxy). For async execution, ONNXRuntime runs WebNN EP in main thread and calls WebNN async `compute()` method through [asyncify](https://emscripten.org/docs/porting/asyncify.html#usage-with-embind). 

According to the [test result](https://github.com/microsoft/onnxruntime/pull/19145#issuecomment-1905088459) across 35 models (including CNNs & transformers), the model inference time difference of the two execution modes is minimum. Actually, for GPU, the async is even slightly faster than sync (async / sync 103% in average). While for CPU, the async is a bit slower than sync (async / sync 95% in average). It's because WebNN EP on CPU has less operators support currently (referring to [implementation status](https://webmachinelearning.github.io/webnn-status/)). For each non-supported op, the model inference will fallback to run Wasm op and return to compute next WebNN sub-graph. The more ops fallback, the more async compute call (more asyncify overhead).

With more ops being supported by WebNN CPU/XNNPACK backend, the ops fallback would be less that means less asyncify overhead. And with [JSPI](https://github.com/WebAssembly/js-promise-integration/blob/main/proposals/js-promise-integration/Overview.md) (JavaScript Promise Integration) [coming](https://groups.google.com/a/chromium.org/g/blink-dev/c/YNuYkzmxlOY/m/KKhowQd5BgAJ), the asyncify overhead hopefully would become even less. The performance of async execution mode is expected to be faster. 

With [onnxruntime#19145](https://github.com/microsoft/onnxruntime/pull/19145) merged, ONNXRuntime WebNN EP is now only using WebNN async execution mode and won't use sync execution mode anymore.

Based on this implementation experience, the proposal for WebNN spec is to remove the support of sync execution. That would help simplify the spec as well as the implementation. Wasm ML framework could use WebNN async methods via asyncify today and migrate to JSPI once it is available. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Drop the support of synchronous execution #531

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Drop the support of synchronous execution #531

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions