Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop the support of synchronous execution #531

Closed
huningxin opened this issue Jan 26, 2024 · 4 comments · Fixed by #548
Closed

Drop the support of synchronous execution #531

huningxin opened this issue Jan 26, 2024 · 4 comments · Fixed by #548

Comments

@huningxin
Copy link
Contributor

The current WebNN spec supports both asynchronous and synchronous execution modes. In particular, the synchronous execution mode, including MLContext.computeSync(), ML.createContextSync() and MLGraphBuilder.buildSync() methods, were introduced (only available in dedicated worker) for easy integration with ML frameworks written in C++ and compiled to Wasm, for example ONNXRuntime WebNN EP (Execution Provider) used the sync execution before.

Chromium WebNN prototype supports both execution modes for implementation feedback. Chrome team encouraged WG to check whether sync APIs are really necessary before its launch.

Recently, ONNXRuntim WebNN EP experimented (onnxruntime#19145) the async execution mode and compared the performance with sync. For sync execution, ONNXRuntime runs WebNN EP in a dedicated worker and calls WebNN computeSync() method there, the JavaScript user code in main thread communicate (via postMessage) with WebNN EP in worker thread through ONNXRuntime Wasm proxy. For async execution, ONNXRuntime runs WebNN EP in main thread and calls WebNN async compute() method through asyncify.

According to the test result across 35 models (including CNNs & transformers), the model inference time difference of the two execution modes is minimum. Actually, for GPU, the async is even slightly faster than sync (async / sync 103% in average). While for CPU, the async is a bit slower than sync (async / sync 95% in average). It's because WebNN EP on CPU has less operators support currently (referring to implementation status). For each non-supported op, the model inference will fallback to run Wasm op and return to compute next WebNN sub-graph. The more ops fallback, the more async compute call (more asyncify overhead).

With more ops being supported by WebNN CPU/XNNPACK backend, the ops fallback would be less that means less asyncify overhead. And with JSPI (JavaScript Promise Integration) coming, the asyncify overhead hopefully would become even less. The performance of async execution mode is expected to be faster.

With onnxruntime#19145 merged, ONNXRuntime WebNN EP is now only using WebNN async execution mode and won't use sync execution mode anymore.

Based on this implementation experience, the proposal for WebNN spec is to remove the support of sync execution. That would help simplify the spec as well as the implementation. Wasm ML framework could use WebNN async methods via asyncify today and migrate to JSPI once it is available.

@a-sully
Copy link
Contributor

a-sully commented Jan 26, 2024

Big +1 to this proposal from the Chrome team. Thank you for the detailed exploration!

@anssiko
Copy link
Member

anssiko commented Jan 26, 2024

Thank you @huningxin, please feel free to proceed with a PR.

For context, this issue was discussed in https://www.w3.org/2024/01/25-webmachinelearning-minutes.html#t05

PR #532 awaits the landing of the PR that addresses this issue.

@wacky6
Copy link

wacky6 commented Jan 29, 2024

@huningxin Is there a detailed benchmark result that can be shared?

I think 103% / 95% on average is a good news to hear, but I wonder whether the performance delta is consistent across all models, or whether there's a certain grouping (i.e. the distribution of the numbers).

@huningxin
Copy link
Contributor Author

@wacky6

Is there a detailed benchmark result that can be shared?

I added the details at onnxruntime/pull/19145. The updated result have more models. The average async / sync on webnn-cpu becomes 93.45% while webnn-gpu is still 103.84%. The newly-added models has ops fallback on webnn-cpu that causes the cpu number decreased a bit.

huningxin added a commit to huningxin/webnn that referenced this issue Feb 1, 2024
huningxin added a commit to huningxin/webnn that referenced this issue Feb 1, 2024
Remove the definition and algorithm steps for
- ML.createContextSync()
- MLGraphBuilder.buildSync()
- MLContext.computeSync()

Fix webmachinelearning#531
huningxin added a commit to huningxin/webnn that referenced this issue Feb 15, 2024
Remove the definition and algorithm steps for
- ML.createContextSync()
- MLGraphBuilder.buildSync()
- MLContext.computeSync()

Fix webmachinelearning#531
anssiko pushed a commit that referenced this issue Feb 15, 2024
* Remove the definition and algorithm steps for
- ML.createContextSync()
- MLGraphBuilder.buildSync()
- MLContext.computeSync()

* Use [=reject=] |promise| with a {{TypeError}}
* Abort after rejecting promise in parallel steps

Fix #531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants