Skip to content

Explainer for text conversion stuff #1409

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

morlovich
Copy link
Collaborator

Relevant to #961

Similarly, a Uint8Array containing UTF-8 data can be converted to a String by calling
`protectedAudience.decodeUtf8(someArray)`. Note that this is specifically for Uint8Arrays, and
will not handle other, similar, types.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These utility functions are useful for passing `String`s into and out of WebAssembly functions where `String`s are required to pass through the [`WebAssembly.memory ArrayBuffer`](https://developer.mozilla.org/en-US/docs/WebAssembly/Reference/JavaScript_interface/Memory).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did only the first part since I don't want to tell people with more WebAssembly experience than me how to interface with it...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my previous suggested text is reasonable. We also probably want to add something like:
Utilities like wasm-bindgen are often used to automate WebAssembly function binding generation and may rely on Web classes like TextEncoder and TextDecoder to accomplish these conversions. We can supply implementations of these classes using encodeUtf8() and decodeUtf8() like so:

class TextEncoder { encode(s) { return protectedAudience.encodeUtf8(s); } }
class TextDecoder { decode(s) { return protectedAudience.decodeUtf8(s); } }

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably include links to wasm-bindgen and TextEncoder and TextDecoder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my previous suggested text is reasonable.

Memory array isn't the only way.
https://developer.mozilla.org/en-US/docs/WebAssembly/Guides/JavaScript_builtins

(And your polyfill example lacks feature detection --- and is too modern a JS dialect for me to check by eyeballing)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory array isn't the only way.

I think I covered that by proposing "may rely on..."

JS builtins seem like they might allow calling built-in JS functions on JS Strings from WASM, but I don't think they allow WASM to efficiently access the characters of the string.
I'm not saying my proposed text is perfect, feel free to edit and improve it, but I think we need to provide more help to readers in using these new additions and I think we need to explain the motivation for these. Explainers' central responsibility is to discuss motivation.

Copy link
Collaborator Author

@morlovich morlovich Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a "charCode at index X" builtin, but I have no idea of how fast it is; and also how well it fits the tooling on the other side.

Thanks for the context/motivation.

Maybe I could say something like:

Tools like wasm-bindgen frequently perform these conversions using TextEncoder and TextDecoder interfaces in order to pass Strings from JavaScript to WASM efficiently. Since these classes are not available in the bidder, seller, or reporting script environments, protectedAudience.encodeUtf8 and decodeUtf8 functions provide a way of efficiently polyfilling the minimum needed subset of their functionality. For example, a version incorporating feature detection:

TextEncoder = function() {}
TextEncoder.prototype = {}
if (globalThis.protectedAudience && protectedAudience.encodeUtf8) {
   TextEncoder.prototype.encode = protectedAudience.encodeUtf8;
} else {
   TextEncoder.prototype.encode = slowerJavaScriptEncodeImplementation;
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed this, so PTAL if you can (though I dunno if CMA pre-review process stuff is doable in half a day).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants