Event-Level Click Conversion Measurement API #418

csharrison · 2019-09-05T13:44:35Z

こんにちはTAG!

I'm requesting a TAG review of:

Name: Event-Level Click Conversion Measurement API
Specification URL: N/A
Explainer (containing user needs and example code)¹: url
GitHub issues (if you prefer feedback filed there): url
Tests: N/A
Primary contacts (and their relationship to the specification): @csharrison, @michaelkleber, @johnivdel

Further details:

Relevant time constraints or deadlines: We’d like to discuss this at TPAC (Sept 16, 2019). Other than that no hard time constraints.
I have read and filled out the Self-Review Questionnare on Security and Privacy. The assessment is here.
I have reviewed the TAG's API Design Principles
The group where the work on this specification is: No group yet

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

Links to major pieces of multi-stakeholder review or discussion of this specification: N/A
Links to major unresolved issues or opposition with this specification:
- Publisher opt-in through an HTML attribute WICG/attribution-reporting-api#6

You should also know that...

We’re still very early stage here, just looking to get TAG review earlier rather than later. We also have some nascent ideas in https://github.com/csharrison/conversion-measurement-api/blob/master/AGGREGATE.md but those should be reviewed separately since it’s definitely not ready yet.

We'd prefer the TAG provide feedback as (please select one):

open issues in our GitHub repo for each point of feedback
open a single issue in our GitHub repo for the entire review
leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

hadleybeeman · 2019-09-12T06:19:23Z

We started to look at this in our F2F in Tokyo but ran out of time, so we weren't able to start the review.

We are noting that this is a competing proposal to The Private Click Measurement proposal.

@hober and I will do a deeper dive as soon as we have the chance.

csharrison · 2019-09-12T16:18:21Z

Thanks! Yeah we tried to align our proposal as much as possible with the Private Click Measurement proposal. We're supportive of trying to land on a unified solution that can be used by all browsers to suit their (potentially differing) needs. This is related to privacycg/private-click-measurement#11.

lknik · 2019-10-22T17:03:12Z

This thread looks to be of interest with input from Mozilla and WebKit. Input about possible risk of misuse would be welcome.

Would you be so kind and expand the answer to 2.2 (security and privacy health check)?

Now it's:

2.2. Is this specification exposing the minimum amount of information necessary to power the feature?
Yes

Are you sure about 2.12, i.e. can't these identifiers serve as temporary ID?

csharrison · 2019-10-23T19:03:05Z

@lknik I updated question 2.2 of the questionnaire to go into a bit more depth on minimum information necessary since there is some nuance there. I also just updated the Privacy Considerations of the explainer to go into more depth (link).

Can you elaborate on what precisely you mean by "can't these identifiers can serve as a temporary ID"?

lknik · 2019-10-29T21:36:06Z

@csharrison

Sure. Are you sure the information exchanged using this API is not constituting a temporary identifier?

csharrison · 2019-10-29T21:54:48Z

@lknik I may have misunderstood that question in the questionnaire as the browser creating some new global identifier that is readable across sites and can be used for re-identifying users across the web, which this API does not do.

I updated the section, since I suppose you could consider the join of <impression metadata, conversion metadata> as a temporary identifier that just isn't exposed to script and is only sent in non-credentialed requests. Can you take a look and see if it makes sense to you?

lknik · 2019-11-05T21:46:00Z

Hi,

Thanks for additional clarifications! Will have a look.

So

This requirement is based around the fact that sophisticated ads ranking is done via complex machine learning, where individual inferences need to be annotated with labels. Without the 64 bits to identify which ad / click converted, we don’t get the labelling.

Is "we" here the royal "we" or any particular "we"? ;)

More to the point, I still don't quite get it why 64b is exactly needed. I don't want to be overly picky (despite I maybe am a bit), but while I understand your reply to 2.2, I don't get it where those 64b come from.

In the meantime a bit more architectural question/comments/remarks/angle. Basically what I wonder is the overlap factor with another existing spec (also implemented already I think) that intends to deal with similar functionality, specifically Ad Click Attribution that defines the following:

<a adCampaignId=”[6-bit ad campaign id]" adDestination="[ad click destination URL]">

In Conversion Measurement API defines:

<a addestination=”[eTLD+1]” impressiondata=”[string]” impressionexpiry=[unsigned long long] reportingdomain=”[eTLD+1]”>

Of course we may end up with sites using the following in practice:

<a addestination=”[eTLD+1]” adCampaignId="[6-bit ad campaign id]" impressiondata=”[string]” impressionexpiry=[unsigned long long] reportingdomain=”[eTLD+1]”>

...with some implementations selectively ignoring parts of the attributes. Though here I must thank you for clearly defining what adDestination is (which is apparently not the case with ad click attribution; yes I realise that based on spec text explains that adDestination in both specs translates to same thing).

But I wonder if you could not discuss/agree on any convergence in particular. Because at the moment it seems we're exploring two different specs that deal with indeed pretty similar tasks. I am not sure if this kind of enrichment makes web platform a better place.

csharrison · 2019-11-06T18:02:52Z

Is "we" here a royal "we" or any particular "we"? ;)

Sorry that's the royal we. Labeling in general is not possible unless we can pick the specific inference (i.e. ad selection) and say whether it was a success or failure (or maybe something a little finer grained).

More to the point, I still don't quite get it why 64b is exactly needed. I don't want to be overly picky (despite I maybe am a bit), but while I understand your reply to 2.2, I don't get it where those 64b come from.

64 exact bits aren't fully needed, but you basically want a scheme that lets you avoid too many collisions (i.e. the scheme is "event-level" where we can pinpoint a specific event like an ad-click and label that). We could probably reduce this down, but as you probably know, anything >= 33 bits can identify an individual on earth, and you likely need a lot less to identify a user on a particular site in a given 2 day window, so we only really start getting meaningful privacy protections if the impression metadata is reduced below 32 bits. However, as soon as you start getting into a regime where your click ids are colliding a lot (you are using the full bit space), utility decreases a lot.

In this case in the design we chose 64 bits because we felt moving below 33 would prove tricky utility wise due to collisions, and reducing the 64 bits to some number > 33 didn't really improve privacy at the margin.

But I wonder if you could not discuss/agree on any convergence in particular. Because at the moment it seems we're exploring two different specs that deal with indeed pretty similar tasks. I am not sure if this kind of enrichment makes web platform a better place.

Yes I think we should try to align on convergence. I think many of these API differences can be resolved by aligning on API surface, and having some sort of "configuration" advertising the valid inputs. For instance, a UA that wants the impression metadata to be 6 bits can use the same attribute but advertise that they only accept 6 bit input. Similarly, the reportingdomain could only accept e.g. publisher or addestination domains for UAs that want those guarantees.

cc @johnwilander

hober · 2019-11-07T00:15:25Z

as you probably know, anything >= 33 bits can identify an individual on earth, and you likely need a lot less to identify a user on a particular site in a given 2 day window, so we only really start getting meaningful privacy protections if the impression metadata is reduced below 32 bits.
[…]
In this case in the design we chose 64 bits[…]

So, by that logic, you don't believe your proposal gets meaningful privacy protection. Noted.

csharrison · 2019-11-07T00:37:43Z

So, by that logic, you don't believe your proposal gets meaningful privacy protection. Noted.

I don't think that's a fair characterization. My statement was about the size of the impression id, i.e. the publisher-side identifier. An identifier on its own, even a high-entropy one, isn't necessarily bad for privacy even if it can be used to identify a user. For instance, the fetch() API can be used to send arbitrarily large identifiers.

The privacy sensitive aspect of this API is what information is allowed to be joined with this publisher-side identifier that couldn't before (i.e. what this gives you over using the fetch API). In this case, we allow a noisy, very low entropy (e.g. 3 bit) cross site identifier to be joined, gated on a click and some action on the advertiser site (a conversion). Abstractly, the API introduces something like a rate-limited, low-entropy, noisy message channel from advertisers --> publishers, where messages can only be sent on clicks.

The privacy of the API would be improved if the impression side ID were < 32 bits, but I still think the API has meaningful privacy protections.

michaelkleber · 2019-11-07T15:23:07Z

@hober It's helpful to be more precise than just saying "privacy", and indeed the list of high-level threats in the Target Privacy Threat Model that PING is working on should give us the language to communicate better here.

A lot of this proposal is focused on threat "Unexpected Recognition, cross-site" — that is, on preventing anyone from recognizing the same user across two different sites. We talked about why that was our primary focus in our privacy model explainer. Fixing that problem definitely is "meaningful privacy protection".

The impression ID here is deliberately large enough to uniquely identify which ad impression it was that converted, so it also allows a small amount of what the Privacy Threat Model calls "information disclosure". That's the "rate-limited, low-entropy, noisy message channel" that @csharrison described. Putting the browser in control of the rate, entropy, and noise is also a "meaningful privacy protection". And sure, blocking information flow altogether is of course "more private", but it also doesn't solve the problem at hand.

lknik · 2019-12-04T16:52:46Z

Thanks a lot for the updates. My following input.

In this case in the design we chose 64 bits because we felt moving below 33 would prove tricky utility wise due to collisions, and reducing the 64 bits to some number > 33 didn't really improve privacy at the margin.

So it seems to me you acknowledge that size (erm...) matters, which is sensible indeed, but on the other hand we're all also aware that much less than 33b is sufficient for tracking. In this case, the 64b is closer to "just a bit more than we really need", or closer to "just right", and if so, why? Sounds a bit arbitrary to some extent still. So if this is an arbitrary choice, why not, say, 60 bits, or 1024 bits, or no limit and leaving it to the browser? Which would mean that for example Safari would have its bit length, and other browsers, maybe other numbers (since it seems both UAs go for different numbers anyway).

There is, of course, also the report uri attribute.

But I wonder if you could not discuss/agree on any convergence in particular.

Yes I think we should try to align on convergence. [...]

Indeed would be great. I don't know, however, how such co-op should start. Is there any progress on starting the conversation somehow, somewhere?

@michaelkleber

Putting the browser in control of the rate, entropy, and noise is also a "meaningful privacy protection". And sure, blocking information flow altogether is of course "more private", but it also doesn't solve the problem at hand.

I see your point. Though I actually wonder what is the main point here. Solve cross-site tracking potential? Civilise tracking a bit to give compelling arguments for softer anti-tracker blocking, since you mention above blocking information flows?

michaelkleber · 2019-12-09T18:30:14Z

Hi @lknik: In consultation with @johnwilander, we're going to try to align our two proposals through discussion and issues on https://github.com/WICG/ad-click-attribution.

csharrison · 2019-12-09T20:48:42Z

Hey @lknik , thanks for the response.

In this case, the 64b is closer to "just a bit more than we really need", or closer to "just right", and if so, why? Sounds a bit arbitrary to some extent still. So if this is an arbitrary choice, why not, say, 60 bits, or 1024 bits, or no limit and leaving it to the browser?

I agree this is an arbitrary number. We picked it for two reasons:

It is large enough to prevent collisions, which improves the utility. We wanted data to be "event-level" at the impression side.
It is small enough that it won't be misused as something other than a click identifier. For instance, we didn't want people putting arbitrary text in there, and relying on that behavior especially in the event that browsers in the future would want to ramp down the maximum. This is why we didn't like "no limit".

60 bits vs 64 bits I believe makes no practical difference to utility or privacy, so we went with the rounder number.

lknik · 2020-01-13T17:15:43Z

We discussed it during telecon today. For the moment we propose to close the issue. Should you feel the need, please come back for additional feedback once you progress on the layering of the approach with the PCM one (it will be more meaningful to look it at that point). Thank you, tuning out!

csharrison · 2020-09-11T22:14:20Z

Hey TAG reviewers,
Sorry for the delay. Here is our conclusions from ongoing discussions with @johnwilander:

Naming: our attribute names are quite different from PCM and we’re working on a new set of names after receiving some feedback that should hopefully make things more clear. We’ll update that issue with the proposed names but it may be something we can align on.
Bits: PCM is potentially interested in supplying 3 bits of conversion data (aligning with this proposal) but we are still interested in event-level data on the impression side. We agreed that the 64 bit ID could live in a different attribute in the Privacy CG F2F in May.
Noise: We are still interested in adding noise to our 3 bits but they are not. This could be resolved in a spec by having noise be browser determined and the noise rate explicitly included in the conversion report.
Report delays: There are some fundamental privacy reasons why we can’t implement the same reporting delays as PCM due to our 64 bits, so we can’t quite align there. We could get around this in the spec by having delays be browser controlled, and having the reports say something about which delays were imposed by the UA.
3rd party reporters: we proposed a possible way to align on API surface but PCM is not interested in supporting this use case in general and would only support reports going to publishers / advertisers directly.
Multiple conversions per click: PCM is not interested in supporting multiple conversions per click if they occur >24 hrs after the first click, which makes it difficult to align since this is an important use case we want to support. This feature also influences our proposal for report delays. This is discussed in this issue.

cc @johnwilander @michaelkleber @johnivdel. Happy to discuss some of these items more in the next privacy CG F2F if it's useful.

csharrison added the Progress: untriaged label Sep 5, 2019

torgo assigned hober Sep 10, 2019

torgo added Progress: unreviewed and removed Progress: untriaged labels Sep 10, 2019

torgo assigned lknik, hadleybeeman and torgo Sep 10, 2019

lknik added Missing: security & privacy review Review type: deep thoughts Topic: HTML labels Oct 23, 2019

torgo added this to the 2020-01-06-week milestone Dec 16, 2019

michaelkleber mentioned this issue Dec 23, 2019

Layering "Click Through Conversion Measurement Event-Level API " on top of PCM privacycg/private-click-measurement#26

Closed

torgo modified the milestones: 2020-01-06-week, 2020-01-13-week Jan 6, 2020

lknik closed this as completed Jan 13, 2020

abebis mentioned this issue Jan 21, 2020

Consider defining a modern JS API in addition to the tracking pixel mechanism privacycg/private-click-measurement#31

Open

yoavweiss mentioned this issue Mar 11, 2020

How does someone go about transferring a repo from WICG to another CG? WICG/admin#93

Closed

csharrison mentioned this issue Oct 15, 2020

Define structure of JSON used for conversion reports privacycg/private-click-measurement#30

Closed

cynthia added the Provenance: Privacy Sandbox label Mar 17, 2022

johnivdel mentioned this issue Mar 23, 2022

Review Request for Attribution Reporting API #724

Open

rhiaro added Mode: none Does not require TAG review and removed Progress: unreviewed labels May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Event-Level Click Conversion Measurement API #418

Event-Level Click Conversion Measurement API #418

csharrison commented Sep 5, 2019

hadleybeeman commented Sep 12, 2019

csharrison commented Sep 12, 2019

lknik commented Oct 22, 2019

csharrison commented Oct 23, 2019 •

edited

lknik commented Oct 29, 2019

csharrison commented Oct 29, 2019

lknik commented Nov 5, 2019 •

edited

csharrison commented Nov 6, 2019

hober commented Nov 7, 2019

csharrison commented Nov 7, 2019 •

edited

michaelkleber commented Nov 7, 2019 •

edited

lknik commented Dec 4, 2019

michaelkleber commented Dec 9, 2019

csharrison commented Dec 9, 2019

lknik commented Jan 13, 2020

csharrison commented Sep 11, 2020

Event-Level Click Conversion Measurement API #418

Event-Level Click Conversion Measurement API #418

Comments

csharrison commented Sep 5, 2019

hadleybeeman commented Sep 12, 2019

csharrison commented Sep 12, 2019

lknik commented Oct 22, 2019

csharrison commented Oct 23, 2019 • edited

lknik commented Oct 29, 2019

csharrison commented Oct 29, 2019

lknik commented Nov 5, 2019 • edited

csharrison commented Nov 6, 2019

hober commented Nov 7, 2019

csharrison commented Nov 7, 2019 • edited

michaelkleber commented Nov 7, 2019 • edited

lknik commented Dec 4, 2019

michaelkleber commented Dec 9, 2019

csharrison commented Dec 9, 2019

lknik commented Jan 13, 2020

csharrison commented Sep 11, 2020

csharrison commented Oct 23, 2019 •

edited

lknik commented Nov 5, 2019 •

edited

csharrison commented Nov 7, 2019 •

edited

michaelkleber commented Nov 7, 2019 •

edited