Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event-Level Click Conversion Measurement API #418

Closed
3 of 5 tasks
csharrison opened this issue Sep 5, 2019 · 16 comments
Closed
3 of 5 tasks

Event-Level Click Conversion Measurement API #418

csharrison opened this issue Sep 5, 2019 · 16 comments

Comments

@csharrison
Copy link

こんにちはTAG!

I'm requesting a TAG review of:

  • Name: Event-Level Click Conversion Measurement API
  • Specification URL: N/A
  • Explainer (containing user needs and example code)¹: url
  • GitHub issues (if you prefer feedback filed there): url
  • Tests: N/A
  • Primary contacts (and their relationship to the specification): @csharrison, @michaelkleber, @johnivdel

Further details:

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

You should also know that...

We’re still very early stage here, just looking to get TAG review earlier rather than later. We also have some nascent ideas in https://github.com/csharrison/conversion-measurement-api/blob/master/AGGREGATE.md but those should be reviewed separately since it’s definitely not ready yet.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our GitHub repo for each point of feedback
  • open a single issue in our GitHub repo for the entire review
  • leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

@hadleybeeman
Copy link
Member

We started to look at this in our F2F in Tokyo but ran out of time, so we weren't able to start the review.

We are noting that this is a competing proposal to The Private Click Measurement proposal.

@hober and I will do a deeper dive as soon as we have the chance.

@csharrison
Copy link
Author

Thanks! Yeah we tried to align our proposal as much as possible with the Private Click Measurement proposal. We're supportive of trying to land on a unified solution that can be used by all browsers to suit their (potentially differing) needs. This is related to privacycg/private-click-measurement#11.

@lknik
Copy link
Member

lknik commented Oct 22, 2019

This thread looks to be of interest with input from Mozilla and WebKit. Input about possible risk of misuse would be welcome.

Would you be so kind and expand the answer to 2.2 (security and privacy health check)?

Now it's:

2.2. Is this specification exposing the minimum amount of information necessary to power the feature?
Yes

Are you sure about 2.12, i.e. can't these identifiers serve as temporary ID?

@csharrison
Copy link
Author

csharrison commented Oct 23, 2019

@lknik I updated question 2.2 of the questionnaire to go into a bit more depth on minimum information necessary since there is some nuance there. I also just updated the Privacy Considerations of the explainer to go into more depth (link).

Can you elaborate on what precisely you mean by "can't these identifiers can serve as a temporary ID"?

@lknik
Copy link
Member

lknik commented Oct 29, 2019

@csharrison

Sure. Are you sure the information exchanged using this API is not constituting a temporary identifier?

@csharrison
Copy link
Author

@lknik I may have misunderstood that question in the questionnaire as the browser creating some new global identifier that is readable across sites and can be used for re-identifying users across the web, which this API does not do.

I updated the section, since I suppose you could consider the join of <impression metadata, conversion metadata> as a temporary identifier that just isn't exposed to script and is only sent in non-credentialed requests. Can you take a look and see if it makes sense to you?

@lknik
Copy link
Member

lknik commented Nov 5, 2019

Hi,

Thanks for additional clarifications! Will have a look.

So

This requirement is based around the fact that sophisticated ads ranking is done via complex machine learning, where individual inferences need to be annotated with labels. Without the 64 bits to identify which ad / click converted, we don’t get the labelling.

Is "we" here the royal "we" or any particular "we"? ;)

More to the point, I still don't quite get it why 64b is exactly needed. I don't want to be overly picky (despite I maybe am a bit), but while I understand your reply to 2.2, I don't get it where those 64b come from.

In the meantime a bit more architectural question/comments/remarks/angle. Basically what I wonder is the overlap factor with another existing spec (also implemented already I think) that intends to deal with similar functionality, specifically Ad Click Attribution that defines the following:

<a adCampaignId=”[6-bit ad campaign id]" adDestination="[ad click destination URL]">

In Conversion Measurement API defines:

<a addestination=”[eTLD+1]” impressiondata=”[string]” impressionexpiry=[unsigned long long] reportingdomain=”[eTLD+1]”>

Of course we may end up with sites using the following in practice:

<a addestination=”[eTLD+1]” adCampaignId="[6-bit ad campaign id]" impressiondata=”[string]” impressionexpiry=[unsigned long long] reportingdomain=”[eTLD+1]”>

...with some implementations selectively ignoring parts of the attributes. Though here I must thank you for clearly defining what adDestination is (which is apparently not the case with ad click attribution; yes I realise that based on spec text explains that adDestination in both specs translates to same thing).

But I wonder if you could not discuss/agree on any convergence in particular. Because at the moment it seems we're exploring two different specs that deal with indeed pretty similar tasks. I am not sure if this kind of enrichment makes web platform a better place.

@csharrison
Copy link
Author

Is "we" here a royal "we" or any particular "we"? ;)

Sorry that's the royal we. Labeling in general is not possible unless we can pick the specific inference (i.e. ad selection) and say whether it was a success or failure (or maybe something a little finer grained).

More to the point, I still don't quite get it why 64b is exactly needed. I don't want to be overly picky (despite I maybe am a bit), but while I understand your reply to 2.2, I don't get it where those 64b come from.

64 exact bits aren't fully needed, but you basically want a scheme that lets you avoid too many collisions (i.e. the scheme is "event-level" where we can pinpoint a specific event like an ad-click and label that). We could probably reduce this down, but as you probably know, anything >= 33 bits can identify an individual on earth, and you likely need a lot less to identify a user on a particular site in a given 2 day window, so we only really start getting meaningful privacy protections if the impression metadata is reduced below 32 bits. However, as soon as you start getting into a regime where your click ids are colliding a lot (you are using the full bit space), utility decreases a lot.

In this case in the design we chose 64 bits because we felt moving below 33 would prove tricky utility wise due to collisions, and reducing the 64 bits to some number > 33 didn't really improve privacy at the margin.

But I wonder if you could not discuss/agree on any convergence in particular. Because at the moment it seems we're exploring two different specs that deal with indeed pretty similar tasks. I am not sure if this kind of enrichment makes web platform a better place.

Yes I think we should try to align on convergence. I think many of these API differences can be resolved by aligning on API surface, and having some sort of "configuration" advertising the valid inputs. For instance, a UA that wants the impression metadata to be 6 bits can use the same attribute but advertise that they only accept 6 bit input. Similarly, the reportingdomain could only accept e.g. publisher or addestination domains for UAs that want those guarantees.

cc @johnwilander

@hober
Copy link
Contributor

hober commented Nov 7, 2019

as you probably know, anything >= 33 bits can identify an individual on earth, and you likely need a lot less to identify a user on a particular site in a given 2 day window, so we only really start getting meaningful privacy protections if the impression metadata is reduced below 32 bits.
[…]
In this case in the design we chose 64 bits[…]

So, by that logic, you don't believe your proposal gets meaningful privacy protection. Noted.

@csharrison
Copy link
Author

csharrison commented Nov 7, 2019

So, by that logic, you don't believe your proposal gets meaningful privacy protection. Noted.

I don't think that's a fair characterization. My statement was about the size of the impression id, i.e. the publisher-side identifier. An identifier on its own, even a high-entropy one, isn't necessarily bad for privacy even if it can be used to identify a user. For instance, the fetch() API can be used to send arbitrarily large identifiers.

The privacy sensitive aspect of this API is what information is allowed to be joined with this publisher-side identifier that couldn't before (i.e. what this gives you over using the fetch API). In this case, we allow a noisy, very low entropy (e.g. 3 bit) cross site identifier to be joined, gated on a click and some action on the advertiser site (a conversion). Abstractly, the API introduces something like a rate-limited, low-entropy, noisy message channel from advertisers --> publishers, where messages can only be sent on clicks.

The privacy of the API would be improved if the impression side ID were < 32 bits, but I still think the API has meaningful privacy protections.

@michaelkleber
Copy link

michaelkleber commented Nov 7, 2019

@hober It's helpful to be more precise than just saying "privacy", and indeed the list of high-level threats in the Target Privacy Threat Model that PING is working on should give us the language to communicate better here.

A lot of this proposal is focused on threat "Unexpected Recognition, cross-site" — that is, on preventing anyone from recognizing the same user across two different sites. We talked about why that was our primary focus in our privacy model explainer. Fixing that problem definitely is "meaningful privacy protection".

The impression ID here is deliberately large enough to uniquely identify which ad impression it was that converted, so it also allows a small amount of what the Privacy Threat Model calls "information disclosure". That's the "rate-limited, low-entropy, noisy message channel" that @csharrison described. Putting the browser in control of the rate, entropy, and noise is also a "meaningful privacy protection". And sure, blocking information flow altogether is of course "more private", but it also doesn't solve the problem at hand.

@lknik
Copy link
Member

lknik commented Dec 4, 2019

Thanks a lot for the updates. My following input.

In this case in the design we chose 64 bits because we felt moving below 33 would prove tricky utility wise due to collisions, and reducing the 64 bits to some number > 33 didn't really improve privacy at the margin.

So it seems to me you acknowledge that size (erm...) matters, which is sensible indeed, but on the other hand we're all also aware that much less than 33b is sufficient for tracking. In this case, the 64b is closer to "just a bit more than we really need", or closer to "just right", and if so, why? Sounds a bit arbitrary to some extent still. So if this is an arbitrary choice, why not, say, 60 bits, or 1024 bits, or no limit and leaving it to the browser? Which would mean that for example Safari would have its bit length, and other browsers, maybe other numbers (since it seems both UAs go for different numbers anyway).

There is, of course, also the report uri attribute.

But I wonder if you could not discuss/agree on any convergence in particular.

Yes I think we should try to align on convergence. [...]

Indeed would be great. I don't know, however, how such co-op should start. Is there any progress on starting the conversation somehow, somewhere?

@michaelkleber

Putting the browser in control of the rate, entropy, and noise is also a "meaningful privacy protection". And sure, blocking information flow altogether is of course "more private", but it also doesn't solve the problem at hand.

I see your point. Though I actually wonder what is the main point here. Solve cross-site tracking potential? Civilise tracking a bit to give compelling arguments for softer anti-tracker blocking, since you mention above blocking information flows?

@michaelkleber
Copy link

Hi @lknik: In consultation with @johnwilander, we're going to try to align our two proposals through discussion and issues on https://github.com/WICG/ad-click-attribution.

@csharrison
Copy link
Author

Hey @lknik , thanks for the response.

In this case, the 64b is closer to "just a bit more than we really need", or closer to "just right", and if so, why? Sounds a bit arbitrary to some extent still. So if this is an arbitrary choice, why not, say, 60 bits, or 1024 bits, or no limit and leaving it to the browser?

I agree this is an arbitrary number. We picked it for two reasons:

  1. It is large enough to prevent collisions, which improves the utility. We wanted data to be "event-level" at the impression side.
  2. It is small enough that it won't be misused as something other than a click identifier. For instance, we didn't want people putting arbitrary text in there, and relying on that behavior especially in the event that browsers in the future would want to ramp down the maximum. This is why we didn't like "no limit".

60 bits vs 64 bits I believe makes no practical difference to utility or privacy, so we went with the rounder number.

@lknik
Copy link
Member

lknik commented Jan 13, 2020

We discussed it during telecon today. For the moment we propose to close the issue. Should you feel the need, please come back for additional feedback once you progress on the layering of the approach with the PCM one (it will be more meaningful to look it at that point). Thank you, tuning out!

@lknik lknik closed this as completed Jan 13, 2020
@csharrison
Copy link
Author

Hey TAG reviewers,
Sorry for the delay. Here is our conclusions from ongoing discussions with @johnwilander:

  • Naming: our attribute names are quite different from PCM and we’re working on a new set of names after receiving some feedback that should hopefully make things more clear. We’ll update that issue with the proposed names but it may be something we can align on.
  • Bits: PCM is potentially interested in supplying 3 bits of conversion data (aligning with this proposal) but we are still interested in event-level data on the impression side. We agreed that the 64 bit ID could live in a different attribute in the Privacy CG F2F in May.
  • Noise: We are still interested in adding noise to our 3 bits but they are not. This could be resolved in a spec by having noise be browser determined and the noise rate explicitly included in the conversion report.
  • Report delays: There are some fundamental privacy reasons why we can’t implement the same reporting delays as PCM due to our 64 bits, so we can’t quite align there. We could get around this in the spec by having delays be browser controlled, and having the reports say something about which delays were imposed by the UA.
  • 3rd party reporters: we proposed a possible way to align on API surface but PCM is not interested in supporting this use case in general and would only support reports going to publishers / advertisers directly.
  • Multiple conversions per click: PCM is not interested in supporting multiple conversions per click if they occur >24 hrs after the first click, which makes it difficult to align since this is an important use case we want to support. This feature also influences our proposal for report delays. This is discussed in this issue.

cc @johnwilander @michaelkleber @johnivdel. Happy to discuss some of these items more in the next privacy CG F2F if it's useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment