Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Select a fraud prevention mechanism #27

Open
johnwilander opened this issue Dec 13, 2019 · 15 comments
Open

☂️ Select a fraud prevention mechanism #27

johnwilander opened this issue Dec 13, 2019 · 15 comments
Assignees
Labels
fraud prevention Related to fraud prevention layering Layering additional data and functionality on top of PCM

Comments

@johnwilander
Copy link
Collaborator

johnwilander commented Dec 13, 2019

We need to determine what fraud prevention mechanism to use. Each possible mechanism should be labeled fraud prevention.

Labeled layering because the mechanism should work for the Event-level Conversion Measurement folks too.

Old summary below.


Fraud prevention has been discussed in #6 and at W3C TPAC in September this year.

The Attack

The secure HTTP POST request to ad click source eTLD+1/.well-known/ad-click-attribution/6-bit ad attribution data/6-bit ad campaign id can be spoofed to report fraudulent attribution data.

The Countermeasure

The suggested way to prevent this kind of fraudulent attribution reporting is to create blinded cryptographic proof of trustworthiness at the time of the ad click, the conversion, or both, and provide that proof as part of the attribution report.

The existing proposal for such proof is Trust Tokens, as outlined in the Trust Token API proposal.

To discuss:

  • The Trust Token API describes a web API whereas in the context of PCM, the browser would initiate the creation of Trust Tokens at the click source, the conversion site, or both. How do we untie Trust Tokens from the Trust Token API?
  • The Trust Token API proposal refers to Privacy Pass by a team of people which is currently not a standard. What is the plan in terms of standardization?
  • The underlying technology is a so called blinded signature. Are all the pieces in Trust Tokens needed for the purposes of fraud prevention in PCM? If not, how close could we get to just blinded signatures?
  • The Trust Token API proposal mentions “a list of allowed or disallowed issuers.” Will this solution scale if we were to support trust tokens not only by click sources but also by click destinations? Basically any website that’s on either side of an ad click that needs attribution reports would have to be on the list. Is this where the disallow list comes in? We should discuss this in the context of PCM.
@johnwilander johnwilander added the layering Layering additional data and functionality on top of PCM label Dec 13, 2019
@johnwilander
Copy link
Collaborator Author

Ping @michaelkleber @csharrison.

@benjaminsavage
Copy link

Thanks @johnwilander for filing this issue!

I think your suggested countermeasure (blinded cryptographic proof of trustworthiness) is a very promising one.

I too think it makes sense to just use blinded signatures (as opposed to somehow trying to utilize the full Trust Token API).

Here is one idea for how the browser could initiate the process:

Step 1:

The publisher website displays an ad, with an adCampaignID and adDestination attribute.

Step 2:

The person clicks on the ad.

Step 3:

The browser generates a secret, random "nonce" that is kept in a secret location, inaccessible via any JavaScript API, and not accessible by any browser extension.

Step 4:

The browser "blinds" this "nonce" with a blinding function. It generates a random "blinding factor" to do so. I would recommend using a different "blinding factor" each time. As before, this random "blinding factor" should be kept in a secret location, inaccessible via any JavaScript API, and not accessible by any browser extension.
Pseudocode:

blinding_factor = genRandomBlindingFactor();
blinded_nonce = blind(nonce, blinding_factor);

Step 5:

The browser sends this "blinded_nonce" to the publisher website, in a request that is tied to the specific click that just happened an instant ago, asking for a signature.
The objective would be to ensure that:

  1. The publisher never generates more than one signature per ad-click
  2. The publisher can provide a signature that matches the adCampaignID and adDestination of the click

There are two approaches that come to mind here. Either the ad-click could be a redirect that first bounces the user through a "blind signature" step before sending them onwards to the actual adDestination, or the browser could issue an out of band request to the blind signature endpoint.

Step 6:

The publisher website would receive the request to provide a signature for an ad click that just happened moments ago. This request must contain sufficient information such that the publisher knows which ad-click this request refers to. This is so they:

  1. Can ensure the ad was actually clicked, not just viewed.
  2. Can ensure they haven't already given out a signature for this click
  3. So they can look up the adCampaignID and adDestination associated with that click

If no signature has already been given out for this click, the publisher should provide a cryptographic signature corresponding to the adCampaignID and adDestination of that click.

There are probably more elegant / advanced ways of doing this (would love help from you cryptographers out there!), but just for simplicity let's assume the publisher generates a unique Key-Value pair for each combination of adCampaignID and adDestination.

The publisher signs the blinded-nonce with the appropriate private key.
Pseudocode:

private_key = lookupPrivateKey(adCampaignID, adDestination);
blind_signature = RSASign(blinded_nonce, private_key);

Step 7:

When the browser receives this blind_signature from the publisher, it should validate that this value is actually a legitimate signature, and not somehow being used as a tracking vector.

To do this, the browser should download the Public key that corresponds to the given adCampaignID, adDestination pair.

The publisher should make these public keys available somehow (perhaps another endpoint?).

To ensure the publisher isn't rotating them super quickly (as part of some kind of attempt at tracking) requests for these keys should not have cookies (i.e. anonymous requests) and perhaps they can be cached somewhere outside of the Publishers control (there isn't a good reason for them to be changing often). Our friends at Google have thought more about this problem than I have and I'm sure the folks who worked on Key Transparency have more thoughts.

Step 8:

With the correct public key available, the browser can quickly check that the signature is valid:
Pseudocode:

is_valid = verify(blinded_nonce, blind_signature, public_key);

If the signature is not valid, the browser should drop data about this ad click. It should not be eligible for generating a conversion report.

If the signature is valid the browser can "unblind" the signature using the random blinding factor from before.
Pseudocode:

publisher_signature = unblind(blind_signature, blinding_factor);

At this point, the browser can dispose of:

  • The blinded_nonce
  • The blinding_factor
  • The blind_signature

None of these will be needed again.

The publisher_signature and the nonce should be kept together with the other information about this click requesting attribution (namely the adCampaignID and the adDestination).

Step 9:

At a much later time, when the anonymous conversion report is generated, the report can contain 2 additional fields: the nonce and the publisher_signature.

Neither of these add any additional tracking information. The publisher website has never seen either, and they are un-linkable to the blinded_nonce and the blinded_signature the publisher website saw earlier (this "un-linkability" is the key property of blind-signatures).

The publisher can validate that the signature is valid, and could only have been generated by them.
Psuedocode:

public_key = lookupPublicKey(adCampaignID, adDestination);
is_valid = verify(nonce, publisher_signature, public_key);

If the signature is invalid, this report is likely fraudulent and should be ignored. If the signature is valid, the publisher knows a few things:

  1. At some point in the past, they must have given out a blind-signature to this browser
  2. The blind-signature they gave out was for an ad-click that was actually directing people to the adDestination shown in the anonymous conversion report, and was a click on an ad with the adCampaignID specified in the anonymous conversion report.

Fraud Prevention guarantees

I think this would make life significantly more difficult for fraudsters. In order to generate fake conversions they would need to do a few things:

  1. Actually click on ads shown on the publisher
  2. Run some kind of jailbroken / hacked version of their browser that somehow enables them to access the nonce and publisher_signature.
  3. Only send fake conversion reports corresponding to the ads the publisher actually served them.

The third one is a really nice guarantee (that we got by virtue of using a different Key-Value pair per adCampaignID, adDestination pair). This makes it MUCH harder to mess around with a competitor (you might never get served an ad for a competitor).

The publisher can also choose to limit the types of ads it shows to possibly compromised / sketchy looking browsers (e.g. those that click on a LOT of ads...). They could serve a lot of "honey-pot ads" (ads no real person would ever click on, for example ugly ads advertising expensive, low-quality goods, in a language the end user doesn't understand). If the publisher starts seeing conversions on these "honey-pot ads" that's a good sign that at least some of those clients it deems "possibly compromised / sketchy" are indeed engaging in conversion fraud. This is a nice trick because it's very hard for automated scripts to differentiate between real ads and "honey-pot ads". It's hard to write a script that can say: "This ad doesn't seem appealing".

All of these ideas are drawn from our "Private Fraud Prevention" repo: https://github.com/siyengar/private-fraud-prevention

@csharrison
Copy link

Hi, sorry for delay here. I think integrating blind signatures makes sense for this API. One concern I brought up in WICG/attribution-reporting-api#13 is that some of these techniques require additional thought if we want to add noise to the values we report.

@johnwilander
Copy link
Collaborator Author

Thanks @johnwilander for filing this issue!

I think your suggested countermeasure (blinded cryptographic proof of trustworthiness) is a very promising one.

Given @csharrison's positive comment, I think this looks like a viable way forward.

I too think it makes sense to just use blinded signatures (as opposed to somehow trying to utilize the full Trust Token API).

Here is one idea for how the browser could initiate the process:

Step 1:

The publisher website displays an ad, with an adCampaignID and adDestination attribute.

While "publisher" works in concrete examples, we try to avoid it in general descriptions. The term muddies the waters since many don't think of publishers as engaged in cross-site tracking and the whole thing gets a false aura of benign, happy path behavior. The truth is this web technology will be available to all websites including social networks, search engines, click bait sites, shopping sites, bank sites, splash screens from service providers etc.

Step 2:

The person clicks on the ad.

Step 3:

The browser generates a secret, random "nonce" that is kept in a secret location, inaccessible via any JavaScript API, and not accessible by any browser extension.

Step 4:

The browser "blinds" this "nonce" with a blinding function. It generates a random "blinding factor" to do so. I would recommend using a different "blinding factor" each time. As before, this random "blinding factor" should be kept in a secret location, inaccessible via any JavaScript API, and not accessible by any browser extension.
Pseudocode:

blinding_factor = genRandomBlindingFactor();
blinded_nonce = blind(nonce, blinding_factor);

Step 5:

The browser sends this "blinded_nonce" to the publisher website, in a request that is tied to the specific click that just happened an instant ago, asking for a signature.
The objective would be to ensure that:

  1. The publisher never generates more than one signature per ad-click

This sounds like defense against some threat. Could you elaborate, please?

  1. The publisher can provide a signature that matches the adCampaignID and adDestination of the click

There are two approaches that come to mind here. Either the ad-click could be a redirect that first bounces the user through a "blind signature" step before sending them onwards to the actual adDestination, or the browser could issue an out of band request to the blind signature endpoint.

We don't want to add extra navigations or redirects if we can avoid them so out-of-band sounds best.

Step 6:

The publisher website would receive the request to provide a signature for an ad click that just happened moments ago. This request must contain sufficient information such that the publisher knows which ad-click this request refers to. This is so they:

  1. Can ensure the ad was actually clicked, not just viewed.

If these ad clicks were restricted to first-party links, we'd already be in a good place since the out-of-band request could carry cookies. But we've received the request for third-party serving of ad links in which case cookies are unlikely in browsers with anti tracking measures.

  1. Can ensure they haven't already given out a signature for this click
  2. So they can look up the adCampaignID and adDestination associated with that click

If no signature has already been given out for this click, the publisher should provide a cryptographic signature corresponding to the adCampaignID and adDestination of that click.

There are probably more elegant / advanced ways of doing this (would love help from you cryptographers out there!), but just for simplicity let's assume the publisher generates a unique Key-Value pair for each combination of adCampaignID and adDestination.

The publisher signs the blinded-nonce with the appropriate private key.
Pseudocode:

private_key = lookupPrivateKey(adCampaignID, adDestination);
blind_signature = RSASign(blinded_nonce, private_key);

Step 7:

When the browser receives this blind_signature from the publisher, it should validate that this value is actually a legitimate signature, and not somehow being used as a tracking vector.

To do this, the browser should download the Public key that corresponds to the given adCampaignID, adDestination pair.

The publisher should make these public keys available somehow (perhaps another endpoint?).

To ensure the publisher isn't rotating them super quickly (as part of some kind of attempt at tracking) requests for these keys should not have cookies (i.e. anonymous requests) and perhaps they can be cached somewhere outside of the Publishers control (there isn't a good reason for them to be changing often). Our friends at Google have thought more about this problem than I have and I'm sure the folks who worked on Key Transparency have more thoughts.

Step 8:

With the correct public key available, the browser can quickly check that the signature is valid:
Pseudocode:

is_valid = verify(blinded_nonce, blind_signature, public_key);

If the signature is not valid, the browser should drop data about this ad click. It should not be eligible for generating a conversion report.

If the signature is valid the browser can "unblind" the signature using the random blinding factor from before.
Pseudocode:

publisher_signature = unblind(blind_signature, blinding_factor);

At this point, the browser can dispose of:

  • The blinded_nonce
  • The blinding_factor
  • The blind_signature

None of these will be needed again.

The publisher_signature and the nonce should be kept together with the other information about this click requesting attribution (namely the adCampaignID and the adDestination).

Step 9:

At a much later time, when the anonymous conversion report is generated, the report can contain 2 additional fields: the nonce and the publisher_signature.

We suggest that the browser requests the ad click source's public key at this point (without cookies), checks the signature, and only sends the report of the signature is valid. This makes it much harder for the ad click source site to personalize the signature for tracking purposes. And if the ad click source site is able to personalize the signature and serve up the personalized public key at this later point, it already has the means of cross-site tracking.

Neither of these add any additional tracking information. The publisher website has never seen either, and they are un-linkable to the blinded_nonce and the blinded_signature the publisher website saw earlier (this "un-linkability" is the key property of blind-signatures).

The publisher can validate that the signature is valid, and could only have been generated by them.
Psuedocode:

public_key = lookupPublicKey(adCampaignID, adDestination);
is_valid = verify(nonce, publisher_signature, public_key);

If the signature is invalid, this report is likely fraudulent and should be ignored. If the signature is valid, the publisher knows a few things:

  1. At some point in the past, they must have given out a blind-signature to this browser
  2. The blind-signature they gave out was for an ad-click that was actually directing people to the adDestination shown in the anonymous conversion report, and was a click on an ad with the adCampaignID specified in the anonymous conversion report.

Fraud Prevention guarantees

I think this would make life significantly more difficult for fraudsters. In order to generate fake conversions they would need to do a few things:

  1. Actually click on ads shown on the publisher
  2. Run some kind of jailbroken / hacked version of their browser that somehow enables them to access the nonce and publisher_signature.
  3. Only send fake conversion reports corresponding to the ads the publisher actually served them.

The third one is a really nice guarantee (that we got by virtue of using a different Key-Value pair per adCampaignID, adDestination pair). This makes it MUCH harder to mess around with a competitor (you might never get served an ad for a competitor).

The publisher can also choose to limit the types of ads it shows to possibly compromised / sketchy looking browsers (e.g. those that click on a LOT of ads...). They could serve a lot of "honey-pot ads" (ads no real person would ever click on, for example ugly ads advertising expensive, low-quality goods, in a language the end user doesn't understand). If the publisher starts seeing conversions on these "honey-pot ads" that's a good sign that at least some of those clients it deems "possibly compromised / sketchy" are indeed engaging in conversion fraud. This is a nice trick because it's very hard for automated scripts to differentiate between real ads and "honey-pot ads". It's hard to write a script that can say: "This ad doesn't seem appealing".

All of these ideas are drawn from our "Private Fraud Prevention" repo: https://github.com/siyengar/private-fraud-prevention

We are considering blinded signatures at both ends — "create blinded cryptographic proof of trustworthiness at the time of the ad click, the conversion, or both" — i.e. both at the time of the ad click and at the time of conversion. That way, the resulting report can be validated for both those events. This becomes extra valuable if we do double reporting or optional reporting as explored in #31.

@csharrison
Copy link

cc @dvorak42 for visibility in ideas for integrating blind signatures with conversion measurement.

One additional point I'd like to make is that many publishers / advertisers may not be sophisticated enough to run their own signature endpoints, so from our perspective it would be important to have the capability to delegate the signing operation to a third party.

@benjaminsavage
Copy link

While "publisher" works in concrete examples, we try to avoid it in general descriptions.

I'm happy to use whatever term you prefer. What do you suggest?

The publisher never generates more than one signature per ad-click

This sounds like defense against some threat. Could you elaborate, please?

Certainly.

Putting myself in the shoes of the attacker, if I am going to successfully forge conversion events, and I need a valid signature to use in each fake conversion report, I will want to extract these signatures as efficiently as possible.

If the only way to get a valid signature from the website that displayed the ad is to generate a click on an ad shown there, I will need to scale that operation up somehow. The simplest way is to just click one time and try to generate a large number of signatures out of that one click. If we can cut off this path to scale, that will force the attacker to click on many ads (one click per signature). That will make their fake accounts used to harvest signatures stand out, making them easier to take down.

Perhaps a simpler way to think about it is this:

The maximum number of fake conversions that can be generated is bounded by the number of signatures given out. For this reason, it is very important to control the rate at which signatures are given out.

If these ad clicks were restricted to first-party links, we'd already be in a good place since the out-of-band request could carry cookies. But we've received the request for third-party serving of ad links in which case cookies are unlikely in browsers with anti tracking measures.

There are certainly publishers (like Facebook) who serve their own ads. We could certainly use first-party cookies to validate that this browser did actually click the ad in question, and to validate that we have not already given out a signature for this click.

But you are absolutely correct that this is a rather unusual case. The vast majority of publishers rely on ad networks to serve ads for them.

The Chrome team is proposing a partitioning of cookies in the future, once third-party cookies are removed; I wonder if we can use these partitioned cookies.

If the ad shown on news.example was served by ad-tech.example, it seems that ad-tech.example could simply drop a first-party cookie, scoped to news.example and log that along with any ad-clicks.

We just need to find some way of asking ad-tech.example to generate a signature related to this scoped cookie. I'm confident we can find some technical solution that isn't useful for cross-site tracking.

We suggest that the browser requests the ad click source's public key at this point (without cookies), checks the signature, and only sends the report of the signature is valid. This makes it much harder for the ad click source site to personalize the signature for tracking purposes. And if the ad click source site is able to personalize the signature and serve up the personalized public key at this later point, it already has the means of cross-site tracking.

I think we are saying the same thing - which is great!

Only one minor difference: I am suggesting that the signature which is generated is somehow bound to the adDestination and adCampaignID (which are already present in the eventual conversion report, so no new information is disclosed).

I think we can do this by requesting a public key just as you said, without cookies, and that we can check the signature is valid afterwards. I just want to fetch a signature for that specific combination of adDestination and adCampaignID. This is all about making life harder for fraudsters.

As explained later, if we pursue this route, clicking on an ad and obtaining a signature will only allow you to generate a fake conversion for the associated ad. Since the fraudster does not control which ad is served, this provides an extremely effective deterrent.

If the signature can be used to fake a conversion for any ad campaign, life is now much easier for fraudsters. They can click on any ad and use it to generate a fake conversion for whichever advertiser they please.

So just to recap, the signature should not be personalized, it should be the same public key for all people who click on ads directing to the same destination, with the same campaignID.

We are considering blinded signatures at both ends — "create blinded cryptographic proof of trustworthiness at the time of the ad click, the conversion, or both" — i.e. both at the time of the ad click and at the time of conversion. That way, the resulting report can be validated for both those events.

I'm glad you're exploring this. I think it would be incredibly valuable if we could find a way to pull this off. Unfortunately, like @csharrison said:

many publishers / advertisers may not be sophisticated enough to run their own signature endpoints, so from our perspective it would be important to have the capability to delegate the signing operation to a third party

I totally agree with him. I'm not exactly sure what you have in mind here in terms of "delegation", but if we are talking about something along the lines of: "Here is some open-source code. Please deploy it on your web-server, set up to respond to requests to this endpoint. Please integrate it with your conversion firing logic via a local database"... that would be far too heavy of a lift.

We have found it very difficult to convince advertisers to prioritize the engineering work required to do much simpler things! I don't have any solutions for this, but I would be happy to brainstorm together on how we might solve this challenge.

@johnwilander
Copy link
Collaborator Author

While "publisher" works in concrete examples, we try to avoid it in general descriptions.

I'm happy to use whatever term you prefer. What do you suggest?

Click source or ad click source is what we use in the proposal. I think that term is free of bias and to the point. I.e., any click source will be able to use this technology.

The publisher never generates more than one signature per ad-click

This sounds like defense against some threat. Could you elaborate, please?

Certainly.

Putting myself in the shoes of the attacker, if I am going to successfully forge conversion events, and I need a valid signature to use in each fake conversion report, I will want to extract these signatures as efficiently as possible.

Aha! You're viewing fraudsters as the attackers while I mostly view trackers as the attackers. Both are useful. Let's make sure to be specific on what attack perspective we're using.

If the only way to get a valid signature from the website that displayed the ad is to generate a click on an ad shown there, I will need to scale that operation up somehow. The simplest way is to just click one time and try to generate a large number of signatures out of that one click. If we can cut off this path to scale, that will force the attacker to click on many ads (one click per signature). That will make their fake accounts used to harvest signatures stand out, making them easier to take down.

Got it. Make it hard to hoard valid signatures.

Perhaps a simpler way to think about it is this:

The maximum number of fake conversions that can be generated is bounded by the number of signatures given out. For this reason, it is very important to control the rate at which signatures are given out.

If these ad clicks were restricted to first-party links, we'd already be in a good place since the out-of-band request could carry cookies. But we've received the request for third-party serving of ad links in which case cookies are unlikely in browsers with anti tracking measures.

There are certainly publishers (like Facebook) who serve their own ads. We could certainly use first-party cookies to validate that this browser did actually click the ad in question, and to validate that we have not already given out a signature for this click.

But you are absolutely correct that this is a rather unusual case. The vast majority of publishers rely on ad networks to serve ads for them.

The Chrome team is proposing a partitioning of cookies in the future, once third-party cookies are removed; I wonder if we can use these partitioned cookies.

It's unlikely that Safari will go back to partitioned cookies since we already shipped and later removed them. I didn't know Chrome's plan of record was to allow third-party cookies in the form of partitioned cookies and I don't know what the other browsers' plans are.

I don't think PCM should mandate other means of storage to be available or work in a specific way. Whatever is needed should be spaced here. Perhaps we could use a nonce that is provided in the link meta data and submitted as part of the out-of-band signature request. All we're trying to achieve is tying a click and a request together. No storage should be needed for that.

If the ad shown on news.example was served by ad-tech.example, it seems that ad-tech.example could simply drop a first-party cookie, scoped to news.example and log that along with any ad-clicks.

I don't think we should rely on third-parties storing things in the first-party space.

Regardless, as discussed in #7, supporting meta data in links served by third parties does not imply that the eventual conversion report will go to the third-party. Reports will go to first parties. It's their users.

We just need to find some way of asking ad-tech.example to generate a signature related to this scoped cookie. I'm confident we can find some technical solution that isn't useful for cross-site tracking.

We suggest that the browser requests the ad click source's public key at this point (without cookies), checks the signature, and only sends the report of the signature is valid. This makes it much harder for the ad click source site to personalize the signature for tracking purposes. And if the ad click source site is able to personalize the signature and serve up the personalized public key at this later point, it already has the means of cross-site tracking.

I think we are saying the same thing - which is great!

Only one minor difference: I am suggesting that the signature which is generated is somehow bound to the adDestination and adCampaignID (which are already present in the eventual conversion report, so no new information is disclosed).

I think we can do this by requesting a public key just as you said, without cookies, and that we can check the signature is valid afterwards. I just want to fetch a signature for that specific combination of adDestination and adCampaignID. This is all about making life harder for fraudsters.

As explained later, if we pursue this route, clicking on an ad and obtaining a signature will only allow you to generate a fake conversion for the associated ad. Since the fraudster does not control which ad is served, this provides an extremely effective deterrent.

If the signature can be used to fake a conversion for any ad campaign, life is now much easier for fraudsters. They can click on any ad and use it to generate a fake conversion for whichever advertiser they please.

So just to recap, the signature should not be personalized, it should be the same public key for all people who click on ads directing to the same destination, with the same campaignID.

We are considering blinded signatures at both ends — "create blinded cryptographic proof of trustworthiness at the time of the ad click, the conversion, or both" — i.e. both at the time of the ad click and at the time of conversion. That way, the resulting report can be validated for both those events.

I'm glad you're exploring this. I think it would be incredibly valuable if we could find a way to pull this off. Unfortunately, like @csharrison said:

many publishers / advertisers may not be sophisticated enough to run their own signature endpoints, so from our perspective it would be important to have the capability to delegate the signing operation to a third party

I totally agree with him. I'm not exactly sure what you have in mind here in terms of "delegation", but if we are talking about something along the lines of: "Here is some open-source code. Please deploy it on your web-server, set up to respond to requests to this endpoint. Please integrate it with your conversion firing logic via a local database"... that would be far too heavy of a lift.

I don't have lack of confidence in what site owners can do and I don't have a "this can only work through third-parties" mentality. Things can change and empowering first party websites is a net benefit for both the web platform and users. Therefore, I'm mostly interested in finding ways for this to work with the first parties in power of data and data use.

We have found it very difficult to convince advertisers to prioritize the engineering work required to do much simpler things! I don't have any solutions for this, but I would be happy to brainstorm together on how we might solve this challenge.

I think those are two different conversations. One is about a major website serving ads and the other is about browsers and the web platform. We're building for the future and the future will be different.

The ad click source is the first party site where the click happens, not the potential third party serving the ad and the link. We need to think about what this means for these signatures if we allow third parties to serve the links.

@benjaminsavage
Copy link

Click source or ad click source is what we use in the proposal. I think that term is free of bias and to the point. I.e., any click source will be able to use this technology.

OK, I'll refer to it as the "ad click source" then =)

Aha! You're viewing fraudsters as the attackers while I mostly view trackers as the attackers. Both are useful. Let's make sure to be specific on what attack perspective we're using.

Good call out! I agree both perspectives are useful. I'll make sure to clarify which one I'm using.

Got it. Make it hard to hoard valid signatures.

Exactly!

It's unlikely that Safari will go back to partitioned cookies since we already shipped and later removed them. I didn't know Chrome's plan of record was to allow third-party cookies in the form of partitioned cookies and I don't know what the other browsers' plans are.

I don't think PCM should mandate other means of storage to be available or work in a specific way. Whatever is needed should be spaced here. Perhaps we could use a nonce that is provided in the link meta data and submitted as part of the out-of-band signature request. All we're trying to achieve is tying a click and a request together. No storage should be needed for that.

I like this suggestion. It's definitely a lot simpler and cleaner, and it achieves the same goal of ensuring the ad click source can validate that:

  1. The ad was actually clicked
  2. Only one signature is given out per click

I guess we are talking about something like this pseudocode:

<a adCampaignId="53" adDestination="megastore.com" nonce="1234567890" />

Where the ad server should generate a unique nonce per ad tag, and maintain a record of the nonces of ads which were clicked. The browser can just immediately send this "nonce" back to the ad server to request a signature in the "out of band" request, and then throw it away as it is no longer necessary and will certainly not feature in the anonymous conversion report.

Is this what you had in mind? I think this would work just fine, it's "a lot simpler and cleaner" and as you say, it doesn't require any storage.

@johnwilander
Copy link
Collaborator Author

Click source or ad click source is what we use in the proposal. I think that term is free of bias and to the point. I.e., any click source will be able to use this technology.

OK, I'll refer to it as the "ad click source" then =)

Aha! You're viewing fraudsters as the attackers while I mostly view trackers as the attackers. Both are useful. Let's make sure to be specific on what attack perspective we're using.

Good call out! I agree both perspectives are useful. I'll make sure to clarify which one I'm using.

Got it. Make it hard to hoard valid signatures.

Exactly!

It's unlikely that Safari will go back to partitioned cookies since we already shipped and later removed them. I didn't know Chrome's plan of record was to allow third-party cookies in the form of partitioned cookies and I don't know what the other browsers' plans are.
I don't think PCM should mandate other means of storage to be available or work in a specific way. Whatever is needed should be spaced here. Perhaps we could use a nonce that is provided in the link meta data and submitted as part of the out-of-band signature request. All we're trying to achieve is tying a click and a request together. No storage should be needed for that.

I like this suggestion. It's definitely a lot simpler and cleaner, and it achieves the same goal of ensuring the ad click source can validate that:

  1. The ad was actually clicked
  2. Only one signature is given out per click

I guess we are talking about something like this pseudocode:

<a adCampaignId="53" adDestination="megastore.com" nonce="1234567890" />

Where the ad server should generate a unique nonce per ad tag, and maintain a record of the nonces of ads which were clicked. The browser can just immediately send this "nonce" back to the ad server to request a signature in the "out of band" request, and then throw it away as it is no longer necessary and will certainly not feature in the anonymous conversion report.

Is this what you had in mind? I think this would work just fine, it's "a lot simpler and cleaner" and as you say, it doesn't require any storage.

Yes. We just need to come up with a name for the attribute that's precise, makes sure that it's not perceived as a tracking ID of sorts, and short. 🙂 Then we'd have to think through the privacy and fraud implications of it.

Would it withstand a fraudster's automation? Do you consider malicious browsers in your threat model? A fraudster can download an open source engine, change it, and run from there.

@benjaminsavage
Copy link

Yes. We just need to come up with a name for the attribute that's precise, makes sure that it's not perceived as a tracking ID of sorts, and short. 🙂 Then we'd have to think through the privacy and fraud implications of it.

Sounds good.

I don't see any tracking risk here, so long as the browser disposes of this nonce immediately after sending the "out of band" request. It's just being sent back to the same website that generated it immediately after a first-party interaction.

I'm not picky about the name =).

Would it withstand a fraudster's automation? Do you consider malicious browsers in your threat model? A fraudster can download an open source engine, change it, and run from there.

Excellent question. Yes, I absolutely consider malicious browsers in my threat model.

Even more concerning (and common) are malicious browser extensions. We see a lot of fraudulent activity coming from malicious Chrome extensions in particular.

Thinking through the fraudster's likely attack scenario, I assume they will first try to distribute malicious browser extensions that scrape the page content to extract legitimate values for "nonce". They could attempt to forge "out of band" requests to try to hoard signatures. Fortunately, if those ads had not been clicked we could refuse to return a signature. That way they would also have to simulate an ad click. This would at least limit the rate at which signatures could be generated. We could try to detect compromised accounts by looking for people who seem to click ads way more frequently than the norm, but this is just a heuristic. We could serve honeypot ads, and use this to figure out which users have compromised browsers, but this would not catch all of the abuse.

Ideally I would like to make it impossible for a browser extension to perform this type of an attack. Do you think we could hide this "nonce" from browser extensions somehow?

If we could make it impossible to harvest "nonce" values with browser extensions, the next thing the fraudsters would try would be to just randomly generate values for "nonce" and hope to occasionally get one right. We can mitigate this attack by using really large values for "nonce", say 64 bit values (or more). If the ad-click-source generates values for "nonce" randomly, this would make it incredibly unlikely that the fraudster would ever generate a valid value, much less one that had actually been clicked, and for which a signature had not yet been generated. If the "out of band" request were to come within a few seconds of a click that would further shrink the window of opportunity.

Assuming we successfully mitigate the "random guess" attack, then the fraudsters would move on to malicious browsers.

The most difficult thing about such an attack is scale. How can you deploy a malicious browser across the phones or computers of thousands of people?

Android applications tend to be the main vector of attack we see for distributing Malware these days. We should assume fraudsters will distribute malicious Android applications under the guise of games or utility apps. These apps will have forked browser functionality within them. They will have to load web-pages (likely in an invisible web-view), extract values for "nonce", occasionally simulate a click (but not too often!), then issue "out of band" requests and hoard the signatures (likely sending them back home to a command and control center).

This is a challenging threat to combat. It's easier for websites with login (like Facebook) to prevent this, but could be a significant threat to news sites that do not require login.

Here are a few ideas that come to mind regarding means of mitigating this threat:

  1. Mobile App Stores need to step up their game in terms of app review. Too many malicious apps make it onto app stores, the Google Play store in particular.
  2. We should develop a highly effective industry level collaboration to report Malware to App-Stores so that they can quickly enforce. Fraud hurts the entire industry. Collaborating together to flag risks seems like a great idea to me.
  3. Let's research ways of cryptographically signing requests from real browsers, so that we can just differentiate between "This request came from an authentic Safari browser" and "This request did not come from an authentic Safari browser". We need to find a way of doing this which cannot be used as a cross-site tracking vector. I'm not exactly sure how to do this, but such technology would immediately cut the legs out from a vast category of attacks of all sorts. It would make life so much easier for those teams at Facebook that work to combat abuse and fraud and bots.

@benjaminsavage
Copy link

I don't have lack of confidence in what site owners can do and I don't have a "this can only work through third-parties" mentality. Things can change and empowering first party websites is a net benefit for both the web platform and users. Therefore, I'm mostly interested in finding ways for this to work with the first parties in power of data and data use.

I love your optimism!

OK, let's leave behind the question of "can we convince very large numbers of website owners to make changes" for a moment, and just discuss how we would theoretically want this to function.

I think it's very similar to the click signing. The only difference is that the conversion-source is signing a "conversion" instead of a "click".

Walking through my "fraudster threat model" again, here's how I expect this to be attacked:

  1. Fraudsters will try to hoard as many signatures as possible. We need to apply the same approach as we did on the click-signing side to ensure that at most one signature is given out per actual conversion. We can apply the same approach we discussed above, utilizing some kind of an attribute on the conversion which can be sent back to the conversion-source to uniquely identify which conversion is being signed. This will allow the conversion-source to ensure it only gives out one signature per real conversion.
  2. Fraudsters will try to associate conversions from one browser with clicks from a different one. This will help them scale up operations. To make this substantially harder, I recommend we ensure that in the final anonymous conversion report the same nonce is signed by the ad-click-source as the conversion-source. We can do this without enabling cross-site tracking by using different blinding factors to blind the nonce each time.
  3. Fraudsters will try "upgrade attacks". Example: The fraudster generates a legitimate "Add to cart" event (perhaps indicated by an "Ad Attribution Data" with value "1"), and obtains a legitimate signature for it. Then they send a fraudulent conversion report where they lie and mis-report it as a "High value purchase" (perhaps indicated by the "Ad Attribution Data" as having a value of 63). For this reason, it's important that the conversion-source doesn't just sign the nonce, by that their signature is somehow bound to the true value of the "Ad Attribution Data". The simplest way is for the conversion-source to just maintain 64 Public/Private keypairs. Perhaps there is a more elegant solution out there. But even this rather brute-force approach would work. The browser could just download the corresponding Public-Key and validate that the signature is legit. This will be even less of a problem if we go forward with the proposal in Let browsers have different privacy settings #11, re-allocating the 12 bits to be a 8,4 or even 9,3 allocation. At 3 bits, the conversion-source only needs to expose 8 Public/Private keypairs.

There is a similar discussion going on between @csharrison and I on the Chrome proposal: WICG/attribution-reporting-api#13

The Webkit proposal has the benefit of not adding random noise to the "Ad Attribution Data". This means that we can sign over the conversion value. The downside of the Chrome proposal is that a randomly scrambled noise conversion is essentially indistinguishable from conversion fraud.

@TanviHacks TanviHacks added the agenda+F2F Request to add this issue or PR to the agenda for our upcoming F2F. label Apr 27, 2020
@hober hober added the fraud prevention Related to fraud prevention label Apr 30, 2020
@hober hober changed the title Fraud Prevention ☂️ Select a Fraud Prevention mechanism Apr 30, 2020
@hober hober changed the title ☂️ Select a Fraud Prevention mechanism ☂️ Select a fraud prevention mechanism Apr 30, 2020
@laughinghan
Copy link

laughinghan commented May 16, 2020

Privacy Pass [...] which is currently not a standard. What is the plan in terms of standardization?

If it helps, both Privacy Pass and the underlying interactive cryptographic protocol, VOPRFs (verifiable oblivious pseudo-random functions) are undergoing IETF standardization:

@hober hober removed the agenda+F2F Request to add this issue or PR to the agenda for our upcoming F2F. label Jul 6, 2020
@chris-wood
Copy link

@johnwilander perhaps I missed it, but is there a reason why blind signatures are needed here instead of a VOPRF (Trust Token or Privacy Pass)? The sketch above does not seem to rely on signatures being publicly verifiable.

@johnwilander
Copy link
Collaborator Author

@johnwilander perhaps I missed it, but is there a reason why blind signatures are needed here instead of a VOPRF (Trust Token or Privacy Pass)? The sketch above does not seem to rely on signatures being publicly verifiable.

We'll share more about our proposal when we have things ready.

@johnwilander
Copy link
Collaborator Author

See #41 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fraud prevention Related to fraud prevention layering Layering additional data and functionality on top of PCM
Projects
None yet
Development

No branches or pull requests

7 participants