Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creative Rendering K Anon Tuple: Account for Bidding Logic Deployments With Less Disruption #679

Open
thegreatfatzby opened this issue Jun 28, 2023 · 5 comments

Comments

@thegreatfatzby
Copy link
Contributor

thegreatfatzby commented Jun 28, 2023

Background
From the main doc, K Anon for URL Rendering is enforced not just on the creative but on the Tuple of (owner, biddingScriptUrl, creative), where I'm going to guess "creative" would resolve to the renderURL (so first question if someone can confirm that, although even if it was ad_render_id the issue I'm thinking on would still apply).

It's hard to say how frequently DSPs will want to deploy their bidding logic, as the frequency will likely be related to how much logic they decide to put into the bidding and reporting functions. One possibility we're exploring would be to try to keep the bidding function itself pretty "dumb", but I'm not sure we'll succeed in that; at this point I haven't thought of much logic to put into the buyer reporting function, seller reporting I think there may be some price resolution, but that doesn't impact frequency of change of the biddingLogicUrl.

But, however frequently they do deploy, it's possible, likely even, they will want to do some versioning with multiple versions in prod, or prod/staging/whatever, to ensure compatibility things like:

  • A change to IG.userBiddingSignals structure is compatible with a new version of generateBid.
  • A contextual auction signal is compatible with the particular version.
  • In the case of an SSP and DSP using some protocol (either internally or through something like OpenRTB) getting the IG to the version it needs.
  • There's also A/B testing, which I know there are other levers for that I haven't fully grokked yet so I'll hold off on that for now.

Deployment setups could include something where the version doesn't change often, is part of an app config, and so is not changing on every DSP Contextual Bidder deployment; but it could also be that you want to have your "Contextual Bidder App" line up with your "PaAPI Bidding App" and so desire that there be new version on every deploy. That isn't necessary, but I can see reasons teams might want to do that, in which case unrelated deploys would result in changes to the tuple.

Also let's say a deployment can be anywhere from minutes to hours.

Challenge
The challenge I'm seeing here is that a deployment that resulted in a new biddingLogicUrl would result in a reset of K for all creatives coming from that DSP, including ones with no change, meaning each Creative now has to get back to K again. With K currently at 50, this means the first 49 winning bids for that creative will be bypassed.

Trying to get a sense of the impact of that I took a simple/imperfect proxy, looking at the number of distinct creatives vs number of imps in an hour, and also the creative count * 49 since that is the number of wins that creative would have to go through to start displaying again. The numbers vary between 2.7% and 3.7% (see query below with some numbers "redacted" that I can share if need be).

Again, this isn't precise but is just meant to be directionally indicative of the issue.

Finally, I can imagine a fun theoretical that is unlikely in practice: if a biddingLogicUrl version was to be instantly deployed across all Interest Groups (i.e. dsp.com/bidding-1.0 --> dsp.com/bidding-1.1) then nothing would serve as nothing would hit K for a while. (In this case you could just deploy to the same URL, but might not want to, or be able to b/c maybe there's a prod and staging version).

"Proposal"
I'm not sure exactly what to do here. I was initially going to say have the K-Anon process allow a version parameter in the biddingLogicUrl and constrain that parameter to be an integer less than something, but I can certainly see how that opens up attacks. I don't know if we could allow for k-anon to be done on the biddingLogicUrl itself and then have that combined with an (owner, renderUrl) tuple, so the two both have to be k-anon but independently.

So, I've been thinking about this way too long, maybe I'm missing something obvious and I'll face palm, but for now I'll just say "I'd like to see a way for deployments of bidding logic to be less disruptive to creative K measurement" and for now I'll say "let's just add a small'ish integer query param that will be ignored for the creative K tuple.

Data

select ymdh as Hour, (count(distinct creative_id) * 49) / sum(imps) as PercentImpsBelowKOnDeploy from SOMETABLE where ymdh >= sysdate() - interval '30 hours' group by 1 order by 1 desc;
        Hour         | PercentImpsBelowKOnDeploy 
---------------------+---------------------------
 2023-06-28 12:00:00 |      0.030325694428750557
 2023-06-28 11:00:00 |      0.032238033710425202
 2023-06-28 10:00:00 |      0.034657536625179512
 2023-06-28 09:00:00 |      0.036066384842779339
 2023-06-28 08:00:00 |      0.034605923239306866
 2023-06-28 07:00:00 |      0.032344153791927879
 2023-06-28 06:00:00 |      0.033789860221664039
 2023-06-28 05:00:00 |      0.035021230571789399
 2023-06-28 04:00:00 |      0.034278940798184122
 2023-06-28 03:00:00 |      0.036455398260428181
 2023-06-28 02:00:00 |      0.036733613036232399
 2023-06-28 01:00:00 |      0.035984223304680108
 2023-06-28 00:00:00 |      0.037938807373984773
 2023-06-27 23:00:00 |      0.038521965067499511
 2023-06-27 22:00:00 |      0.037405716138151708
 2023-06-27 21:00:00 |      0.032852726986401520
 2023-06-27 20:00:00 |      0.028944437098990982
 2023-06-27 19:00:00 |      0.028187273766684694
 2023-06-27 18:00:00 |      0.028512300908509566
 2023-06-27 17:00:00 |      0.028858935464712514
 2023-06-27 16:00:00 |      0.028110001272252524
 2023-06-27 15:00:00 |      0.027841584374986311
 2023-06-27 14:00:00 |      0.027565488743761911
 2023-06-27 13:00:00 |      0.028340132613049315
 2023-06-27 12:00:00 |      0.029739302711551697
 2023-06-27 11:00:00 |      0.031927725961592038
 2023-06-27 10:00:00 |      0.033653095688971993
(27 rows)

@thegreatfatzby
Copy link
Contributor Author

Minor follow up that I believe I've guessed the answer to:

  • Question: actually, why isn't the biddingWasmHelperURL part of the K-anon tuple for creative rendering?
  • Guess: because it cannot be used in the reporting functions which is where an easy privacy leak with 1-1 WASM <-> consumer functions could let you cheat?

@michaelkleber
Copy link
Collaborator

You're correct that the k-anon scope is (owner, biddingLogicURL, renderURL). And yeah I see how this might make updating the logic awkward. It's worth noting that it's only the URL that needs to stay the same — it's perfectly fine to have that URL load a new and different script!

The reason that the biddingLogicURL needs to be part of this k-anon check is that the biddingLogicURL loads the script which contains not only the bidding JS but also the reporting JS. Basically, everything non-1p that feeds into reportWin() is what needs to be k-anonymous... and the JavaScript code itself is one of those things. This also explains why it's OK for the contents of the script to change over time but not the URL: the URL is the thing that gets carried around from site to site, while the script itself is fetched anew over the network. So long as you don't have a Michael-Kleber-specific biddingLogicURL, you can't have a Michael-Kleber-specific event-level report.

Does this ability to change the script contents fix your concern about roll-out? It seems like owners of 3rd-party scripts are used to deploying new script contents under a fixed URL, since that's obviously essential if you hand your URL to a website to include in their site HTML somewhere you don't control.

@thegreatfatzby
Copy link
Contributor Author

Hey Fwend,

Understood that we can have the URL point to a different file, overwrite the previous file, or some such. And also get that the attack vector comes mostly, at least as far as I can tell, from the reporting function.

As to the question of whether it fixes my concern: I suppose the short answer is no, but the better and longer answer would be "it does not in theory and I think it will constrain or impact deployments in important ways, I suspect that most of the time when immediate "swaps" are done for a script it's either an asset or something very simple, but anything with business critical logic, even UX side let alone auction side, let alone something like pricing logic, would typically want to do a phased rollout to mitigate risk, even if it's not exactly updating a version on every deploy...I do see this being an issue, but I don't at this exact instance feel like I can say you must fix this immediately, w/o more discussion internally".

@thegreatfatzby
Copy link
Contributor Author

thegreatfatzby commented Sep 3, 2023

Hey @michaelkleber sorry silly thing I never wrote down here...if we decoupled the bidding and reporting function, or at least allowed that as an option, wouldn't updating the bid function, or to be more accurate versioning it, be much less problematic from a privacy attack vector standpoint? (maybe not quite 0 problematic). Could we even remove it from the k-tuple at that point?

@michaelkleber
Copy link
Collaborator

Yes, I think you're exactly right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants