New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forDebuggingOnly availability #632
Comments
Hi @jonasz, yes they are supported in both A and B modes. |
@jonasz we are actually thinking more about the privacy risks of the two parts of the forDebugOnly APIs, we need to think more about this, let us get back to you soon, hopefully next week |
Do you think some sampled mode could be acceptable in the long term ? Something small enough that it doesn't allow to do any user identification, like 1% of forDebuggingOnly.reportWin & forDebuggingOnly.reportLoss ? |
@ajvelasquezgoog friendly ping, any updates on this issue? |
We thank everyone interested for your patience on getting updates on this matter. We have been working closely with and collecting feedback from stakeholders over the last several weeks, and examined the efforts required to adapt to the full removal of these functions by the 3PCD deadline. The intent of The incremental feedback that we have been receiving in the last few months on this plan can be summarized as follows:
Given that, we think there is a path to continue supporting these use cases with a certain level of fidelity that will be acceptable, that also continues to meet our privacy goals. In essence we think it is possible to keep the In essence the proposal entails the introduction of 3 Chrome-controlled variables that will modify the current behavior of the New variable 1: Sampling Rate. Denotes how often a call to the New Variable 2: Cooldown Period. Denotes for how long (in days) should a single Chrome client, for a given calling adtech, return the same FALSE result for o after running the randomizing function that determines that the result should be FALSE. Let us call this variable New Variable 3: Lockout Period. Denotes for how long (in days) should a single Chrome client, for any and all calling adtech return a FALSE result for o after it returns TRUE for o once after running the randomizing function. Let us call this variable In other words, when one ad tech calls a
Based on these variables, and based on 2 reasonable assumptions we can make:
We calculate that in legitimate scenarios like the ones detailed in the opening paragraphs of this reply that each participating adtech should be getting between ~4.7K and ~5.4K daily reports, if they choose to implement We also want to highlight the protections that we see against malicious scenarios with this approach. A malicious actor that knows that the sample rate is 1/ We believe that with this proposal, we can accomplish the goals we set out in our opening paragraphs. Any and all feedback is very appreciated! @jonasz here you go |
We are thrilled to see longterm support for these debugging APIs and look forward to the improved observability as we mature our integrations. I wanted to raise two concerns with the details of the above proposal. In practice, we have observed some overhead when enabling debug codepaths due to the additional code profiling and report building. Given the highly latency sensitive worklet execution environment, we would recommend a mechanism to detect availability of Additionally, we are concerned about the shared lockout period given the threshold for critical situations may differ across adtechs. If one buyer decides to frequently invoke the API or unintentionally introduces a major bug which accelerates their call rate to 100%, should this lockout another buyer who needs to debug their own rare exceptions or sudden incidents? |
Hmm. There are two different things you might be asking here:
I think 1 would need to be an API that actually performed the die roll and triggered the cooling-off period 999/1000 of the time that 2 returned I'm not sold on either one of these, but which are you asking for?
If one buyer spams the API for whatever reason, the worst they could do is lock out 1/1000 of people for everyone else. The cooling-off period that happen 999/1000 times isn't shared state — it is only the 3-year lock-out that would let one ad tech affect another ad tech. |
Thank you for the responses. Could you kindly elaborate on why triggering the cooling-off period is necessary when detecting API availability? Is the concern that we would have access to the 1/1000 device-sticky decision to truly send the debug report, and that this may influence or leak out of the internal worklet execution? I believe we're asking for (1), but without tripping the cooldown period given this (a) effectively incurs the statistical cost of always invoking the API and (b) may be a surprising side effect for all developers. I'm afraid this may incentivize us to always invoke the API if it's detected rather than save it for error states; alternatively we should just accept the overhead and restrict building the event messages to truly exceptional scenarios. Great point about the difference between global lock-out and per-adtech cooling-off periods, I agree that the interplay of these successfully mitigates the impact of a spammy adtech. One final note: as a user of |
An API that is of the form "If I asked for a report right now, then would you send it?" would completely eliminate the 1-year cooling-off period, right? — after all, nobody would ever call the debugging API if they knew that it would not send a report. Your request would allow circumvention of all the "protections against malicious scenarios" that Alonso described above. Or maybe I'm still misunderstanding what you're asking for? On the other hand, I don't see any harm in an API of the form "Am I currently cooling down and/or locked out?" That would let you build your debugging requests much less often than without it, even though you would still only have a 1/1000 chance of sending each one that you built. @JensenPaul WDYT? (Regarding "lockout" vs "cooldown", I personally feel like "lockout" feels more global, like "the door is locked", while "cooldown" seems more caller-specific, as in "you are over-heated, go take a walk and cool down and then you can come back and join the rest of us." But if other people have opinions on these or other more intuitive names for the two states, please share!) |
Ah, I was assuming that the FALSE die roll was cast once per worklet function execution and there was no way to coordinate a loop based attack external to these functions. Thinking outside of that box, it does become clear why the check itself requires a cooldown. Any mechanisms to minimize the overhead of the API usage would still be welcome. Overall, the statefulness of this API makes it more difficult to conceptually model an observability framework compared to traditional random sampling. I wonder if there might be issues here with a population more prone to exceptional circumstances gradually dwindling overtime due to the cooldown, as well as the true rate of an exception becoming invisible without a fully transparent sampling rate? I also worry about the longterm repercussions of an initial, overly lax threshold of exceptional events, e.g. an adtech accidentally locking themselves out of the API for a year. |
Thanks. I think the "Am I currently cooling down and/or locked out?" API would indeed help with minimizing the overhead, we'll explore that.
I agree with this concern, but I haven't come up with any other way to preserve the privacy goals.
Yes, great point, that does seem like it's too easy to accidentally shoot yourself in the foot. Instead of a 1-year cool-down when you don't get a report, I wonder if we could instead have a shorter timeout, like 1 week, that would trigger 90% of the time, and a 1-year timeout the other 10%. Then even if an ad tech shipped a bug that asked everyone to send a debug report all at once, they would recover the ability to debug on 90% of their traffic a week later. (All percentages and time durations subject to change, but at least it's an idea.) |
Okay, I've done a little simulating of this idea of the anti-footgun two-cooldowns idea — thank you Google Sheets for the "Iterative calculation" capability in the File>Settings>Calculation menu. Suppose that when you ask for a debug report in a Chrome instance which is not in the cool-down or lock-out state,
Which is to say: if you accidentally push into production a bug that asks everyone in the world to send you a debug report, you would regain your ability to do selective debugging on 90% of browsers after two weeks, instead of after one year. In that case, with 100 ad techs spamming the API as much as possible, each one gets around 6500 debug reports per day per billion Chrome instances. If there were only a single ad tech using the API, they would instead get around 20K reports per day per billion, so the global lock-out mechanism cuts the number of reports to about 1/3 of what it would be otherwise. The truth will probably be somewhere between those two extremes. |
I'm curious about more insight into the rationale for the 1- or 3-year "long-term" lock-out / cool-down intervals... and the math that goes into what the minimum privacy-safe interval would need to be. Follow-up question -- have we considered that 1/1000 to be defined per |
The numbers are admittedly somewhat arbitrary! But sure, here is my thinking:
Sorry that I don't have a closed-form formula for the reports-per-day figure. I had one back when there was only one kind of cool-down, but once a second cool-down rate came along, simulation seemed like the only viable way. |
I suppose that's the most important question -- given the great lengths to which PS goes to ensure anonymity, there seems to be some wiggle room in this endpoints which could, in principle, allow some non-"me" specific information to be used for debugging that wouldn't be about the user. For example -- am I scoring k-anon bids that way that I'm expecting as a seller?
|
I completely agree that the browser can be more relaxed about information when it is either information from a single site or information shared across many users. But bidding functions necessarily have information from two sites (the IG Join site and the publisher site hosting the auction), with no k-anonymity constraint on either of them; and scoring in a whole auction implicitly involves information from many sites (the IG Join sites of every IG that bids). I don't see any way that the browser can possibly be more relaxed about that sort of many-site user-specific sort of information. |
In order to assess the sampling and other parameters, it will be useful if the API provides three bits that tells whether the report is sampled, whether the device is in cooldown, and whether the device is in lockdown period respectively, before rolling out this sampling mechanism. These can be reported via URL params appended to the reporting URL string:
We’re aware that it is possible for each adtech company to implement all the logic to simulate sampling/cooldown/lockout themselves while the 3P cookie is still available. However, it will be an additional work with some inaccuracy (as 3P cookie doesn’t map to device perfectly). |
Just to clarify -- this cooldown is per ad tech (i.e. tied to
And I want to make sure I understand the distinction, and implications thereof. |
@ardianp-google: Good point, we should make it easy for consumers of the reports to understand what impact downsampling will have. I doubt we can offer all three bits, but I think the one bit from option 2 above gets a lot of the benefit. @rdgordon-index: Yes, the 999/1000 cooldown is per ad tech, while the 1/1000 lockout happens only after sending a report, and is global across all ad techs. The way to think about the global nature is "Once a browser sends a single report, it will wait years before sending another one." |
Doesn't this provide another 'key abuse' mechanism, where ad techs can inadvertently affect each others' debug calls? |
There is a risk, but remember than if another ad tech calls the API for everyone in the world, they have no impact on your debugging call on 99.9% of browsers. It's true that if another ad tech keeps calling the API over and over, then some fraction of the population ends up locked out in the steady state, and if lots of other ad techs do this, then the fraction of the population you have available for reporting goes down. I've put together a little Google Sheets calculator that uses the parameters I suggested above to approximate what happens in a few scenarios. (Thank you to @alexmturner for pointing out the 4x4 matrix whose principal eigenvector makes this run.) https://docs.google.com/spreadsheets/d/1q-uBH7F_NAEWjqcGSChXj6TFbsQ4WK-p83RJTZrty9s/edit#gid=0 For example, with the above cooldown parameters and even with 25 ad techs calling the API as often as possible, 35.9% of browsers could end up in the lockout state — so you would still get reports from the other 2/3 of the population. |
I was wondering, aside from the discussion about the target shape of the API - can we assume that in |
The downsampling idea for |
The proposal in its current state cannot support our needs. First, we need info from won displays in order to compare online data with reported data e.g for the modeling signals field. Second, we would need the same number of reports (100,000) for losses to ensure there is no error leading to systematic loss. This means the sampling should apply independently on wins and losses. We are also a bit worried about the bias introduced by the cooldown and lockout periods which means only new Chrome browsers will send debug reports. Potentially automated bots will generate more reports than real Chrome users. With the following parameters and using above excel file:
We would get 100 000 events per day for wins and for losses. Please note, that in parallel, we made the complementary proposal #871 for offline debugging needs. |
Hello Fabian, Happy new year, sorry for the delay in responding. Certainly this proposed debugging API will not serve all needs, and if your goal is "to compare online data with reported data e.g for the modeling signals field" to find cases of buggy behavior, then I think the laboratory simulation approach discussion in #871 is quite valuable.
I think this different treatment of wins and losses would already be in your power. The two functions
I don't think these numbers are realistic. First, the value "100 ad techs" in the spreadsheet is not meant to be the total number of ad techs, it is meant to be the number of ad techs that are calling the reporting APIs constantly, and so are always in the cooldown-or-lockout period. This is a worst-case scenario, meant to illustrate that you would still be able to get a reasonable number of reports even if many ad techs were conspiring to run a denial-of-service attach to prevent all reporting. I think it is much more likely that ad techs would be selective in exactly the way you want to be: call the API only on a small fraction of "normal" traffic, and call it at a higher rate when something "interesting" happens. This would put many fewer people into lockout, and everyone doing this would get many more "interesting" reports than the spreadsheet's lower bound. A noteworthy part of my 14d-1yr-3yr parameters is that ad techs who did decide to call the API every time would mostly hurt themselves, because they mostly would end up in the cool-down period. Your changes have a big effect: they mean that an ad tech who calls the API all the time would hurt other ad techs a lot more, and hurt themselves a lot less. That means much less incentive for people to be thoughtful about how they use the API. I also don't feel that your parameters have a particularly good privacy story. They would lead to each browser sending a debugging report roughly every 3 months. That means that if the ad tech ecosystem decided to use this as a tracking mechanism, they could join up every person's behavior across 5 sites per year. With my proposed parameters, a browser only sends a report around once every 8 years — so in a year, around 85% of people would send no report at all, and the other 15% could at worst be linked up across only two sites (and those people would surely send no reports at all for three years thereafter). |
It has been discussed in the WICG call from 17/01/2024
I agree on this point above.
I agree that 1% probability combined with a short cool down period would result in too many browsers in a global lock-out state which is not desirable, because we would be too much impacted by other ad techs. However a configuration could be found where we would receive more reports while mitigating the number of browsers in global lockout state. The main levers could be to reduce the lockout period (a few months) and slightly increase the probability of sending a report.
The cooldown and global lockout periods seem really too long as they would strongly bias towards new Chrome browsers and we don't know what exactly this means. Putting something like 90 days or even 30 days and only being able to retrieve information about one single interest group seems still reasonable to me. Having a browser only send a report around once every 8 years, given how fast the industry changes and how often user change their devices, I don't see how this is a reasonable setting.
In which Chrome version would this flag be available ? |
#632 (comment) -- just wanted to clarify, since the explainer was merged -- I wasn't expecting any changed to the fDO endpoints yet -- can you confirm? |
@rdgordon-index That's right, downsampling will only start to happen as part of the removal of 3rd-party cookies. |
Thanks -- I missed this all-important line - https://github.com/WICG/turtledove/pull/1020/files#diff-d65ba9778fe3af46de3edfce2266b5b035192f8869280ec07179963b81f4e624R1232 |
Hey @michaelkleber can you help me understand what this means a bit better? I asked around and don't think we actually have clarity here yet, at least not the kind we can make an implementation choice, even for short term adoption purposes, with. The removal of 3PC has already started and has a planned ramp up date starting sometime in Q3 of 2024, so "as part of the removal of 3PC" could/should be interpreted as having already happened, but it seems like this is meaning to say that the forDebuggingOnly is still usable 100% of the time for some further period? I'd ask that we detail this broken down something like the following: Let's call "Unsampled/Unconstrained Availability of forDebuggingOnly" reporting the state where it can be called and will work immediately in any auction w/o limit or lockouts, and "Sampled Availability..." the state we'll get to eventually with lockouts and whatnot. Current cohortsMode B Treatment 1.* LabelsFor the set of Chrome browsers currently with unpartitioned 3PC access disabled AND sandbox APIs available:
Everything ElseFor All \ aboveCohort, same questions. Next Ramp Up Round, Whenever That IsCurrently planned for Q3 2024, but let's just say on date X when more browsers move into the "yes PS APIs but no unpartitioned 3PC access" group. So, similar questions as above:
I can understand why we'd want forDebuggingOnly not to have an official support date, but a) it seems like we're now giving one to some deprecated*URN functions b) publicly stating the implementation priorities are forcing this would be reasonable and c) I have at least one choice to be made based on the robustness of this timeline, and I suspect I'm not the only one. |
Hello @ajvelasquezgoog , do you know the answer to this? |
Feature rolling out status update: It runs the down sampling algorithm on forDebuggingOnly reports, updates the Explainer: https://github.com/WICG/turtledove/blob/main/FLEDGE.md#712-downsampling |
Thank you @qingxinwu.
I see the |
Hi,
I was wondering, what is the plan for the
forDebuggingOnly
reporting functions and their availability? Will they be supported during the mode a and mode b testing phases?Best regards,
Jonasz
The text was updated successfully, but these errors were encountered: