Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback on Contribution bounding value, scope, and epsilon #23

Open
alexmturner opened this issue Mar 14, 2023 · 11 comments
Open

Feedback on Contribution bounding value, scope, and epsilon #23

alexmturner opened this issue Mar 14, 2023 · 11 comments

Comments

@alexmturner
Copy link
Collaborator

alexmturner commented Mar 14, 2023

Hi all,

We're seeking some feedback on the Private Aggregation API's contribution budget. We'd appreciate any thoughts on both the value of the numeric bound as well as the scope (currently per-origin per-day and separate for FLEDGE and Shared Storage).

In particular, one change we're considering is moving the scope from per-origin to per-site. This would mitigate abuse potential for cases of things like wildcard domains which are (arguably) easier to mint than domains to exceed privacy limits. (See more discussion here.)

Thanks!

[January 2024 edit:] Additionally we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system

@alexmturner
Copy link
Collaborator Author

In addition to the change from per-origin to per-site, we're considering changing the time component of the contribution bound. Specifically, we're considering moving the existing contribution bound (max value sum of 2^16) to applying over a 10 minute window instead of a daily window. We hope this will allow more flexibility and simplify budget management. As a backstop to prevent worst-case leakage, we're considering a new larger daily bound, e.g. 2^20. We'd appreciate any feedback on this proposal!

alexmturner added a commit that referenced this issue May 16, 2023
Switches from per-origin per-day budgets to per-site per-10 min. Also
adds a new per-site per-day looser budget as a backstop to prevent
worst-case leakage.

See original proposal here:
#23 (comment)
alexmturner added a commit that referenced this issue May 22, 2023
Switches from per-origin per-day budgets to per-site per-10 min. Also
adds a new per-site per-day looser budget as a backstop to prevent
worst-case leakage.

See original proposal here:
#23 (comment)
@xottabut
Copy link

Hi Alex (@alexmturner ), I have a few questions regarding budget in context of Private Aggregation API.
I am reading this document to understand the budget for Private Aggregation API contribution budget

  • Is L1 budget per (1) one event (i.e. ad impression) or (2) all events that happens during the period X (last 24 hours)? The documentation says: "each user agent will limit the contribution that it could make to the output of a query. In the case of a histogram operation, the user agent could bound the L1 norm of the values, i.e. the sum of all the contributions across all buckets". It is also not completely clear, what is the query here.
  • The document says: "We initially plan to use an L1 bound of 2^16 = 65 536" and then later on the same page: "We plan to enforce a per-site budget that resets every 10 minutes; that is, we will bound the contributions that any site can make to a histogram over any 10 minute window. As a backstop to limit worst-case leakage, we plan a separate, looser per-site bound that resets daily, limiting the daily L1 norm to 2^20 = 1 048 576." so what are the final limitations? 2^16 for any 10 minutes window and 2^20 for last 24 hours (two different limits or only one of them)?
  • The "site" in "per-site" or "origin" in "per-origin" is it referring to the publisher site(page) or reporting origin?

Another two documents with information about the budget but in Attribution Reporting API are:

Thanks!

@alexmturner
Copy link
Collaborator Author

Hi! Sorry for the delay in responding.

  • The L1 budget for Private Aggregation is over a time period as there isn't a clear notion of event in this API. We limit the sum of contributions' values to 2^16 over a rolling 10 min window for any site. Additionally, we limit the sum of contributions' values to 2^20 over a rolling 24 hour window for any site.
  • The query in that first bullet point is referring to a query to the aggregation service.
  • The origin/site for budgeting is the reporting origin/site.

Hope this answers your questions, but let me know if anything is still unclear :)

@alexmturner
Copy link
Collaborator Author

Closing as this change has been made

@xottabut
Copy link

Thank you Alex for the response.
Sorry, but I feel like I am missing something here about the "each user agent will limit the contribution that it could make to the output of a query."

If query refers to a query to the aggregation service or in other words one aggregation service job that takes one batch of the aggregatable reports does it mean that in the next case the user contribution will be at maximum 65 536?
Case:
User contributes 1 aggregation key key_1=65 536 at 00:00 then same user contributes key_1=65 536 (or even key_2=65 536) at 00:15 (which is allowed by user agent limit). But on the ad-tech side these two reports are collected to one batch and in total contribute 2 * 65536 which is over the mentioned limit the contribution will be either lost or cut down to 65536?

@alexmturner
Copy link
Collaborator Author

Ah yes this wording is a bit confusing; I'll follow up to improve it. The idea is that the user agent is limiting the contribution it can make to the output of a query -- but you're right that that limit isn't a single number, rather a 'rate' over time depending on when the reports were triggered.

alexmturner added a commit that referenced this issue Jul 13, 2023
Clarify that limits are inherently a rate, not an absolute number. Addresses comments raised in #23.
alexmturner added a commit that referenced this issue Jul 13, 2023
Clarify that limits are inherently a rate, not an absolute number. Addresses comments raised in #23.
@chrisbmcandrew
Copy link

We continue to be excited about the Aggregation Service API, and its ability to combine with Shared Storage to deliver advanced reach reporting as previously mentioned.

We believe that making adjustments to the contribution budget would ensure the functionality for Reach & Frequency measurement. Brand advertisers specifically rely on accurate Reach measurement to measure the performance of their campaigns across the web and without a reasonable contribution budget the accuracy and effectiveness of Reach measurement would be greatly impacted. The two settings are:

A per-site budget that resets every 10 minutes.

  • The reporting window causes limits to Reach reporting for campaigns that deliver multiple ad events within the 10 minutes, as additional events outside of the contribution budget would be dropped. Given how users browse the web a single ten minute window can have significant opportunities for ad delivery and a large subset of ad events would be lost. This loss results in wide standard deviation on per-campaign reach reporting, which would limit the creation and usefulness of reports generated from these campaigns. Reducing the overall backstop cap would still allow for a reasonable limitation while similarly ensuring measurement that aligns to how users experience ads across a single session and day.

A backstop per-site 24 hour bound limiting to X^x (currently limit is L1 norm to 2^20 = 1 048 576).

  • Based on how users browse the web, the combination of a rolling window and a daily cap creates additional loss. Again the result of this is a wider range of reported Reach values and an impact to the usefulness of the output. Reducing the overall backstop cap would still allow for a limitation while similarly ensuring measurement that aligns to how users experience ads across a single session and day .

In both cases reported numbers are in aggregate and use Virtual Persons methodology that maintains the overall privacy goals. We look forward to an update on these two settings to ensure Brand advertising is maintained while still providing a safe and private API.

@menonasha
Copy link

Appreciate the feedback - reopening this issue for discussion - we will come back with thoughts.

We wanted to clarify - is the feedback that the use case requires increased budgets for both the 10 minute cap and daily backstop? Wanted to ask since you mention reducing the overall backstop cap in both paragraphs

@alexmturner alexmturner reopened this Dec 1, 2023
@chrisbmcandrew
Copy link

@alexmturner @menonasha Yes, the impact of both is that typical browsing behavior across 10 min windows and across a day has significant opportunities for campaigns to Reach users and a large subset of ad events would be lost. Loss of a large quantity of these events, either due to a 10 cap or 1 day cap, results in unmeasurable Reach and Frequency which is critical to brand advertisers.

@menonasha
Copy link

We do understand that the contribution budget window could cause events to be dropped if a user is served a significant number of ads during the window. Ad techs should consider optimizing for the contribution budget such as by accounting for different campaign sizes or limiting the number of reports per campaign per user in a ten minute window. We would be interested to understand from the ecosystem whether the contribution budget still causes inaccurate Reach measurements after implementing optimization tactics.

In terms of providing additional budget either at the ten minute window or at the daily window, this change would allow for larger potential information gain on a given user, and so this is not in our immediate roadmap.

We would be interested to hear additional ideas of improvements we could make to solve this challenge of losing ad events while maintaining privacy. We welcome additional feedback and public discussion on this subject as we work towards a solution over the long term that addresses these concerns.

@alexmturner alexmturner changed the title Contribution bounding value and scope Feedback on Contribution bounding value, scope, and epsilon Jan 11, 2024
@alexmturner
Copy link
Collaborator Author

We have added this context in the original post as well but we would like to open the scope of this issue to understand feedback on epsilon. The Aggregation Service currently supports a range up to 64. Note that the Aggregation Service adds noise to summary reports that is distributed according to the Laplace distribution with a mean of zero and a standard deviation

sqrt(2) * L1 / epsilon

where L1 is currently 2^16. We are interested in understanding the smallest value of epsilon required the support the minimum viable functionality of your system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants