Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splitting contributions into multiple reports instead of truncating #81

Open
alexmturner opened this issue Jun 28, 2023 · 10 comments
Labels
compat Issue may affect web compatibility enhancement New feature or request

Comments

@alexmturner
Copy link
Collaborator

Feedback from discussion in #44. See also this explainer section. The interaction between this and a providing context ID needs to be thought through still. To avoid performance concerns, we will likely still need a limit on the number of reports.

This would be a web visible change as number of reports being sent would change. However, the main compatibility concern is around feature detection -- developers may wish to know if they can safely use more contributions. One option is to provide a method/attribute with the maximum number of allowed contributions.

@alexmturner alexmturner added compat Issue may affect web compatibility enhancement New feature or request labels Jun 28, 2023
@csharrison
Copy link

I want to also think about this in the context of performance sensitive callers that want to make sure padding (#56) is not bloating storage. E.g. if context_id-providing callers are sure they will always make a single contribution, we should let them pre-specify that and receive a more optimized payload.

@maplgh
Copy link

maplgh commented Feb 14, 2024

The truncation to 20 contributions proposed and introduced in issue#32 hinders our use of the Private Aggregation API (PAA).

We are considering using PAA to obtain debug and troubleshooting data and other use cases. The debug and troubleshooting metrics will require multiple auction scope contributions (e.g. auction count, win count, loss count) and at least one candidate scope contribution to record rejection reasons. Based on our analysis, just the debug and troubleshooting contributions will hit the limit of 20, and we still have the other use cases that we are considering.

Would be great to learn about the status of this issue about splitting contributions into multiple reports.

@dmcardle
Copy link
Contributor

dmcardle commented May 2, 2024

To avoid complex privacy mitigations, we're no longer considering splitting contributions into multiple reports. Instead, we are designing several solutions that enable sites to customize the number of contributions per report.

Increase default report size for Protected Audience callers

Protected Audience buyers Protected Audience sellers Shared Storage callers

We recognize that 20 contributions are not enough for some use cases and that it will take some time to launch a more thorough solution. Until then, we’re considering increasing the default — for Protected Audience script runners only — from 20 contributions per report to 100.

Raising the limit would certainly improve utility, but it would also increase the costs for adtechs. Estimates show that processing reports with 100 contributions could roughly double the processing time and cost of operating the Aggregation Service. Changing the default would affect all adtechs, even those that may not need additional contributions.

We’re seeking feedback from adtechs: is this appealing as a temporary solution or is the cost too high?

Global configuration solution for all callers

Protected Audience buyers Protected Audience sellers Shared Storage callers

Each reporting origin could publish their desired number of contributions per report at a well-known location. This makes the contribution count a global setting, meaning it cannot be customized for each auction unless per-call report sizing is in effect (see next section).

In this design, the browser fetches the well-known configuration once per report to determine the payload size. Each report must have exactly the number of contributions specified in the configuration to avoid leaking cross-site information out of the isolated context. The browser either drops excess contributions or pads with null contributions to hit the target.

To bound local resource consumption, the browser must cap the number of contributions at some reasonable upper bound. We’ve penciled in 1000 contributions as the upper limit, pending analysis. We’re also seeking feedback from adtechs on this value.

Because this design adds a new network request, it may slightly increase report loss. This change should be minor, but we will analyze any impact further if we proceed with this option.

Per-call report sizing solution for sellers and Shared Storage callers

Protected Audience buyers Protected Audience sellers Shared Storage callers

We’re also designing a more focused solution for Protected Audience sellers and Shared Storage callers. Protected Audience buyers are excluded because this capability could enable them to leak cross-site information.

We can add a new field to the privateAggregationConfig dictionary used when invoking Shared Storage or runAdAuction(). When triggering an operation that could send an aggregatable report, callers could also set the number of contributions per report, similar to how they could set a context ID today. Customizing the contribution count in this way overrides the global setting described in the previous section.

Generated reports would have exactly the configured number of contributions, with the browser either dropping excess contributions or adding null contributions. This is necessary to prevent the encrypted payload size from leaking cross-site information.

Just like the prior solution, we are penciling in 1000 contributions per report as the upper limit, pending analysis. We’re also seeking feedback from adtechs on this value.

Requests for feedback

To recap, we’re requesting feedback on each of the following:

  1. Protected Audience buyers/sellers only: changing the default from 20 contributions per report to 100. Given that this could double the cost of operating the Aggregation Service for all adtechs, is it an appealing stopgap solution?

  2. Protected Audience buyers/sellers and Shared Storage callers: publishing the desired number of contributions per report at a well-known location. Any concerns with the design or capping at 1000 contributions per report?

  3. Protected Audience sellers and Shared Storage callers: using a new configuration field to set the number of contributions per report alongside the context ID. Any concerns with the design or capping at 1000 contributions per report?


(Edited on 2024-06-04 to clarify applicable use cases for each solution.)

@rushilw-google
Copy link
Contributor

In addition to increasing the contribution limit, we may also be able to improve the utility from each contribution by allowing pre-aggregation of values that share the same bucket and filtering ID.

Suppose an adtech wants to measure bid rejection reasons across their Interest Groups (IGs) and they use one bucket for each reason. If there are 3 unique bid rejection reasons across 15 eligible Interest Groups (IGs), current reporting will still result in 15 contributions (one for each IG) to be added to a report. Here, we could pre-aggregate the counts of bids rejected by the 3 unique reasons, which would decrease the total number of contributions needed from 15 to 3. That would free up the remaining contributions to be used for other measurements.

Pre-aggregating contributions could be beneficial to buyer and seller measurements including latency reporting. We seek adtech feedback on whether this mechanism would allow better utility from the Private Aggregation API.

@maplgh
Copy link

maplgh commented May 25, 2024

Thanks for the responses on this issue! Below is the feedback on the limit changes and pre-aggregation proposals.

  1. Protected Audience only: changing the default from 20 contributions per report to 100. Given that this could double the cost of operating the Aggregation Service for all adtechs, is it an appealing stopgap solution?

This is a welcome change in the short term as it will allow capturing more data for our testing of the API.

  1. No context ID: publishing the desired number of contributions per report at a well-known location. Any concerns with the design or capping at 1000 contributions per report?
  2. Context ID: using a new configuration field to set the number of contributions per report alongside the context ID. Any concerns with the design or capping at 1000 contributions per report?

These changes sound great as it provides a configurable way to significantly increase the contribution limits. As of now we have no concerns with the 1000 contributions limit in part because of the contribution budget cap becoming the limiting factor

In addition to increasing the contribution limit, we may also be able to improve the utility from each contribution by allowing pre-aggregation of values that share the same bucket and filtering ID.

This proposal sounds reasonable and that's how I originally expected it to work based on the fact that the limit's main purpose is to manage the report size.

@michal-kalisz
Copy link

Feedback on Proposal

  1. About registered Contributions:
  • We do not want to lose any registered contributions.

  • In our case, the number of contributions reported can vary significantly depending on the interest group and user data.

  1. Method for Retrieving Limits:
  • It would be beneficial to have a method that returns the current contribution limit.
  1. Information on Truncated Reports:
  • If a report is truncated (the number of contributions exceeds the limit), it would be useful to receive this information in an aggregated and privacy-preserving manner. For instance, a mechanism similar to Aggregate Debug Reporting (mentioned in issue 705), but adapted for the Private Aggregation API, could be implemented.
  1. Pre-aggregation:
  • The idea of pre-aggregation seems promising as well.
  1. Clarification on Batching:
  • It's not entirely clear how batching is applied: is it according to what is described in the explainer? For Protected Audience, can a single batch contain contributions from different bidding functions or different auctions? If so, is there a defined time window that groups these contributions from different auctions? Could we get a detailed description of how this would be implemented?

Broader Questions
Taking a step back, we have some questions regarding the problem we're trying to solve, specifically:

  1. Batching Rejection:
  • Why was splitting into batches dismissed in the considerations?
  1. Privacy Threats and Padding:
  • What privacy threats does padding aim to address?

  • We understand that one threat is the possibility of transmitting certain information through the payload size. Is the payload size itself a threat, or is it only a concern in conjunction with the IP address?

  • We have the IP address because we receive requests directly from the user's browser. If the IP address were not available, would this eliminate the concern?

  • Does IP Protection mitigate this risk?

  • Are you also designing solutions for B&AS where the entire flow could be enclosed within a secure environment (TEE)?

  1. Alternative Ideas to Mitigate Payload Size Inference:
  • Have you considered other ideas to prevent inference based on payload size? For example, instead of padding to the limit, adding a random number of null contributions? This doesn't entirely solve the problem but reduces the amount of information that can be observed by the reporting endpoint. Given that report sending is delayed by random(1H), the amount of disclosed information is minimal. Using the private aggregation API inherently sends 1 bit of information, but it’s unclear how this could be exploited in a meaningful way.

  • Another idea could be keeping the limit low but allowing splitting into batches with a random element, so that inference based on the number of batches is affected by noise.

We look forward to your comments on these points

@menonasha
Copy link

Appreciate the feedback @michal-kalisz . We definitely understand that you don’t want to lose any registered contributions and that is one of the motivations around why we are proposing a more flexible contribution method.

First to address some of your broader questions, the reason why we didn’t proceed with splitting batches is because it would increase the potential leak via report counts and require us to implement randomized null reporting at a high rate or make the number of reports deterministic. This would significantly increase the number of reports that need to be processed through Aggregation Service for Protected Audience callers. Another reason is that putting more contributions into one report generally has better performance properties than splitting them up. We definitely might still add randomized null reporting in the future but don’t want to increase the number of null reports we’d have to send.

With regards to padding, the payload size is a threat that can be lessened without IP but even with that and randomized delay, the payload size could still leak some non-zero amount of cross-site information. If we were to add random null contributions instead of padding, we would end up needing to add a large amount of contributions to get to reasonable privacy so it is better to pad to a fixed value.

We had some questions on your feedback. Can you explain why you would want an accessor for the contribution limit if you are able to set it in the global configuration? Additionally, are you supportive of the idea to increase the number of contributions from 20 to 100 in the immediate term, given the increased Aggregation Service costs? Lastly I wanted to highlight this proposal for aggregate debugging where we might be able to have an error code for when a report is truncated but still sent.

@maleangrubb
Copy link

We also have this issue. See WICG/turtledove#1187

Increasing to 100 in the immediate term would be a huge help.

@michal-kalisz
Copy link

Thank you, Asha, for the detailed response.

Increasing the limit from 20 to 100 seems to be a good solution in the immediate term.

The idea of the Method for Retrieving Limits was to check how many contributions could still be reported before the limit is exceeded. But I understand that at the time of creating the contribution, we don't know this yet.

In my response, I wanted to ask whether, if we operate in a TEE environment, some of these restrictions could be lifted. For example, instead of having reports go from the device to the ad tech network and then to storage connected to the Aggregation Service/TEE, could it be contained within connected TEE environments?

Michał

@tylerdev0
Copy link

From the Demand Side we would also appreciate the increase in contribution limits. Due to buyers not having access to latency statistics (except from sellers), we would rely on sellers to provide this data for better visibility into bidding function performance, however sellers we have partnered with have hit their contribution limit and therefore cannot provide us with latency stats.

Increasing the limit, at least in the short term, would be appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compat Issue may affect web compatibility enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants