[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting #732

alois-bissuel · 2023-03-23T10:27:47Z

Hello,

There are some cases where we want to use the aggregated API for two uses cases which are quite different:

A low latency campaign monitoring system, where knowing with little delay attributed sales is paramount for correct delivery. Little to no delays can be
Detailed client reporting where precision and richness of the data presented are key. Here delays are more acceptable (ie some arbitrage can be made between delay and signal-to-noise ratio).

We struggle to articulate the two use cases within the API in its current form. Because the data can be only processed once, we have to sacrifice one of the use cases (ie either use one detailed encoding and process the data hourly, meaning use case 2. gets drowned by noise, or process data daily and sacrifice use case 1.).

Supporting the two use cases at the same time could be done by allowing several passes of the data in the aggregation service. To keep the differential privacy properties of the aggregation service, we could keep track of the already consumed budget (i.e. the first pass used ε/4, the second ε/2, and the last ε/4). Another approach would be to define broad key spaces (e.g. split the 128 bit space in 4 buckets), and allow the aggregation only once per key space. This way one would encode in the first key space the fast-paced campaign monitoring metrics and query the aggregation service hourly for them, and encode the client reporting metrics elsewhere and aggregate them weekly.

Both methods have their pros and cons, the latter being more precise (as one doesn't burn some of my budget for the two use cases at the same time), and the former enabling to have less regret (ie on can always reserve some budget for a last aggregation in case of a mistake).

For both methods, the storage space of the aggregation service. can be controlled by setting a sensible but low limit on the number of times the data can be processed.

csharrison · 2023-03-28T14:22:33Z

cc @hostirosti @ruclohani for visibility.

Thanks for filing, @alois-bissuel . I agree a more flexible way of consuming privacy budget should help satisfy the use case. Between use-case (1) and (2) you mention, do you expect the keys to be similar or will e.g. (2) query finer grained slices?

RonShub · 2023-03-30T15:16:33Z

Hello,
+1 for the hourly and daily use case request

It would be super useful for us if it would be possible to increase the limit from 1 to 2. i.e. that the same aggregatable report could appear in two batches and hence contribute to two summary report.
This is crucial for our clients for two main reasons:

Facilitate our hourly->daily (or possibly daily->weekly if SNR is too low) pulls strategy, which will enable our clients to get noiser data first, and data with improved SNR later
Fault-tolerance: in case an internal issue happens on our side on the hourly request and we lost the aggregation service response, we would be able to fallback to the daily response (and vice versa)

For our use case, we expect the keys to be similar.

michal-kalisz · 2023-04-05T14:03:32Z

Hi,
@alois-bissuel thanks for bringing this up. We have similar cases and addressing them seems very valuable, especially from an operational point of view.
At first glance, the second solution (dividing 128 bits into several buckets) seems to be better.

csharrison · 2023-04-05T15:15:01Z

@alois-bissuel quick clarification, you said:

Both methods have their pros and cons, the latter being more precise (as one doesn't burn some of my budget for the two use cases at the same time)

Is this true? If we split the key space into a few sections and allowed you to query those sections independently, you still need to allocate separate budget across those key spaces, just in the form of the client's L1 contribution bound rather than an epsilon.

Is this because the two use-cases will actually have different data / keys, and so querying the "high latency / detailed" key space during a low latency query is wasteful?

alois-bissuel · 2023-04-14T09:25:31Z

Catching up on my issues, sorry for the delay:

Thanks for filing, @alois-bissuel . I agree a more flexible way of consuming privacy budget should help satisfy the use case. Between use-case (1) and (2) you mention, do you expect the keys to be similar or will e.g. (2) query finer grained slices?

As there will be less data available for use-case (1) (ie the fast-pace aggregation case), I expect us to encode less things in the key to have more data aggregated per key.

Both methods have their pros and cons, the latter being more precise (as one doesn't burn some of my budget for the two use cases at the same time)

Is this true? If we split the key space into a few sections and allowed you to query those sections independently, you still need to allocate separate budget across those key spaces, just in the form of the client's L1 contribution bound rather than an epsilon.

Indeed, I was not clear there. I was thinking of separate encoding for both use case and thus a finer budget tracking. Of course the L1 budget still applies. I guess that my first proposal (ie allocating an epsilon budget per pass) rules out a different encoding per use case, hence my remark (and your final comment).

csharrison added the possible-future-enhancement Feature request with no current decision on adoption label Jun 26, 2023

michal-kalisz mentioned this issue Jul 3, 2023

Private Aggregation API one-week test summary [05.2023] patcg-individual-drafts/private-aggregation-api#82

Closed

keke123 mentioned this issue Jul 14, 2023

Does "batch disjointness" offer enough flexibility? #331

Open

alexmturner mentioned this issue Sep 8, 2023

Consider adding a custom 'label' to allow more flexible batching patcg-individual-drafts/private-aggregation-api#92

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting #732

[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting #732

alois-bissuel commented Mar 23, 2023

csharrison commented Mar 28, 2023

RonShub commented Mar 30, 2023

michal-kalisz commented Apr 5, 2023

csharrison commented Apr 5, 2023

alois-bissuel commented Apr 14, 2023

[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting #732

[Aggregated API] Using the API for both low latency reactive monitoring and detailed client reporting #732

Comments

alois-bissuel commented Mar 23, 2023

csharrison commented Mar 28, 2023

RonShub commented Mar 30, 2023

michal-kalisz commented Apr 5, 2023

csharrison commented Apr 5, 2023

alois-bissuel commented Apr 14, 2023