Skip to content

Commit

Permalink
Merge branch 'main' into patch-7
Browse files Browse the repository at this point in the history
  • Loading branch information
caraitto committed Mar 4, 2023
2 parents 5cba6fa + 84cc14e commit fc5846d
Show file tree
Hide file tree
Showing 5 changed files with 211 additions and 52 deletions.
Binary file added DP_kanon_server.pdf
Binary file not shown.
57 changes: 32 additions & 25 deletions FLEDGE_extended_PA_reporting.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,12 @@ function generateBid(interestGroup, auctionSignals, perBuyerSignals, trustedBidd
});
```

The above logic will trigger a report if the generated bid wins (see [reserved.win](#reporting-bidding-data-for-wins)). And another one,
if the user later clicks on the winning ad (this needs to be triggered by the fenced frame itself, see
reporting for post-auction signals). When the buyer receives an aggregated report they can infer what
the click-through-rate (CTR) was for users on different “interest group age” buckets.
The above logic will trigger a report if the generated bid wins (see
[reserved.win](#reporting-bidding-data-for-wins)). And another one, if the user later clicks on the
winning ad (this needs to be triggered by the fenced frame itself, see
[reportPrivateAggregationEvent](#reporting-bidding-data-associated-with-an-event-in-a-frame). When
the buyer receives an aggregated report they can infer what the click-through-rate (CTR) was for
users on different “interest group age” buckets.

### Example 2: Getting the average bid gap for an ad.

Expand All @@ -102,13 +104,14 @@ do this, we introduce a field to the “contributions” object called `signalVa
the report to depend on post auction information. A `signalValue` object is composed of the following
values:
* `baseValue`: The name of the auction result value we want to report. For instance, winningBid.
* `offset`: Optional offset to add/subtract to the auction result value
* `scale`: Optional scale factor by which we want to multiply the output. This is useful for
controlling the amount of noise added by the aggregation service. Scale is applied after `offset`
is subtracted.
* `scale`: Optional scale factor by which we want to multiply the auction result value. This is
useful for controlling the amount of noise added by the aggregation service. Scale is applied
before `offset` is added.
* `offset`: Optional offset to add to the auction result value.

After the auction happens, the final value of the generated report is `scale`*(`baseValue` + `offset`). The
following example shows how to return the gap between an ad bid and the winning bid:

After the auction happens, the final value of the generated report is (`baseValue` * `scale`) + `offset`.
The following example shows how to return the gap between an ad bid and the winning bid:

```
function generateBid(...) {
Expand All @@ -120,7 +123,7 @@ function generateBid(...) {
value: {
baseValue: "winningBid",
scale: 2, // Number which will be multiplied by browser value
offset: -bid // Numbers which will be added to browser value, prior to scaling
offset: -bid * 2 // Numbers which will be added to browser value, after scaling
}
});
Expand All @@ -133,7 +136,7 @@ contribution would be generated as:

```
bucket: 1596n
value: 200 // = (winningBid + -bid) * scale = (200 - 100) * 2
value: 200 // = (winningBid * scale) + offset = (200 * 2) + (-100 * 2)
```
This would correspond to the gap by which the advertiser lost the auction, scaled by 2.
Scaling is important as it allows bidders to better control the amount of noise which will
Expand Down Expand Up @@ -182,16 +185,16 @@ The parameters consist of:
be sent (see [Triggering reports](#triggering-reports) below), and
* a `contribution` object which contains:
* a `bucket` which is a 128bit ID or a `signalBucket` which tells the browser how to calculate the bucket (represented as BigInt) and
* a `value` which is an integer or a `signalValue` which tells the browser how to calculate the value.
* a `value` which is a non-negative integer or a `signalValue` which tells the browser how to calculate the value.


Where `signalBucket` and `signalValue` is a dictionary which consists of:
* a `baseValue` field indicating which value the browser should use to calculate the resulting bucket or value. A `signalValue.baseValue` or `signalBucket.baseValue` may be any of the following:
* `winningBid`: the value used is the winning bid value.
* `highestScoringOtherBid`: the value used is the bid value that was scored as second highest.
* `scriptRunTime`: milliseconds of CPU time that the calling function required, when called.
* `signalsFetchTime`: milliseconds required to fetch the trusted bidding or scoring signals, when called from `generateBid()` or `scoreAd()` respectively.
* `bidRejectReason`: one of the following values:
* `winning-bid`: the value used is the winning bid value.
* `highest-scoring-other-bid`: the value used is the bid value that was scored as second highest.
* `script-run-time`: milliseconds of CPU time that the calling function required, when called.
* `signals-fetch-time`: milliseconds required to fetch the trusted bidding or scoring signals, when called from `generateBid()` or `scoreAd()` respectively.
* `bid-reject-reason`: one of the following values:
* 0: indicates ad creative URL did not meet the k-anonymity threshold
* 1: indicates seller rejected bid because “Invalid Bid”
* 2: indicates seller rejected bid because “Bid was Below Auction Floor”
Expand All @@ -205,8 +208,8 @@ Where `signalBucket` and `signalValue` is a dictionary which consists of:
* The auction was aborted (i.e. calling endAdAuction())
* an auction that never rendered the ad
* optional `offset` and `scale` that allow the reporter to manipulate the browser signal to customize the buckets and values they will receive in reports:
* `scale` will be multiplied by the browser provided value
* `offset` will be added to the browser provided value (allows you to shift buckets, etc)
* `scale` will be multiplied by the browser provided value. Scale is applied before `offset` is added. Default value is 1.0.
* `offset` will be added to the browser provided value (allows you to shift buckets, etc). Default value is 0.

## Triggering reports

Expand All @@ -216,15 +219,15 @@ A fenced frame can trigger the sending of contributions associated with an arbit
by calling into a new API:

```
window.fence.reportPrivateAggregationEvent("click");
window.fence.reportEvent("click");
```

This will cause any contributions associated with a call to `reportContributionForEvent()`
with an event-type of `click` to be reported/sent.

In this example, `"click"` is an event-name chosen by the auction bidder. There are a number
of event names that are reserved and invoked directly by the browser. All reserved values will
have the `"reserved."` prefix.
have the `"reserved."` prefix, and all non-reserved values cannot have the `"reserved."` prefix.

### Reporting bidding data for wins

Expand Down Expand Up @@ -256,7 +259,11 @@ buyers a way to express which sellers they’re comfortable sharing this informa
add a new mechanism which allows each buyer to declare a set of approved sellers: The interestGroup
provided to `navigator.joinAdInterestGroup()` will contain a new field named `sellerCapabilities`, a
dict keyed by seller origin (or "*", to set defaults for non-specified seller origins) with lists of
permission strings as values (as described below).
permission strings as values (as described below). Sellers might want to only score bids from
interest groups that will share aggregate statistics, so a field, `requiredSellerCapabilities` will
also be added to the auction config. Any interest group that doesn't permit (for the auction's seller)
all the `sellerCapabilities` listed in `requiredSellerCapabilities` will not participate in the
auction.

For the seller to declare reporting, the `auctionConfig` passed to `runAdAuction` is amended to
contain a configuration for the seller latency report.
Expand Down Expand Up @@ -288,8 +295,8 @@ The seller is able to measure the following for each buyer, assuming permission
* `interestGroupCount`: The number of the interest groups which could participate in the auction
(i.e. the number of intererest groups on the machine for this buyer -- note the count *isn't* limited by the auction config's `perBuyerGroupLimits`). This requires the `interest-group-counts` `sellerCapabilities` permission.
* `bidCount`: The number of valid bids generated by this buyer. This requires the `interest-group-counts` `sellerCapabilities` permission.
* `totalGenerateBidLatency`: The sum of execution time for all generateBids() in milliseconds. If the interest group didn't fetch any trusted signals, then 0 milliseconds is reported. This requires the `latency-stats` `sellerCapabilities` permission.
* `totalSignalsFetchLatency`: The total time spent fetching trusted buyer signals in milliseconds. This requires the `latency-stats` `sellerCapabilities` permission.
* `totalGenerateBidLatency`: The sum of execution time for all generateBids() in milliseconds. This requires the `latency-stats` `sellerCapabilities` permission.
* `totalSignalsFetchLatency`: The total time spent fetching trusted buyer signals in milliseconds. If the interest group didn't fetch any trusted signals, then 0 milliseconds is reported. This requires the `latency-stats` `sellerCapabilities` permission.

Given the `auctionConfig` above, if buyer1.com had two interest groups participate in the auction,
their trusted buyer signals fetch taking 10ms, their `generateBid()` scripts running for 2ms and
Expand Down
111 changes: 111 additions & 0 deletions FLEDGE_k_anonymity_differential_privacy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# AboveThreshold with Periodic Restarts Algorithm

At a high level, the job of the k-anonymity server is to keep track, for each
"set" _S_, how many distinct users have invoked `Join` on _S_ over a given
lookback period _w_ (say, last 7 days). If the set size exceeds a predetermined
threshold _k_, the set is said to have k-anonymity status. The k-anonymity
server continually updates the k-anonymity status of each set known to it, and
publishes the status of each set via the `Query` endpoint.

In this document, we explain how the k-anonymity status of a set is updated and
published, to limit the ability of a bad actor to learn information about
individual user behavior by observing changes in the output of the Query
endpoint.

We employ the following techniques to protect user's privacy:


* The status of each set is only updated at periodic intervals (such as once an
hour) rather than continuously. This limits the ability of tying a specific set
status change (from no to yes) to a specific `Join` action based on timing.

* We add a calibrated amount of noise to both the true set size and the
k-anonymity threshold when evaluating the status of a set. This ensures
differential privacy properties for the algorithm and masks the contribution
of any single user's actions on the status of the set.

* We limit the frequency with which the status of a set can change. While it is
important to support fast ramp-ups (a large number of users Join-ing a set
should result in the set achieving k-anonymity quickly), we limit the speed at
which k-anonymity status is lost.


In [Differentially Private Algorithms for 𝑘-Anonymity Server][1] we analyze the
algorithm and formally bound the amount of information leakage.

We begin by describing the main [parameters](#parameters) of the algorithm and
presenting the [algorithm](#algorithm) itself at a high level. We then propose a list
of [initial parameter settings](#proposed-parameter-settings).

## Parameters

We will use the following parameters in the AboveThresholdWithPeriodicRestart
algorithm:

| Parameter | Definition |
|----------- |------------|
|$p$ | $p$ stands for the **query update period**. The $k$-anonymity query is evaluated at increments of $p$, that is, at approximate times $t_0$, $t_0+p$, $t_0+2p$... for some starting timestamp $t_0$. For convenience, we define the time values $(w, t)$ in multiples of $p$. For example, if $p$ is 1 hour, the window length will be an integer number of hours. |
|$w$ | We use $w$ for the set member **counting window length**. The window corresponds to a duration of $wp$ time. For a time $t\in \mathbb{N}$ (corresponding to timestamp $t_0+tp$), the set size subject to the $k$-anonymity requirement, is defined as the number of unique set members who joined the set in the interval $(t-w,t]$. We denote this count for the set $S$ by $C(S,t-w,t)$. |
|$k$ | We use $k$ for the **$k$-anonymity** threshold. The $k$-anonymity status of a set $S$ will be a noisy response to the query: does $S$ have at least $k$ members (i.e., is $C(S,t-w,t)\geq k)$? |
|$\varepsilon, \delta$ | $\varepsilon, \delta$ are the standard differential-privacy parameters. Let $f:\mathbb{N} \rightarrow {0 (false),1 (true)}$ (we can write $f\in2^\mathbb{N}$) be a function corresponding to a sequence of Query server responses for set $S$, where $t\in \mathbb{N}$. We denote this by $Query(S)=f$. We say that the query function is $(\varepsilon, \delta)$-private if for any $S$, any set $F\in 2^\mathbb{N}$ of possible Query server responses, and inputs $X, X'$ which are equal except for a single user $u$ addition to a set $S$, $P(Query(S)\in F\vert X)\leq e^\varepsilon P(Query(S)\in F\vert X')+\delta$. |

## The AboveThresholdWithPeriodRestartAlgorithm

The following paragraphs describe the algorithm at a high level. For further details,
see [Differentially Private Algorithms for 𝑘-Anonymity Server][1].

Let $A_t(S)$ be the evaluation of the $k$-anonymity query for set $S$ at time $t$.


**Parameters**: $p$ (Query update period); $w$ (counting window size), $k$ (threshold),
$\varepsilon, \delta$ (privacy parameters).

**Inputs**: Set $S$, $t$ (current time), $C(S,t-w,t)$ (exact count of unique clients
whose membership in $S$ overlaps with interval $(t-w,t]$).

**State**: $A_{t-1}(S)$ (most recent algorithm evaluation at time $t-1$)

**Algorithm $A_t(S)$**:

```
// algorithm restart
if t-1 = 0 mod w:
state = false;
// modify k with truncated Laplace noise mu (function of epsilon)
generate k'=k+mu;
if state == true:
A_t(S)=true;
else:
generate nu; // truncated Laplace noise
A_t(S)=(C(S,t-w,t)+nu <= k');
```

## Proposed parameter settings

We propose the following parameter values to determine k-anonymity status of
a set.


| Parameter | Value |
|--------------------|-----------|
|Update period $p$ | $p=1$ hour. The k-anonymity server will re-evaluate the _k_-anonymity status of each set once every hour. |
|Lookback window $w$ | $w=7$ days (168 units of $p$). Join events from the past 7 days will count towards measuring the current set size. |
|Set size $k$ | $k=50$. A set must contain at least 50 users (in a noisy sense) to earn positive k-anonymity status. |


The differential privacy algorithm needs the $\varepsilon$ (epsilon) and
$\delta$ (delta) parameters to control the amount of noise added to the true
set size and the set size threshold. We choose the lowest round value of
$\varepsilon$ that permits $\delta < 10^{-5}$ given the choice of parameters
$p$, $w$, and $k$ above.


| Parameter | Value |
|----------------------|-----------|
|$\varepsilon, \delta$ | $\varepsilon=3$, $\delta<10^{-5}$ |


We verified that these parameters resulted in a limited amount of noise (as
shown in our document [Differentially Private Algorithms for 𝑘-Anonymity Server][1]).

[1]: DP_kanon_server.pdf

0 comments on commit fc5846d

Please sign in to comment.