Merge branch 'main' into patch-7

WICG · Mar 4, 2023 · fc5846d · fc5846d
2 parents 5cba6fa + 84cc14e
commit fc5846d
Show file tree

Hide file tree

Showing 5 changed files with 211 additions and 52 deletions.
diff --git a/DP_kanon_server.pdf b/DP_kanon_server.pdf
diff --git a/FLEDGE_extended_PA_reporting.md b/FLEDGE_extended_PA_reporting.md
@@ -88,10 +88,12 @@ function generateBid(interestGroup, auctionSignals, perBuyerSignals, trustedBidd
  });
 ```
 
-The above logic will trigger a report if the generated bid wins (see [reserved.win](#reporting-bidding-data-for-wins)). And another one,
-if the user later clicks on the winning ad (this needs to be triggered by the fenced frame itself, see
-reporting for post-auction signals). When the buyer receives an aggregated report they can infer what
-the click-through-rate (CTR) was for users on different “interest group age” buckets.
+The above logic will trigger a report if the generated bid wins (see
+[reserved.win](#reporting-bidding-data-for-wins)). And another one, if the user later clicks on the
+winning ad (this needs to be triggered by the fenced frame itself, see
+[reportPrivateAggregationEvent](#reporting-bidding-data-associated-with-an-event-in-a-frame). When
+the buyer receives an aggregated report they can infer what the click-through-rate (CTR) was for
+users on different “interest group age” buckets.
 
 ### Example 2: Getting the average bid gap for an ad. 
 
@@ -102,13 +104,14 @@ do this, we introduce a field to the “contributions” object called `signalVa
 the report to depend on post auction information. A `signalValue` object is composed of the following
 values:
 * `baseValue`: The name of the auction result value we want to report. For instance, winningBid. 
-* `offset`: Optional offset to add/subtract to the auction result value
-* `scale`: Optional scale factor by which we want to multiply the output. This is useful for
-controlling the amount of noise added by the aggregation service. Scale is applied after `offset`
-is subtracted.
+* `scale`: Optional scale factor by which we want to multiply the auction result value. This is
+useful for controlling the amount of noise added by the aggregation service. Scale is applied
+before `offset` is added.
+* `offset`: Optional offset to add to the auction result value.
 
-After the auction happens, the final value of the generated report is `scale`*(`baseValue` + `offset`). The
-following example shows how to return the gap between an ad bid and the winning bid:
+
+After the auction happens, the final value of the generated report is (`baseValue` * `scale`) + `offset`.
+The following example shows how to return the gap between an ad bid and the winning bid:
 
 ```
 function generateBid(...) {
@@ -120,7 +123,7 @@ function generateBid(...) {
  value: {
  baseValue: "winningBid",
  scale: 2, // Number which will be multiplied by browser value
- offset: -bid // Numbers which will be added to browser value, prior to scaling
+ offset: -bid * 2 // Numbers which will be added to browser value, after scaling
  }
  });
 
@@ -133,7 +136,7 @@ contribution would be generated as:
 
 ```
 bucket: 1596n
-value: 200 // = (winningBid + -bid) * scale = (200 - 100) * 2 
+value: 200 // = (winningBid * scale) + offset = (200 * 2) + (-100 * 2)
 ```
 This would correspond to the gap by which the advertiser lost the auction, scaled by 2.
 Scaling is important as it allows bidders to better control the amount of noise which will
@@ -182,16 +185,16 @@ The parameters consist of:
 be sent (see [Triggering reports](#triggering-reports) below), and
 * a `contribution` object which contains:
  * a `bucket` which is a 128bit ID or a `signalBucket` which tells the browser how to calculate the bucket (represented as BigInt) and
- * a `value` which is an integer or a `signalValue` which tells the browser how to calculate the value.
+ * a `value` which is a non-negative integer or a `signalValue` which tells the browser how to calculate the value.
 
 
 Where `signalBucket` and `signalValue` is a dictionary which consists of:
 * a `baseValue` field indicating which value the browser should use to calculate the resulting bucket or value. A `signalValue.baseValue` or `signalBucket.baseValue` may be any of the following:
- * `winningBid`: the value used  is the winning bid value.
- * `highestScoringOtherBid`: the value used is the bid value that was scored as second highest.
- * `scriptRunTime`: milliseconds of CPU time that the calling function required, when called.
- * `signalsFetchTime`: milliseconds required to fetch the trusted bidding or scoring signals, when called from `generateBid()` or `scoreAd()` respectively.
- * `bidRejectReason`: one of the following values:
+ * `winning-bid`: the value used is the winning bid value.
+ * `highest-scoring-other-bid`: the value used is the bid value that was scored as second highest.
+ * `script-run-time`: milliseconds of CPU time that the calling function required, when called.
+ * `signals-fetch-time`: milliseconds required to fetch the trusted bidding or scoring signals, when called from `generateBid()` or `scoreAd()` respectively.
+ * `bid-reject-reason`: one of the following values:
  * 0: indicates ad creative URL did not meet the k-anonymity threshold
  * 1: indicates seller rejected bid because “Invalid Bid”
  * 2: indicates seller rejected bid because “Bid was Below Auction Floor”
@@ -205,8 +208,8 @@ Where `signalBucket` and `signalValue` is a dictionary which consists of:
  * The auction was aborted (i.e. calling endAdAuction())
  * an auction that never rendered the ad
 * optional `offset` and `scale` that allow the reporter to manipulate the browser signal to customize the buckets and values they will receive in reports:
- * `scale` will be multiplied by the browser provided value
- * `offset` will be added to the browser provided value (allows you to shift buckets, etc)
+ * `scale` will be multiplied by the browser provided value. Scale is applied before `offset` is added. Default value is 1.0.
+ * `offset` will be added to the browser provided value (allows you to shift buckets, etc). Default value is 0.
 
 ## Triggering reports
 
@@ -216,15 +219,15 @@ A fenced frame can trigger the sending of contributions associated with an arbit
 by calling into a new API:
 
 ```
-window.fence.reportPrivateAggregationEvent("click");
+window.fence.reportEvent("click");
 ```
 
 This will cause any contributions associated with a call to `reportContributionForEvent()`
 with an event-type of `click` to be reported/sent. 
 
 In this example, `"click"` is an event-name chosen by the auction bidder. There are a number
 of event names that are reserved and invoked directly by the browser. All reserved values will
-have the `"reserved."` prefix.
+have the `"reserved."` prefix, and all non-reserved values cannot have the `"reserved."` prefix.
 
 ### Reporting bidding data for wins
 
@@ -256,7 +259,11 @@ buyers a way to express which sellers they’re comfortable sharing this informa
 add a new mechanism which allows each buyer to declare a set of approved sellers: The interestGroup
 provided to `navigator.joinAdInterestGroup()` will contain a new field named `sellerCapabilities`, a
 dict keyed by seller origin (or "*", to set defaults for non-specified seller origins) with lists of
-permission strings as values (as described below).
+permission strings as values (as described below). Sellers might want to only score bids from
+interest groups that will share aggregate statistics, so a field, `requiredSellerCapabilities` will
+also be added to the auction config. Any interest group that doesn't permit (for the auction's seller)
+all the `sellerCapabilities` listed in `requiredSellerCapabilities` will not participate in the
+auction.
 
 For the seller to declare reporting, the `auctionConfig` passed to `runAdAuction` is amended to
 contain a configuration for the seller latency report.
@@ -288,8 +295,8 @@ The seller is able to measure the following for each buyer, assuming permission
 * `interestGroupCount`: The number of the interest groups which could participate in the auction
 (i.e. the number of intererest groups on the machine for this buyer -- note the count *isn't* limited by the auction config's `perBuyerGroupLimits`). This requires the `interest-group-counts` `sellerCapabilities` permission.
 * `bidCount`: The number of valid bids generated by this buyer. This requires the `interest-group-counts` `sellerCapabilities` permission.
-* `totalGenerateBidLatency`: The sum of execution time for all generateBids() in milliseconds. If the interest group didn't fetch any trusted signals, then 0 milliseconds is reported. This requires the `latency-stats` `sellerCapabilities` permission.
-* `totalSignalsFetchLatency`: The total time spent fetching trusted buyer signals in milliseconds. This requires the `latency-stats` `sellerCapabilities` permission.
+* `totalGenerateBidLatency`: The sum of execution time for all generateBids() in milliseconds. This requires the `latency-stats` `sellerCapabilities` permission.
+* `totalSignalsFetchLatency`: The total time spent fetching trusted buyer signals in milliseconds. If the interest group didn't fetch any trusted signals, then 0 milliseconds is reported. This requires the `latency-stats` `sellerCapabilities` permission.
 
 Given the `auctionConfig` above, if buyer1.com had two interest groups participate in the auction,
 their trusted buyer signals fetch taking 10ms, their `generateBid()` scripts running for 2ms and

diff --git a/FLEDGE_k_anonymity_differential_privacy.md b/FLEDGE_k_anonymity_differential_privacy.md
@@ -0,0 +1,111 @@
+# AboveThreshold with Periodic Restarts Algorithm
+
+At a high level, the job of the k-anonymity server is to keep track, for each
+"set" _S_, how many distinct users have invoked `Join` on _S_ over a given
+lookback period _w_ (say, last 7 days). If the set size exceeds a predetermined
+threshold _k_, the set is said to have k-anonymity status. The k-anonymity
+server continually updates the k-anonymity status of each set known to it, and
+publishes the status of each set via the `Query` endpoint.
+
+In this document, we explain how the k-anonymity status of a set is updated and
+published, to limit the ability of a bad actor to learn information about
+individual user behavior by observing changes in the output of the Query
+endpoint.
+
+We employ the following techniques to protect user's privacy:
+
+
+* The status of each set is only updated at periodic intervals (such as once an
+hour) rather than continuously. This limits the ability of tying a specific set
+status change (from no to yes) to a specific `Join` action based on timing.
+
+* We add a calibrated amount of noise to both the true set size and the
+k-anonymity threshold when evaluating the status of a set. This ensures
+differential privacy properties for the algorithm and masks the contribution
+of any single user's actions on the status of the set.
+
+* We limit the frequency with which the status of a set can change. While it is
+important to support fast ramp-ups (a large number of users Join-ing a set
+should result in the set achieving k-anonymity quickly), we limit the speed at
+which k-anonymity status is lost.
+
+
+In [Differentially Private Algorithms for 𝑘-Anonymity Server][1] we analyze the
+algorithm and formally bound the amount of information leakage.
+
+We begin by describing the main [parameters](#parameters) of the algorithm and
+presenting the [algorithm](#algorithm) itself at a high level. We then propose a list
+of [initial parameter settings](#proposed-parameter-settings).
+
+## Parameters
+
+We will use the following parameters in the AboveThresholdWithPeriodicRestart
+algorithm:
+
+| Parameter | Definition |
+|----------- |------------|
+|$p$ | $p$ stands for the **query update period**. The $k$-anonymity query is evaluated at increments of $p$, that is, at approximate times $t_0$, $t_0+p$, $t_0+2p$... for some starting timestamp $t_0$. For convenience, we define the time values $(w, t)$ in multiples of $p$. For example, if $p$ is 1 hour, the window length will be an integer number of hours. |
+|$w$ | We use $w$ for the set member **counting window length**. The window corresponds to a duration of $wp$ time. For a time $t\in \mathbb{N}$ (corresponding to timestamp $t_0+tp$), the set size subject to the $k$-anonymity requirement, is defined as the number of unique set members who joined the set in the interval $(t-w,t]$. We denote this count for the set $S$ by $C(S,t-w,t)$. |
+|$k$ | We use $k$ for the **$k$-anonymity** threshold. The $k$-anonymity status of a set $S$ will be a noisy response to the query: does $S$ have at least $k$ members (i.e., is $C(S,t-w,t)\geq k)$? |
+|$\varepsilon, \delta$ | $\varepsilon, \delta$ are the standard differential-privacy parameters. Let $f:\mathbb{N} \rightarrow {0 (false),1 (true)}$ (we can write $f\in2^\mathbb{N}$) be a function corresponding to a sequence of Query server responses for set $S$, where $t\in \mathbb{N}$. We denote this by $Query(S)=f$. We say that the query function is $(\varepsilon, \delta)$-private if for any $S$, any set $F\in 2^\mathbb{N}$ of possible Query server responses, and inputs $X, X'$ which are equal except for a single user $u$ addition to a set $S$, $P(Query(S)\in F\vert X)\leq e^\varepsilon P(Query(S)\in F\vert X')+\delta$. |
+
+## The AboveThresholdWithPeriodRestartAlgorithm
+
+The following paragraphs describe the algorithm at a high level. For further details,
+see [Differentially Private Algorithms for 𝑘-Anonymity Server][1].
+
+Let $A_t(S)$ be the evaluation of the $k$-anonymity query for set $S$ at time $t$.
+
+
+**Parameters**: $p$ (Query update period); $w$ (counting window size), $k$ (threshold),
+$\varepsilon, \delta$ (privacy parameters).
+
+**Inputs**: Set $S$, $t$ (current time), $C(S,t-w,t)$ (exact count of unique clients
+whose membership in $S$ overlaps with interval $(t-w,t]$).
+
+**State**: $A_{t-1}(S)$ (most recent algorithm evaluation at time $t-1$)
+
+**Algorithm $A_t(S)$**:
+
+```
+// algorithm restart
+if t-1 = 0 mod w:
+ state = false;
+ // modify k with truncated Laplace noise mu (function of epsilon)
+ generate k'=k+mu;
+if state == true:
+ A_t(S)=true;
+else:
+ generate nu; // truncated Laplace noise
+ A_t(S)=(C(S,t-w,t)+nu <= k');
+```
+
+## Proposed parameter settings
+
+We propose the following parameter values to determine k-anonymity status of
+a set.
+
+
+| Parameter | Value |
+|--------------------|-----------|
+|Update period $p$ | $p=1$ hour. The k-anonymity server will re-evaluate the _k_-anonymity status of each set once every hour. |
+|Lookback window $w$ | $w=7$ days (168 units of $p$). Join events from the past 7 days will count towards measuring the current set size. |
+|Set size $k$ | $k=50$. A set must contain at least 50 users (in a noisy sense) to earn positive k-anonymity status. |
+
+
+The differential privacy algorithm needs the $\varepsilon$ (epsilon) and
+$\delta$ (delta) parameters to control the amount of noise added to the true
+set size and the set size threshold. We choose the lowest round value of
+$\varepsilon$ that permits $\delta < 10^{-5}$ given the choice of parameters
+$p$, $w$, and $k$ above.
+
+
+| Parameter | Value |
+|----------------------|-----------|
+|$\varepsilon, \delta$ | $\varepsilon=3$, $\delta<10^{-5}$ |
+
+
+We verified that these parameters resulted in a limited amount of noise (as
+shown in our document [Differentially Private Algorithms for 𝑘-Anonymity Server][1]).
+
+[1]: DP_kanon_server.pdf