Skip to content

Latest commit

 

History

History
316 lines (265 loc) · 17.9 KB

FLEDGE_k_anonymity_server.md

File metadata and controls

316 lines (265 loc) · 17.9 KB

Privacy Sandbox k-Anonymity Server

The k-anonymity server

The FLEDGE proposal calls for k-anonymity thresholds on network updates to interest groups. A browser should not request interest group updates unless there are at least $k$ other browsers that reported being in the same interest group within a TTL period. Joining an interest group is a local action stored by the user's browser, so implementing k-anonymity thresholds requires a central server to count how many different browsers have joined a given interest group and reveal to all browsers when that count becomes at least k. In this explainer we discuss this counting server and how we're taking a private approach to its design.

Interest groups are one use case for k-anonymity thresholds. FLEDGE also calls for thresholds on the renderUrl that won an auction, and there are other Privacy Sandbox APIs, such as Shared Storage, that may impose k-anonymity thresholds. Other browser features, like start_url parameters for progressive web apps, might benefit from k-anonymity thresholds as well.

Given the variety of use cases, we intend to implement a k-anonymity server that's quite general. Let's define an object as the browser-stored state (e.g. an interest group) that we wish to have a k-anonymity threshold for. We'll design the server to operate on integers s = Hash(object); that is, every object will be hashed consistently across browsers. On the server each hash will map to a single set, where that set contains all the browsers that have told the server they have the object as local browser-state. To support different use cases, with possibly different server-side behavior, we'll define a type for each set.

The diagram below shows the write path, which we call Join, from a browser to the server. The browser has an identifier or token, b, that is used by the server for counting purposes. The browser hashes the object, computing a set hash s. It sends these parameters, along with the type of the set (e.g. "interest group"), t, to the server: Join(b, t, s). The server will store this membership and apply a type-defined TTL that's configured on the server. When the TTL expires, the server will stop considering the browser b as part of (t,s) unless the browser sends another Join request that resets the TTL for its membership.

join request

The server also needs a read endpoint to expose back to the browser whether a particular set is above the k-anonymity threshold. We call this endpoint Query, and it takes a type t and the set s to check: Query(t, s). It returns a boolean that is true if the set has met the k-anonymity threshold.

The list of sets above the threshold, each represented as (t, s) and used to serve Query requests, is updated periodically with recent Join requests. A given typed set $S$ is included in the update if the cardinality of the set is at least $|S|\geq k\pm\epsilon$. If the cardinality falls below this threshold the set may be removed on the next update.

query request

The browser will periodically call Query for its local objects and update one bit per-object, is_kanon, with the result. This bit can be used by the browser to enforce thresholds. For example, the browser could only request network updates for interest groups where is_kanon == true or only render an ad if the stored renderUrl has the bit set.

This simple design implements the business functionality of the server, but there are many other things we want to do to ensure the server protects the privacy of Chrome users.

How we're thinking about privacy

We recognize the sensitivity of the information being sent to this server: the set hashes might represent browsing behavior, whether an interest group or otherwise. These are highly sensitive, and we don't want the server, or someone interacting with the server, to be able to link set hashes back to individual users.

In designing this server we're taking an iterative, privacy-focused approach. Our initial design contains robust privacy protections that are outlined in more detail below. Over time we plan to further strengthen privacy protections as research areas advance and new technologies and tools become available.

What we're doing now

Oblivious HTTP

For requests to the k-anonymity server that are intended to be anonymous, such as Join and Query requests described above, we plan to use an Oblivious HTTP relay so Google is blind to the IP addresses of end users. The payload for Join and Query requests contain set hashes, which we consider sensitive browsing behavior, so we'd like Google to know as little as possible about user requests. Oblivious HTTP is well suited to this use case, where we have small, anonymous, stateless, requests and we don't need the server that knows the browsing data, i.e. the set hashes, privy to identifying information, including IP address, of the user making the request. To implement Oblivious HTTP we're engaging a third-party company to operate a relay resource on our behalf. Chrome browsers, when making Join or Query requests, send to this relay, in the body of an HTTP POST message, an encrypted payload for the k-anonymity server. The payload is a Binary HTTP message that has been encrypted using Hybrid Public Key Encryption. Chrome will encrypt the message using keys fetched directly from the k-anonymity server (on the Google domain, not through the relay). The relay will forward the request to a gateway that will run on Google servers near the k-anonymity server. The relay is therefore oblivious to the content of the request but privy to the requestor's IP address and the k-anonymity server (and gateway) are oblivious to the requestor but privy to the request content.

ohttp diagram

This design lets us separate between two entities user identifying data that we can't remove by other means (such as the IP address) and the content of the user's request. No single entity sees both, and we expect a contractual relationship that ensures these entities don't collude to share request data.

We evaluated several other proxying designs that could avoid Google becoming privy to user IP addresses. Among the other options we considered, Oblivious HTTP stood out as offering a solution with the most coverage for Chrome users in restrictive network environments, such as those with existing forward proxies required to access the internet. As a normal, small, POST request to the relay, Oblivious HTTP traffic is expected to work in more restricted environments than, say, HTTP CONNECT or MASQUE proxy requests.

Low-entropy identifiers

Next, we're taking a conservative approach to the browser identifier, b, that is sent with Join requests. Even the operator of the k-anonymity server shouldn't be able to identify unique users calling Join. If the operator cheats, and examines the database stored by Join, we're protecting users by only sending to Join a value b with entropy limited to $j$ bits. We expect $8\leq j\leq 16$, meaning there are no more than 65,536 different possible identifiers; which is far lower than the number of 1-day active users of desktop Chrome.

With our IP blindness principles, we have no way of distinguishing users that share a b when they call Join(b, t, s). When we count the cardinality of a set on the server, each distinct b will be counted only once, even if multiple users join the same set with identical values of b. This means that our cardinality calculation may undercount and that we can only count up to a limit of $2^j$. We expect all the k-anonymity thresholds we need to enforce will have $k\leq 2^j$. Chrome code that calls Join will be part of the Chromium open source codebase, and can enforce on the user's device that b is not longer than $j$ bits.

Abuse and invalid traffic

Abusive or malicious writes to Join can undermine the k-anonymity thresholds, misleading browsers into thinking they are members of a k-anonymous interest group when, in fact, the other members of the group are not real. It is important that we protect Join against malicious write traffic, and, to maintain privacy, that we do this in an anonymous way.

To protect this endpoint we will use Trust Tokens. Every write to Join will require a one-time-use Trust Token be attached to the request, and tokens will be bound to a specific low-entropy identifier, b. Each browser will be issued tokens with its assigned b, and it can spend those tokens as it wishes to make Join calls to the server.

We will operate a Trust Token issuer specific to this server and these tokens; we'll call this issuer Sign. In our current proposal, Sign will require, at least initially for desktop Chrome, that the user be signed-in to Chrome with a Google Account. Requiring sign-in lets us rate limit the number of tokens issued to a given user, assign each user a stable value for b, and prevent naive abuse of Join by anonymous users. Even though the user is signed-in, and Google Account credentials are used to issue Trust Tokens, the Trust Tokens received by Join cannot be linked back to the Google Account they were issued to. The Trust Token issuer can learn which users join a large number of interest groups. To guard against this, we're exploring options that include having the client request tokens at a constant rate and discard unused tokens.

Query is a read-only API, so it doesn't have the same abuse concerns as Join. We won't require Trust Tokens, or a Google Account, for a browser to call Query.

Differential privacy of public data

We're working to ensure that the data this server exposes through Query, that is the set of k-anonymous hashes, meets a quantifiable level of differential privacy and does not reveal information about what sets a non-malicious user may have joined. In addition, we want to bound false negatives and false positives within the set of k-anonymous hashes because false positives pose a privacy risk and false negatives limit the utility of FLEDGE.

Privacy enhancements we are exploring

To build on the privacy protections we are implementing today, there are a few different areas we're researching that could offer even better privacy to Chrome users. None of these approaches are ready for production today, but we commit to continue investing in research, prototyping, and testing in these areas.

Private information retrieval

The Query endpoint to this server receives sensitive information in the form of set hashes that the browser wants to check the k-anonymous property of. If Query requests are batched, with multiple set hashes in a single request, then that request contains cross-site data known to be from a single browser. Private information retrieval (PIR) is a technique that could allow the server to process Query requests, either batched or unbatched, without the server knowing which set hashes are being queried. We're exploring both single-party PIR, which currently has a lot of network and computational overhead, and multi-party PIR, which has less overhead but the additional complexity of operating two non-colluding servers with consistent copies of the dataset.

Anonymous Counting Tokens to replace low-entropy identifiers

We're working on researching and testing a privacy improvement to low-entropy browser identifiers. The $j$-bit identifier, b, is constant for a given browser, which allows some inferences to be made by the Join server, in spite of collisions between users. To improve the privacy of this scheme and increase the accuracy of our cardinality calculations, while maintaining our ability to prevent abusive traffic, we're developing a new token scheme that we're calling Anonymous Counting Tokens.

Anonymous Counting Tokens will allow a token issuer (the Sign server) to issue tokens to a client that are associated with a value that the client provides. The issuer will be able to ensure that the client can obtain only a single token for a given value without knowing the value the client is requesting a token for, i.e. the tokens are blind.

The Sign server will use Anonymous Counting Tokens to issue tokens signed for a single set hash, in contrast to the low-entropy identifier design where tokens are signed for a value derived from a first party identity the Sign server is given by the client. Each user will be able to request only a single token for each set hash and the Join server will verify, in what will become a Join(t, s) call, that the token was issued for the set hash s that the client is attempting to join. It may be surprising that this behavior is even possible without the token issuer learning anything about the set hashes joined by a particular user. A forthcoming paper will describe the cryptographic technique that makes this possible.

To support validity periods for the Anonymous Counting Tokens where each client can get one token per value per period, we will have a registration mechanism which will enable clients to refresh their parameters per TTL period and obtain a fresh token for each period.

Similar to low-entropy identifiers, Sign will require with Anonymous Counting Tokens that the server has a first party identity with the user (a Google Account or other trusted identity provider) for the server to issue tokens.

Trusted execution environments

We are exploring approaches to transition components of this server to run in trusted execution environments (TEEs) with open source code. TEEs implemented by chip manufacturers and cloud providers could allow the browser to verify the server code executing matches the open source project and offer encryption of the server's RAM while in-use, protecting some of the server's data from insider access. Combined with thoughtful key management, TEEs could offer an opportunity to increase the privacy of other server functions like counting cardinalities and persisting state.

Device attestations

Our initial reliance on Google Accounts to issue Trust Tokens that authenticate writes is necessary partly because we don't have other methods of authenticating a Chrome browser to a server. Some platforms, like Android with SafetyNet attestations, can assure the server that a request originates from a legitimate device and client application. Desktop Chrome, however, runs on many different platforms with varying degrees of platform-level security. In the future we hope to develop methods of attesting to our server that requests are from a legitimate instance of desktop Chrome without necessarily requiring the user to be signed-in to their Google Account.

The impact of our privacy decisions on advertisers

The choices we are making here to protect user privacy impact the behavior of the server, and the behavior of FLEDGE within Chrome.

By using low-entropy identifiers that intentionally collide among browsers we by design undercount the cardinality of a given set hash. This has the potential to require more than k browsers to join a set before it is marked k-anonymous. Even if $n>k$ browsers join a given set, if all those browsers by chance have the same b, then the set won't be marked k-anonymous. We will mitigate this by choosing a uniform distribution of b identifiers across browsers. Over time we hope to migrate from low-entropy identifiers to the anonymous token scheme, which does not undercount cardinality in the same way.

To ensure differential privacy of the output data from this server, i.e. the set of k-anonymous set hashes, we must limit how frequently we update the data. We must also add noise to the membership of a given set hash in the output. These restrictions mean that an interest group will not be marked k-anonymous immediately after the $k^{\rm th}$ user joins; there may be some delay due to added noise. This noise is expected to be larger than the undercounting error from low-entropy identifiers. Noise is required to provide a differentially-private output data set, so we don't anticipate changing this behavior.

To prevent abuse of the Join API, we are only allowing writes from users that are signed-in to Chrome. Developers can still use FLEDGE for users that aren't signed-in to Google within the Chrome browser, including making calls to Query to check k-anonymity thresholds. However, we recognize that the addition of those users to interest groups won't contribute to counts that make a set hash k-anonymous. We recognize that this may bias the system against interest groups that might be more popular with signed-out users. Over time we expect to reduce, or otherwise eliminate, this potential bias by adding support for device attestation or other approaches to device-level trust.

The Trust Token issuer, Sign, will enforce limits on token issuance to each Google Account. Tokens are one-time-use, so these limits will restrict the number of Join calls a browser can make in a given period of time. This doesn't necessarily limit the number of interest groups the browser can join locally, only the number it will be considered a part of when computing k-anonymity on the server. This limit will be per-user, and the browser can make decisions about which interest groups to spend its tokens on and make Join calls for.