Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matchkey Specification: UAV as fallback MKP #34

Open
bmcase opened this issue Feb 22, 2023 · 1 comment
Open

Matchkey Specification: UAV as fallback MKP #34

bmcase opened this issue Feb 22, 2023 · 1 comment

Comments

@bmcase
Copy link
Collaborator

bmcase commented Feb 22, 2023

Matchkey Specification: UAV as fallback MKP

Summary:

  • We propose a way to optimize the matchkey size by letting the User Agent Vendor (UAV) be a fallback matchkey provider for its User Agents (UAs) instead of having the UA generate random on-device matchkeys when none has been set by a Matchkey Provider (MKP).

Our original design for setting matchkeys allowed for two ways of setting the matchkey 1) where a MKP calls the set_match_key() API and 2) when no MKP has set a key the device would generate its own matchkey the first time get_encrypted_matchkey() is called. The second case leads to the need for much larger size matchkeys to avoid collisions (think birthday paradox). Here we propose an additional way to set the matchkey which allows the User Agent Vendor to be a fallback matchkey provider for its User Agents.

The User Agent Vendor would supply each of its User Agents with a unique matchkey. These matchkeys would only enable same-user-agent attributions, thus enabling the same functionality as having the UA set its own matchkey randomly. But what we achieve here is a significant optimization in terms of the size required for these matchkeys. Since they can be set uniquely as the UAV we don’t need to increase the size to bound the probability of spurious collisions. Since sorting in the MPC attribution scales with the number of bits in the matchkey and sort is still the most expensive stage in MPC, this translates to a large optimization for the MPC.

Setting the matchkey

We propose three ways of setting the matchkey: 1) by a Matchkey Provider (MKP), 2) by the User Agent Vendor (UAV), or 3) by the UA itself. We also propose that the matchkey itself encodes how it was set so there is no overlap between matchkeys set from the different methods.

For the structure of the matchkey

  1. The first bit is called set_by_MKP and is a 1 if the matchkey has been set by that matchkey provider and 0 otherwise. When a MKP sets a matchkey they will set the 44 bit mk_from_MKP part to be unique for each of their users.

  2. In the case a MKP has not set their matchkey on a UA and the get_encrypted_matchkey() is called, the API will still return a matchkey but it will be one unique to that user agent. For setting such a matchkey we propose that the UAV can act as a fallback MKP supplying each of its user agents with a unique matchkey. The second bit of the matchkey is called set_by_UAV and is a 1 if the UAV has set the matchkey and 0 otherwise. In the case the UAV has set the matchkey, they will use the next 8 bits to set their UAV_id. The reason for this is that we don’t want collisions coming from different UAVs and don’t want to rely on the User Agent String separate from these IPA APIs to determine which UA a user is using. With the remaining 34 bits the UAV will provide each of its UA with a unique key. Allocating 8 bits for the UAV_id would help to future proof the design and if not that many are needed immediately then browsers with lots of users could take multiple entries (from which they can allocate as though they were additional bits), which would allow small and large browsers to coexist.

  3. In the case that neither a MKP nor a UAV have set a matchkey the first 2 bits will be zeros and the UA will generate and store its own matchkey the first time get_encrypted_matchkey() is called. Since UAs would be doing this independently of each other there will be the possibility of collisions.

IPA Matchkey Specification (1)

Hiding user counts

When the MKP or the UAV sets a matchkey uniquely for each of its users, they likely do not want the matchkeys to reveal anything about their total user counts (in case there is a way with device access to learn what matchkeys have been set on your device). To prevent this a MKP/UAV could use a keyed pseudo-random permutation (PRP) to map from a user index [0, number_of_users) to a still small space [0,2^34). Since this is a permutation, it will not introduce any collisions but will prevent a few matchkeys from telling anything about how many total users a MKP has. This may be one such available primitive to use for the PRP.

Collisions for UA generated matchkeys

As mentioned above, for matchkeys that are generated independently on devices when no MKP or UAV has set the matchkey there is the possibility of collisions. We would like to see how many such matchkeys a query can contain before there start to be spurious collisions. The expected number of matchekeys that are the same as some other in the query is $E=n(1-(1-1/N)^{n-1})$ where n is query size and N is the number of matchkeys.

  • Using this formula, if we can tolerate an expected number of collisions < 10, then with 43 bits for the matchkey we need to have fewer than ~9.3M randomly generated matchkeys in a query.
  • If we can tolerate an expected number of collisions < 1, then with 43 bits for the matchkey we need to have fewer than ~3M randomly generated matchkeys in a query.

Thanks @martinthomson for lots of good discussions on this and for suggesting how to hide the user counts and @benjaminsavage, @richajaindce and @akoshelev for reviewing.

@bmcase
Copy link
Collaborator Author

bmcase commented Feb 23, 2023

cc @npdoty who asked a question related to matchkey collisions at the last PATCG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant