-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Matchkey Specification: UAV as fallback MKP #34
Comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Matchkey Specification: UAV as fallback MKP
Summary:
Our original design for setting matchkeys allowed for two ways of setting the matchkey 1) where a MKP calls the
set_match_key()
API and 2) when no MKP has set a key the device would generate its own matchkey the first timeget_encrypted_matchkey()
is called. The second case leads to the need for much larger size matchkeys to avoid collisions (think birthday paradox). Here we propose an additional way to set the matchkey which allows the User Agent Vendor to be a fallback matchkey provider for its User Agents.The User Agent Vendor would supply each of its User Agents with a unique matchkey. These matchkeys would only enable same-user-agent attributions, thus enabling the same functionality as having the UA set its own matchkey randomly. But what we achieve here is a significant optimization in terms of the size required for these matchkeys. Since they can be set uniquely as the UAV we don’t need to increase the size to bound the probability of spurious collisions. Since sorting in the MPC attribution scales with the number of bits in the matchkey and sort is still the most expensive stage in MPC, this translates to a large optimization for the MPC.
Setting the matchkey
We propose three ways of setting the matchkey: 1) by a Matchkey Provider (MKP), 2) by the User Agent Vendor (UAV), or 3) by the UA itself. We also propose that the matchkey itself encodes how it was set so there is no overlap between matchkeys set from the different methods.
For the structure of the matchkey
The first bit is called
set_by_MKP
and is a 1 if the matchkey has been set by that matchkey provider and 0 otherwise. When a MKP sets a matchkey they will set the 44 bitmk_from_MKP
part to be unique for each of their users.In the case a MKP has not set their matchkey on a UA and the get_encrypted_matchkey() is called, the API will still return a matchkey but it will be one unique to that user agent. For setting such a matchkey we propose that the UAV can act as a fallback MKP supplying each of its user agents with a unique matchkey. The second bit of the matchkey is called
set_by_UAV
and is a 1 if the UAV has set the matchkey and 0 otherwise. In the case the UAV has set the matchkey, they will use the next 8 bits to set theirUAV_id
. The reason for this is that we don’t want collisions coming from different UAVs and don’t want to rely on the User Agent String separate from these IPA APIs to determine which UA a user is using. With the remaining 34 bits the UAV will provide each of its UA with a unique key. Allocating 8 bits for theUAV_id
would help to future proof the design and if not that many are needed immediately then browsers with lots of users could take multiple entries (from which they can allocate as though they were additional bits), which would allow small and large browsers to coexist.In the case that neither a MKP nor a UAV have set a matchkey the first 2 bits will be zeros and the UA will generate and store its own matchkey the first time
get_encrypted_matchkey()
is called. Since UAs would be doing this independently of each other there will be the possibility of collisions.Hiding user counts
When the MKP or the UAV sets a matchkey uniquely for each of its users, they likely do not want the matchkeys to reveal anything about their total user counts (in case there is a way with device access to learn what matchkeys have been set on your device). To prevent this a MKP/UAV could use a keyed pseudo-random permutation (PRP) to map from a user index [0, number_of_users) to a still small space [0,2^34). Since this is a permutation, it will not introduce any collisions but will prevent a few matchkeys from telling anything about how many total users a MKP has. This may be one such available primitive to use for the PRP.
Collisions for UA generated matchkeys
As mentioned above, for matchkeys that are generated independently on devices when no MKP or UAV has set the matchkey there is the possibility of collisions. We would like to see how many such matchkeys a query can contain before there start to be spurious collisions. The expected number of matchekeys that are the same as some other in the query is$E=n(1-(1-1/N)^{n-1})$ where n is query size and N is the number of matchkeys.
Thanks @martinthomson for lots of good discussions on this and for suggesting how to hide the user counts and @benjaminsavage, @richajaindce and @akoshelev for reviewing.
The text was updated successfully, but these errors were encountered: