Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking FLEDGE participants' expectations/experience on the usage of the FLEDGE trusted server #290

Closed
peiwenhu opened this issue Apr 18, 2022 · 12 comments
Labels
Looking for feedback Design issues looking for partner feedback

Comments

@peiwenhu
Copy link
Contributor

Hi FLEDGE participants, this is @peiwenhu who is working on the FLEDGE trusted server system which will eventually replace the current BYOS model. We would like to understand more about the trusted server here to make sure the product considers your needs. If you have input outside these starter questions listed below we are also interested.

I: Dataset

  1. [If you run the BYOS mode server] How often do you update the data served by your key value server in the current BYOS mode? (Every minute, hour, day, etc?)
  2. How often would you prefer to update your data served by your key value server eventually?
  3. [If you run the BYOS mode server] How do you perform data updates to your key value server in the current BYOS mode? (batching, streaming, etc?)
  4. How would you prefer to perform data updates to your key value server eventually?
  5. For data updates, roughly how much percentage of the dataset is involved each time? (average, 90 percentile, 100 percentile)
  6. How much data staleness can you tolerate (from when you start to update the key/value dataset to when they are available to user devices)? What is the impact if you must accommodate data staleness > 5min? 10min?
  7. How large do you expect your dataset to be?
  8. Would you be able to break your data down into different shards served by different trustedBiddingSignalsUrl/trustedScoringSignalsUrl completely separately?

II: Performance

  1. How much request-per-second do you expect your system to perform at?
  2. Have you run into any performance issues in the current BYOS model or would you like to call out any particular areas worth paying attention to?

III: Interface

  1. Chrome intends to provide the specific server that key/value data is queried from. We’re also designing peripherals that will allow other operations such as updates, versioning of the data and techniques to support high workload. Would you prefer to build the infrastructure separately or use the Chrome reference implementation of that?
  2. [If you run the BYOS mode server] What debugging scenarios have you run into in the current BYOS mode?
  3. What debugging functionalities would you like to have?
  4. For the “data version header”: What thoughts would you have if the server decides which version of data to return, and the data of a specific version can be looked up internally but the client software cannot specify it in requests. The server may also decide to not return the version.
  5. [If you run the BYOS mode server] Do you run your server with any known difference from the server outlined in the FLEDGE proposal?
@jonasz
Copy link
Contributor

jonasz commented Apr 29, 2022

Hi @peiwenhu, I was wondering, what is roughly the timeline you have in mind for discussion, implementation, and deployment of the trusted server? We may have some input for your questions, but we'd only be able to join the disussion sometime in future, as we're fully booked working on the Origin Trial now.

@p-j-l
Copy link
Contributor

p-j-l commented Apr 29, 2022

@jonasz, no rush - the timeline we're thinking of here for design and implementation is well after the Origin Trials that's in progress right now.

@arouzaud
Copy link

arouzaud commented May 5, 2022

Hello,
On Criteo side, we have surely various inputs to discuss with you regarding the trusted server, although we first want to assess the capabilities of FLEDGE with a BYOS system during the Origin Trial. We will take some time to elaborate about it on our end, and come back to you in the future (potentially after the OT).

@p-j-l
Copy link
Contributor

p-j-l commented May 6, 2022

We've tried to refine the questions a bit to make them easier to answer to answer, let us know if this makes it easier to give some quick feedback:

  1. Is a propagation latency of 5 minutes from data update to data serving acceptable?
  2. Do you expect your key-value set to be larger than 400GB?
  3. How frequently do you want to make updates to your key-value pairs?
  4. When doing a data update, what percentage of the total data-set is updated?
  5. How would you like to update your key-value pairs (database mutations, http based api, file drop)
  6. Is the “data version header” important to you? How do you plan on using it?

@jianch2022
Copy link

re1. 5 minutes latency for data update is good enough for data except budget data.
re2. I think 400G is good enough for the KV payload.
re3. For now, the update on our KeyVal can be hourly. It would be better if we can update every 15 or 10 minutes.
re4. Normally the percentage of the total data updated is small. There could be edge cases such as advertisers making big changes at the end of month/quarter.
re5. I think this can be flexible. Either option might be OK.
re6. data version might be useful when we want to rollback data in some cases. But I guess we can manage it in the backend.

@appascoe
Copy link
Collaborator

I can take a first-pass at these answers, but we'll need quite a bit of time to answer them more definitively:

  1. A latency of 5 minutes sounds a bit high to me. For example, if a campaign is misconfigured, it could very, very quickly overspend its budget, and we'd want to shut it off ASAP. I'd expect something closer to 1s. This may not be doable for bulk updates, but maybe single query updates could be prioritized?
  2. 400GB is probably fine.
  3. This is pretty dependent on the situation, as outlined in 1. I'd expect that most of our KV pairs would be updated every five minutes, typically.
  4. As we'd use the KV store for pacing, I'd expect that almost all of our data would be updated every five minutes.
  5. We'd almost certainly prefer an HTTP API, and would express our usage of the KV store in QPS.
  6. Yes, I believe the data version header would be useful. We still need to scope out a bunch of ways that we could use the KV store, but, for example, I could see passing back some model parameters that would only be compatible with certain models in the bidding function. Alternatively, we could just encode that as part of the JSON value.

@p-j-l
Copy link
Contributor

p-j-l commented May 11, 2022

To add a little more context here: our aim right now is to work out what features to build into a very first version of a reference implementation so that we can start testing with anyone who'd like to.

I fully expect the requirements will change as we get into this and we're open to adding features as we go.

@suprajasekhar
Copy link

Here are some expectations and estimates for trusted auction server from Google Ad Manager.

  • Is a propagation latency of 5 minutes from data update to data serving acceptable?

Yes, up to 15 minutes data propagation delay is acceptable.

  • Do you expect your key-value set to be larger than 400GB?

Since we are in the early phase of FLEDGE origin trials, it is fuzzy how the key-value set will grow. It largely depends on how buyers decide to structure their creatives in FLEDGE. For example, with product-level TURTLEDOVE (“Ads composed of multiple pieces”) and considering creatives immutable, the size of the creative population may be significantly different compared to the today’s state. Hence the below numbers are only a loose estimate, and we hope key-value server design will offer sufficient scalability for the data size to grow over time.

  Median p75 p99
Year 1 135 GB 275 GB 520 GB
Year 3 235 GB 750 GB 2 TB
  • How frequently do you want to make updates to your key-value pairs?

We would like to do both incremental and batch updates to the key-value pairs. The incremental updates will be continuous and the batch updates will happen once every 2 hours.

  • When doing a data update, what percentage of the total data-set is updated?

During incremental updates, we will update ~3.5K keys per second and occasionally (p99 estimate) we will update close to 100% of the data set. This could happen either due to new policies or changes to models.

  • How would you like to update your key-value pairs (database mutations, http based api, file drop)

Http based API will be the most convenient provided it can support both incremental update QPS (mentioned above) and batch mode which would require support for long and reliable file uploads to update the entire dataset.

  • Is the “data version header” important to you? How do you plan on using it?

Data version header will be helpful when rolling out changes, logging and debugging, though we don’t plan to use it immediately.

@chinyetlin
Copy link

This is a revised response with slightly more details to a previous comment to reflect the use cases of Google Buyside platform:

  1. Is a propagation latency of 5 minutes from data update to data serving acceptable?

    • For ad metadata from the advertisers and the budget data, 5 minutes is acceptable.
    • For budget data, we prefer to have a lower latency, i.e. within 30 seconds for 99.9 percentile and within 1 second for 50 percentile.
  2. Do you expect your key-value set to be larger than 400GB?

    • Yes. 400GB is probably not enough to contain all our key-value sets.
    • Our dataset can be split into multiple key-spaces (e.g. smaller virtual datasets) which should each remain under 400 GB for the near term future. However, our assumption is that we are able to execute some customized logic to join multiple data sets to produce the final output. If this is not possible and we need to precompute all possible combinations of the data joins, the size will greatly exceed this limit.
  3. How frequently do you want to make updates to your key-value pairs?

    • For ad metadata, we would like to support both incremental and batch updates to the key-value pairs. The batch update frequency is every 2 hours, while the incremental update is continuous, with the order of magnitude of 1K QPS.
    • Budget data update is in the order of magnitude of 45K QPS.
    • These numbers of incremental updates per second are based on the current remarketing ads and can be subject to future organic growth.
  4. When doing a data update, what percentage of the total data-set is updated?

    • It is preferable to load all key-value pairs in a single batch in two scenarios: 1/ server starts up 2/ server needs to roll back to a previous “good” version to mitigate data corruption. Otherwise, it is more common to load the data updates for a small set of keys triggered by changes made by our customers.
    • For the batch update, if we take into account the budget data, close to 100% of the data set needs to be updated.
    • For the incremental update
      • The ad metadata changes is about 3% per minute.
      • The budget data changes is about 62% per minute.
  5. How would you like to update your key-value pairs (database mutations, http based api, file drop)

    • A combination of API endpoint and file drop. Http based api is probably more useful for the incremental update if it can support up to 1K QPS. File drop might be more useful for the batch update.
    • The server can load from the batch update when it starts up, then consumes the subsequent incremental updates so that it can catch up with latest changes. We might want certain data freshness constraints, i.e. the server can only serve the incoming requests after it catches up with the more recent incremental updates. We might need to define some data staleness signals to enforce this kind of constraints.
  6. Is the “data version header” important to you? How do you plan on using it?

    • Data version header will be very useful for reconstructing auction/bidding signals when logging the winning event. We plan to use these signals for model training.
    • It can be much more useful if we are able to assign a data version for each virtual dataset. With a single data version, we need a more complicated design to cover multiple datasets with different update patterns, e.g. budget data might be updated per minute while other data could be updated only once per day.

@maciejkowalczyk
Copy link

maciejkowalczyk commented Aug 25, 2022

Hi,
Here are answers for RTB House's current idea of Trusted Bidding Server usage replacing the BYOS model:

Is a propagation latency of 5 minutes from data update to data serving acceptable?

Yes, it is

Do you expect your key-value set to be larger than 400GB?

We should be able to shard our KV data in order to fit this limit but would prefer to work with 1TB instances

How frequently do you want to make updates to your key-value pairs?

We expect to have a constant stream of up to 10k-20k updates per second and additionally two total rewrite batches per day

When doing a data update, what percentage of the total data-set is updated?

The incremental updates would just change single keys each while the batches would update almost all, 1G keys

How would you like to update your key-value pairs (database mutations, http based api, file drop)

I'm not sure what you mean by "database mutations".
We could send files with batches. However, for incremental updates, HTTP API might be preferable as it would hopefully result in lower latency

Is the “data version header” important to you? How do you plan on using it?

We currently don't plan using the Data-Version header

Do you think these numbers are feasible?

@peiwenhu
Copy link
Contributor Author

Hi, Here are answers for RTB House's current idea of Trusted Bidding Server usage replacing the BYOS model:

Is a propagation latency of 5 minutes from data update to data serving acceptable?

Yes, it is

Do you expect your key-value set to be larger than 400GB?

We should be able to shard our KV data in order to fit this limit but would prefer to work with 1TB instances

How frequently do you want to make updates to your key-value pairs?

We expect to have a constant stream of up to 10k-20k updates per second and additionally two total rewrite batches per day

When doing a data update, what percentage of the total data-set is updated?

The incremental updates would just change single keys each while the batches would update almost all, 1G keys

How would you like to update your key-value pairs (database mutations, http based api, file drop)

I'm not sure what you mean by "database mutations". We could send files with batches. However, for incremental updates, HTTP API might be preferable as it would hopefully result in lower latency

Is the “data version header” important to you? How do you plan on using it?

We currently don't plan using the Data-Version header

Do you think these numbers are feasible?

@maciejkowalczyk thanks. the numbers are reasonable. Today Privacy Sandbox has an initial version of the server source code available at its own Github repo. The functionality coverage will increase as the development goes on but the current implementation is ready for basic trial. We recommend you to do so and provide any feedback there.

@JensenPaul JensenPaul added the Looking for feedback Design issues looking for partner feedback label Aug 29, 2022
@lx3-g
Copy link

lx3-g commented Apr 11, 2023

The realtime data update functionality has been implemented. Please check out the docs below for more info:
Loading data
Realtime updates capabilities
Please let us know if you have any feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Looking for feedback Design issues looking for partner feedback
Projects
None yet
Development

No branches or pull requests

12 participants