Aggregation Service load testing framework

We welcome your feedback on this document as we prepare to add it to our public guidance repository.

We encourage ad techs to run load testing on 100% of production traffic:

  1. Ad techs should access Conversion Attribution measurement using the Attribution Reporting API as their reporting use-cases.
  2. Ad tech should make design decisions while minimizing noise (reference: design decisions modeled)
  3. While testing, ad techs should keep track of the number of jobs they run per day (e.g. per advertiser jobs), an estimated distribution of conversion event volume and the number of aggregate keys as input per processing job (refer tooutput_domain_blob_prefix job parameter in Aggregation Service API documentation), and estimated average conversion events per input report.
  4. For testing, ad techs should lookup the recommended instance type from the sizing guidance table based on their expected job size (i.e., report volume, domain size) and size their deployed aggregation service accordingly. Reference: Sizing guidance for Aggregated Service on AWS
  5. Ad techs should execute aggregation jobs for load tests.

Goals

This guidance is specific to aggregate conversion attribution measurement and will include key setup and configuration instructions intended for use by ad techs to:

  • Estimate load expectations for aggregate conversion attribution measurement.
  • Optimize their key setup and configuration for performance and noise based on dimensions and goals they intended to measure, and the size and segmentation of their advertisers.

Prerequisite

This guide is intended for an ad tech audience. Before going through the following steps, you should review our documentation on working with noise, summary report design decisions, and experiment with noise lab for an optimal configuration.

Steps

1. Initial aggregation key setup strategy

Determine how many different key structures (i.e. set of dimensions) you need based on your business type and objectives. Note that optimizing your key structure could help reduce the noise in reports.

The number of advertisers you have
For example, say you have 1,000 advertisers.

The similarities between your advertisers
Similarities should be assessed based on volume of conversions, relative conversion values, and general coverage of advertiser characteristics. The more similar you are able to group them, the more finely tuned your results would be (due to less variance in output values), and therefore, the less the impact of noise. Refer to advanced key management for more details. For example, an ad tech can segment its advertisers by industry, spend, and conversion volume as follows:

  • Industry (for example: Insurance, Jewelry, Growth Retail)
  • Spend (for example: <$50,000/quarter, $50-$150,000/quarter, $150,000-$250,000/quarter)
  • Conversion volume (Low, Medium, High)

The number of aggregate key structures to be created
For example, 27 (3x3x3) : 3 industries, 3 spend types, and 3 groupings for conversion values.

2. Identify aggregation key dimensions

Next, identify the important dimensions you would want to track for both impression and conversions to estimate the number of source and trigger side keys.

For each aggregation key structure, the important dimensions that you need to track for impressions will help you determine the number of source side keys. Dimensions will depend on the advertiser type from #1 above (i.e., industry, spend, conversion). The following examples help explain dimensions:

  • Key Structure 1: (Industry = insurance, spend = <50,000, conversion volume = low)

    • A: 4 dimensions: Campaign (e.g.: 50 possibilities), Ad group (e.g.: 20 possibilities), Device type (e.g.: 5 possibilities), Geo (e.g.: 50 possibilities)
      1. Possible dimensional combinations = 50 x 20 x 5 x 50 = 250,000. This represents the number of possible dimensional combinations for source side keys for key structure 1.
      2. Need to reserve 18 bits (18 bits = 262,144 possible combinations)
  • Key Structure 2: (Industry = insurance, spend = <50,000, conversion volume = medium)

    • A: 4 dimensions: Campaign (e.g.: 30 possibilities), Ad group (e.g.: 80 possibilities), Ad type (e.g.: 3 possibilities), Geo (e.g.: 50 possibilities).
      1. Possible dimensional combinations = 30 x 80 x 3 x 50 = 360,000. This represents the number of possible dimensional combinations or source side keys for key structure 2.
      2. Need to reserve 19 bits (19 bits) = 524,288 possible combinations)
  • Key Structure 3: Repeat (similarly plan for all Key Structures you have)

For each aggregation key structure, the important dimensions that you need to track for conversions will help you determine the trigger side keys. For example:

  • Key Structure 1: (Industry = insurance, spend = <50,000, conversion volume = low)

    • A: 2 dimensions: Product category (e.g.: 100 possibilities), Conversion type (e.g.: 5 possibilities)
      1. Possible dimensional combinations = 100 x 5 = 500
      2. Need to reserve 9 bits (9 bits = 512 possible combinations)
  • Key Structure 2: (Industry = insurance, spend = <50,000, conversion volume = medium)

    • A: 3 dimensions: Product category (e.g.: 50 possibilities), Product type (10 possibilities), Conversion type (3 possibilities)
      1. Possible dimensional combinations = 50 x 10 x 3 = 1,500
      2. Need to reserve 11 bits (11 bits = 2,048 possible combinations)
  • Key Structure 3: Repeat (similarly plan for all Key Structures you have)

Estimates for Aggregate Keys

  • Key Structure 1: 250,000 impression keys x 500 conversion keys = 125,000,000 keys
  • Key Structure 2: 360,000 impression keys x 1.5k conversion keys = 540,000,000 keys
  • Key Structure 3: (similarly plan for all Key Structures you have)
  • Repeat for each Key Structure
  • Max Aggregate Keys = 540,000,000 keys (across all the key structures). Need to reserve 30 bits (30 bits = 1.07B possible combinations)

Anticipated conversion volume

For each aggregation key structure, the anticipated volume can be explained using the following examples:

  • Key Structure 1: (Industry = insurance, spend = <50,000, conversion volume = low)
    • A: Anticipate that Key Structure 1 will constitute about $500,000 worth of advertiser spend over the next quarter for an average of $8 CPM price. Anticipate that this will result in 62,500,000 impressions that need to be registered.
    • Anticipate that the average impression to conversion rate that Key Structure 1 will constitute over the next quarter is 0.08%, resulting in 50,000 attributed conversions that need to be captured. For each conversion, measure the purchase value and purchase count.
  • Key Structure 2: (Industry = insurance, spend = <50,000, conversion volume = medium)
    • A: Anticipate that Key 2 will constitute about $800,000 worth of spend over the next quarter for an average of $10 CPM price. Anticipate that this will result in 80,000,000 impressions that need to be registered.
    • Anticipate that the average impression to conversion rate that Key 2 will constitute over the next quarter is 0.03125%, resulting in 25,000 attributed conversions that need to be captured. For each conversion, measure the purchase value and purchase count.
  • Repeat for each Key Structure

Reporting delivery and batching frequency (batch per advertiser)**

For each aggregation key structure, you will need conversion reports delivered on a recurring basis. We recommend that ad techs batch by advertiser (for cleaner separation of data per report and more efficient aggregation) and use the report's shared_info.scheduled_report_time field for batching.

  • A: Hourly
  • B: Daily
  • C: Weekly

Notes

  • For batching by advertiser, verify SLAs with advertisers.
  • More frequent batching will include higher noise per batch. (Refer: Decision: Batch frequency).

  • To avoid errors due to incorrect batching, ensure batches use the scheduled_report_time field, not report arrival time. For example: if you batch every hour, your batch for 11am should only include reports with scheduled_report_time between 10 am and 11am, and not reports that arrived between 10am and 11am with a different scheduled_report_time (e.g.: 9am).

Estimates for Report Volume

  • Key Structure 1: 50,000 attributed conversions / 2160 (hourly reporting, hours in a quarter) = 24 summary reports per hour per advertiser (24 x 1000 advertisers = 24K summary reports)
  • Key Structure 2: 25,000 attributed conversions / 2160 (hourly reporting, hours in a quarter) = 12 summary reports per hour per advertiser (12 x 1000 advertisers = 12K summary reports)
  • Key Structure 3: Repeat
  • Total number of summary reports per hour = 24 summary reports for key structure 1 + 12 summary reports for key structure 2 + ... = ... per hour per advertiser

Feedback summary

Understanding the following estimates from ad techs help us plan features and improvements to support the scale required by ad techs. We suggest that you please share the following below with us. See our sizing guidance for Aggregation Service on AWS for more information:

  • Max input domain keys (keys to aggregate for) per aggregation service job
  • Max input reports volume per job (attributed conversions)
  • Estimated contributions per report (keys/value pairs in a report)
  • Estimated distribution of attributed conversions per job
  • Estimated distribution of domain keys in a job
  • Estimated number of jobs per hour/day/week