[WPE][browserperfdash-benchmark] Allow to run run-benchmark plans subtest-by-subtest #47255

clopez · 2025-06-26T19:25:20Z

`00cc05f`

[WPE][browserperfdash-benchmark] Allow to run run-benchmark plans subtest-by-subtest
https://bugs.webkit.org/show_bug.cgi?id=295050

Reviewed by Nikolas Zimmermann.

On the RPi4 32-bit bots, JetStream2 frequently crashes or times out before completing,
which prevents any results from being reported because the current runner only uploads
data if all subtests finish successfully.

We have experimented with running selected subtest sets, but the flakiness remains:
some subtests pass while others fail inconsistently, which makes very difficult to
select a working set of subtests.

This patch implements on browserperfdash-benchmark the ability to run a given benchmark
plan subtest-by-subtest, so it only runs a subtest at a time.
If a subtest fails, the runner skips it and proceeds to the next. Partial results from the
passing subtests are then uploaded to the dashboard, improving visibility into progress
and regressions even when the full benchmark cannot complete.

To differenciate the standard benchmark plan between the one that is run subtest-by-subtest
the string `-split-subtests` is appended to the end of the benchmark plan name. So, to
to run the plan jetstream2 subtest-by-subtest it should be specified to run a virtual
benchmark plan named `jetstream2-split-subtests`.
Since the benchmark plan name is visible on the dashboard, is also possible to differentiate
there from the complete jetstream2 vs the one that was run subtest-by-subtest

All the benchmark plans that support subtests can be specified this way (not only JetStream2),
and the list of those is visible when the flag `--list-plans` is passed to the runner.

* Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:
(BrowserPerfDashRunner.__init__):
(BrowserPerfDashRunner._parse_config_file):
(BrowserPerfDashRunner._get_plan_version_hash):
(BrowserPerfDashRunner._get_benchmark_runner_split_subtess_plans):
(BrowserPerfDashRunner._run_benchmark_runner_plan_split_subtests):
(BrowserPerfDashRunner._run_plan):
(BrowserPerfDashRunner.run):

Canonical link: https://commits.webkit.org/296877@main

3b430de

Misc	iOS, visionOS, tvOS & watchOS	macOS	Linux	Windows
✅ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	✅ 🛠 wpe	⏳ 🛠 win
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	⏳ 🧪 wpe-wk2	⏳ 🧪 win-tests
✅ 🧪 webkitperl	~~🧪 ios-wk2~~	~~🧪 api-mac~~	~~🧪 api-wpe~~
✅ 🧪 webkitpy	⏳ 🧪 ios-wk2-wpt	~~🧪 mac-wk1~~	~~🛠 wpe-cairo~~
⏳ 🛠 🧪 jsc	~~🧪 api-ios~~	~~🧪 mac-wk2~~	~~🛠 gtk~~
	✅ 🛠 vision	⏳ 🧪 mac-AS-debug-wk2	~~🧪 gtk-wk2~~
	✅ 🛠 vision-sim	~~🧪 mac-wk2-stress~~	~~🧪 api-gtk~~
	✅ 🧪 vision-wk2	⏳ 🧪 mac-intel-wk2	~~🛠 playstation~~
✅ 🛠 🧪 unsafe-merge	✅ 🛠 tv
	~~🛠 tv-sim~~
	~~🛠 watch~~
	~~🛠 watch-sim~~

webkit-early-warning-system · 2025-06-26T19:25:28Z

EWS run on previous version of this PR (hash 0802621)

Misc	iOS, visionOS, tvOS & watchOS	macOS	Linux	Windows
✅ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	✅ 🛠 wpe	✅ 🛠 win
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	✅ 🧪 wpe-wk2	⏳ 🧪 win-tests
✅ 🧪 webkitperl	✅ 🧪 ios-wk2	✅ 🧪 api-mac	✅ 🧪 api-wpe
✅ 🧪 webkitpy	✅ 🧪 ios-wk2-wpt	✅ 🧪 mac-wk1	✅ 🛠 wpe-cairo
	✅ 🧪 api-ios	✅ 🧪 mac-wk2	✅ 🛠 gtk
	✅ 🛠 vision	✅ 🧪 mac-AS-debug-wk2	✅ 🧪 gtk-wk2
	✅ 🛠 vision-sim	✅ 🧪 mac-wk2-stress	✅ 🧪 api-gtk
	✅ 🧪 vision-wk2	✅ 🧪 mac-intel-wk2	✅ 🛠 playstation
	✅ 🛠 tv
	✅ 🛠 tv-sim
	✅ 🛠 watch
	✅ 🛠 watch-sim

Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py

aoikonomopoulos · 2025-06-27T11:14:41Z

Left some (mostly stylistic) suggestions, conceptually this looks solid and is sorely needed!

nikolaszimmermann

Looks good, thanks! Please resolve @aoikonomopoulos comments prior to landing.

webkit-early-warning-system · 2025-07-01T18:43:24Z

EWS run on current version of this PR (hash 3b430de)

Misc	iOS, visionOS, tvOS & watchOS	macOS	Linux	Windows
✅ 🧪 style	✅ 🛠 ios	✅ 🛠 mac	✅ 🛠 wpe	⏳ 🛠 win
✅ 🧪 bindings	✅ 🛠 ios-sim	✅ 🛠 mac-AS-debug	⏳ 🧪 wpe-wk2	⏳ 🧪 win-tests
✅ 🧪 webkitperl	~~🧪 ios-wk2~~	~~🧪 api-mac~~	~~🧪 api-wpe~~
✅ 🧪 webkitpy	⏳ 🧪 ios-wk2-wpt	~~🧪 mac-wk1~~	~~🛠 wpe-cairo~~
⏳ 🛠 🧪 jsc	~~🧪 api-ios~~	~~🧪 mac-wk2~~	~~🛠 gtk~~
	✅ 🛠 vision	⏳ 🧪 mac-AS-debug-wk2	~~🧪 gtk-wk2~~
	✅ 🛠 vision-sim	~~🧪 mac-wk2-stress~~	~~🧪 api-gtk~~
	✅ 🧪 vision-wk2	⏳ 🧪 mac-intel-wk2	~~🛠 playstation~~
✅ 🛠 🧪 unsafe-merge	✅ 🛠 tv
	~~🛠 tv-sim~~
	~~🛠 watch~~
	~~🛠 watch-sim~~

…test-by-subtest https://bugs.webkit.org/show_bug.cgi?id=295050 Reviewed by Nikolas Zimmermann. On the RPi4 32-bit bots, JetStream2 frequently crashes or times out before completing, which prevents any results from being reported because the current runner only uploads data if all subtests finish successfully. We have experimented with running selected subtest sets, but the flakiness remains: some subtests pass while others fail inconsistently, which makes very difficult to select a working set of subtests. This patch implements on browserperfdash-benchmark the ability to run a given benchmark plan subtest-by-subtest, so it only runs a subtest at a time. If a subtest fails, the runner skips it and proceeds to the next. Partial results from the passing subtests are then uploaded to the dashboard, improving visibility into progress and regressions even when the full benchmark cannot complete. To differenciate the standard benchmark plan between the one that is run subtest-by-subtest the string `-split-subtests` is appended to the end of the benchmark plan name. So, to to run the plan jetstream2 subtest-by-subtest it should be specified to run a virtual benchmark plan named `jetstream2-split-subtests`. Since the benchmark plan name is visible on the dashboard, is also possible to differentiate there from the complete jetstream2 vs the one that was run subtest-by-subtest All the benchmark plans that support subtests can be specified this way (not only JetStream2), and the list of those is visible when the flag `--list-plans` is passed to the runner. * Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py: (BrowserPerfDashRunner.__init__): (BrowserPerfDashRunner._parse_config_file): (BrowserPerfDashRunner._get_plan_version_hash): (BrowserPerfDashRunner._get_benchmark_runner_split_subtess_plans): (BrowserPerfDashRunner._run_benchmark_runner_plan_split_subtests): (BrowserPerfDashRunner._run_plan): (BrowserPerfDashRunner.run): Canonical link: https://commits.webkit.org/296877@main

webkit-commit-queue · 2025-07-01T18:46:41Z

Committed 296877@main (00cc05f): https://commits.webkit.org/296877@main

Reviewed commits have been landed. Closing PR #47255 and removing active labels.

clopez requested review from JonWBedard and gsnedders as code owners June 26, 2025 19:25

clopez self-assigned this Jun 26, 2025

clopez added the New Bugs Unclassified bugs are placed in this component until the correct component can be determined. label Jun 26, 2025

clopez requested review from aoikonomopoulos, nikolaszimmermann and a team June 26, 2025 19:29