Skip to content

[WPE][browserperfdash-benchmark] Allow to run run-benchmark plans subtest-by-subtest #47255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

clopez
Copy link
Contributor

@clopez clopez commented Jun 26, 2025

0802621

[WPE][browserperfdash-benchmark] Allow to run run-benchmark plans subtest-by-subtest
https://bugs.webkit.org/show_bug.cgi?id=295050

Reviewed by NOBODY (OOPS!).

On the RPi4 32-bit bots, JetStream2 frequently crashes or times out before completing,
which prevents any results from being reported because the current runner only uploads
data if all subtests finish successfully.

We have experimented with running selected subtest sets, but the flakiness remains:
some subtests pass while others fail inconsistently, which makes very difficult to
select a working set of subtests.

This patch implements on browserperfdash-benchmark the ability to run a given benchmark
plan subtest-by-subtest, so it only runs a subtest at a time.
If a subtest fails, the runner skips it and proceeds to the next. Partial results from the
passing subtests are then uploaded to the dashboard, improving visibility into progress
and regressions even when the full benchmark cannot complete.

To differenciate the standard benchmark plan between the one that is run subtest-by-subtest
the string `-split-subtests` is appended to the end of the benchmark plan name. So, to
to run the plan jetstream2 subtest-by-subtest it should be specified to run a virtual
benchmark plan named `jetstream2-split-subtests`.
Since the benchmark plan name is visible on the dashboard, is also possible to differentiate
there from the complete jetstream2 vs the one that was run subtest-by-subtest

All the benchmark plans that support subtests can be specified this way (not only JetStream2),
and the list of those is visible when the flag `--list-plans` is passed to the runner.

* Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:
(BrowserPerfDashRunner.__init__):
(BrowserPerfDashRunner._parse_config_file):
(BrowserPerfDashRunner._get_plan_version_hash):
(BrowserPerfDashRunner._get_benchmark_runner_split_subtess_plans):
(BrowserPerfDashRunner._run_benchmark_runner_plan_split_subtests):
(BrowserPerfDashRunner._run_plan):
(BrowserPerfDashRunner.run):

0802621

Misc iOS, visionOS, tvOS & watchOS macOS Linux Windows
✅ 🧪 style ✅ 🛠 ios ✅ 🛠 mac ✅ 🛠 wpe ✅ 🛠 win
✅ 🧪 bindings ✅ 🛠 ios-sim ✅ 🛠 mac-AS-debug ✅ 🧪 wpe-wk2 ⏳ 🧪 win-tests
✅ 🧪 webkitperl ✅ 🧪 ios-wk2 ✅ 🧪 api-mac ✅ 🧪 api-wpe
✅ 🧪 webkitpy ✅ 🧪 ios-wk2-wpt ✅ 🧪 mac-wk1 ✅ 🛠 wpe-cairo
✅ 🧪 api-ios ✅ 🧪 mac-wk2 ✅ 🛠 gtk
✅ 🛠 vision ✅ 🧪 mac-AS-debug-wk2 ✅ 🧪 gtk-wk2
✅ 🛠 vision-sim ✅ 🧪 mac-wk2-stress ✅ 🧪 api-gtk
✅ 🧪 vision-wk2 ✅ 🧪 mac-intel-wk2 ✅ 🛠 playstation
✅ 🛠 tv
✅ 🛠 tv-sim
✅ 🛠 watch
✅ 🛠 watch-sim

…test-by-subtest

https://bugs.webkit.org/show_bug.cgi?id=295050

Reviewed by NOBODY (OOPS!).

On the RPi4 32-bit bots, JetStream2 frequently crashes or times out before completing,
which prevents any results from being reported because the current runner only uploads
data if all subtests finish successfully.

We have experimented with running selected subtest sets, but the flakiness remains:
some subtests pass while others fail inconsistently, which makes very difficult to
select a working set of subtests.

This patch implements on browserperfdash-benchmark the ability to run a given benchmark
plan subtest-by-subtest, so it only runs a subtest at a time.
If a subtest fails, the runner skips it and proceeds to the next. Partial results from the
passing subtests are then uploaded to the dashboard, improving visibility into progress
and regressions even when the full benchmark cannot complete.

To differenciate the standard benchmark plan between the one that is run subtest-by-subtest
the string `-split-subtests` is appended to the end of the benchmark plan name. So, to
to run the plan jetstream2 subtest-by-subtest it should be specified to run a virtual
benchmark plan named `jetstream2-split-subtests`.
Since the benchmark plan name is visible on the dashboard, is also possible to differentiate
there from the complete jetstream2 vs the one that was run subtest-by-subtest

All the benchmark plans that support subtests can be specified this way (not only JetStream2),
and the list of those is visible when the flag `--list-plans` is passed to the runner.

* Tools/Scripts/webkitpy/browserperfdash/browserperfdash_runner.py:
(BrowserPerfDashRunner.__init__):
(BrowserPerfDashRunner._parse_config_file):
(BrowserPerfDashRunner._get_plan_version_hash):
(BrowserPerfDashRunner._get_benchmark_runner_split_subtess_plans):
(BrowserPerfDashRunner._run_benchmark_runner_plan_split_subtests):
(BrowserPerfDashRunner._run_plan):
(BrowserPerfDashRunner.run):
@clopez clopez self-assigned this Jun 26, 2025
@clopez clopez added the New Bugs Unclassified bugs are placed in this component until the correct component can be determined. label Jun 26, 2025
@clopez clopez requested review from aoikonomopoulos, nikolaszimmermann and a team June 26, 2025 19:29
@@ -224,6 +222,14 @@ def _get_test_data_json_string(self, temp_result_file):
del temp_result_json['debugOutput']
return json.dumps(temp_result_json)

def _get_benchmark_runner_split_subtess_plans(self, benchmark_runner_plans):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo, subtess -> subtest.

subtests_passed = []
subtests_failed = []
for subtest in subtests_to_run:
current_subtest_index += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this use enumerate() instead?

main_benchmark_name = None
metrics_aggregation_entry = None
split_subtest_data = {}
split_subtest_data_is_empty = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but couldn't the single use of split_subtest_data_is_empty be replaced by len(split_subtest_data) == 0 (or even not split_subtest_data, but personally I find the latter version confusing).

metrics_aggregation_entry = temp_result_json[main_benchmark_name]['metrics']
split_subtest_data[main_benchmark_name] = {}
split_subtest_data[main_benchmark_name]['metrics'] = metrics_aggregation_entry
split_subtest_data[main_benchmark_name]['tests'] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another style thing, but perhaps we could construct the dict locally and then expose it? I.e.

split_subtest_data[main_benchmark_name] = {
    'metrics': temp_result_json[main_benchmark_name]['metrics'],
    'tests' = {},
}

It's fine the way it is, it's just that the repetition of the long names makes it feel a tad too verbose for me.

@@ -407,7 +491,8 @@ def run(self):
try:
with tempfile.NamedTemporaryFile() as temp_result_file:
self._result_data['local_timestamp_teststart'] = datetime.now().strftime('%s')
self._run_plan(plan, temp_result_file.name)
number_failed_subtests = self._run_plan(plan, temp_result_file.name)
subtests_failed = type(number_failed_subtests) is int and number_failed_subtests > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler to have all callees of _run_plan return an int here, even if it's a hardcoded 0?

@aoikonomopoulos
Copy link
Contributor

Left some (mostly stylistic) suggestions, conceptually this looks solid and is sorely needed!

@clopez clopez added WPE WebKit WebKit WPE component and removed New Bugs Unclassified bugs are placed in this component until the correct component can be determined. labels Jun 27, 2025
Copy link
Contributor

@nikolaszimmermann nikolaszimmermann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! Please resolve @aoikonomopoulos comments prior to landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WPE WebKit WebKit WPE component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants