Open Bug 1707753 Opened 3 years ago Updated 5 months ago

6 tests skipped on apple_silicon

Categories

(Toolkit :: Application Update, task, P3)

task

Tracking

()

REOPENED
Tracking Status
firefox90 --- affected

People

(Reporter: jmaher, Assigned: nalexander, NeedInfo)

References

Details

(Whiteboard: [fidedi-ope])

Attachments

(2 files)

Last week we turned on tests for Apple Silicon (OSX 11.2.3 on new Apple based hardware). We are using the simplified new test config process:
https://firefox-source-docs.mozilla.org/testing/ci-configs/index.html

As the tests are live, we are now filing bugs to help close the loop and hope to fix any issues over the next 7 weeks. As the process outlines, there are tier-3 jobs running on m-c which run these skipped tests and will expect a failure/timeout/crash- if it doesn't fail, then the job will turn orange.

Here are some failing tests:
toolkit/components/taskscheduler/tests/xpcshell/test_TaskSchedulerMacOSImpl.js
toolkit/components/taskscheduler/tests/xpcshell/test_TaskScheduler.js

and some more failing tests:
toolkit/mozapps/update/tests/unit_base_updater/marStageSuccessComplete.js
toolkit/mozapps/update/tests/unit_base_updater/marStageSuccessPartial.js
toolkit/mozapps/update/tests/unit_base_updater/marAppInUseStageSuccessComplete_unix.js
toolkit/mozapps/update/tests/unit_base_updater/marAppApplyUpdateStageSuccess.js

:nalexander - I am giving you a heads up as triage owner that these tests are skipped on the new apple silicon platform.

Flags: needinfo?(nalexander)

(In reply to Joel Maher ( :jmaher ) (UTC -0800) from comment #1)

:nalexander - I am giving you a heads up as triage owner that these tests are skipped on the new apple silicon platform.

Thanks jmaher. The task scheduler ones look like some differences in configuration.

I can't explain the unit_base_updater failures -- Kirk, any ideas?

Flags: needinfo?(nalexander) → needinfo?(ksteuber)

It looks to me like the unit_base_updater tests are all failing the same way. They effectively time out after hitting this error:

JavaScript error: /opt/worker/tasks/task_161911974565346/build/tests/xpcshell/head.js, line 241: uncaught exception:
Waiting for file to exist, path: /opt/worker/tasks/task_161911974565346/build/tests/xpcshell/tests/toolkit/mozapps/update/tests/unit_base_updater/marStageSuccessComplete/dir.app/Contents/Resources/postup_app.log - timed out after 50 tries.

I had to do some digging to figure out what the story with that is. All three of these tests call checkPostUpdateAppLog(), which checks the contents of postup_app.log. In testing, we override PostUpdate by changing updater.ini, which specifies what ought to be called for PostUpdate. I believe that the binary specified is TestAUSHelper, which I think is copied to that location here. That binary ought to write to postup_app.log here. But the error message suggests that either it isn't, or there is a long enough delay that we time out waiting for it.

It's pretty hard to pinpoint where the failure is happening here. The log shows that we are reading a success value from update.status, but I believe that that still gives us anywhere between the invocation of LaunchCallbackAndPostProcessApps and the writing of the log file for something to go wrong that would result in the file not being written. This leaves us with a pretty large number of things that could be going wrong to cause this.

I'm not entirely sure how to proceed here. I don't have access to any relevant hardware, and I'm not sure whether it would be worth it for me to get hardware for this and set it all up just to figure this out. Adding a lot more logging and re-running on try is the best option I can think of, but it would still be pretty tricky. The logging that I would need would be in the updater binary, and AFAIK I can't easily get at the updater logs generated by a test. Maybe I could add some code to the test to read the logs to include them as part of the test log? This sounds like something I would likely need to dedicate some time to.

Flags: needinfo?(ksteuber)

thanks for the digging in :bytesized. As a note, we have turned off these tests temporarily while we work to get things running in arm64 mode instead of emulated in x86_64- also we need to upgrade the machines for security issues. Having this data when we turn the tests back on will help validate things, test things differently, or confirm we are still in the same boat.

Priority: -- → P3
Whiteboard: [fidedi-ope]

Newer versions of macOS mark copied applications as quarantined and
pop a UI dialog when they are run. It's possible to manually
unquarantine them by removing specific extended attributes. It
appears the attributes must be removed from directories and files.

Assignee: nobody → nalexander
Status: NEW → ASSIGNED

The existing code works around two limitations of nsIProcess:
there's no support for output redirection and the environment is
inherited from the parent directly. Subprocess.jsm is not limited
in these ways.

I hypothesize that the existing /bin/sh launch mechanism fails due
to changes to the shell serving /bin/sh shebangs that rolled out
with macOS 10.15 (Catalina). These failures do not reproduce locally
for me (with an M1 with macOS 11.0). The invocations are silent and
this should at least provide logging when they fail.

Depends on D148559

Kirk: throwing out these two patches that I worked up during a deep dive into this ticket moons ago. I'm quite sure they didn't actually address the issue, and I never could repro on an actual M1 device.

Try build is percolating at https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f1a3bce4216f591ea663fa7092fff8d7c15a5fc.

We might want to land these if they're neutral, even if they don't address the issue, since they set the stage. Or we might wait until we get a loaner M1 to repro in automation.

I rebased and pushed a new try build at https://treeherder.mozilla.org/jobs?repo=try&revision=11e5fa13b9cc87535b541d6d8588ab9040258f07. We'll see what happens.

(In reply to Nick Alexander :nalexander [he/him] from comment #9)

I rebased and pushed a new try build at https://treeherder.mozilla.org/jobs?repo=try&revision=11e5fa13b9cc87535b541d6d8588ab9040258f07. We'll see what happens.

It looks like this is enough: https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=TSWgobErRV6NY6qN2CofeQ.0&revision=11e5fa13b9cc87535b541d6d8588ab9040258f07. I'll work with :bytesized to get this landed.

OK, this has been an odyssey, but I'm getting closer. First, the earlier try build didn't actually run any tests due to a weirdness in test-verify. But https://treeherder.mozilla.org/jobs?repo=try&revision=5948f297f4294f7df361df6c7174811cebf0bea5 actually does run the macOS tests, and they're green. However, there's a different unexpected interaction with the Subprocess.sys.mjs changes: the tests succeed locally with XRE_PROFILE_PATH set, but fail with -profile only. That's counter to my expectations, so I'm going to try to run it down.

Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4750f3371d2d
Mark copied app files as unquarantined in macOS updater unit tests. r=bytesized,application-update-reviewers
https://hg.mozilla.org/integration/autoland/rev/f2cbd63247d4
Use `Subprocess.jsm` to launch application, rather than `nsIProcess` + `/bin/sh`. r=bytesized,application-update-reviewers
Regressions: 1873298

Backed out for causing xpcshell failures in marAppApplyUpdateAppBinInUseStageSuccess_win.js

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: TEST-UNEXPECTED-FAIL | toolkit/mozapps/update/tests/unit_base_updater/marAppApplyUpdateAppBinInUseStageSuccess_win.js | runUpdateUsingApp - [runUpdateUsingApp : 4579] the status file state should equal the expected value - "applied" == "succeeded"
Flags: needinfo?(nalexander)
Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5c225644025f
Mark copied app files as unquarantined in macOS updater unit tests. r=bytesized,application-update-reviewers
https://hg.mozilla.org/integration/autoland/rev/85a01700b713
Use `Subprocess.jsm` to launch application, rather than `nsIProcess` + `/bin/sh`. r=bytesized,application-update-reviewers

Backed out for causing xpcshell failures in marAppApplyUpdateAppBinInUseStageSuccess_win.js

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: TEST-UNEXPECTED-FAIL | toolkit/mozapps/update/tests/unit_base_updater/marAppApplyUpdateAppBinInUseStageSuccess_win.js | runUpdateUsingApp - [runUpdateUsingApp : 4613] the status file state should equal the expected value - "applied" == "succeeded"
Attachment #9280122 - Attachment description: Bug 1707753 - Mark copied app files as unquarantined in macOS updater unit tests. r?bytesized → WIP: Bug 1707753 - Mark copied app files as unquarantined in macOS updater unit tests. r?bytesized
Attachment #9280123 - Attachment description: Bug 1707753 - Use `Subprocess.jsm` to launch application, rather than `nsIProcess` + `/bin/sh`. r?bytesized → WIP: Bug 1707753 - Use `Subprocess.jsm` to launch application, rather than `nsIProcess` + `/bin/sh`. r?bytesized
Attachment #9280123 - Attachment description: WIP: Bug 1707753 - Use `Subprocess.jsm` to launch application, rather than `nsIProcess` + `/bin/sh`. r?bytesized → Bug 1707753 - Use `Subprocess.jsm` to launch application, rather than `nsIProcess` + `/bin/sh`. r?bytesized
Attachment #9280122 - Attachment description: WIP: Bug 1707753 - Mark copied app files as unquarantined in macOS updater unit tests. r?bytesized → Bug 1707753 - Mark copied app files as unquarantined in macOS updater unit tests. r?bytesized
Pushed by nalexander@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/624787a1c115
Mark copied app files as unquarantined in macOS updater unit tests. r=bytesized,application-update-reviewers
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 124 Branch
Regressions: 1877654
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 124 Branch → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: