Closed Bug 1872913 Opened 6 months ago Closed 5 months ago

Hanging at shutdown as mMainThreadDebuggeeEventTarget is paused and will not execute CancelingOnParentRunnable

Categories

(Core :: DOM: Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED
123 Branch
Tracking Status
firefox123 --- fixed

People

(Reporter: tsmith, Assigned: jstutte)

References

(Blocks 1 open bug)

Details

(Keywords: pernosco)

Attachments

(1 file)

Found while fuzzing m-c 20240103-9ea90dc23395 (--enable-debug --enable-fuzzing)

I don't have a test case but I do have a Pernosco session: https://pernos.co/debug/BKJlE9c0OL2DIiMuKywbPw/index.html

stderr:

[Parent 2109771, Main Thread] WARNING: '!top', file /builds/worker/checkouts/gecko/dom/xul/MenuBarListener.cpp:99
[Parent 2109771, IPC I/O Parent] WARNING: Process 2110195 may be hanging at shutdown; will wait for up to 8000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:184
[Parent 2109771, IPC I/O Parent] WARNING: Process 2110195 hanging at shutdown; attempting crash report (fatal error).: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:207
#0 0x7f0396297041 in mozilla::MediaTr    #0 0x70000002  (linux-vdso.so.1+0x70000002) (BuildId: 274539cc9764a41518e972bafd6e92fc673c25be)
#1 0x7fad6819bd07 in _raw_syscall /home/twsmith/code/rr/src/preload/raw_syscall.S:120
#2 0x7fad68195908 in traced_raw_syscall /home/twsmith/code/rr/src/preload/syscallbuf.c:350:10
#3 0x7fad68198d02 in sys_futex /home/twsmith/code/rr/src/preload/syscallbuf.c:2040:14
#4 0x7fad68198d02 in syscall_hook_internal /home/twsmith/code/rr/src/preload/syscallbuf.c:4134:5
#5 0x7fad6819bacb in syscall_hook /home/twsmith/code/rr/src/preload/syscallbuf.c:4311:17
#6 0x7fad6819bacb in syscall_hook /home/twsmith/code/rr/src/preload/syscallbuf.c:4295:16
#7 0x7fad68195322 in _syscall_hook_trampoline /home/twsmith/code/rr/src/preload/syscall_hook.S:308
#8 0x7fad6819538c in __morestack /home/twsmith/code/rr/src/preload/syscall_hook.S:443
#9 0x7fad681953a8 in _syscall_hook_trampoline_48_3d_00_f0_ff_ff /home/twsmith/code/rr/src/preload/syscall_hook.S:462
#10 0x7fad6816337b in futex_wait_cancelable /build/glibc-wuryBv/glibc-2.31/nptl/../sysdeps/nptl/futex-internal.h:183:13
#11 0x7fad6816337b in __pthread_cond_wait_common /build/glibc-wuryBv/glibc-2.31/nptl/pthread_cond_wait.c:508:14
#12 0x7fad6816337b in pthread_cond_wait@@GLIBC_2.3.2 /build/glibc-wuryBv/glibc-2.31/nptl/pthread_cond_wait.c:647:10
#13 0x555b24315b49 in mozilla::detail::ConditionVariableImpl::wait(mozilla::detail::MutexImpl&) /builds/worker/checkouts/gecko/mozglue/misc/ConditionVariable_posix.cpp:106:11
#14 0x555b24315c2f in mozilla::detail::ConditionVariableImpl::wait_for(mozilla::detail::MutexImpl&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) /builds/worker/checkouts/gecko/mozglue/misc/ConditionVariable_posix.cpp:113:5
#15 0x7fad485390e9 in mozilla::OffTheBooksCondVar::Wait(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) /builds/worker/checkouts/gecko/xpcom/threads/BlockingResourceBase.cpp:534:20
#16 0x7fad48538fb0 in mozilla::OffTheBooksCondVar::Wait() /builds/worker/checkouts/gecko/xpcom/threads/BlockingResourceBase.cpp:514:21
#17 0x7fad48545afc in mozilla::TaskController::GetRunnableForMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:619:19
#18 0x7fad48585f04 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1134:38
#19 0x7fad4858e975 in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#20 0x7fad509cc7d0 in mozilla::dom::workerinternals::RuntimeService::Cleanup() /builds/worker/checkouts/gecko/dom/workers/RuntimeService.cpp:1619:14
#21 0x7fad509d3341 in mozilla::dom::workerinternals::RuntimeService::Observe(nsISupports*, char const*, char16_t const*) /builds/worker/checkouts/gecko/dom/workers/RuntimeService.cpp:1909:5
#22 0x7fad48434737 in nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) /builds/worker/checkouts/gecko/xpcom/ds/nsObserverList.cpp:71:19
#23 0x7fad48436be1 in nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) /builds/worker/checkouts/gecko/xpcom/ds/nsObserverService.cpp:288:19
#24 0x7fad48332431 in mozilla::AppShutdown::AdvanceShutdownPhaseInternal(mozilla::ShutdownPhase, bool, char16_t const*, nsCOMPtr<nsISupports> const&) /builds/worker/checkouts/gecko/xpcom/base/AppShutdown.cpp:433:21
#25 0x7fad48332a87 in mozilla::AppShutdown::AdvanceShutdownPhase(mozilla::ShutdownPhase, char16_t const*, nsCOMPtr<nsISupports> const&) /builds/worker/checkouts/gecko/xpcom/base/AppShutdown.cpp:456:3
#26 0x7fad485fa0d2 in mozilla::ShutdownXPCOM(nsIServiceManager*) /builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp:612:5
#27 0x7fad485f9e24 in NS_ShutdownXPCOM /builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp:564:10
#28 0x7fad504c0ec4 in mozilla::dom::ContentProcess::CleanUp() /builds/worker/checkouts/gecko/dom/ipc/ContentProcess.cpp:189:3
#29 0x7fad5561c916 in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:660:16
#30 0x7fad55631c86 in mozilla::BootstrapImpl::XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/Bootstrap.cpp:67:12
#31 0x555b24257f18 in content_process_main(mozilla::Bootstrap*, int, char**) /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
#32 0x555b242581b9 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:375:18
#33 0x7fad67c12082 in __libc_start_main /build/glibc-wuryBv/glibc-2.31/csu/../csu/libc-start.c:308:16
#34 0x555b2422e0e8 in _start (/home/twsmith/workspace/browsers/m-c-20240103160634-fuzzing-noopt-debug/firefox-bin+0xce0e8) (BuildId: f0cc65e059645bdf2e7305d49c15f7207c65d749)
Flags: needinfo?(jstutte)
See Also: → 1769913

Thanks for that pernosco trace! I added some comments there and I think this points us to and helps us with bug 1769913.

Flags: needinfo?(jstutte)

(In reply to Jens Stutte [:jstutte] from bug 1769913 comment #6)

Bug 1872913 contains an interesting case for such a shutdown hang as of comment 0.

It seems we post the CancelingOnParentRunnable (holding the strong worker ref that blocks our shutdown) to the worker's mMainThreadDebuggeeEventTarget but an incoming nsGlobalWindowInner::Suspend pauses that queue before it will ever be executed. I wonder if the CancelingOnParentRunnable should better be dispatched directly to the main thread queue? Or we need to drain/unpause the throttled queue on Cancel (but I fear that could always race with asynchronous events causing it to pause again) ?

Let's give it a try to avoid pausing when canceling. There are potentially more/different runnables in that queue, I assume.

Severity: -- → S3
Priority: -- → P2
Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Summary: hanging at shutdown → Hanging at shutdown as mMainThreadDebuggeeEventTarget is paused and will not execute CancelingOnParentRunnable
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eb1eeea41f02
Ensure mMainThreadDebuggeeEventTarget is not paused during canceling of a worker. r=asuth

Yes, that test revealed a logic problem with that patch, as we now run some runnables in a situation they do not expect to be run in.

Flags: needinfo?(jstutte)
Pushed by jstutte@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1c1cef3dae75
Ensure mMainThreadDebuggeeEventTarget is not paused during canceling of a worker. r=asuth
Status: ASSIGNED → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → 123 Branch

Is there a reason for this bug to not have the bugmon keyword? I'd like to make it check if it is fixed for good.

Flags: needinfo?(twsmith)

(In reply to Jens Stutte [:jstutte] from comment #9)

Is there a reason for this bug to not have the bugmon keyword? I'd like to make it check if it is fixed for good.

Bugmon requires a test case and we don't have a reliable one for this issue.

Flags: needinfo?(twsmith)

(In reply to Tyson Smith [:tsmith] from comment #10)

Bugmon requires a test case and we don't have a reliable one for this issue.

Fair enough, should have known. Thanks

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: