Open Bug 1885211 Opened 3 months ago Updated 3 months ago

Assertion failure: size > 0, at /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp:290

Categories

(Core :: DOM: Streams, defect, P3)

defect

Tracking

()

Tracking Status
firefox125 --- affected

People

(Reporter: tsmith, Assigned: saschanaz)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, pernosco)

Found while fuzzing m-c 20240227-8ea0c0ea3c7c (--enable-debug --enable-fuzzing)

Unfortunately we do not have a reliable test case to reduce. This is reported regularly by the fuzzers at a low volume.

Assertion failure: size > 0, at /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp:290

#0 0x7fd485090025 in mozilla::dom::InputToReadableStreamAlgorithms::OnInputStreamReady(nsIAsyncInputStream*) /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp:290:3
#1 0x7fd48508fabe in mozilla::dom::InputStreamHolder::OnInputStreamReady(nsIAsyncInputStream*) /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp:226:23
#2 0x7fd47ff5c583 in nsMultiplexInputStream::AsyncWaitCompleted() /builds/worker/checkouts/gecko/xpcom/io/nsMultiplexInputStream.cpp:967:13
#3 0x7fd47ff72c31 in AsyncWaitRunnable::Run() /builds/worker/checkouts/gecko/xpcom/io/nsMultiplexInputStream.cpp:804:14
#4 0x7fd484fb1774 in mozilla::dom::(anonymous namespace)::ExternalRunnableWrapper::WorkerRun(JSContext*, mozilla::dom::WorkerPrivate*) /builds/worker/checkouts/gecko/dom/workers/WorkerPrivate.cpp:212:37
#5 0x7fd484fa28f8 in mozilla::dom::WorkerRunnable::Run() /builds/worker/checkouts/gecko/dom/workers/WorkerRunnable.cpp:378:12
#6 0x7fd47ffd3831 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1193:16
#7 0x7fd47ffda80d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#8 0x7fd484f8f15f in mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) /builds/worker/checkouts/gecko/dom/workers/WorkerPrivate.cpp:3388:7
#9 0x7fd484f72791 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /builds/worker/checkouts/gecko/dom/workers/RuntimeService.cpp:2117:42
#10 0x7fd47ffd3831 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1193:16
#11 0x7fd47ffda80d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#12 0x7fd480cbe255 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:330:5
#13 0x7fd480bd33d1 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#14 0x7fd480bd33d1 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#15 0x7fd47ffceb03 in nsThread::ThreadFunc(void*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:370:10
#16 0x7fd49428f66f in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
#17 0x7fd494b39ac2 in start_thread nptl/pthread_create.c:442:8
#18 0x7fd494bcaa03 in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:100

A Pernosco session is available here: https://pernos.co/debug/uhwNjHTAp5zb7ax15lhZig/index.html

Keywords: pernosco

The same test case sometimes will trigger another assertion that looks related:

Assertion failure: !mRawPtr, at /builds/worker/workspace/obj-build/dist/include/mozilla/AlreadyAddRefed.h:136

#0 0x7f6d53b25035 in ~already_AddRefed /builds/worker/workspace/obj-build/dist/include/mozilla/AlreadyAddRefed.h:136:5
#1 0x7f6d53b25035 in mozilla::RemoteLazyInputStreamThread::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) /builds/worker/checkouts/gecko/dom/file/ipc/RemoteLazyInputStreamThread.cpp:154:5
#2 0x7f6d4fe84340 in nsIEventTarget::Dispatch(nsIRunnable*, unsigned int) /builds/worker/workspace/obj-build/dist/include/nsIEventTarget.h:38:14
#3 0x7f6d50202603 in PostContinuationEvent_Locked /builds/worker/checkouts/gecko/xpcom/io/nsStreamUtils.cpp:463:21
#4 0x7f6d50202603 in nsAStreamCopier::PostContinuationEvent() /builds/worker/checkouts/gecko/xpcom/io/nsStreamUtils.cpp:455:12
#5 0x7f6d502008dc in OnOutputStreamReady /builds/worker/checkouts/gecko/xpcom/io/nsStreamUtils.cpp:425:5
#6 0x7f6d502008dc in non-virtual thunk to nsAStreamCopier::OnOutputStreamReady(nsIAsyncOutputStream*) /builds/worker/checkouts/gecko/xpcom/io/nsStreamUtils.cpp
#7 0x7f6d5020177f in operator() /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:86:47
#8 0x7f6d5020177f in already_AddRefed<mozilla::CancelableRunnable> NS_NewCancelableRunnableFunction<CallbackHolder::CallbackHolder(nsIAsyncOutputStream*, nsIOutputStreamCallback*, unsigned int, nsIEventTarget*)::'lambda'()>(char const*, CallbackHolder::CallbackHolder(nsIAsyncOutputStream*, nsIOutputStreamCallback*, unsigned int, nsIEventTarget*)::'lambda'()&&)::FuncCancelableRunnable::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:636:9
#9 0x7f6d501f080b in CallbackHolder::Notify() /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:118:19
#10 0x7f6d501efcc6 in ~nsPipeEvents /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:1155:14
#11 0x7f6d501efcc6 in nsPipe::OnPipeException(nsresult, bool) /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:984:1
#12 0x7f6d501ef838 in nsPipe::OnInputStreamException(nsPipeInputStream*, nsresult) /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:916:9
#13 0x7f6d501f1906 in nsPipeInputStream::CloseWithStatus(nsresult) /builds/worker/checkouts/gecko/xpcom/io/nsPipe3.cpp:1286:10
#14 0x7f6d53b1f929 in mozilla::RemoteLazyInputStream::Close() /builds/worker/checkouts/gecko/dom/file/ipc/RemoteLazyInputStream.cpp:528:23
#15 0x7f6d501e8812 in nsMultiplexInputStream::Close() /builds/worker/checkouts/gecko/xpcom/io/nsMultiplexInputStream.cpp:302:32
#16 0x7f6d5535a2f8 in Shutdown /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp:195:13
#17 0x7f6d5535a2f8 in DisconnectFromOwner /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp:189:3
#18 0x7f6d5535a2f8 in non-virtual thunk to mozilla::dom::InputStreamHolder::DisconnectFromOwner() /builds/worker/checkouts/gecko/dom/streams/UnderlyingSourceCallbackHelpers.cpp
#19 0x7f6d521ba6cb in operator() /builds/worker/checkouts/gecko/dom/base/nsIGlobalObject.cpp:207:18
#20 0x7f6d521ba6cb in std::_Function_handler<void (mozilla::GlobalTeardownObserver*, bool*), nsIGlobalObject::DisconnectGlobalTeardownObservers()::$_0>::_M_invoke(std::_Any_data const&, mozilla::GlobalTeardownObserver*&&, bool*&&) /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:297:2
#21 0x7f6d5215b90f in operator() /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_function.h:687:14
#22 0x7f6d5215b90f in nsIGlobalObject::ForEachGlobalTeardownObserver(std::function<void (mozilla::GlobalTeardownObserver*, bool*)> const&) const /builds/worker/checkouts/gecko/dom/base/nsIGlobalObject.cpp:197:5
#23 0x7f6d5215a96f in nsIGlobalObject::DisconnectGlobalTeardownObservers() /builds/worker/checkouts/gecko/dom/base/nsIGlobalObject.cpp:205:3
#24 0x7f6d55259ad9 in mozilla::dom::WorkerPrivate::DoRunLoop(JSContext*) /builds/worker/checkouts/gecko/dom/workers/WorkerPrivate.cpp:3346:23
#25 0x7f6d5523d151 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() /builds/worker/checkouts/gecko/dom/workers/RuntimeService.cpp:2123:42
#26 0x7f6d502617b1 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1193:16
#27 0x7f6d5026878d in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:480:10
#28 0x7f6d50f5126e in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:300:20
#29 0x7f6d50e66471 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:363:3
#30 0x7f6d50e66471 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:345:3
#31 0x7f6d5025ca83 in nsThread::ThreadFunc(void*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:370:10
#32 0x7f6d645bd66f in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
#33 0x7f6d64e67ac2 in start_thread nptl/pthread_create.c:442:8
#34 0x7f6d64ef8a03 in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:100

It seems like the problem is happening at this moment (special views link, you want the special views).

In particular, it seems like the pull algorithm is being invoked while the InputStreamHolder already had an AsyncWait call in flight where the AsyncWaitRunnable had been created and dispatched already, but had not yet run. This breaks/desynchronizes the state machine. At the moment the assertion is firing, the underlying nsMultiplexInputStream has an AsyncWait in flight where a StreamNeeded request has been issued which is why the call to Available() returns 0.

The pernosco link above is using the pernosco link button from the side that has a "with current views" to provided a very curated "executions of" merged thing with a lot of useful constraints applied. The journal is not as good. The lines in the executions of that look like nsMultiplexInputStream.cpp:795 — if (aEventTarget) { → 0x7f8c540949e0 are telling us the address of the runnable created in the call to AsyncWaitRunnable::Create in the preceding executions of line so that one can match up the create and the run.

I am marking this as a P3 because this involves a rather wacky composite blob with a rare race condition, but I think the pernosco trace potentially makes this actionable in a P2 horizon.

Severity: -- → S3
Flags: needinfo?(krosylight)
Priority: -- → P3
Assignee: nobody → krosylight
Flags: needinfo?(krosylight)

I made a merged view of PullCallback (line 256), AsyncWait (line 212), OnInputStreamReady (line 274). Indeed I can see that pull algorithm is called while the previous AsyncWait (called with WAIT_CLOSURE_ONLY) did not respond yet.

I think the assumption was that it should be safe to do this; the next Available call would anyway give us NS_BASE_STREAM_CLOSED if it was closed. Somehow that's not the case here and the next Available call returns NS_OK, potentially meaning that the underlying stream was trying to ping the listener when the stream is not closed? 🤔

AsyncWaitRunnable::Create does not seem to use AsyncWaitFlags, could that be a problem?

(In reply to Kagami [:saschanaz] (they/them) from comment #5)

AsyncWaitRunnable::Create does not seem to use AsyncWaitFlags, could that be a problem?

I'm not sure how. I only see WAIT_CLOSURE_ONLY as a defined flag and that seems like it would not be helpful in this context since we do expect the stream to potentially produce more data.

(In reply to Kagami [:saschanaz] (they/them) from comment #4)

I think the assumption was that it should be safe to do this; the next Available call would anyway give us NS_BASE_STREAM_CLOSED if it was closed. Somehow that's not the case here and the next Available call returns NS_OK, potentially meaning that the underlying stream was trying to ping the listener when the stream is not closed? 🤔

My understanding of the trace is that:

  • We have a very complicated composite blob that results in a multiplex stream consisting of 13 sub-things that are either strings or other blobs.
  • There is a specific race happening where the multiplex stream pull callback is invoked such that the multiplex stream now has reached a RemoteLazyInputStream which will end up sending an async StreamNeeded request so that it can have the underlying stream data needed. While we are waiting for the stream to be created, Available() will return 0 with NS_OK because the stream is not yet closed, it's just doing async stuff.
  • The assertion that's firing was assuming that the OnInputStreamReady callback would be due to the most recent AsyncWait it invoked having completed, so there would be no async stuff happening where the stream is in an indeterminate state. But this is an incorrect assumption because the InputToReadableStreamAlgorithms already had an outstanding call to AsyncWait that had already dispatched its notification runnable, but which had not yet run.
  • There is still going to be another OnInputStreamReady callback that will happen in the future!
    • And there is a slightly different variant of this race where when the PullCallback called AsyncWait before the runnable was dispatched which would moot this firing OnInputStreamReady event.

I don't know enough about the InputToReadableStreamAlgorithms state machine here to understand if this is a case where you could just do something like one of the following:

  • Don't call AsyncWait in PullCallbackImpl if we already have an AsyncWait call in flight because it's redundant and potentially creates this exact race.
  • Just turn the assertion instead into "if success and size == 0 then this was just a wacky race and we can just early return now and wait for the next OnInputStreamReady"

Or if this is something where there's more complex state machine things that need to be dealt with.

But this is an incorrect assumption because the InputToReadableStreamAlgorithms already had an outstanding call to AsyncWait that had already dispatched its notification runnable, but which had not yet run.

But it should not dispatch any runnable unless it's closed, as the AsyncWait is with WAIT_CLOSURE_ONLY. That call exists to close the DOM stream when the XPCOM stream is closed.

Don't call AsyncWait in PullCallbackImpl if we already have an AsyncWait call in flight because it's redundant and potentially creates this exact race.

But PullCallbackImpl needs to reset the async wait flag to subscribe data instead of just stream closure, the new AsyncWait needs to be made for that purpose.

You need to log in before you can comment on or make changes to this bug.