filterResponseData sometimes loses data
Categories
(WebExtensions :: Request Handling, defect, P2)
Tracking
(Not tracked)
People
(Reporter: dennis.lissov, Unassigned)
Details
Attachments
(3 files, 1 obsolete file)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0
Steps to reproduce:
Run with web-ext
the attached test extension (a modified version of an example extension).
Open https://www.sberbank.ru/ru/person and refresh it with F5 a few dozen times checking that the page loads completely (I just look that after a couple seconds the scrollbar stabilizes on the same length).
Actual results:
Sometimes (rarely) the page does not load completely. I could not observe the same problem without the test extension, so it looks like a rare bug (race condition?) somewhere in the response stream filtering functionality.
For example, in this series of tests the problem happened on the 15th run and only the website header was visible.
Expected results:
An extension that does not modify responses in any way should not cause pages to fail to load.
Reporter | ||
Comment 1•4 years ago
|
||
Attached a screenshot of a particularly early broken load. Sorry that I cannot provide a more reliable reproducer.
Comment 2•4 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Reporter | ||
Comment 3•4 years ago
|
||
I've reproduced the same difference with fetch
and here's an archive with an example pair of files. A git diff --word-diff
shows that the difference between these is that one chunk in the middle is missing from the "bad" file.
Reporter | ||
Comment 5•4 years ago
|
||
I've updated the reproducer. The attached extension reproduces the bug automatically and reliably for me (see the logs in its console or in the browser console for older Firefox versions). Normally of the 30 tries some 4 or so are faulty.
The interesting point here is that it does not cause any problems if I comment out the filter.disconnect()
call in the ondata
callback. Also this does not look like a regression: I looked back down to Firefox 57 (launched via mozregression) and observed the same problem.
Reporter | ||
Comment 6•4 years ago
|
||
Ok, I've looked at this with cout
-based debugging and found the following two races:
-
In
StreamFilterParent::FinishDisconnect
settingmState
toDisconnected
andmDisconnected
totrue
happens in two different closures sent to different threads,
which means thatStreamFilterParent::OnDataAvailable
can be called withmDisconnected == false
andmState == State::Disconnected
,
which, in turn, causes that method to send that chunk to the child (and, presumably, lose it forever). -
StreamFilterParent::OnDataAvailable
delegates sending a chunk to the child toStreamFilterParent::DoSendData
on another thread, which drops the chunk ifRecvDisconnect
changesmState
toDisconnecting
between these two points.
After hacking around both problems I could not reproduce losing chunks anymore, but had the problem of (presumably) sometimes buffering the chunk from point 2 out of order. NI-ing :kmag as the original author of this code.
Comment 7•4 years ago
|
||
Thanks for your analysis! I haven't personally verified the bug, but the analysis looks plausible so I'm going to mark it as confirmed.
Have you tried to verify whether the bug gets fixed by moving the RunOnActorThread
call inside the RunOnMainThread
closure? (at https://searchfox.org/mozilla-central/rev/25d491b7924493b5d4bedc07841a1cd92f271100/toolkit/components/extensions/webrequest/StreamFilterParent.cpp#323-327 )
If it works, could you submit a patch? https://wiki.mozilla.org/WebExtensions/Contribution_Onramp#Submitting_a_Patch
This race condition is probably difficult to capture in a unit test, so if you manage to verify that the bug gets fixed without introducing new regressions/issues, then I'll approve the patch.
Reporter | ||
Comment 8•4 years ago
|
||
I've considered that, but that does not close the second race observed... and reading the source I found that this module seems to be able to cause much more interesting races.
For example, write
from the extension causes the actor thread to enqueue calling mOrigListener->OnDataAvailable
on the IO thread, while close
from the extension causes the actor thread to enqueue calling mOrigListener->OnStopRequest
on the main thread. As a result, in the case of off-main-thread delivery write
and close
one after another enqueue both on different threads in arbitrary order or even in parallel (which, AFAIU, violates nsIStreamListener
's contract).
I'm going to try digging into this and refactoring this code.
Comment 9•4 years ago
|
||
FYI there is another race in bug 1645941.
Reporter | ||
Comment 10•4 years ago
|
||
Hmm, looks like I should wait for bug 1633935 and bug 1645941.
Updated•4 years ago
|
Updated•4 years ago
|
Reporter | ||
Comment 11•2 years ago
|
||
Sorry, I've lost all track of this. The reproducer shows the bug is still there on Nightly 2022-07-25, not sure whether the analysis is still valid or something has changed significantly since 2020. Hope to have time for another look at this in a couple weeks or so.
Description
•