Open Bug 1644621 Opened 4 years ago Updated 2 years ago

filterResponseData sometimes loses data

Categories

(WebExtensions :: Request Handling, defect, P2)

79 Branch
defect

Tracking

(Not tracked)

People

(Reporter: dennis.lissov, Unassigned)

Details

Attachments

(3 files, 1 obsolete file)

Attached file http-response-noop-filter.xpi (obsolete) —

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:79.0) Gecko/20100101 Firefox/79.0

Steps to reproduce:

Run with web-ext the attached test extension (a modified version of an example extension).
Open https://www.sberbank.ru/ru/person and refresh it with F5 a few dozen times checking that the page loads completely (I just look that after a couple seconds the scrollbar stabilizes on the same length).

Actual results:

Sometimes (rarely) the page does not load completely. I could not observe the same problem without the test extension, so it looks like a rare bug (race condition?) somewhere in the response stream filtering functionality.
For example, in this series of tests the problem happened on the 15th run and only the website header was visible.

Expected results:

An extension that does not modify responses in any way should not cause pages to fail to load.

Attached a screenshot of a particularly early broken load. Sorry that I cannot provide a more reliable reproducer.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Product: Firefox → WebExtensions
Attached file example-files.zip

I've reproduced the same difference with fetch and here's an archive with an example pair of files. A git diff --word-diff shows that the difference between these is that one chunk in the middle is missing from the "bad" file.

Flags: needinfo?(rob)

I've updated the reproducer. The attached extension reproduces the bug automatically and reliably for me (see the logs in its console or in the browser console for older Firefox versions). Normally of the 30 tries some 4 or so are faulty.

The interesting point here is that it does not cause any problems if I comment out the filter.disconnect() call in the ondata callback. Also this does not look like a regression: I looked back down to Firefox 57 (launched via mozregression) and observed the same problem.

Attachment #9155486 - Attachment is obsolete: true

Ok, I've looked at this with cout-based debugging and found the following two races:

  1. In StreamFilterParent::FinishDisconnect setting mState to Disconnected and mDisconnected to true happens in two different closures sent to different threads,
    which means that StreamFilterParent::OnDataAvailable can be called with mDisconnected == false and mState == State::Disconnected,
    which, in turn, causes that method to send that chunk to the child (and, presumably, lose it forever).

  2. StreamFilterParent::OnDataAvailable delegates sending a chunk to the child to StreamFilterParent::DoSendData on another thread, which drops the chunk if RecvDisconnect changes mState to Disconnecting between these two points.

After hacking around both problems I could not reproduce losing chunks anymore, but had the problem of (presumably) sometimes buffering the chunk from point 2 out of order. NI-ing :kmag as the original author of this code.

Flags: needinfo?(kmaglione+bmo)

Thanks for your analysis! I haven't personally verified the bug, but the analysis looks plausible so I'm going to mark it as confirmed.

Have you tried to verify whether the bug gets fixed by moving the RunOnActorThread call inside the RunOnMainThread closure? (at https://searchfox.org/mozilla-central/rev/25d491b7924493b5d4bedc07841a1cd92f271100/toolkit/components/extensions/webrequest/StreamFilterParent.cpp#323-327 )

If it works, could you submit a patch? https://wiki.mozilla.org/WebExtensions/Contribution_Onramp#Submitting_a_Patch
This race condition is probably difficult to capture in a unit test, so if you manage to verify that the bug gets fixed without introducing new regressions/issues, then I'll approve the patch.

Status: UNCONFIRMED → NEW
Component: Untriaged → Request Handling
Ever confirmed: true
Flags: needinfo?(rob)

I've considered that, but that does not close the second race observed... and reading the source I found that this module seems to be able to cause much more interesting races.

For example, write from the extension causes the actor thread to enqueue calling mOrigListener->OnDataAvailable on the IO thread, while close from the extension causes the actor thread to enqueue calling mOrigListener->OnStopRequest on the main thread. As a result, in the case of off-main-thread delivery write and close one after another enqueue both on different threads in arbitrary order or even in parallel (which, AFAIU, violates nsIStreamListener's contract).

I'm going to try digging into this and refactoring this code.

FYI there is another race in bug 1645941.

Hmm, looks like I should wait for bug 1633935 and bug 1645941.

Severity: -- → S3
Priority: -- → P2
Summary: filterResponseData rarely loses data → filterResponseData sometimes loses data
Flags: needinfo?(kmaglione+bmo)

Sorry, I've lost all track of this. The reproducer shows the bug is still there on Nightly 2022-07-25, not sure whether the analysis is still valid or something has changed significantly since 2020. Hope to have time for another look at this in a couple weeks or so.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: