Closed Bug 1872496 Opened 6 months ago Closed 4 months ago

Webtransport: Receiving bytes on the server side from an incoming bidrectional stream

Categories

(Core :: Networking: HTTP, defect)

Firefox 118
defect

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox125 --- fixed

People

(Reporter: marten.richter, Assigned: marten.richter)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0

Steps to reproduce:

I am the author of a node.js web transport package based on library quiche (the Google variant). So far, I have had only unit tests against chromium. Today, I implemented unit tests against Firefox using Playwright.
So far, all tests except one test pass (the failing test is actually flacky).

The flaky test is:
https://github.com/fails-components/webtransport/blob/f8b411a025f1963a1824503512dc02386bc93e5c/main/test/bidirectional-streams.spec.js#L95
If it is stalled it comes only until Line 113.
On the server side looks like:
https://github.com/fails-components/webtransport/blob/a1cbd68ebc77329dc2177fa4872fae3fbeab2185/main/test/fixtures/server.js#L81
if the test is flaky, it does not proceed on line 85.

The server sends 5 times a packet of 5 bytes to the client.
The client forwards them through a pipe through back to the server for comparison.
I can actually see that if the test passes, the bytes arrive together as one 25 byte package. If it does not pass, I do not see activity on the server's UDP socket.
Currently, I assume that the UA tries to buffer the writes and concat them but misses sending them in some runs and waits forever.
(It is the same test, but seems to be a separate problem than https://bugs.chromium.org/p/chromium/issues/detail?id=1455664&q=&can=4, which I patched for chromium).
Of course, it can still be a problem on my side...

Do you have an idea what might be going on? Or where to look in the source code of firefox?

If you need more information, I will provide them.

Actual results:

The test is flacky.

Expected results:

The test should pass as on chromium.

The Bugbug bot thinks this bug should belong to the 'Core::Networking: HTTP' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Networking: HTTP
Product: Firefox → Core

I try to setup a debugger. But I failed outside of playwright, as I found no way to accept my local generated certificate for the webtransport connection. (On chromium I use a serverCertificateHashes, but this is not merged yet or is it supported in nightly?). No popup is showing up....

Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged][necko-priority-new]
Severity: S3 → --
Priority: P2 → --
Whiteboard: [necko-triaged][necko-priority-new]
Flags: needinfo?(marten.richter)

Yes, it is very different.
The wpt test uses an echo on the server side.
The unit test is very controlled, I have also comparable tests and they are not very sensitive to timing errors.
I connect the readable and writeable of a bidirectional stream on the browser side with a pipeTo.
So I have the echo on the browser side.
This test also revealed a bug in Chromium, which I identified after some debugging and patched.
The test is very sensitive to timing. I expect that it has something to do, with the timing of the webtransport stream data and the fin...
But more can only be said, if the network events are in some form traced ....

Flags: needinfo?(marten.richter)

Jesup, do you have any commentary for the purposes of triage?

Flags: needinfo?(rjesup)

I now have serverCertificateHashes working, and I started yesterday to debug.
So far, I can tell that the bug seems to be on the Firefox side. The data is received but depending if it works or does not work in http3webtransportstream.cc WriteSegments is called or not called.
So it seems that closing the incoming side closes the outgoing side of the bidirectional state, but the outstanding writes and also the fin, etc., are not written to the server. Furthermore, the stream is never cleanly closed, and the session seems to prevail over a tab refresh (so a memory or connection leak). I have also seen some illegal instruction messages in the debug window around the event, but I do not know if this is related.
I will continue to investigate.

Please exchange WriteSegments with ReadSegments.

I am coming closer.
It seems that it depends on whether the fin arrived at the relevant source code together with the data or if it took longer.
The problem seems that https://searchfox.org/mozilla-central/rev/3ce1eac61cb824c77ef564bb04a215bb7070fac0/netwerk/protocol/http/Http3Session.cpp#423 issues a close on the read side and write side simultaneously. So that the outstanding data and the fin is not sent back on the outgoing side of the stream.
At least on quiche from the chromium project, as far as I can remember, it waits for full stream closure, that the read and write side is properly closed, and then closes the stream.
I have also seen a similar mechanism in Firefox, but I have no clue why the stream is fully terminated. I will continue to investigate.

The reason why the stream is closed is this:

[Parent 23348: Socket Thread]: D/nsHttp Http3WebTransportStream::WriteSegments rv=0x80470002 countWrittenSingle=0 socketin=0 [this=1f6b8869700]

rv=0x80470002 means the base stream closed coming from the pipe stream, which caused the whole incoming web transport stream to shut down.

This one is coming from:
https://searchfox.org/mozilla-central/rev/3ce1eac61cb824c77ef564bb04a215bb7070fac0/netwerk/protocol/http/Http3WebTransportStream.cpp#454
or
https://searchfox.org/mozilla-central/rev/3ce1eac61cb824c77ef564bb04a215bb7070fac0/netwerk/protocol/http/Http3WebTransportStream.cpp#466

So there are two ways to fix this: either add a check here if the send side is closed, or later, when rv=0x80470002`arrives, only issue a close if the send side is closed.

Ok:

diff --git a/netwerk/protocol/http/Http3WebTransportStream.cpp b/netwerk/protocol/http/Http3WebTransportStream.cpp
--- a/netwerk/protocol/http/Http3WebTransportStream.cpp
+++ b/netwerk/protocol/http/Http3WebTransportStream.cpp
@@ -527,6 +527,10 @@ nsresult Http3WebTransportStream::WriteS
       if (rv == NS_BASE_STREAM_WOULD_BLOCK) {
         rv = NS_OK;
       }
+      if (rv == NS_BASE_STREAM_CLOSED) {
+        mReceiveStreamPipeOut->Close();
+        rv = NS_OK;
+      }
       again = false;
     } else if (NS_FAILED(mSocketInCondition)) {
       if (mSocketInCondition != NS_BASE_STREAM_WOULD_BLOCK) {

At least for my test case, it eliminates this bug.
(I could submit a patch, but I am working on another patch, and I have no idea how to work in parallel with hg (I am more familiar with git, but the tutorial was for hg only) I think it is better to get feedback from the developers first, I will now check if I can run the WPT tests).

When using the webtransport's stream in a pipeto, the
NS_BASE_STREAM_CLOSED is not filtered and prevents sometimes (1 out of
20) a proper closure and sending of remaining data of the bidirectional stream.
The change prevents this problem.

Assignee: nobody → marten.richter
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Pushed by archaeopteryx@coole-files.de:
https://hg.mozilla.org/integration/autoland/rev/0fcb60f34e99
WebTransport: Fix WebTransport's stream closure. r=jesup,necko-reviewers
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 125 Branch
Flags: needinfo?(rjesup)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: