-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
extmod/modlwip.c: Fix crash when calling recv on listening socket #17333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Code size report:
|
583b17e
to
210f5c9
Compare
@@ -0,0 +1,39 @@ | |||
# Test recv on listening socket after accept without saving connection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless I'm missing something subtle, this test looks identical to the existing tcp_accept_recv.py
test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I referenced that unit test when adding the one here but for some reason thought it was performing recv on the correct accepted socket rather than directly on the original socket.
I just ran the test between cpython and a rpi pico_2W and it doesn't hard fault reset, but nor does it raise an exception, it never receives anything and hangs.
But then I test again with pyb-d on current master and it passes multi_net/tcp_accept_recv.py
but reliably crashes on multi_net/tcp_accept_recv_listen.py
Turns out yeah it's rather subtle
s.bind(socket.getaddrinfo("0.0.0.0", PORT)[0][-1])
- s.listen()
+ s.listen(1)
multitest.next()
With 1
it all passes. Without (or with 2
etc...) it crashes - aka if the original socket is still listening for new incomming connections it crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now consolidating the two tests, modding the original to expose the bug and re-tested before and after with pyb-d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you combined the two tests into the original? Did you forget to push the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry yes, turns out git had complained about me missing the -f
on the push
210f5c9
to
bdb6296
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #17333 +/- ##
=======================================
Coverage 98.57% 98.57%
=======================================
Files 169 169
Lines 21968 21968
=======================================
Hits 21654 21654
Misses 314 314 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
bdb6296
to
22471c6
Compare
tests/multi_net/tcp_accept_recv.py
Outdated
@@ -11,13 +11,14 @@ def instance0(): | |||
s = socket.socket() | |||
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | |||
s.bind(socket.getaddrinfo("0.0.0.0", PORT)[0][-1]) | |||
s.listen(1) | |||
s.listen() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might change the original test with 1 as an argument.
Maybe it's better to do a loop here to perform both tests (with no argument, and with an argument of 1, and then maybe some other values like 0 for good measure).
Also need the client to do a loop as well, and use multitest.wait()
and multitest.broadcast()
to sync with the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've got an updated looping test just pushed that passes on the pyb-d with this patch applied.
22471c6
to
baaf54f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM! I ran the new unit test between PYBD_SF2 (fix applies) and ESP32_GENERIC_S3 (fix doesn't apply) and noted no issues.
I believe this also fixes #16697 - I can't reproduce the crash with test code from the issue any more, and it makes sense that this would be the root cause. So that's good news! EDIT: have added this to the description so that it creates the "close on merge" linkage.
I've run this updated test on a PYBD_SF6 and RPI_PICO2_W, and it passes without the fix from this PR. Not sure what to make of that... @andrewleech can you please rerun the test here without the fix and see if it also passes for you? |
Ah I missed that one, yes I do believe it should be essentially the same bug. The uninitialised pbuf was the same problem I found when C debugging this hard fault. |
That's a bit strange, I'll try to test again to ensure the test is hitting the actual problem. |
baaf54f
to
a8bba8c
Compare
There was a important
and the pyboard-d resets / hardfaults. With the patch applied I get:
|
Oops, good catch @dpgeorge . Thanks @andrewleech for finding the root cause. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this works now. Tested without the fix and the new test crashes PYBD_SF6 and RPI_PICO2_W. With the fix in this PR the test passes.
a8bba8c
to
cf751cf
Compare
I was just about to merge this when I realised that it's not handling the non-blocking case correctly. The new check added in this PR should go at the top of I think the non-blocking version of the test should also be added to the test, as an additional combination. So, it'll have a loop of 8 different tests (4 in blocking mode, 4 in non-blocking mode). Add (@andrewleech sorry but I force pushed to your branch, so be sure to pull --rebase) |
e333d72
to
dd7888a
Compare
Great catch @dpgeorge I've updated as suggested and re-tested:
|
Add check to prevent calling recv on a socket in the listening state. This prevents a crash/hard fault when user code mistakenly tries to recv on the listening socket instead of on the accepted connection. Add corresponding test case to demonstrate the bug. Signed-off-by: Andrew Leech <[email protected]>
dd7888a
to
2f04381
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I retested on PYBD_SF6, with the test here, both before and after the fix in this PR. Test fails without the fix and passes with the fix.
Summary
Fix an issue where calling
recv()
on a listening socket afteraccept()
would cause a hard fault crash onMicroPython hardware. This happens when user code incorrectly tries to receive data on the listening socket
instead of the accepted connection.
With this fix, MicroPython now properly raises an
OSError
witherrno=107
(ENOTCONN) similar to CPython,instead of crashing. This makes the behavior consistent with standard socket implementations and improves error
handling.
The bug is demonstrated by the following code pattern:
Once data is sent to this socket from
The first device crashes with a hard fault.
Closes #16697 (the root cause of the
wrap_socket()
crash is calling recv on a listening socket.)Testing
Added a new test file tests/multi_net/tcp_accept_recv_listen.py that demonstrates the fix by intentionally
triggering the error condition and checking for the appropriate exception.
Testing was performed against a pyboard-d (WIFI):
The test confirms that a proper OSError is raised with the correct errno value (107/ENOTCONN) instead of
crashing.
While this bug was found / fixed manually the unit test and documentation was created with AI support from Claude Code.