Closed Bug 1416529 Opened 7 years ago Closed 6 years ago

AddressSanitizer: heap-use-after-free @ mozilla::net::Http2Session::ProcessConnectedPush()

Categories

(Core :: Networking: HTTP, defect, P1)

58 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 59+ fixed
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 + fixed
firefox60 + fixed

People

(Reporter: bc, Assigned: u408661)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-uaf, sec-high, Whiteboard: [necko-triaged][post-critsmash-triage][adv-main59+][adv-esr52.7+])

Attachments

(4 files, 2 obsolete files)

Attached file Linux Debug Log
1. https://ngodung.tk/

You may need to reload to see the crash. I use Web Developer Web Console to open a tab and then open Web Console in that tab and do
setInterval('opener.document.location.reload()', 45000)

2. heap-use-after-free: opt, debug asan Linux, Beta/57, Nightly/58. Crash Windows Debug

==4104==ERROR: AddressSanitizer: heap-use-after-free on address 0x6130002e0700 at pc 0x7f405ee795b2 bp 0x7f40510a9400 sp 0x7f40510a93f8
READ of size 8 at 0x6130002e0700 thread T7 (Socket Thread)
    #0 0x7f405ee795b1 in mozilla::net::Http2Session::ProcessConnectedPush(mozilla::net::Http2Stream*, mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) /builds/worker/workspace/build/src/netwerk/protocol/http/Http2Session.cpp:3472:38
    #1 0x7f405ee76b42 in mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) /builds/worker/workspace/build/src/netwerk/protocol/http/Http2Session.cpp:3004:12
    #2 0x7f405ef9f1ae in mozilla::net::nsHttpConnection::OnSocketReadable() /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnection.cpp:1954:13
    #3 0x7f405efa1288 in mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnection.cpp:2290:19
    #4 0x7f405efa16cc in non-virtual thunk to mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnection.cpp:2260:19
    #5 0x7f405e8d969e in mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) /builds/worker/workspace/build/src/netwerk/base/nsSocketTransport2.cpp:298:19
    #6 0x7f405e8e5b1f in mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) /builds/worker/workspace/build/src/netwerk/base/nsSocketTransport2.cpp:2206:20
    #7 0x7f405e8f3ff6 in mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) /builds/worker/workspace/build/src/netwerk/base/nsSocketTransportService2.cpp:1193:29
    #8 0x7f405e8f2b73 in mozilla::net::nsSocketTransportService::Run() /builds/worker/workspace/build/src/netwerk/base/nsSocketTransportService2.cpp:947:13
    #9 0x7f405e8f4a6c in non-virtual thunk to mozilla::net::nsSocketTransportService::Run() /builds/worker/workspace/build/src/netwerk/base/nsSocketTransportService2.cpp:857:27
    #10 0x7f405e6e183f in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1037:14

Windows 7 
[Child 3228, Main Thread] WARNING: Unsupported TextureClient backend type: 'aMoz2DBackend != gfx::BackendType::SKIA && aMoz2DBackend != gfx::BackendType::DIRECT2D && aMoz2DBackend != gfx::BackendType::DIRECT2D1_1', file z:/build/build/src/gfx/layers/client/TextureClient.cpp, line 1255
[Child 3228, MediaPDecoder #2] WARNING: Unimplemented function NotifyDataArrived: file z:/build/build/src/dom/media/mp3/MP3Demuxer.cpp, line 83

###!!! [Parent][MessageChannel] Error: (msgtype=0x2A000B,name=PCompositorWidget::Msg_UnobserveVsync) Channel error: cannot send/recv

Unable to read VR Path Registry from C:\Users\mozauto\AppData\Local\openvr\openvrpaths.vrpath
[GPU 4036, Main Thread] WARNING: Shutting down GPU process early due to a crash!: file z:/build/build/src/gfx/ipc/GPUParent.cpp, line 433
Group: core-security → network-core-security
Flags: needinfo?(mcmanus)
Flags: needinfo?(mcmanus)
nick can you jump on this?
Flags: needinfo?(hurley)
Assignee: nobody → hurley
Priority: -- → P1
Whiteboard: [necko-triaged]
Nick, any updates on this one?
I've been looking at this one, and so far my main theory is that the stream is somehow getting freed without being removed from the list of push connected streams (which is why we would crash derefing the stream).

I feel like long-term we should go all-in on making stream pointers refptrs, but that's a big patch that touches a lot of places, and is risky in its own right. But I don't have a smaller, more targeted fix in mind (yet). I'm going to continue investigating that route a bit before jumping down the refptr rabbit hole.

One other thought I've had is that it could be that the entire session has been freed somehow, though I've had that thought for bug 1402014, and I haven't been able figure out how that could happen, either.
Flags: needinfo?(hurley)
:bc - are you still able to reproduce this? I haven't yet been able to with an ASAN build. (I ask, because I have a theory and a patch that would fix if my theory is correct, and I'm looking for someone to test it out, since I can't do it myself.)
Flags: needinfo?(bob)
bughunter wasn't able to reproduce any crashes on that site using today's builds with repeated attempts. I have debug asan build on Fedora from yesterday that I am doing the setInterval reload hack. So far it hasn't reproduced either. I'll let it run for a while but it doesn't look like it is reproducible any more. I haven't confirmed it but perhaps bug 1401459 fixed this.
Flags: needinfo?(bob)
Thanks Bob. Looking at the patch for 1401459, I don't think that would affect what's going on here (but perhaps I'm wrong). If my theory holds true, it's possible that this was triggered by a server bug, and if they updated the server software on the site, then it's possible it's no longer behaving in a way that would expose our bug.

I'll give this a shot with 1401459 backed out locally to see if that makes it reproducible.
Finally hit a different asan error after reloading for "a while". I've seen this recently on a number of other urls but haven't gotten a rount tuit filing it.

==27056==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060001a9260 at pc 0x7fe65754e3cb bp 0x7fe645087210 sp 0x7fe645087208
READ of size 8 at 0x6060001a9260 thread T26 (Cache2 I/O)
    #0 0x7fe65754e3ca in mozilla::Runnable::Release() /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:50:1

Maybe we can just morph this bug or I can file a new one if you like.
No, don't morph this, this is definitely an issue (just a tricky one to track down, apparently). File a new one - though I feel like I've seen that signature before recently, so a quick search might be worth it before filing.
I don't have access to security bugs other than release ones. Searching for them is a waste of time. Filing them is mostly a waste since they end up as dupes. If its still around, I'll file it later.
I *think* I have a better idea of what's going on here, after a brain break and looking at the code again. This could, indeed, have stopped appearing with the site in comment 0 after a server update. Here's my theory:

1. Server sends PUSH_PROMISE to us, potentially along with some response HEADERS and DATA (though that part isn't necessary here). It is, however, key that it does not send us *all* the DATA.
2. We generate a transaction that matches with the pushed stream, and start getting data from it through a synthetic pulled stream, as usual.
2. Server sends GOAWAY to us (the last-good-id in the GOAWAY doesn't really matter, but let's assume it's lower than the promised ID from step 1). This causes us to remove the synthetic pull stream created in step 2 (as it doesn't have stream ID, and thus matches our criteria for "transactions we can restart after GOAWAY"). That stream gets deleted, but never lets its push source know about that.
3. Server sends us more DATA on the pushed stream, which now has a dangling pointer to the freed consumer stream, and calls session->ConnectPushedStream(danglingPointer).
4. Next WriteSegments cycle, the server sees danglingPointer in mPushesReadyForRead, and happily tries to call danglingPointer->WriteSegments (though the stack seen in this and the duplicate bug).
5. Boom.

Long-term, the solution here is to stop using raw pointers. However, that's a hairy patch that will take a lot of time to get right. Short-term, we can ensure in step 2 that the stream that gets deleted first removes references of itself from the push source.

Patch should be coming sometime today.
Attached patch patch (obsolete) — Splinter Review
Attachment #8948023 - Flags: review?(mcmanus)
Attachment #8945193 - Attachment is obsolete: true
I'm a little confused, but I think I've worked this out.. I just want to reword it all to make sure you and I understand the problem and the solution the same way.. I agree its tricky.

The immediate cause of the crash is that there is a stale pointer in the mPushesReadyForRead queue... the pointer gets its vtable de'refd and that causes the access error. The streams in that queue are pull streams connected to push streams.

When a stream is closed it is scrubbed from that queue. So the stale entry must have been added after the stream was closed (and then presumably deleted). There are two ways to get into the queue - one is to add yourself (which we're going to presume a closed stream isn't doing), and the other is from http2pushedstream::connectpushstream() which adds its corresponding pull stream when there is some data around to connect. I can totally believe that is happening here after the pull stream has been paired up with the push and then closed.

the trigger on closing the pull stream could come from something as simple as session::closeTransaction that would get called on a nav event if a pair was still in progress.

your fix is that when the pull stream is closed, it sets tells its push stream to break the pairing.. and that means in the future http2pushedstream::connectpushstream() won't add the pullstream to the queue.

fwiw the session code tried to do something similar when it called cleanupstream() on the pull stream (see the setconsumerstream(nullptr) in there) but I bet it returned early due to DeferCleanup().

did I get that right? good sleuthing nick.

If that's right I would make a couple tiny changes.. your change should probably also be called from the http2stream dtor just in case. and #2 - I'm not so sure about calling pushSource->close() from the pull stream. that's not a pattern streams have with each other and I'm concerned about what happens if http2session isn't in the loop. unless you think its the right thing to do I think the patch works fine without the close.

I also think that bits that check the pushSource in session::cleanupstream can be moved to be before the defer cleanup check. I think its ok to break that pairing even if we're going to defer actual closing.
Flags: needinfo?(hurley)
Attachment #8948023 - Flags: review?(mcmanus)
Yep, that's exactly what I'm thinking. Good times.

Adding the setconsumerstream(nullptr) to the dtor is good, I'll do that. And I think it's fine to not close() the push from the pull stream. I'm a little less bullish on moving the pushSource check in cleanupstream before the defer check, so unless you're really set on that, I think I'll leave it out for now.

Now that I'm thinking about it... does it really seem necessary to put in the setconsumerstream(nullptr) I've added here in addition to adding one in the dtor? I would think the centralized (dtor) version should be enough for our purposes here, and the other just seems a little extra code for no real gain (since the dtor gets called almost immediately afterwards, anyway).
Flags: needinfo?(hurley) → needinfo?(mcmanus)
I would make the change in cleanupStream.. I think that code is actually the real bug here and is probably the only change you need to make (cleanupStream was called before the ->Close() you're adding code to) though the ones you propose are sensible.

I would do it both in the close and the dtor.. from a state perspective you don't want the pull stream being scheduled after the close, but its lifetime is really controlled by that session hashtable and I don't feel comfortably asserting those things close together in time all the time.

The other possibility if this doesn't resolve the problem is to make mPushesReadyForRead a queue of weak pointers.. http2stream already implements weak pointer and there are only a few call sites that deal with that queue so it wouldn't be hard.
Flags: needinfo?(mcmanus)
Attached patch patchSplinter Review
Attachment #8949245 - Flags: review?(mcmanus)
Attachment #8948023 - Attachment is obsolete: true
Attachment #8949245 - Flags: review?(mcmanus) → review+
Comment on attachment 8949245 [details] [diff] [review]
patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Unclear. The patch makes it very clear that we're mishandling the closing of streams connected to pushes in some cases, however the details of the actual sequence of events required to hit the bug aren't immediately obvious (which is why it took me a while to figure this one out).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Nope.

Which older supported branches are affected by this flaw? All.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Not yet, they should be effectively the same (perhaps minus some line fuzz) and no more risky than the current patch. I'll get those put together before landing.

How likely is this patch to cause regressions; how much testing does it need? Not very likely - this is all code around when we're closing an h2 session down. Just some manual testing to make sure there are no new crashes against an existing h2 server (gmail or youtube, for example) should be fine.

It's worth noting that while I'm fairly confident in my assessment of the issue and the way to fix it, since I've never been able to reproduce the crash, I can't promise 100% that the crash will be fixed. 99% sure, though :)
Attachment #8949245 - Flags: sec-approval?
sec-approval+ for trunk.
Please nominate beta (59) and ESR52 patches as well to land after it is on trunk.
Attachment #8949245 - Flags: sec-approval? → sec-approval+
Attached patch patch for betaSplinter Review
Attached patch patch for esr52Splinter Review
https://hg.mozilla.org/mozilla-central/rev/2ef6e47e1087
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Comment on attachment 8949803 [details] [diff] [review]
patch for beta

Approval Request Comment
[Feature/Bug causing the regression]: http/2
[User impact if declined]: Potential UAF crashes (these have been seen in the wild)
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Not yet.
[Needs manual test from QE? If yes, steps to reproduce]: You need a server that behaves in a particular way. Right now, we don't have any information to reliably reproduce those. Bug, having some testing against known http/2 servers (google, youtube, anything hosted on cloudflare or fastly) to ensure no new crashes might be good.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Not particularly
[Why is the change risky/not risky?]: Reorders some cleanup to prevent an accidental placing of an invalid pointer into a queue for later use.
[String changes made/needed]: None
Attachment #8949803 - Flags: approval-mozilla-beta?
Comment on attachment 8949804 [details] [diff] [review]
patch for esr52

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's sec-high.
User impact if declined: Potential UAF crashes
Fix Landed on Version: 60 (current nightly)
Risk to taking this patch (and alternatives if risky): Low risk. No alternatives.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8949804 - Flags: approval-mozilla-esr52?
Comment on attachment 8949803 [details] [diff] [review]
patch for beta

Sec-high, Beta59+
Attachment #8949803 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8949804 [details] [diff] [review]
patch for esr52

ESR52.7+
Attachment #8949804 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Group: network-core-security → core-security-release
Flags: qe-verify-
Whiteboard: [necko-triaged] → [necko-triaged][post-critsmash-triage]
Whiteboard: [necko-triaged][post-critsmash-triage] → [necko-triaged][post-critsmash-triage][adv-main59+][adv-esr52.7+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: