Closed Bug 1416529 Opened 7 years ago Closed 6 years ago

AddressSanitizer: heap-use-after-free @ mozilla::net::Http2Session::ProcessConnectedPush()

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox-esr52

59+

fixed

firefox57

---

wontfix

firefox58

---

wontfix

firefox59

fixed

firefox60

fixed

People

(Reporter: bc, Assigned: u408661)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-uaf, sec-high, Whiteboard: [necko-triaged][post-critsmash-triage][adv-main59+][adv-esr52.7+])

Attachments

(4 files, 2 obsolete files)

Linux Debug Log 7 years ago Bob Clary [:bc] (inactive) 173.20 KB, text/plain		Details
asan debug mozilla::Runnable::Release() 7 years ago Bob Clary [:bc] (inactive) 25.32 KB, text/plain		Details
patch 7 years ago u408661 1.07 KB, patch		Details \| Diff \| Splinter Review
patch 6 years ago u408661 4.74 KB, patch	mcmanus : review+ abillings : sec-approval+	Details \| Diff \| Splinter Review
patch for beta 6 years ago u408661 4.83 KB, patch	ritu : approval-mozilla-beta+	Details \| Diff \| Splinter Review
patch for esr52 6 years ago u408661 4.66 KB, patch	ritu : approval-mozilla-esr52+	Details \| Diff \| Splinter Review

Bob Clary [:bc] (inactive)

Reporter

Description

•

7 years ago

Attached file Linux Debug Log — Details

1. https://ngodung.tk/

You may need to reload to see the crash. I use Web Developer Web Console to open a tab and then open Web Console in that tab and do
setInterval('opener.document.location.reload()', 45000)

2. heap-use-after-free: opt, debug asan Linux, Beta/57, Nightly/58. Crash Windows Debug

==4104==ERROR: AddressSanitizer: heap-use-after-free on address 0x6130002e0700 at pc 0x7f405ee795b2 bp 0x7f40510a9400 sp 0x7f40510a93f8
READ of size 8 at 0x6130002e0700 thread T7 (Socket Thread)
    #0 0x7f405ee795b1 in mozilla::net::Http2Session::ProcessConnectedPush(mozilla::net::Http2Stream*, mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*) /builds/worker/workspace/build/src/netwerk/protocol/http/Http2Session.cpp:3472:38
    #1 0x7f405ee76b42 in mozilla::net::Http2Session::WriteSegmentsAgain(mozilla::net::nsAHttpSegmentWriter*, unsigned int, unsigned int*, bool*) /builds/worker/workspace/build/src/netwerk/protocol/http/Http2Session.cpp:3004:12
    #2 0x7f405ef9f1ae in mozilla::net::nsHttpConnection::OnSocketReadable() /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnection.cpp:1954:13
    #3 0x7f405efa1288 in mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnection.cpp:2290:19
    #4 0x7f405efa16cc in non-virtual thunk to mozilla::net::nsHttpConnection::OnInputStreamReady(nsIAsyncInputStream*) /builds/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnection.cpp:2260:19
    #5 0x7f405e8d969e in mozilla::net::nsSocketInputStream::OnSocketReady(nsresult) /builds/worker/workspace/build/src/netwerk/base/nsSocketTransport2.cpp:298:19
    #6 0x7f405e8e5b1f in mozilla::net::nsSocketTransport::OnSocketReady(PRFileDesc*, short) /builds/worker/workspace/build/src/netwerk/base/nsSocketTransport2.cpp:2206:20
    #7 0x7f405e8f3ff6 in mozilla::net::nsSocketTransportService::DoPollIteration(mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>*) /builds/worker/workspace/build/src/netwerk/base/nsSocketTransportService2.cpp:1193:29
    #8 0x7f405e8f2b73 in mozilla::net::nsSocketTransportService::Run() /builds/worker/workspace/build/src/netwerk/base/nsSocketTransportService2.cpp:947:13
    #9 0x7f405e8f4a6c in non-virtual thunk to mozilla::net::nsSocketTransportService::Run() /builds/worker/workspace/build/src/netwerk/base/nsSocketTransportService2.cpp:857:27
    #10 0x7f405e6e183f in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1037:14

Windows 7 
[Child 3228, Main Thread] WARNING: Unsupported TextureClient backend type: 'aMoz2DBackend != gfx::BackendType::SKIA && aMoz2DBackend != gfx::BackendType::DIRECT2D && aMoz2DBackend != gfx::BackendType::DIRECT2D1_1', file z:/build/build/src/gfx/layers/client/TextureClient.cpp, line 1255
[Child 3228, MediaPDecoder #2] WARNING: Unimplemented function NotifyDataArrived: file z:/build/build/src/dom/media/mp3/MP3Demuxer.cpp, line 83

###!!! [Parent][MessageChannel] Error: (msgtype=0x2A000B,name=PCompositorWidget::Msg_UnobserveVsync) Channel error: cannot send/recv

Unable to read VR Path Registry from C:\Users\mozauto\AppData\Local\openvr\openvrpaths.vrpath
[GPU 4036, Main Thread] WARNING: Shutting down GPU process early due to a crash!: file z:/build/build/src/gfx/ipc/GPUParent.cpp, line 433

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: core-security → network-core-security

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Flags: needinfo?(mcmanus)

Keywords: csectype-uaf, sec-high

Patrick McManus [:mcmanus]

Updated

•

7 years ago

Flags: needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 1

•

7 years ago

nick can you jump on this?

Flags: needinfo?(hurley)

Honza Bambas (:mayhemer)

Updated

•

7 years ago

Assignee: nobody → hurley

Priority: -- → P1

Whiteboard: [necko-triaged]

Frederik Braun [:freddy]

Comment 2

•

7 years ago

Nick, any updates on this one?

u408661

Assignee

Comment 3

•

7 years ago

I've been looking at this one, and so far my main theory is that the stream is somehow getting freed without being removed from the list of push connected streams (which is why we would crash derefing the stream).

I feel like long-term we should go all-in on making stream pointers refptrs, but that's a big patch that touches a lot of places, and is risky in its own right. But I don't have a smaller, more targeted fix in mind (yet). I'm going to continue investigating that route a bit before jumping down the refptr rabbit hole.

One other thought I've had is that it could be that the entire session has been freed somehow, though I've had that thought for bug 1402014, and I haven't been able figure out how that could happen, either.

Flags: needinfo?(hurley)

u408661

Assignee

Comment 4

•

7 years ago

:bc - are you still able to reproduce this? I haven't yet been able to with an ASAN build. (I ask, because I have a theory and a patch that would fix if my theory is correct, and I'm looking for someone to test it out, since I can't do it myself.)

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 5

•

7 years ago

bughunter wasn't able to reproduce any crashes on that site using today's builds with repeated attempts. I have debug asan build on Fedora from yesterday that I am doing the setInterval reload hack. So far it hasn't reproduced either. I'll let it run for a while but it doesn't look like it is reproducible any more. I haven't confirmed it but perhaps bug 1401459 fixed this.

Flags: needinfo?(bob)

u408661

Assignee

Comment 6

•

7 years ago

Thanks Bob. Looking at the patch for 1401459, I don't think that would affect what's going on here (but perhaps I'm wrong). If my theory holds true, it's possible that this was triggered by a server bug, and if they updated the server software on the site, then it's possible it's no longer behaving in a way that would expose our bug.

I'll give this a shot with 1401459 backed out locally to see if that makes it reproducible.

Bob Clary [:bc] (inactive)

Reporter

Comment 7

•

7 years ago

Attached file asan debug mozilla::Runnable::Release() (obsolete) — Details

Finally hit a different asan error after reloading for "a while". I've seen this recently on a number of other urls but haven't gotten a rount tuit filing it.

==27056==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060001a9260 at pc 0x7fe65754e3cb bp 0x7fe645087210 sp 0x7fe645087208
READ of size 8 at 0x6060001a9260 thread T26 (Cache2 I/O)
    #0 0x7fe65754e3ca in mozilla::Runnable::Release() /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp:50:1

Maybe we can just morph this bug or I can file a new one if you like.

u408661

Assignee

Comment 8

•

7 years ago

No, don't morph this, this is definitely an issue (just a tricky one to track down, apparently). File a new one - though I feel like I've seen that signature before recently, so a quick search might be worth it before filing.

Bob Clary [:bc] (inactive)

Reporter

Comment 9

•

7 years ago

I don't have access to security bugs other than release ones. Searching for them is a waste of time. Filing them is mostly a waste since they end up as dupes. If its still around, I'll file it later.

u408661

Assignee

Comment 11

•

7 years ago

I *think* I have a better idea of what's going on here, after a brain break and looking at the code again. This could, indeed, have stopped appearing with the site in comment 0 after a server update. Here's my theory:

1. Server sends PUSH_PROMISE to us, potentially along with some response HEADERS and DATA (though that part isn't necessary here). It is, however, key that it does not send us *all* the DATA.
2. We generate a transaction that matches with the pushed stream, and start getting data from it through a synthetic pulled stream, as usual.
2. Server sends GOAWAY to us (the last-good-id in the GOAWAY doesn't really matter, but let's assume it's lower than the promised ID from step 1). This causes us to remove the synthetic pull stream created in step 2 (as it doesn't have stream ID, and thus matches our criteria for "transactions we can restart after GOAWAY"). That stream gets deleted, but never lets its push source know about that.
3. Server sends us more DATA on the pushed stream, which now has a dangling pointer to the freed consumer stream, and calls session->ConnectPushedStream(danglingPointer).
4. Next WriteSegments cycle, the server sees danglingPointer in mPushesReadyForRead, and happily tries to call danglingPointer->WriteSegments (though the stack seen in this and the duplicate bug).
5. Boom.

Long-term, the solution here is to stop using raw pointers. However, that's a hairy patch that will take a lot of time to get right. Short-term, we can ensure in step 2 that the stream that gets deleted first removes references of itself from the push source.

Patch should be coming sometime today.

u408661

Assignee

Comment 12

•

7 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Attachment #8948023 - Flags: review?(mcmanus)

u408661

Assignee

Updated

•

7 years ago

Attachment #8945193 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Comment 13

•

7 years ago

I'm a little confused, but I think I've worked this out.. I just want to reword it all to make sure you and I understand the problem and the solution the same way.. I agree its tricky.

The immediate cause of the crash is that there is a stale pointer in the mPushesReadyForRead queue... the pointer gets its vtable de'refd and that causes the access error. The streams in that queue are pull streams connected to push streams.

When a stream is closed it is scrubbed from that queue. So the stale entry must have been added after the stream was closed (and then presumably deleted). There are two ways to get into the queue - one is to add yourself (which we're going to presume a closed stream isn't doing), and the other is from http2pushedstream::connectpushstream() which adds its corresponding pull stream when there is some data around to connect. I can totally believe that is happening here after the pull stream has been paired up with the push and then closed.

the trigger on closing the pull stream could come from something as simple as session::closeTransaction that would get called on a nav event if a pair was still in progress.

your fix is that when the pull stream is closed, it sets tells its push stream to break the pairing.. and that means in the future http2pushedstream::connectpushstream() won't add the pullstream to the queue.

fwiw the session code tried to do something similar when it called cleanupstream() on the pull stream (see the setconsumerstream(nullptr) in there) but I bet it returned early due to DeferCleanup().

did I get that right? good sleuthing nick.

If that's right I would make a couple tiny changes.. your change should probably also be called from the http2stream dtor just in case. and #2 - I'm not so sure about calling pushSource->close() from the pull stream. that's not a pattern streams have with each other and I'm concerned about what happens if http2session isn't in the loop. unless you think its the right thing to do I think the patch works fine without the close.

I also think that bits that check the pushSource in session::cleanupstream can be moved to be before the defer cleanup check. I think its ok to break that pairing even if we're going to defer actual closing.

Flags: needinfo?(hurley)

Patrick McManus [:mcmanus]

Updated

•

7 years ago

Attachment #8948023 - Flags: review?(mcmanus)

u408661

Assignee

Comment 14

•

6 years ago

Yep, that's exactly what I'm thinking. Good times.

Adding the setconsumerstream(nullptr) to the dtor is good, I'll do that. And I think it's fine to not close() the push from the pull stream. I'm a little less bullish on moving the pushSource check in cleanupstream before the defer check, so unless you're really set on that, I think I'll leave it out for now.

Now that I'm thinking about it... does it really seem necessary to put in the setconsumerstream(nullptr) I've added here in addition to adding one in the dtor? I would think the centralized (dtor) version should be enough for our purposes here, and the other just seems a little extra code for no real gain (since the dtor gets called almost immediately afterwards, anyway).

Flags: needinfo?(hurley) → needinfo?(mcmanus)

Patrick McManus [:mcmanus]

Comment 15

•

6 years ago

I would make the change in cleanupStream.. I think that code is actually the real bug here and is probably the only change you need to make (cleanupStream was called before the ->Close() you're adding code to) though the ones you propose are sensible.

I would do it both in the close and the dtor.. from a state perspective you don't want the pull stream being scheduled after the close, but its lifetime is really controlled by that session hashtable and I don't feel comfortably asserting those things close together in time all the time.

The other possibility if this doesn't resolve the problem is to make mPushesReadyForRead a queue of weak pointers.. http2stream already implements weak pointer and there are only a few call sites that deal with that queue so it wouldn't be hard.

Flags: needinfo?(mcmanus)

u408661

Assignee

Comment 16

•

6 years ago

Attached patch patch — Details — Splinter Review

Attachment #8949245 - Flags: review?(mcmanus)

u408661

Assignee

Updated

•

6 years ago

Attachment #8948023 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Updated

•

6 years ago

Attachment #8949245 - Flags: review?(mcmanus) → review+

u408661

Assignee

Comment 17

•

6 years ago

Comment on attachment 8949245 [details] [diff] [review]
patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch? Unclear. The patch makes it very clear that we're mishandling the closing of streams connected to pushes in some cases, however the details of the actual sequence of events required to hit the bug aren't immediately obvious (which is why it took me a while to figure this one out).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? Nope.

Which older supported branches are affected by this flaw? All.

If not all supported branches, which bug introduced the flaw?

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Not yet, they should be effectively the same (perhaps minus some line fuzz) and no more risky than the current patch. I'll get those put together before landing.

How likely is this patch to cause regressions; how much testing does it need? Not very likely - this is all code around when we're closing an h2 session down. Just some manual testing to make sure there are no new crashes against an existing h2 server (gmail or youtube, for example) should be fine.

It's worth noting that while I'm fairly confident in my assessment of the issue and the way to fix it, since I've never been able to reproduce the crash, I can't promise 100% that the crash will be fixed. 99% sure, though :)

Attachment #8949245 - Flags: sec-approval?

Al Billings [:abillings - ex-MoCo]

Comment 18

•

6 years ago

sec-approval+ for trunk.
Please nominate beta (59) and ESR52 patches as well to land after it is on trunk.

status-firefox57: affected → wontfix

status-firefox58: affected → wontfix

status-firefox59: --- → affected

status-firefox60: --- → affected

status-firefox-esr52: --- → affected

tracking-firefox59: --- → +

tracking-firefox60: --- → +

tracking-firefox-esr52: --- → 59+

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Attachment #8949245 - Flags: sec-approval? → sec-approval+

u408661

Assignee

Comment 19

•

6 years ago

Attached patch patch for beta — Details — Splinter Review

u408661

Assignee

Comment 20

•

6 years ago

Attached patch patch for esr52 — Details — Splinter Review

u408661

Assignee

Comment 21

•

6 years ago

remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/2ef6e47e10879a0b4476871e3abf8b353282134e

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 22

•

6 years ago

https://hg.mozilla.org/mozilla-central/rev/2ef6e47e1087

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox60: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

u408661

Assignee

Comment 23

•

6 years ago

Comment on attachment 8949803 [details] [diff] [review]
patch for beta

Approval Request Comment
[Feature/Bug causing the regression]: http/2
[User impact if declined]: Potential UAF crashes (these have been seen in the wild)
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Not yet.
[Needs manual test from QE? If yes, steps to reproduce]: You need a server that behaves in a particular way. Right now, we don't have any information to reliably reproduce those. Bug, having some testing against known http/2 servers (google, youtube, anything hosted on cloudflare or fastly) to ensure no new crashes might be good.
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Not particularly
[Why is the change risky/not risky?]: Reorders some cleanup to prevent an accidental placing of an invalid pointer into a queue for later use.
[String changes made/needed]: None

Attachment #8949803 - Flags: approval-mozilla-beta?

u408661

Assignee

Comment 24

•

6 years ago

Comment on attachment 8949804 [details] [diff] [review]
patch for esr52

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's sec-high.
User impact if declined: Potential UAF crashes
Fix Landed on Version: 60 (current nightly)
Risk to taking this patch (and alternatives if risky): Low risk. No alternatives.
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #8949804 - Flags: approval-mozilla-esr52?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 25

•

6 years ago

Comment on attachment 8949803 [details] [diff] [review]
patch for beta

Sec-high, Beta59+

Attachment #8949803 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 26

•

6 years ago

Comment on attachment 8949804 [details] [diff] [review]
patch for esr52

ESR52.7+

Attachment #8949804 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 27

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/8463e0614769

status-firefox59: affected → fixed

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: network-core-security → core-security-release

Ryan VanderMeulen [:RyanVM]

Comment 28

•

6 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/f87ef3774d5e

status-firefox-esr52: affected → fixed

Matt Wobensmith [:mwobensmith][:matt:]

Updated

•

6 years ago

Flags: qe-verify-

Whiteboard: [necko-triaged] → [necko-triaged][post-critsmash-triage]

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: [necko-triaged][post-critsmash-triage] → [necko-triaged][post-critsmash-triage][adv-main59+][adv-esr52.7+]

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

Sylvestre Ledru [:Sylvestre]

Updated

•

4 years ago

Blocks: asan-maintenance

You need to log in before you can comment on or make changes to this bug.