Closed
Bug 1342823
Opened 7 years ago
Closed 7 years ago
Crash in RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom
Categories
(Core :: DOM: Animation, defect)
Core
DOM: Animation
Tracking
()
People
(Reporter: marcia, Assigned: bzbarsky)
References
Details
(4 keywords, Whiteboard: [adv-main53+][adv-esr52.1+][adv-esr45.9+][post-critsmash-triage])
Crash Data
Attachments
(6 files, 2 obsolete files)
2.57 KB,
patch
|
Details | Diff | Splinter Review | |
3.17 KB,
patch
|
farre
:
review+
abillings
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
12.86 KB,
patch
|
farre
:
review+
abillings
:
approval-mozilla-aurora+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
14.78 KB,
patch
|
farre
:
review+
abillings
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
14.57 KB,
patch
|
farre
:
review+
jcristau
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
14.12 KB,
patch
|
farre
:
review+
jcristau
:
approval-mozilla-esr45+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-74aa902b-d29f-4f91-bc08-73ba42170226. ============================================================= Seen while looking at nightly crash stats - crashes started using the 20170218030212 build: http://bit.ly/2lK81v3. Majority of crashes are on Windows 10.
Comment 2•7 years ago
|
||
Oh, I misread the summary. Olli, any thoughts on this (or, if not, where it should live?)?
Component: DOM → DOM: Animation
Flags: needinfo?(afarre) → needinfo?(bugs)
Comment 3•7 years ago
|
||
Is this a new crash? There was similar fixed very recently I think. Searching...
Comment 4•7 years ago
|
||
I'd say this is bug 1335450. Marcia, are there more recent crashes with the same stack?
Flags: needinfo?(bugs) → needinfo?(mozillamarcia.knous)
Comment 5•7 years ago
|
||
Oh, or is this a regression from that bug, if this started 20170218030212.
Flags: needinfo?(bzbarsky)
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Olli Pettay [:smaug] (review request overload, closing the queue for a day or two) from comment #4) > I'd say this is bug 1335450. > Marcia, are there more recent crashes with the same stack? looks as if there are few with Build 20170227030203
Flags: needinfo?(mozillamarcia.knous)
![]() |
Assignee | |
Comment 7•7 years ago
|
||
Did the crashes start then, or just continue? Per discussion in bug 1335450 the patch for it should be more or less a no-op on trunk... but that still leaves the question of what's causing the crashes on trunk open. :( Maybe we should just give up and refcount these pointers in the refresh driver. Then failure to remove ourselves will be a leak instead of a crash, of course...
Group: dom-core-security
Flags: needinfo?(bzbarsky)
Comment 8•7 years ago
|
||
Would use of nsWeakPtr be to slow?
Comment 9•7 years ago
|
||
But, if this is a regression, regression window would be good.
Keywords: regressionwindow-wanted
Updated•7 years ago
|
Crash Signature: [@ RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom] → [@ RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom] [@ RefPtr<T>::RefPtr<T> | nsTArray_Impl<T>::AppendElement<T> | TakeFrameRequestCallbacksFrom]
Comment 11•7 years ago
|
||
Quoting myself from bug 1342411 comment 3: The spike seems to start around Dec 14 2016, which is very close to the release of 50.1.0 on 2016-12-13. I can't find any crashes in 50.0* for example. I think this is the list of changes that comprise 50.1.0: https://hg.mozilla.org/releases/mozilla-release/pushloghtml?fromchange=FIREFOX_50_0_2_RELEASE&tochange=FIREFOX_50_1_0_RELEASE
Comment 12•7 years ago
|
||
Fwiw, the vast majority of crashes are on 32-bit: 414 (64-bit) vs 5563 (32-bit) in the past 6 months. 100% of crashes are on Windows for the first two signatures. I did find a similar signature for Linux though which is probably this bug: bp-12243d70-2761-4dc1-a7d3-7532b2170301
OS: Windows 8 → All
Hardware: Unspecified → All
Updated•7 years ago
|
Crash Signature: [@ RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom] [@ RefPtr<T>::RefPtr<T> | nsTArray_Impl<T>::AppendElement<T> | TakeFrameRequestCallbacksFrom] → [@ RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom] [@ RefPtr<T>::RefPtr<T> | nsTArray_Impl<T>::AppendElement<T> | TakeFrameRequestCallbacksFrom] [@ nsCOMPtr<T>::nsCOMPtr | TakeFrameRequestCallbacksFrom ]
Comment 13•7 years ago
|
||
I strongly suspect that one of the nsIDocument raw pointers in mThrottledFrameRequestCallbackDocs have been destroyed: http://searchfox.org/mozilla-central/rev/9c1c7106eef137e3413fd867fc1ddfb1d3f6728c/layout/base/nsRefreshDriver.h#480 When create a nsCOMPtr<nsIDocument> for it, we probably crash trying to increment its ref count: http://searchfox.org/mozilla-central/rev/9c1c7106eef137e3413fd867fc1ddfb1d3f6728c/layout/base/nsRefreshDriver.cpp#1552
Comment 14•7 years ago
|
||
Yeah, I don't see anything that easily guarantees that the raw pointer is removed from that array when the document is destroyed. Would it be so bad to just remove it in the dtor for nsIDocument? Or failing that, make those arrays strong references?
![]() |
Assignee | |
Comment 15•7 years ago
|
||
Can we please avoid dumping multiple issues together? 50.1 suffers from the obvious issue we fixed in bug 1335450. But the crashes on nightly are much more confusing... This bug is specifically about the nightly crashes, or at least I would prefer it to be. > I don't see anything that easily guarantees that the raw pointer is removed from that array > when the document is destroyed. Well, the invariants around when we add/remove ourselves should guarantee it. That is, by the time a document is being destroyed it should have no script global object, no presshell, etc, and any of those things going away should generally remove from the list. That said, the logic around that is a bit complicated, and we're clearly screwing it up somewhere, given the nightly crashes. :( > Would it be so bad to just remove it in the dtor for nsIDocument? Remove from where? By the time ~nsIDocument is running mPresShell had better be null (because the presshell holds a strong ref to the document; see <http://searchfox.org/mozilla-central/rev/9c1c7106eef137e3413fd867fc1ddfb1d3f6728c/layout/base/nsIPresShell.h#1783>). That means we can't reach the object to remove ourselves from. > Or failing that, make those arrays strong references? That's certainly doable. Then this crash will turn into a leak...
![]() |
Assignee | |
Comment 16•7 years ago
|
||
And I strongly urge anyone willing to spend time on this to read bug 1335450 comment 7. That describes the invariants we are trying to maintain and my audit of whether we are maintaining those invariants. Again, clearly I'm missing _something_ if we still have crashes on trunk. But I can't figure out what, and really would appreciate a second set of eyes on this...
Comment 17•7 years ago
|
||
> Can we please avoid dumping multiple issues together? 50.1 suffers from the obvious issue we fixed in bug 1335450. Sorry about that. I came in from the duped bug and didn't immediately notice this was already fixed for the most part. > This bug is specifically about the nightly crashes Absolutely.
Updated•7 years ago
|
status-firefox55:
--- → affected
![]() |
Assignee | |
Comment 18•7 years ago
|
||
> Would use of nsWeakPtr be to slow?
Probably not, though we should measure....
Comment 19•7 years ago
|
||
Andreas, can you take a look here? See comment 16 and we're *hoping* this just requires another person to take a careful look.
Flags: needinfo?(afarre)
Updated•7 years ago
|
Assignee: nobody → afarre
Flags: needinfo?(afarre)
Comment 20•7 years ago
|
||
If you can't figure out what is going wrong, you could add a bool flag that tracks if the nsIDocument is in the callback list and crash if it still is in the dtor. That would at least defang the security issue, and might provide some insight into the actual problem by giving an earlier stack.
Comment 21•7 years ago
|
||
I need access to bug 1335450, can anyone help me there?
Flags: needinfo?(overholt)
![]() |
||
Comment 22•7 years ago
|
||
(In reply to Andreas Farre [:farre] from comment #21) > I need access to bug 1335450, can anyone help me there? You are now cc'd.
Flags: needinfo?(overholt)
Updated•7 years ago
|
Crash Signature: [@ RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom] [@ RefPtr<T>::RefPtr<T> | nsTArray_Impl<T>::AppendElement<T> | TakeFrameRequestCallbacksFrom] [@ nsCOMPtr<T>::nsCOMPtr | TakeFrameRequestCallbacksFrom ] → [@ RefPtr<T>::RefPtr<T> | TakeFrameRequestCallbacksFrom] [@ RefPtr<T>::RefPtr<T> | nsTArray_Impl<T>::AppendElement<T> | TakeFrameRequestCallbacksFrom] [@ nsCOMPtr<T>::nsCOMPtr | TakeFrameRequestCallbacksFrom ] [@ nsTArray_Impl<T>::AppendElement<T> | TakeF…
Comment 24•7 years ago
|
||
Note that the additional signature from bug 1349110 is all EXEC crashes.... :-(
Comment 25•7 years ago
|
||
Ok, found something that might be a lead. After sampling some crash reports it turns out that everything seems to come from ticking throttled rAFs.
Comment 26•7 years ago
|
||
(In reply to Andreas Farre [:farre] from comment #25) > Ok, found something that might be a lead. After sampling some crash reports > it turns out that everything seems to come from ticking throttled rAFs. Well, I found some coming from the vsync path as well, but sofar only on linux.
Comment 27•7 years ago
|
||
I've been banging my head against this for a couple of days now, and I cannot for the life of me see how this happens. Boris comment about invariants holds up to my scrutiny, so it feels like something else is going on. I've attached the patch that turns the arrays into strong references, and it is very small and the period where we'd leak a document is also limited, since actually running a rAF callback will remove the document. Any thoughts about this?
![]() |
Assignee | |
Comment 28•7 years ago
|
||
> and the period where we'd leak a document is also limited, since actually running
> a rAF callback will remove the document
Yes, but what guarantees that we'll run an rAF callback? What is the scope of a single refresh driver, especially with e10s? I'd think it's a single tab, which means in a background tab with e10s we never run rAF on that refresh driver at all, unless I'm missing something.
Flags: needinfo?(bzbarsky)
Updated•7 years ago
|
Flags: needinfo?(continuation)
![]() |
Assignee | |
Comment 29•7 years ago
|
||
So I just went through the invariants bits again, and here are some variously-far-fetched hypotheses for what might be happening: 1) Maybe we're unlinking the document while it's still registered with the refresh driver. This will clear mFrameRequestCallbacks, but not unregister. After that nothing will unregister, because it will see that we have no callbacks anyway, so are not registered. This seems extremely unlikely to me, because presshell holds a strong ref to the doc, and it's _not_ cycle-collected, so I can't see any reason the document could get unlinked while the presshell is still alive. 2) Maybe the refresh driver is not correctly removing the doc from its lists after calling TakeFrameRequestCallbacksFrom. I checked that just now, and it looks ok. 3) Maybe something under nsDocument::CreateShell causes rescheduling after we set mPresShell but before we call MaybeRescheduleAnimationFrameNotifications so we reschedule the same document twice. I don't see how any of those things would, but maybe. 4) Maybe the value of mPresShell->GetPresContext() or mPresShell->GetPresContext()->RefreshDriver() can change somehow during a presshell's lifetime. I don't _think_ it can, but haven't managed to prove it yet. 5) Maybe something in SetScriptGlobalObject runs between when we revoke/reschedule stuff and when mScriptGlobalObject changes. This is actually not seeming implausible. In particular, consider the case when SetScriptGlobalObject(nullptr) is called. We will call RevokeAnimationFrameNotifications(). Then if we have a nonzero mOnloadBlockCount, we will RemoveRequest(mOnloadBlocker). This will run our load event, which can call requestAnimationFrame. At this point mScriptGlobalObject is not null yet, so that will schedule us. #5 might be the most likely problem here, actually.... I don't know what we could do about #4 offhand, but for the other problems, and especially for #5... let me try something here.
![]() |
Assignee | |
Comment 30•7 years ago
|
||
MozReview-Commit-ID: 3FLVka93lCq
Attachment #8851152 -
Flags: review?(afarre)
![]() |
Assignee | |
Updated•7 years ago
|
Assignee: afarre → bzbarsky
Status: NEW → ASSIGNED
![]() |
Assignee | |
Comment 31•7 years ago
|
||
MozReview-Commit-ID: 46oVKKbCLbn
Attachment #8851153 -
Flags: review?(afarre)
![]() |
Assignee | |
Comment 32•7 years ago
|
||
Try run (without mention of this bug#, and with the patch hidden in the try-syntax changeset, on top of some other stuff I wanted to test anyway), looks greenish.
Comment 33•7 years ago
|
||
[Tracking Requested - why for this release]: Several of the signatures listed here affect 53 and 52.0.1/ESR, and we're seeing upwards of 400 crashes per day. Note also that bug 1230817 covers almost the same set of signatures, and is open. It was filed a year ago... Al, should we dup and close it? Let it sit idle to avoid suggesting attention (at least until this fix lands/ships)?
status-firefox52:
--- → affected
status-firefox53:
--- → affected
status-firefox-esr52:
--- → unaffected
tracking-firefox53:
--- → ?
Flags: needinfo?(abillings)
Comment 34•7 years ago
|
||
the esr status flag was set by accident i think
Updated•7 years ago
|
Attachment #8851152 -
Flags: review?(afarre) → review+
Updated•7 years ago
|
Attachment #8851153 -
Flags: review?(afarre) → review+
![]() |
Assignee | |
Comment 35•7 years ago
|
||
Comment on attachment 8851153 [details] [diff] [review] part 2. Simplify the management of whether our document's frame request callbacks are scheduled [Security approval request comment] How easily could an exploit be constructed based on the patch? It'll take some work. I haven't managed to get a crash out of it yet, though I haven't tried too hard. You have to figure out where the actual failure is (which we don't know yet, though comment 29 item 5 is a likely option). If that's the one, you have to navigate away from a page in such a way that the navigation completes while there is a pending requestAnimationFrame _and_ before the load event for the page. I haev some ideas for how to do that, with a bit of work. Then you have to ensure the document gets gced before the next refresh driver tick. Again, probably doable. Then you have to figure out how to make use of your dead object. Importantly, this is not the first time we're changing this code in a security bug in the last few months. :( Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? To some extent yes, in that it's clear it's to do with rAF callback scheduling. But not what the problem with that scheduling is, no. Which older supported branches are affected by this flaw? All of them, pretty much. If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Not yet. Backporting to branches before 54 will require a bit of work because those branches don't have bug 1340086 fixed. How likely is this patch to cause regressions; how much testing does it need? I think this is fairly safe in terms of regression risk.
Attachment #8851153 -
Flags: sec-approval?
Comment 36•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #33) > Note also that bug 1230817 covers almost the same set of signatures, and is > open. It was filed a year ago... Al, should we dup and close it? Let it > sit idle to avoid suggesting attention (at least until this fix lands/ships)? I'd wait until we ship the fix and then dupe it to this bug. Are we wanting to backport this to 53 and ESR52 or ESR45? I'm giving sec-approval+ for trunk. We should at least take this there and Aurora.
status-firefox-esr45:
--- → ?
tracking-firefox54:
--- → +
tracking-firefox55:
--- → +
tracking-firefox-esr52:
--- → ?
Flags: needinfo?(abillings)
Updated•7 years ago
|
Attachment #8851153 -
Flags: sec-approval? → sec-approval+
![]() |
Assignee | |
Comment 37•7 years ago
|
||
> Are we wanting to backport this to 53 and ESR52 or ESR45?
It's probably a good idea, yes. I'll work on writing a backported patch.
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8850916 -
Flags: feedback?
![]() |
Assignee | |
Comment 38•7 years ago
|
||
Comment on attachment 8851152 [details] [diff] [review] part 1. Move some one-bit booleans in nsIDocument next to all the other one-bit booleans Approval Request Comment [Feature/Bug causing the regression]: Probably requestAnimationFrame when we initially implemented it. [User impact if declined]: Random crashes, probably exploitable if one tries hard enough. [Is this code covered by automated tests?]: To some extent. Clearly not for the crashing scenario. [Has the fix been verified in Nightly?]: Not yet. [Needs manual test from QE? If yes, steps to reproduce]: We don't know; that's the problem. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: Somewhat. [Why is the change risky/not risky?]: It's refactoring some code and changing the flow of logic that is not very clear. That said, I think this is something that is safe enough to backport and the regression risk is not too bad. [String changes made/needed]: None.
Attachment #8851152 -
Flags: approval-mozilla-aurora?
![]() |
Assignee | |
Comment 39•7 years ago
|
||
Comment on attachment 8851153 [details] [diff] [review] part 2. Simplify the management of whether our document's frame request callbacks are scheduled This and the previous change applies as-is on Aurora.
Attachment #8851153 -
Flags: approval-mozilla-aurora?
![]() |
Assignee | |
Comment 40•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/66ea2aba89b4955f82161caeb83d9354bf4b9982 https://hg.mozilla.org/integration/mozilla-inbound/rev/cc73d91c9880f819e60a1dc536d046ed1c037e1c
![]() |
Assignee | |
Comment 41•7 years ago
|
||
Andreas, the main changes here are the IsAnimationsPaused() check in UpdateFrameRequestCallbackSchedulingState and the corresponding changes to SuppressEventHandling/ResumeAnimations. But there were various other merge conflicts, so please look this over somewhat carefully. My initial merge lost the UpdateFrameRequestCallbackSchedulingState() call in SuppressEventHandling, for example...
Attachment #8851744 -
Flags: review?(afarre)
![]() |
Assignee | |
Comment 42•7 years ago
|
||
Attachment #8851750 -
Flags: review?(afarre)
![]() |
Assignee | |
Comment 43•7 years ago
|
||
Attachment #8851754 -
Flags: review?(afarre)
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8851744 -
Flags: approval-mozilla-beta?
![]() |
Assignee | |
Comment 44•7 years ago
|
||
Comment on attachment 8851750 [details] [diff] [review] Rollup backport for esr52; this had various merge conflicts (even starting with the 53 patch as a base), but not substantive changes.. [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's sec-crit. User impact if declined: The crashes will continue until morale improves. Then again, they may even with this patch. Fix Landed on Version: 55. Risk to taking this patch (and alternatives if risky): See comment 38. Risk is moderate, but I'm not sure there are less-risky options. String or UUID changes made by this patch: None. See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8851750 -
Flags: approval-mozilla-esr52?
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8851754 -
Flags: approval-mozilla-esr45?
Updated•7 years ago
|
Attachment #8851152 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•7 years ago
|
Attachment #8851153 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•7 years ago
|
Attachment #8851744 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/mozilla-central/rev/66ea2aba89b4 https://hg.mozilla.org/mozilla-central/rev/cc73d91c9880
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 46•7 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/99da1ab37c37 https://hg.mozilla.org/releases/mozilla-aurora/rev/ae2b433951e6 NOTE: This can't be uplifted to Beta yet despite the approval since review on the rebased patch is still pending.
![]() |
Assignee | |
Comment 47•7 years ago
|
||
Attachment #8851838 -
Flags: review?(afarre)
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8851750 -
Attachment is obsolete: true
Attachment #8851750 -
Flags: review?(afarre)
Attachment #8851750 -
Flags: approval-mozilla-esr52?
![]() |
Assignee | |
Comment 48•7 years ago
|
||
Attachment #8851839 -
Flags: review?(afarre)
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8851754 -
Attachment is obsolete: true
Attachment #8851754 -
Flags: review?(afarre)
Attachment #8851754 -
Flags: approval-mozilla-esr45?
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8851838 -
Flags: approval-mozilla-esr52?
![]() |
Assignee | |
Updated•7 years ago
|
Attachment #8851839 -
Flags: approval-mozilla-esr45?
Updated•7 years ago
|
Attachment #8851744 -
Flags: review?(afarre) → review+
Updated•7 years ago
|
Attachment #8851838 -
Flags: review?(afarre) → review+
Updated•7 years ago
|
Attachment #8851839 -
Flags: review?(afarre) → review+
Comment 49•7 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/873e61601178
Updated•7 years ago
|
Group: dom-core-security → core-security-release
Comment 50•7 years ago
|
||
Comment on attachment 8851839 [details] [diff] [review] ESR45 patch merged a bit better sec-critical fix for esr45
Attachment #8851839 -
Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
Comment 51•7 years ago
|
||
Comment on attachment 8851838 [details] [diff] [review] ESR52 patch merged a bit better sec-critical fix for esr52
Attachment #8851838 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 52•7 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-esr52/rev/e469af8e9ccc https://hg.mozilla.org/releases/mozilla-esr45/rev/609145968cfe
Updated•7 years ago
|
![]() |
Assignee | |
Comment 54•7 years ago
|
||
How do things look here since the checkins? Or do we not have enough data yet?
Flags: needinfo?(rjesup)
Comment 55•7 years ago
|
||
No crashes on any of the signatures with a build date after 2017-03-29-00:00:00 :-)
Flags: needinfo?(rjesup)
Updated•7 years ago
|
Whiteboard: [adv-main53+][adv-esr52.1+][adv-esr48.9+]
Updated•7 years ago
|
Whiteboard: [adv-main53+][adv-esr52.1+][adv-esr48.9+] → [adv-main53+][adv-esr52.1+][adv-esr45.9+]
Updated•7 years ago
|
Flags: qe-verify-
Whiteboard: [adv-main53+][adv-esr52.1+][adv-esr45.9+] → [adv-main53+][adv-esr52.1+][adv-esr45.9+][post-critsmash-triage]
Updated•7 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•