Crash in [@ nsIFrame::GetParent]
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: aryx, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: crash, Whiteboard: [no-nag])
Crash Data
[Tracking Requested - why for this release]:
This bug existed earlier but got more frequent with Firefox 110: 494 crashes until now compared to 124 for the whole Firefox 109 cycle. 25% on Windows 10, many other crashes on Windows 8.1 & 7.
Crash report: https://crash-stats.mozilla.org/report/index/f0913143-3e67-4568-8ab3-97d6f0230307
Reason: EXCEPTION_ACCESS_VIOLATION_WRITE
Top 10 frames of crashing thread:
0 xul.dll nsIFrame::GetParent const layout/generic/nsIFrame.h:895
0 xul.dll mozilla::ViewportUtils::IsZoomedContentRoot layout/base/ViewportUtils.cpp:246
0 xul.dll nsIFrame::GetTransformMatrix::<lambda_0>::operator const layout/generic/nsIFrame.cpp:7417
0 xul.dll nsIFrame::GetTransformMatrix const layout/generic/nsIFrame.cpp:7423
1 xul.dll nsLayoutUtils::GetTransformToAncestor layout/base/nsLayoutUtils.cpp:2089
1 xul.dll TransformGfxRectToAncestor layout/base/nsLayoutUtils.cpp:2343
2 xul.dll nsLayoutUtils::TransformFrameRectToAncestor layout/base/nsLayoutUtils.cpp:2581
2 xul.dll nsLayoutUtils::TransformFrameRectToAncestor layout/base/nsLayoutUtils.h:888
2 xul.dll BoxToRect::AddBox layout/base/nsLayoutUtils.cpp:3692
3 xul.dll nsLayoutUtils::GetAllInFlowBoxes layout/base/nsLayoutUtils.cpp:3598
![]() |
Reporter | |
Comment 1•1 year ago
|
||
46% of the crashes of Firefox 110.0.1 are in the first 5 minutes, and 88% with Intel HD Graphics 5500.
Jeff, could you take a look at this signature which started to spike around March 1st?
Updated•1 year ago
|
Comment 2•1 year ago
|
||
I don't have any guesses what this would be. The graphics drivers aren't loaded into the parent so it doesn't seem likely that it would be because of that.
I guess it makes sense to keep in Layout cause that's where the code is crashing.
Comment 3•1 year ago
|
||
92% of the crashes have CPU Info = family 6 model 61 stepping 4. So this seems like it could be a cpu bug.
And a high percent of the crashes are on Windows 8.1. Perhaps there is a microcode update that only happens if you have Windows 10 or newer for this cpu?
Comment 4•1 year ago
|
||
The bug is marked as tracked for firefox111 (beta). We have limited time to fix this, the soft freeze is in a day. However, the bug still isn't assigned.
:fgriffith, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit auto_nag documentation.
![]() |
Reporter | |
Comment 5•1 year ago
|
||
Firefox 110.0.1 shipped at 100% on March 1 which aligns with the latest crash volume increase. There was already a crash frequency increase with 110.0 but to a lesser extent.
Updated•1 year ago
|
Comment 6•1 year ago
|
||
Emiio and I took a quick look. Will NI him to redirect.
Comment 7•1 year ago
|
||
Yeah, a lot of the crash reasons are also "impossible". E.g. nsIFrame::GetParent
only reads memory, but we're crashing with a write-near-null error. Given Tim's observations in comment 3 and those, it seems this might not be actionable as a layout bug... Gabriele, is there any chance you could take a look and sanity-check us?
Comment 8•1 year ago
|
||
Let's first discount the crashes being reads: the addresses are all over the place and several look like bit-flips, we can probably chalk them up to flaky hardware.
As for the crashes that are writes there's an interesting pattern:
- All crashes are from a very specific version of Broadwell CPUs: family 6 model 61 stepping 4
- All crashes are running Windows 7 or Windows 8.1, this is important because Microsoft started shipping CPU microcode updates with Windows 10
- The highest microcode versions in those crashes for that CPU is 0x19, the highest version available is 0x2f which confirms these CPUs did not receive microcode updates
- And finally the smoking gun, the crashing instruction is
mov rcx, qword [r13 + 0x30]
which is a read not a write so this crash is impossible
This is most definitely a crash caused by a CPU bug.
CC'ing :afranchuk and :suhaib who are both working on different aspects of crash analysis. This is a very good example of a crash which we'd like to automatically identify as caused by hardware.
Comment hidden (obsolete) |
Comment 10•1 year ago
|
||
As mentioned by :gsvelto, this crash was caused by hardware bug - dropping the topcrash
keyword.
:gsvelto, is such case will be classified as hardware crash in the new information that will be available soon in crash reports? If so, the bot then could ignore such such crashes.
Updated•1 year ago
|
Comment 11•1 year ago
|
||
(In reply to Suhaib Mujahid [:suhaib] from comment #10)
:gsvelto, is such case will be classified as hardware crash in the new information that will be available soon in crash reports? If so, the bot then could ignore such such crashes.
Yes, that's the idea. Given the crash reason and crashing instruction can be proven to be incompatible we should be able to catch it automatically in the stack walker.
Comment hidden (obsolete) |
Updated•1 year ago
|
Comment 14•1 year ago
|
||
We still have a handful of crashes in 111 that match all the conditions in comment 8. It's possible that changes in the build made the bug less likely to be triggered, but didn't remove it entirely.
Updated•1 year ago
|
Comment 16•1 year ago
|
||
Closing this out as WORKSFORME as comment 8 pretty definitively pins this on a CPU microcode bug.
Comment 17•1 year ago
|
||
Let's keep this open as long as we're getting crash volume (but we can classify it as low-severity and not worry too much about it, given comment 8).
Also: fortunately the recent spike (20-70 crashes/day) in early March seems to have gone away; we're back down to single-digit crashes per day.
Description
•