gfx.webrender.debug.dl.dump-content causes weird crashes (memory corruption?)
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | disabled |
firefox93 | --- | disabled |
firefox94 | --- | disabled |
firefox95 | --- | fixed |
People
(Reporter: botond, Assigned: mikokm)
References
(Regression)
Details
(Keywords: regression)
Attachments
(2 files)
STR
In a debug build, run ./mach run --setpref gfx.webrender.debug.dl.dump-content=true https://bug1734007.bmoattachments.org/attachment.cgi?id=9244249
.
I don't think the issue is specific to this page, I've seen it on other pages as well.
Actual results
Firefox crashes with debug assertions or segmentation faults in unrelated code. I've attached an example backtrace, which is a segmentation fault in nsTArray::Length()
caused by the array's mHdr
being null. The calling code in this case is DOM / style related, but that varies.
I've also seen assertions in TimeStamp code and elsewhere.
The assertions / crashes are seemingly unrelated to the WebRender display list dumping, but invariably go away if I run with --setpref gfx.webrender.debug.dl.dump-content=false
. I suspect memory corruption.
Reporter | ||
Comment 2•3 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #1)
Can you try with an asan build?
I can Try with a capital T: https://treeherder.mozilla.org/jobs?repo=try&revision=915252df71b0645233ddf80cf77627110d2d474b
Reporter | ||
Comment 3•3 years ago
•
|
||
(In reply to Botond Ballo [:botond] from comment #2)
I can Try with a capital T: https://treeherder.mozilla.org/jobs?repo=try&revision=915252df71b0645233ddf80cf77627110d2d474b
Seeing some use-after-poisons.
Reporter | ||
Comment 4•3 years ago
|
||
Regression range from mozregression:
Looks like bug 1724344.
Comment 5•3 years ago
|
||
I can reproduce this locally. Interestingly, I can completely comment out the body of wr_dump_display_list
(and emit_display_list
) and it still occurs, but only with that debug flag enabled.
Comment 6•3 years ago
|
||
I think this is an existing corruption issue which may be made more visible by the patch above.
I was able to see ASAN failures / crashes with a build as far back as 2020 (only when the flag is enabled though):
commit 9010a032c1a74dc4a45e51fef0a30e8bb17139a2 (HEAD)
Author: Lee Salzman <[email protected]>
Date: Fri Oct 23 02:15:59 2020 +0000
Bug 1672898 - increase precision of SWGL's YUV composition stepping. r=mattwoodrow
In this case, the following partial stack trace from ASAN:
==650278==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61300088a4b2 at pc 0x55a273afce09 bp 0x7ffdacf93190 sp 0x7ffdacf92950
WRITE of size 185 at 0x61300088a4b2 thread T0 (Web Content)
#0 0x55a273afce08 in memset /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:780:3
#1 0x7f2aef9870ce in core::intrinsics::write_bytes::h6a1ee2ed9ab6412b /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/intrinsics.rs:2022:14
#2 0x7f2aef9871a8 in core::ptr::mut_ptr::_$LT$impl$u20$$BP$mut$u20$T$GT$::write_bytes::h7a4fee190bcc0ee4 /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/core/src/ptr/mut_ptr.rs:995:18
#3 0x7f2aef9881c6 in peek_poke::strip_red_zone::he77ef14b14274e9c /homecode/work/gecko2/gfx/wr/peek-poke/src/lib.rs:125:9
#4 0x7f2aef920424 in webrender_api::display_list::DisplayListBuilder::emit_display_list::hd5f5681e22279beb /homecode/work/gecko2/gfx/wr/webrender_api/src/display_list.rs:1084:9
#5 0x7f2aef901102 in wr_dump_display_list /homecode/work/gecko2/gfx/webrender_bindings/src/bindings.rs:3766:17
#6 0x7f2ae1e7d14f in mozilla::layers::WebRenderCommandBuilder::BuildWebRenderCommands(mozilla::wr::DisplayListBuilder&, mozilla::wr::IpcResourceUpdateQueue&, nsDisplayList*, nsDisplayListBuilder*, mozilla::layers::WebRenderScrollData&, WrFiltersHolder&&) /homecode/work/gecko2/gfx/layers/wr/WebRenderCommandBuilder.cpp:1577:20
#7 0x7f2ae1e95a91 in mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer(nsDisplayList*, nsDisplayListBuilder*, WrFiltersHolder&&, mozilla::layers::WebRenderBackgroundData*) /homecode/work/gecko2/gfx/layers/wr/WebRenderLayerManager.cpp:349:30
#8 0x7f2ae9681168 in nsDisplayList::PaintRoot(nsDisplayListBuilder*, gfxContext*, unsigned int) /homecode/work/gecko2/layout/painting/nsDisplayList.cpp:2454:18
#9 0x7f2ae8d281e0 in nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags) /homecode/work/gecko2/layout/base/nsLayoutUtils.cpp:3465:13
#10 0x7f2ae8bfe9bd in mozilla::PresShell::Paint(nsView*, nsRegion const&, mozilla::PaintFlags) /homecode/work/gecko2/layout/base/PresShell.cpp:6361:5
#11 0x7f2ae8407880 in nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) /homecode/work/gecko2/view/nsViewManager.cpp:460:18
I can try digging back further tomorrow if no one else gets to it. Perhaps Jeff or Miko have an idea what the issue might be though? Many of the stack traces I saw while doing local builds were in the DL cache code, but that might be unrelated.
Updated•3 years ago
|
Comment 7•3 years ago
|
||
Set release status flags based on info from the regressing bug 1724344
Comment 8•3 years ago
|
||
I ran mozregression with asan build enabled, and I see ASAN violations when this flag is enabled back to July 2020, which seems to be the underlying cause.
mozregression was only able to narrow it down to this pushlog [1] due to missing old builds, and I'm struggling to build locally from that far back.
The commit that stands out to me (based on what I was seeing in the ASAN traces) from that pushlog is [2]. This seems likely to be the underlying cause, but I'm not sure how all the Gecko-side DL caching code works. Perhaps Miko could take a look?
[2] https://hg.mozilla.org/mozilla-central/rev/422a055dfbae293ad9277720e5d9a0901e5cdad5
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 9•3 years ago
|
||
Modifying the code to
pub fn emit_display_list<W>(
&mut self,
indent: usize,
range: Range<Option<usize>>,
mut sink: W,
) -> usize
where
W: Write
{
let mut temp = BuiltDisplayList::default();
ensure_red_zone::<di::DisplayItem>(&mut temp.payload.items_data);
ensure_red_zone::<di::DisplayItem>(&mut temp.payload.cache_data);
strip_red_zone::<di::DisplayItem>(&mut temp.payload.items_data);
strip_red_zone::<di::DisplayItem>(&mut temp.payload.cache_data);
0
}
is enough to trigger ASAN checks (loop would almost certainly work too).
I think the issue is that strip_red_zone()
is not doing the right thing.
Assignee | ||
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
Pushed by mikokm@gmail.com: https://hg.mozilla.org/integration/autoland/rev/d77b337b99da Remove bogus write_bytes in strip_red_zone r=jrmuizel
Comment 12•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Comment 13•3 years ago
|
||
The patch landed in nightly and beta is affected.
:miko, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Description
•