Open Bug 1592364 Opened 5 years ago Updated 1 year ago

High CPU usage, possibly linked to animated PNGs (FF 70, FF Trunk, Linux)

Categories

(Core :: Graphics: ImageLib, defect, P3)

defect

Tracking

()

UNCONFIRMED

People

(Reporter: carls, Unassigned)

Details

Attachments

(1 file)

666.62 KB, image/png
Details

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0

Steps to reproduce:

Go to https://datagubbe.se/ffox/ (or any site containing one or more animated PNGs)

Profiling data (both from FF Trunk):
https://perfht.ml/34bnkAj
https://perfht.ml/2BRGMG2

System is Intel Celeron N4100 / 1.1 GHz with 8 GB RAM, running Linux kernel 5.0.0-32-generic (i915.enable_fbc=0) and X.Org X Server 1.20.4

Actual results:

CPU usage rises to around 50% when idling, even more on complex sites.

Reproduced in FF 70 and FF Nightly

Expected results:

Lower CPU usage

Select ImgDecoder in Threads, and profile again.

(In reply to gwarser from comment #1)

Select ImgDecoder in Threads, and profile again.

New profile data with ImgDecoder selected:
https://perfht.ml/2q05e5A

I may be wrong, but it looks like you are decoding AV1 video (AV1 video based image, AVIF?) not PNG. If it's true, then high cpu usage is rather expected.

(In reply to gwarser from comment #3)

I may be wrong, but it looks like you are decoding AV1 video (AV1 video based image, AVIF?) not PNG. If it's true, then high cpu usage is rather expected.

A sample file (the one used for profiling) is available here: https://datagubbe.se/ffox/anim.png
It's entirely possible I'm missing something, but Firefox identifies it as a PNG file when right-clicking and selecting "View Image Info".
The file header also looks like PNG/APNG:

carls@pixel:~/tmp$ xxd anim.png | head
00000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452  .PNG........IHDR
00000010: 0000 0280 0000 0280 0802 0000 0083 af5e  ...............^
00000020: 7400 0000 0861 6354 4c00 0000 2c00 0000  t....acTL...,...
00000030: 008d 7c02 1500 0000 1a66 6354 4c00 0000  ..|......fcTL...
00000040: 0000 0002 8000 0002 8000 0000 0000 0000  ................
00000050: 0000 0500 6400 001a 5750 8e00 0000 0674  ....d...WP.....t
00000060: 524e 5300 0000 0000 4018 7a46 0100 015e  [email protected]...^
00000070: 8549 4441 5478 daec bdf9 7323 4972 269a  .IDATx....s#Ir&.
00000080: 1179 2171 1fbc af2a d6d5 dd35 ddd3 3d9a  .y!q...*...5..=.
00000090: d168 7647 d28c 343a d66c 65fb 6fbe 9f65  .hvG..4:.le.o..e

Sorry, I'm no Firefox dev. I'm just trying to help, but I don't know what is happening here. Your profiles from comment 0 do not have recorded anything using noticeable cpu time and not trace of any PNG decoding. In profile from comment 2 still nothing which looks like PNG decoder, but instead few aom_iwht4x4_16_add_sse2 calls which looks to be part the code from AV1 decoder (inverse Walsh Hadamard transform?).

My profiling: https://perfht.ml/36c4b3b PNG decoder uses ~20% of time, but it's on ancient q6600 cpu.

The profile in comment 5 has proper symbols. For some reason the other profiles don't seem to have.

(In reply to Olli Pettay [:smaug] from comment #6)

The profile in comment 5 has proper symbols. For some reason the other profiles don't seem to have.

I'm following the instructions here: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem
I'm using firefox-trunk from ppa:ubuntu-mozilla-daily/ppa

It's quite possible I'm missing something or doing something wrong, any pointers would be appreciated.

I've made yet another profile and tried enabling as much as possible in the profiler settings: https://perfht.ml/2N4scS1

Carl,

Would you mind doing us another favor? Would you mind comparing the performance of Firefox trunk against a) older version(s) of Firefox and b) Chrome on the same hardware.

We have tried to reproduce this bug and cannot seem to do it. We are unsure whether this is a regression and would really appreciate your help.

Thanks for taking the time to report all this information. I hope that we can figure out the root cause.

Will

Flags: needinfo?(carls)

(In reply to Will Hawkins from comment #9)

Carl,

Would you mind doing us another favor? Would you mind comparing the performance of Firefox trunk against a) older version(s) of Firefox and b) Chrome on the same hardware.

We have tried to reproduce this bug and cannot seem to do it. We are unsure whether this is a regression and would really appreciate your help.

Thanks for taking the time to report all this information. I hope that we can figure out the root cause.

Will

I began noticing frequent spikes in CPU temperature after I upgraded from Firefox 69 to Firefox 70, and pinpointed it to Firefox by using htop. Thus I can confirm the behavior is the same in Firefox 70 on my computer, which is the version I normally use for day-to-day browsing.

Chromium idles on about 14% CPU usage on my PoC page ( https://datagubbe.se/ffox/ ), which is less than half that of Firefox.

I'm providing kernel and XOrg info above because I believe my kernel was updated at the same time and thus I can't rule out that it's got something to do with that. Perhaps there are known issues with Firefox in combination with certain kernel versions.

And, I should be thanking you all for patiently helping me out with this bug report and of course for creating a great browser with excellent privacy features.

Flags: needinfo?(carls)

FWIW I am affected by this behavior.

with firefox 70 open (no pages open - default "Firefox Home" landing page)

(p374.ob) [130] $ uname -a
Linux lh 4.15.0-66-generic #75-Ubuntu SMP Tue Oct 1 05:24:09 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
(p374.ob) [131] $ firefox --version
Mozilla Firefox 70.0
(p374.ob) [132] $ uptime
 07:55:08 up 14 days, 12:59,  1 user,  load average: 2.20, 1.83, 1.67

upon closing firefox and waiting a bit...

(p374.ob) [139] $ uptime
 07:58:07 up 14 days, 13:02,  1 user,  load average: 0.13, 1.04, 1.39

and now its behaving normally...

one page open (to make this comment)

(p374.ob) [146] $ uptime
 08:04:51 up 14 days, 13:09,  1 user,  load average: 0.25, 0.79, 1.20

I'm willing to help troubleshoot but request that any data publicly viewable is anonymized...

Carl, any chance you could try to find regression range - assuming you can reproduce the issue easily?
https://mozilla.github.io/mozregression/ can be useful.

I can definitely see Firefox taking more CPU time on https://datagubbe.se/ffox/ than Chrome.

Component: Performance → ImageLib
Flags: needinfo?(carls)

(In reply to Olli Pettay [:smaug] from comment #13)

Carl, any chance you could try to find regression range - assuming you can reproduce the issue easily?
https://mozilla.github.io/mozregression/ can be useful.

I can definitely see Firefox taking more CPU time on https://datagubbe.se/ffox/ than Chrome.

I will look into how to use mozregression as soon as I can find the time (probably this weekend), in the meantime I can say with confidence that the issue was not present in Firefox v68, because I was using that version since I bought my current computer this summer up until v69 was released.

There are a few bugs which have impacted APNG performance in the last few years.

  • < Firefox 62 - Decodes APNGs once, higher memory footprint.
  • Firefox 62 - Bug 1460258 discards and redecodes frames as necessary for large APNGs (if total number of frames exceed 20MB by default). This should have improved the APNG decode/blend speed as well although it would do it more often.
  • Firefox 65 - Bug 1465619 makes us reuse memory buffers, avoid the allocation/free cost, as well as reuse the contents of the buffer.
  • Firefox 69 - Bug 1255106 had some color management changes that should not have had a major impact on APNGs, but maybe is relevant given the timeline.
  • Firefox 71 - Bug 1551088 should reduce PNG decode times (6-7% on average according to telemetry) by improving RGB expansion to RGBA (if opaque) and alpha premultiplication (if transparent).

Here we have a 640x640 sized image, which has a 1.56MB footprint per frame. From the profile traces it is clear that we exceed the memory threshold and opt to redecode. As far as I understand it, Chrome makes the same decision, and ought to redecode as well. You can try changing image.animated.decode-on-demand.threshold-kb to be much larger to not discard/redecode the APNG and see what impact that has on performance.

I wasn't able to reproduce similar performance on an Intel Core i7-4900MQ.

Turns out using mozregression was refreshingly easy. :-)
Embarrassingly enough for me, I could actually reproduce the behavior in much earlier nightly builds as well (including 68.0a1). CPU usage looks to be the same.
Increasing the decoder threshold to ~2 megabytes brought CPU usage down to around 10% on the test page.

The spikes in CPU temperature might be related to shorter polling intervals or something else that has changed in my system when applying general upgrades.

Sorry for causing so much ado about nothing.

Flags: needinfo?(carls)

Sorry, that's supposed to be ~200 megabytes. :-)

Priority: -- → P3
Attached image cpu

It happens to me also at least 5-6 times a day, although I have only some Atlassian pages opened.

Do you still experience this problem?

Flags: needinfo?(tau_xxx)
Flags: needinfo?(carls)

(In reply to Wayne Mery (:wsmwk) from comment #19)

Do you still experience this problem?

I don't experience it anymore.

Flags: needinfo?(tau_xxx)
Severity: normal → S3
Severity: normal → S3

Clear a needinfo that is pending on an inactive user.

Inactive users most likely will not respond; if the missing information is essential and cannot be collected another way, the bug maybe should be closed as INCOMPLETE.

For more information, please visit auto_nag documentation.

Flags: needinfo?(carls)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: