Closed Bug 1861533 Opened 8 months ago Closed 4 months ago

Implement more CacheIR ops in PBL

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

RESOLVED FIXED
125 Branch
Tracking Status
firefox125 --- fixed

People

(Reporter: jsharp, Assigned: jsharp)

Details

Attachments

(10 files)

I've been working to extend cfallin's "portable baseline" (PBL) tier with more CacheIR op fast-paths. I'm opening this issue to attach a Phabricator patch stack with the current state of that work.

If you've declared that a value doesn't actually need to be a GC root,
then you're opting out of the usual safety guarantees and might as well
allow constructing a Handle to refer to that value too.

The baseline tier sometimes checks whether any of a bit-set of flags are
present on an ArgumentsObject, so PBL needs the ability to do the same.

As a bonus, factoring this pattern to a separate function allows
simplifying the implementation of the existing getters for flags.

Depends on D192030

The ObjectElements header has a "length" field which is mostly only
meaningful for ArrayObject subclasses of NativeObject. However, this
field is read as a hint in NativeObject::growElements, passed to
goodElementsAllocationAmount when deciding how much memory to allocate,
even if the object is not an array.

The baseline tier sometimes increments this field in the CacheIR
StoreDenseElementHole op, even if the object is not an array. So cfallin
and I want to ensure that PBL matches that behavior.

To do so, we need public access to this field, so this commit adds a
public getter and setter.

Depends on D192031

Many CacheIR ops are only ever followed by a ReturnFromIC. Rather than
branching through the dispatch table just to get to a return, we can
package up this special case to return directly.

Depends on D192032

The most important of these cleanups is a bug-fix for
GuardDynamicSlotIsSpecificObject, which was interpreting its "slot"
parameter as a byte-offset rather than as an index into the slots array.

Other changes just make the boilerplate for reading registers and stub
fields more consistent; add comments for incomplete implementations
relative to the baseline tier; or add more helpful tracing.

Depends on D192033

Depends on D192034

Assignee: nobody → jsharp
Attachment #9360532 - Attachment description: WIP: Bug 1861533: Allow using fake-rooted values as handles, r=jandem → Bug 1861533: Allow using fake-rooted values as handles, r=jandem
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #9360533 - Attachment description: WIP: Bug 1861533: Allow checking multiple ArgumentsObject flags, r=jandem → Bug 1861533: Allow checking multiple ArgumentsObject flags, r=jandem
Attachment #9360534 - Attachment description: WIP: Bug 1861533: Allow access to NativeObject "length", r=jandem → Bug 1861533: Allow access to NativeObject "length", r=jandem
Attachment #9360535 - Attachment description: WIP: Bug 1861533: Specialize PREDICT_NEXT for ReturnFromIC, r=jandem → Bug 1861533: Specialize PREDICT_NEXT for ReturnFromIC, r=jandem
Attachment #9360536 - Attachment description: WIP: Bug 1861533: Cleanups for existing PBL CacheIR ops, r=jandem → Bug 1861533: Cleanups for existing PBL CacheIR ops, r=jandem
Attachment #9360537 - Attachment description: WIP: Bug 1861533: Implement more CacheIR ops in PBL, r=jandem → Bug 1861533: Implement more CacheIR ops in PBL, r=jandem

Here's a try-run for these patches: link

One motivation for this work is to get more performance improvements out of PBL. This tier can give correct results even if all CacheIR ops immediately return NextIC, which causes it to eventually run the non-IC fallback for each bytecode op. However, the more of these ops we implement, the better performance should get, ideally. So here's a rough benchmark of speed-up ratios on Octane v2, before and after the final patch of this stack:

1.22747x RayTrace
1.17575x EarleyBoyer
1.10825x Mandreel
1.07161x Gameboy
1.06072x PdfJS
1.02652x RegExp
1.00976x Splay
1.00975x CodeLoad
1.00685x NavierStokes
0.995134x Crypto
0.975806x DeltaBlue
0.965409x Box2D
0.946667x Richards

The overall score computed by Octane shows a 1.04583x speedup. However this should all only be taken as a rough guide to performance impact as I've seen significant variability in trying to measure this benchmark suite.

Even if we assume these measurements are valid, I don't think it's worth paying attention to the benchmarks which show slowdowns until we have implemented all of the CacheIR ops that those benchmarks use. Implementing only part of an IC chain can lead to slowdowns as we evaluate the newly-implemented ops and then throw away the results due to encountering an unimplemented op, then redo the work in the fallback. If there are still slowdowns at that point, we should profile where the time is going; we may find that there are cases where interpreting the IC chain is slower than just using the full fallback, in which case the "hybrid ICs" logic in PBL may need tuning.

Severity: -- → N/A
Priority: -- → P3
Attachment #9360534 - Attachment description: Bug 1861533: Allow access to NativeObject "length", r=jandem → Bug 1861533: Ignore "length" for non-arrays, r=jandem
Attachment #9360532 - Attachment description: Bug 1861533: Allow using fake-rooted values as handles, r=jandem → Bug 1861533: Allow using fake-rooted values as handles, r=jonco

Apparently clang-format wants to apply diffs like this:

  • INT32_OP(BitXor, ^, {});
  • INT32_OP(BitXor, ^,
  •       {
    
  •       });
    

That's just excessive, and also for some reason wasn't required when pbl
landed about a month ago. I propose that we just ignore clang-format in
this case.

Depends on D192034

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:jsharp, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(jsharp)
Flags: needinfo?(jdemooij)
Flags: needinfo?(jdemooij)

Sorry for the delay. I discovered that the jit-tests suite has been failing when PBL is built in debug mode. Chris didn't previously turn on debug builds in CI so nobody noticed.

I don't think these patches introduce any new test suite failures so I'd be happy to land them as-is, but Chris argued (and I don't blame him) that I should open a separate bug and land the fixes for the existing failures before landing these new patches. However I haven't had time to make much progress on investigating those failures; I have a fix for one issue but I don't know how many more there are.

So as far as I'm concerned, this patch stack could land as-is, but I would totally understand if you agree with Chris that we should set it aside until we can verify that debug-build jit-tests pass too.

Flags: needinfo?(jsharp)

The various compilers' implementations of emitCompareStringResult
check several fast cases first and only fall back to a general
comparison if none of those cases apply. The general comparisons include
debug asserts that the fast paths must have already been checked. So PBL
needs to check those fast paths too.

Depends on D192035

Depends on D195677

Depends on D195678

The fixes for PBL to pass the jit-test suite in debug builds turned out to be pretty minor so I've just added them to this patch stack, as well as a patch to run these tests in CI going forward. I don't know anything about Mozilla CI configuration so please double-check that I copy-pasted things correctly.

Could you also do a try run for me? I'm pretty confident in the correctness of these patches at this point but, you know, just in case.

(In reply to [email protected] from comment #15)

Could you also do a try run for me? I'm pretty confident in the correctness of these patches at this point but, you know, just in case.

Sure, here you go: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=1a183b6bf04922f656f07237c259dd685bff908f

Side note; probably worth asking for Level 1 SCM access at this point if you'll be working on maintaining PBL.

jsharp, let us know if/when this is ready to land and one of us can do it :)

Alright, I think it's ready to land! Thank you for your help getting it there. My first Mozilla contribution, how exciting :-)

I've taken over driving this patch-stack and did a rebase and fixed an issue with the added CI config; try builds here for everything but pbl-debug, and here for a rerun of the latter. :jsharp plans to update his patch-stack from this branch when able and then I'll Lando it when green. It seems there are some test timeouts which we'll take a look at.

Here is a green try-run of the pbl-debug CI job; "optimize": false had caused the timeouts. I think this is ready to land once Jamey is able to upload the above branch as updated patches.

Pushed by chris@cfallin.org:
https://hg.mozilla.org/integration/autoland/rev/99ba26092555
Allow using fake-rooted values as handles, r=jonco
https://hg.mozilla.org/integration/autoland/rev/6a531229c73c
Allow checking multiple ArgumentsObject flags, r=jandem
https://hg.mozilla.org/integration/autoland/rev/d942b1570d05
Ignore "length" for non-arrays, r=jandem
https://hg.mozilla.org/integration/autoland/rev/06febe799b24
Specialize PREDICT_NEXT for ReturnFromIC, r=jandem
https://hg.mozilla.org/integration/autoland/rev/844d439c9a8b
Cleanups for existing PBL CacheIR ops, r=jandem
https://hg.mozilla.org/integration/autoland/rev/71e5b4ccf4bd
Don't reformat short INT32_OP invocations, r=jandem
https://hg.mozilla.org/integration/autoland/rev/72d2892bd9bc
Implement more CacheIR ops in PBL, r=jandem
https://hg.mozilla.org/integration/autoland/rev/c053ea8e7a45
Fix debug asserts in PBL CompareStringResult, r=jandem
https://hg.mozilla.org/integration/autoland/rev/a1da44e0fc12
Fix PBL setting frame debuggee flag, r=jandem
https://hg.mozilla.org/integration/autoland/rev/8bfa623afd6f
Add PBL debug builds to CI, r=sfink,jmaher
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: