Heap write analysis is detecting but not reporting errors

firefox58

---

firefox59

firefox60

firefox61

fixed

People

(Reporter: sfink, Assigned: sfink)

References

Details

(Keywords: sec-audit, Whiteboard: [adv-main61-][post-critsmash-triage])

Attachments

(6 files)

Current set of 43 hazards 6 years ago Steve Fink [:sfink] [:s:] 51.20 KB, text/plain		Details
Fix setting exit_status 6 years ago Steve Fink [:sfink] [:s:] 1.51 KB, patch	ted : review+	Details \| Diff \| Splinter Review
heap write analysis: whitelist all atomics 6 years ago Steve Fink [:sfink] [:s:] 1.17 KB, patch	bholley : review+	Details \| Diff \| Splinter Review
heap write analysis: explicitly whitelist stringbuffer canary field 6 years ago Steve Fink [:sfink] [:s:] 1.09 KB, patch	bholley : review+	Details \| Diff \| Splinter Review
heap write analysis: whitelist GetAutoArrayBuffer, which returns a pointer into \|this\| 6 years ago Steve Fink [:sfink] [:s:] 2.25 KB, patch	bholley : review+	Details \| Diff \| Splinter Review
heap write analysis: nsTArray owns its header 6 years ago Steve Fink [:sfink] [:s:] 2.43 KB, patch	bholley : review+	Details \| Diff \| Splinter Review

Assignee

Description

•

6 years ago

The heap write hazard analysis job is running and currently detecting 43 errors, but is showing up green on treeherder. The status code appears to not be getting propagated properly.

That means it is probably reporting quite a few legitimate races in the current code.

Assignee

Comment 1

•

6 years ago

Attached file Current set of 43 hazards — Details

Emilio Cobos Álvarez (:emilio)

Assignee

Updated

•

6 years ago

Assignee: nobody → sphink

Comment 2

•

6 years ago

Most of them look sane fwiw.

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

6 years ago

status-firefox59: --- → ?

status-firefox60: --- → affected

tracking-firefox60: --- → +

Assignee

Comment 3

•

6 years ago

The problem appears to already be in mozilla-release, where it reports 30 hazards.

Liz Henry (:lizzard) (relman/hg->git project)

Assignee

Updated

•

6 years ago

status-firefox-esr52: --- → unaffected

tracking-firefox-esr52: --- → ?

Updated

•

6 years ago

status-firefox58: --- → wontfix

status-firefox59: ? → affected

tracking-firefox59: --- → +

tracking-firefox-esr52: ? → -

Emilio Cobos Álvarez (:emilio)

Comment 4

•

6 years ago

So, just took a quick look:

 * Gecko_ContentList_AppendAll is mt-only.
 * I don't know what Gecko_GetNextStyleChild is complaining about, but nsCanvasFrame::AppendAnonymousContentTo seems thread-safe to me, assuming the argument is owned, which it is.
 * Gecko_GetLookAndFeelSystemColor: We explicitly call LookAndFeel::NativeInit on the main thread before calling into Servo to prevent this race:

https://searchfox.org/mozilla-central/rev/908e3662388a96673a0fc00b84c47f832a5bea7c/layout/style/ServoStyleSet.cpp#461

The rest of them look like from bug 1418161 which explicitly handles the race, or from bug 1410276, which given it runs during destruction of the string buffer I _think_ it's ok (since the thread freeing the last reference must be necessarily the only thread referencing the string buffer still for the code to be sound).

Assignee

Comment 5

•

6 years ago

The reporting error was introduced in bug 1339989. I did 'exit_status = 1' in a shell script. Which means to run a command named 'exit_status' with two arguments, '=' and '1', rather than setting the variable exit_status as I intended. :(

Assignee

Comment 6

•

6 years ago

Attached patch Fix setting exit_status — Details — Splinter Review

Not sure who to pick for review, but it's trivial, and it can't land until we do something about the hazards anyway.

Attachment #8955721 - Flags: review?(ted)

Comment 7

•

6 years ago

I'm going to call _this_ bug "sec-audit" because it sounds like you're still working through the list. If some or all of those errors need to be fixed we might want to spawn new bugs for each instance just to keep it all straight and can rate the security severity of each as appropriate.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> The rest of them look like [...] which explicitly handles the race,

Is there a way to annotate the code so the tool won't warn in these cases? Let's say we fix some but still end up with a report that has 28 "OK" errors every time it runs. We likely won't notice when 29 and 30 creep in there that might be real problems.

Keywords: sec-audit

Comment 8

•

6 years ago

If it ends up being easier to patch them all in this bug (because "no more errors" ensures we didn't forget one) then please clear sec-audit and replace it with the severity of the worst issue we're fixing. Probably sec-high for a csectype-race.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 9

•

6 years ago

(In reply to Daniel Veditz [:dveditz] from comment #7)
> I'm going to call _this_ bug "sec-audit" because it sounds like you're still
> working through the list.

I think that's right. I attached a patch to fix the reporting problem, but that patch fixes a sec-audit problem, nothing more.

> If some or all of those errors need to be fixed we
> might want to spawn new bugs for each instance just to keep it all straight
> and can rate the security severity of each as appropriate.

Agreed.

> (In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> > The rest of them look like [...] which explicitly handles the race,
> 
> Is there a way to annotate the code so the tool won't warn in these cases?
> Let's say we fix some but still end up with a report that has 28 "OK" errors
> every time it runs. We likely won't notice when 29 and 30 creep in there
> that might be real problems.

Yes, and that's exactly what we did when this was reporting correctly. There are annotation mechanisms, but we need to look at each one to figure out where to apply the annotation. Most of them are obvious. We want to get this back to zero to make it easy to know what's new.

The way this is set up, we *could* allow 28 errors through, in which case it would only turn the job red if you introduced a 29th, but then it would be a pain to look through the log to identify which is the new one. So we prefer to keep the allowed count at zero. Which worked great until I broke the reporting and it started lying about being green.

Comment 10

•

6 years ago

> I don't know what Gecko_GetNextStyleChild is complaining about

Presumably the virtual call to AppendAnonymousContentTo.  I'm _guessing_ that the nsCanvasFrame version is just one possible target that was picked for that virtual call?  In general, we could be calling any frame's AppendAnonymousContentTo here.

(not currently active) Ted Mielczarek

Comment 11

•

6 years ago

Comment on attachment 8955721 [details] [diff] [review]
Fix setting exit_status

Review of attachment 8955721 [details] [diff] [review]:
-----------------------------------------------------------------

That sucks, I hate shell scripting. Have you ever run this script through shellcheck? Every time I do that to any script I've written it points out that like 90% of my script is wrong.

Attachment #8955721 - Flags: review?(ted) → review+

Emilio Cobos Álvarez (:emilio)

Updated

•

6 years ago

Depends on: 1443101

Assignee

Comment 12

•

6 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> Comment on attachment 8955721 [details] [diff] [review]
> Fix setting exit_status
> 
> Review of attachment 8955721 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> That sucks, I hate shell scripting. Have you ever run this script through
> shellcheck? Every time I do that to any script I've written it points out
> that like 90% of my script is wrong.

I just started using shellcheck via flycheck, and it was correctly marking this line red. But I'm so used to seeing the green warnings that I didn't pay attention and figured it out the old way instead. :-(

Comment 13

•

6 years ago

I can fix the remaining hazards here.

Flags: needinfo?(bobbyholley)

Updated

•

6 years ago

Depends on: 1443227

Assignee

Updated

•

6 years ago

Depends on: 1443233

Assignee

Comment 14

•

6 years ago

Attached patch heap write analysis: whitelist all atomics — Details — Splinter Review

I haven't really thought through this one. Is it really ok to whitelist atomics? I mean, they tend to be used when you know there are multiple threads around, and the order of operations doesn't matter, but it's not like it would be hard to create a race condition with atomics. Is this a good assumption or not?

I did this because my local run got a huge number of hazards from __atomic_compare_exchange_8 or something like that.

Attachment #8956498 - Flags: review?(bobbyholley)

Assignee

Comment 15

•

6 years ago

Attached patch heap write analysis: explicitly whitelist stringbuffer canary field — Details — Splinter Review

I was originally planning to file separate bugs for all the problems found, but most of them are relatively minor annotation tweaks.

I'm sort of assuming that the mCanary field is intended to be racy. It gets set just before you free things, after all.

Attachment #8956499 - Flags: review?(bobbyholley)

Assignee

Comment 16

•

6 years ago

Attached patch heap write analysis: whitelist GetAutoArrayBuffer, which returns a pointer into |this| — Details — Splinter Review

Also, I'm r?bholley on everything, on the assumption that you'll distribute them out if needed.

If I understand the comment correctly, GetAutoArrayBuffer gives back a pointer that is another way of accessing 'this', and so should have the same safety. Uh... I don't actually remember which hazard this was for, unless it was for the mHdr one, but I have an additional fix for that coming up.

Attachment #8956500 - Flags: review?(bobbyholley)

Assignee

Comment 17

•

6 years ago

Attached patch heap write analysis: nsTArray owns its header — Details — Splinter Review

Example hazard:

Error: Field write nsTArrayHeader.mCapacity
Location: _ZN13nsTArray_baseI27nsTArrayInfallibleAllocator25nsTArray_CopyWithMemutilsE14ShrinkCapacityEmm$void nsTArray_base<Alloc, Copy>::ShrinkCapacity(nsTArray_base<Alloc, Copy>::size_type, size_t) [with Alloc = nsTArrayInfallibleAllocator; Copy = nsTArray_CopyWithMemutils; nsTArray_base<Alloc, Copy>::size_type = long unsigned int; size_t = long unsigned int] @ obj-analyzed/dist/include/nsTArray-inl.h#241 ### SafeArguments: this
Stack Trace:
_ZN13nsTArray_ImplI6RefPtrI6nsAtomE27nsTArrayInfallibleAllocatorE7CompactEv$void nsTArray_Impl<E, Alloc>::Compact() [with E = RefPtr<nsAtom>; Alloc = nsTArrayInfallibleAllocator] @ xpcom/ds/nsTArray.h#2005 ### SafeArguments: this
_ZN13nsTArray_ImplI6RefPtrI6nsAtomE27nsTArrayInfallibleAllocatorE5ClearEv$void nsTArray_Impl<E, Alloc>::Clear() [with E = RefPtr<nsAtom>; Alloc = nsTArrayInfallibleAllocator] @ xpcom/ds/nsTArray.h#1785 ### SafeArguments: this
Gecko_nsStyleSVG_SetContextPropertiesLength @ layout/style/ServoBindings.cpp#2075 ### SafeArguments: aSvg

coming from the lines of code

  size_type size = sizeof(Header) + length * aElemSize;
  void* ptr = nsTArrayFallibleAllocator::Realloc(mHdr, size);
  if (!ptr) {
    return;
  }
  mHdr = static_cast<Header*>(ptr);
  mHdr->mCapacity = length;

The analysis thinks 'ptr' is safe because it comes directly from a realloc, but once it's stored into a member (mHdr) that becomes irrelevant. However, it seems like an nsTArray's header is owned by the nsTArray, so if the array itself is safe, then its header should be too. This patch allows mHdr to inherit its safety from 'this'.

Attachment #8956502 - Flags: review?(bobbyholley)

Comment 18

•

6 years ago

Comment on attachment 8956498 [details] [diff] [review]
heap write analysis: whitelist all atomics

Review of attachment 8956498 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I think the reduced friction of people introducing new atomics without having to tweak the analysis is worth it.

Attachment #8956498 - Flags: review?(bobbyholley) → review+

Updated

•

6 years ago

Attachment #8956500 - Flags: review?(bobbyholley) → review+

Comment 19

•

6 years ago

Comment on attachment 8956499 [details] [diff] [review]
heap write analysis: explicitly whitelist stringbuffer canary field

Review of attachment 8956499 [details] [diff] [review]:
-----------------------------------------------------------------

It's not racey because it only gets written on the last release, when the caller has exclusive access.

Attachment #8956499 - Flags: review?(bobbyholley) → review+

Comment 20

•

6 years ago

Comment on attachment 8956502 [details] [diff] [review]
heap write analysis: nsTArray owns its header

Review of attachment 8956502 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing all of this!

Attachment #8956502 - Flags: review?(bobbyholley) → review+

Updated

•

6 years ago

No longer depends on: 1443227

Updated

•

6 years ago

Flags: needinfo?(bobbyholley)

Updated

•

6 years ago

status-firefox59: affected → wontfix

status-firefox61: --- → affected

tracking-firefox61: --- → +

Updated

•

6 years ago

Group: core-security → layout-core-security

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Assignee

Updated

•

6 years ago

Keywords: leave-open

Comment 22

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3ae1535c7890b8d3e12e48a21ec3841ba8e09fc3
https://hg.mozilla.org/integration/mozilla-inbound/rev/277393616f91d3a05a264b318c3ffb3780c6d5e5
https://hg.mozilla.org/integration/mozilla-inbound/rev/228f3feb90616a96a70b3cd90217c37325e996b7
https://hg.mozilla.org/integration/mozilla-inbound/rev/c407e9656999eec919da180ba4ddd0b4e456971c
https://hg.mozilla.org/mozilla-central/rev/3ae1535c7890
https://hg.mozilla.org/mozilla-central/rev/277393616f91
https://hg.mozilla.org/mozilla-central/rev/228f3feb9061
https://hg.mozilla.org/mozilla-central/rev/c407e9656999

Comment 23

•

6 years ago

Steve, this bug is marked as tracking 60 & 61. As we're starting to get late in the cycle, is there more work planned before the trains start moving?

Flags: needinfo?(sphink)

Updated

•

6 years ago

Priority: -- → P2

Assignee

Comment 24

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #23)
> Steve, this bug is marked as tracking 60 & 61. As we're starting to get late
> in the cycle, is there more work planned before the trains start moving?

There is certainly more work planned, but I don't expect to get to it soon, so we should probably fork bugs. One for making the job properly report errors, one (probably this one) for the fixes already made and landed, and at least one for further fixes (that are required before we can allow the job to report errors again.) But I'm not sure if that's the most convenient way for you tracking people, so I won't create them until you weigh in.

Flags: needinfo?(sphink) → needinfo?(ryanvm)

Comment 25

•

6 years ago

That all sounds good to me. Given where we are in the cycle, I like the idea of spinning off remaining work to new bugs which can be tracked independently of what's already been landed here.

Flags: needinfo?(ryanvm)

Comment 26

•

6 years ago

(In reply to Steve Fink [:sfink] [:s:] (PTO Apr9-12) from comment #24)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #23)
> > Steve, this bug is marked as tracking 60 & 61. As we're starting to get late
> > in the cycle, is there more work planned before the trains start moving?
> 
> There is certainly more work planned, but I don't expect to get to it soon,
> so we should probably fork bugs. One for making the job properly report
> errors, one (probably this one) for the fixes already made and landed, and
> at least one for further fixes (that are required before we can allow the
> job to report errors again.) But I'm not sure if that's the most convenient
> way for you tracking people, so I won't create them until you weigh in.

Wait, I'm confused. This is all fixed and landed on FF60, which is currently on beta, right? Is this about uplifting the patches to 59 or something? I don't think we need to do that...

Flags: needinfo?(sphink)

Comment 27

•

6 years ago

Per IRC discussion, this bug is fixed and further follow-ups will go into new bugs. It's too late for any Fx60 backports at this point, but is there still something we'd want to land on ESR60 for the 60.1 release shipping in June?

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox60: affected → wontfix

status-firefox61: affected → fixed

status-firefox-esr60: --- → affected

tracking-firefox-esr60: --- → ?

Keywords: leave-open

Resolution: --- → FIXED

Target Milestone: --- → mozilla61

Assignee

Comment 28

•

6 years ago

I'd say wontfix it. Everything so far has been false positives.

Flags: needinfo?(sphink)

Updated

•

6 years ago

status-firefox-esr60: affected → wontfix

tracking-firefox-esr60: ? → -

Assignee

Comment 29

•

6 years ago

I filed bug 1459323 for the continuation.

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Group: layout-core-security → core-security-release

Updated

•

6 years ago

Whiteboard: [adv-main61-]

Andrei Vaida [:avaida]

Updated

•

6 years ago

Flags: qe-verify-

Whiteboard: [adv-main61-] → [adv-main61-][post-critsmash-triage]