Closed Bug 1442804 Opened 6 years ago Closed 6 years ago

Heap write analysis is detecting but not reporting errors

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 - unaffected
firefox-esr60 - wontfix
firefox58 --- wontfix
firefox59 + wontfix
firefox60 + wontfix
firefox61 + fixed

People

(Reporter: sfink, Assigned: sfink)

References

Details

(Keywords: sec-audit, Whiteboard: [adv-main61-][post-critsmash-triage])

Attachments

(6 files)

The heap write hazard analysis job is running and currently detecting 43 errors, but is showing up green on treeherder. The status code appears to not be getting propagated properly.

That means it is probably reporting quite a few legitimate races in the current code.
Assignee: nobody → sphink
Most of them look sane fwiw.
The problem appears to already be in mozilla-release, where it reports 30 hazards.
So, just took a quick look:

 * Gecko_ContentList_AppendAll is mt-only.
 * I don't know what Gecko_GetNextStyleChild is complaining about, but nsCanvasFrame::AppendAnonymousContentTo seems thread-safe to me, assuming the argument is owned, which it is.
 * Gecko_GetLookAndFeelSystemColor: We explicitly call LookAndFeel::NativeInit on the main thread before calling into Servo to prevent this race:

https://searchfox.org/mozilla-central/rev/908e3662388a96673a0fc00b84c47f832a5bea7c/layout/style/ServoStyleSet.cpp#461

The rest of them look like from bug 1418161 which explicitly handles the race, or from bug 1410276, which given it runs during destruction of the string buffer I _think_ it's ok (since the thread freeing the last reference must be necessarily the only thread referencing the string buffer still for the code to be sound).
The reporting error was introduced in bug 1339989. I did 'exit_status = 1' in a shell script. Which means to run a command named 'exit_status' with two arguments, '=' and '1', rather than setting the variable exit_status as I intended. :(
Not sure who to pick for review, but it's trivial, and it can't land until we do something about the hazards anyway.
Attachment #8955721 - Flags: review?(ted)
I'm going to call _this_ bug "sec-audit" because it sounds like you're still working through the list. If some or all of those errors need to be fixed we might want to spawn new bugs for each instance just to keep it all straight and can rate the security severity of each as appropriate.

(In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> The rest of them look like [...] which explicitly handles the race,

Is there a way to annotate the code so the tool won't warn in these cases? Let's say we fix some but still end up with a report that has 28 "OK" errors every time it runs. We likely won't notice when 29 and 30 creep in there that might be real problems.
Keywords: sec-audit
If it ends up being easier to patch them all in this bug (because "no more errors" ensures we didn't forget one) then please clear sec-audit and replace it with the severity of the worst issue we're fixing. Probably sec-high for a csectype-race.
(In reply to Daniel Veditz [:dveditz] from comment #7)
> I'm going to call _this_ bug "sec-audit" because it sounds like you're still
> working through the list.

I think that's right. I attached a patch to fix the reporting problem, but that patch fixes a sec-audit problem, nothing more.

> If some or all of those errors need to be fixed we
> might want to spawn new bugs for each instance just to keep it all straight
> and can rate the security severity of each as appropriate.

Agreed.

> (In reply to Emilio Cobos Álvarez [:emilio] from comment #4)
> > The rest of them look like [...] which explicitly handles the race,
> 
> Is there a way to annotate the code so the tool won't warn in these cases?
> Let's say we fix some but still end up with a report that has 28 "OK" errors
> every time it runs. We likely won't notice when 29 and 30 creep in there
> that might be real problems.

Yes, and that's exactly what we did when this was reporting correctly. There are annotation mechanisms, but we need to look at each one to figure out where to apply the annotation. Most of them are obvious. We want to get this back to zero to make it easy to know what's new.

The way this is set up, we *could* allow 28 errors through, in which case it would only turn the job red if you introduced a 29th, but then it would be a pain to look through the log to identify which is the new one. So we prefer to keep the allowed count at zero. Which worked great until I broke the reporting and it started lying about being green.
> I don't know what Gecko_GetNextStyleChild is complaining about

Presumably the virtual call to AppendAnonymousContentTo.  I'm _guessing_ that the nsCanvasFrame version is just one possible target that was picked for that virtual call?  In general, we could be calling any frame's AppendAnonymousContentTo here.
Comment on attachment 8955721 [details] [diff] [review]
Fix setting exit_status

Review of attachment 8955721 [details] [diff] [review]:
-----------------------------------------------------------------

That sucks, I hate shell scripting. Have you ever run this script through shellcheck? Every time I do that to any script I've written it points out that like 90% of my script is wrong.
Attachment #8955721 - Flags: review?(ted) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #11)
> Comment on attachment 8955721 [details] [diff] [review]
> Fix setting exit_status
> 
> Review of attachment 8955721 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> That sucks, I hate shell scripting. Have you ever run this script through
> shellcheck? Every time I do that to any script I've written it points out
> that like 90% of my script is wrong.

I just started using shellcheck via flycheck, and it was correctly marking this line red. But I'm so used to seeing the green warnings that I didn't pay attention and figured it out the old way instead. :-(
I can fix the remaining hazards here.
Flags: needinfo?(bobbyholley)
Depends on: 1443227
Depends on: 1443233
I haven't really thought through this one. Is it really ok to whitelist atomics? I mean, they tend to be used when you know there are multiple threads around, and the order of operations doesn't matter, but it's not like it would be hard to create a race condition with atomics. Is this a good assumption or not?

I did this because my local run got a huge number of hazards from __atomic_compare_exchange_8 or something like that.
Attachment #8956498 - Flags: review?(bobbyholley)
I was originally planning to file separate bugs for all the problems found, but most of them are relatively minor annotation tweaks.

I'm sort of assuming that the mCanary field is intended to be racy. It gets set just before you free things, after all.
Attachment #8956499 - Flags: review?(bobbyholley)
Also, I'm r?bholley on everything, on the assumption that you'll distribute them out if needed.

If I understand the comment correctly, GetAutoArrayBuffer gives back a pointer that is another way of accessing 'this', and so should have the same safety. Uh... I don't actually remember which hazard this was for, unless it was for the mHdr one, but I have an additional fix for that coming up.
Attachment #8956500 - Flags: review?(bobbyholley)
Example hazard:

Error: Field write nsTArrayHeader.mCapacity
Location: _ZN13nsTArray_baseI27nsTArrayInfallibleAllocator25nsTArray_CopyWithMemutilsE14ShrinkCapacityEmm$void nsTArray_base<Alloc, Copy>::ShrinkCapacity(nsTArray_base<Alloc, Copy>::size_type, size_t) [with Alloc = nsTArrayInfallibleAllocator; Copy = nsTArray_CopyWithMemutils; nsTArray_base<Alloc, Copy>::size_type = long unsigned int; size_t = long unsigned int] @ obj-analyzed/dist/include/nsTArray-inl.h#241 ### SafeArguments: this
Stack Trace:
_ZN13nsTArray_ImplI6RefPtrI6nsAtomE27nsTArrayInfallibleAllocatorE7CompactEv$void nsTArray_Impl<E, Alloc>::Compact() [with E = RefPtr<nsAtom>; Alloc = nsTArrayInfallibleAllocator] @ xpcom/ds/nsTArray.h#2005 ### SafeArguments: this
_ZN13nsTArray_ImplI6RefPtrI6nsAtomE27nsTArrayInfallibleAllocatorE5ClearEv$void nsTArray_Impl<E, Alloc>::Clear() [with E = RefPtr<nsAtom>; Alloc = nsTArrayInfallibleAllocator] @ xpcom/ds/nsTArray.h#1785 ### SafeArguments: this
Gecko_nsStyleSVG_SetContextPropertiesLength @ layout/style/ServoBindings.cpp#2075 ### SafeArguments: aSvg

coming from the lines of code

  size_type size = sizeof(Header) + length * aElemSize;
  void* ptr = nsTArrayFallibleAllocator::Realloc(mHdr, size);
  if (!ptr) {
    return;
  }
  mHdr = static_cast<Header*>(ptr);
  mHdr->mCapacity = length;

The analysis thinks 'ptr' is safe because it comes directly from a realloc, but once it's stored into a member (mHdr) that becomes irrelevant. However, it seems like an nsTArray's header is owned by the nsTArray, so if the array itself is safe, then its header should be too. This patch allows mHdr to inherit its safety from 'this'.
Attachment #8956502 - Flags: review?(bobbyholley)
Comment on attachment 8956498 [details] [diff] [review]
heap write analysis: whitelist all atomics

Review of attachment 8956498 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, I think the reduced friction of people introducing new atomics without having to tweak the analysis is worth it.
Attachment #8956498 - Flags: review?(bobbyholley) → review+
Attachment #8956500 - Flags: review?(bobbyholley) → review+
Comment on attachment 8956499 [details] [diff] [review]
heap write analysis: explicitly whitelist stringbuffer canary field

Review of attachment 8956499 [details] [diff] [review]:
-----------------------------------------------------------------

It's not racey because it only gets written on the last release, when the caller has exclusive access.
Attachment #8956499 - Flags: review?(bobbyholley) → review+
Comment on attachment 8956502 [details] [diff] [review]
heap write analysis: nsTArray owns its header

Review of attachment 8956502 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for fixing all of this!
Attachment #8956502 - Flags: review?(bobbyholley) → review+
No longer depends on: 1443227
Flags: needinfo?(bobbyholley)
Group: core-security → layout-core-security
Keywords: leave-open
Steve, this bug is marked as tracking 60 & 61. As we're starting to get late in the cycle, is there more work planned before the trains start moving?
Flags: needinfo?(sphink)
Priority: -- → P2
(In reply to Ryan VanderMeulen [:RyanVM] from comment #23)
> Steve, this bug is marked as tracking 60 & 61. As we're starting to get late
> in the cycle, is there more work planned before the trains start moving?

There is certainly more work planned, but I don't expect to get to it soon, so we should probably fork bugs. One for making the job properly report errors, one (probably this one) for the fixes already made and landed, and at least one for further fixes (that are required before we can allow the job to report errors again.) But I'm not sure if that's the most convenient way for you tracking people, so I won't create them until you weigh in.
Flags: needinfo?(sphink) → needinfo?(ryanvm)
That all sounds good to me. Given where we are in the cycle, I like the idea of spinning off remaining work to new bugs which can be tracked independently of what's already been landed here.
Flags: needinfo?(ryanvm)
(In reply to Steve Fink [:sfink] [:s:] (PTO Apr9-12) from comment #24)
> (In reply to Ryan VanderMeulen [:RyanVM] from comment #23)
> > Steve, this bug is marked as tracking 60 & 61. As we're starting to get late
> > in the cycle, is there more work planned before the trains start moving?
> 
> There is certainly more work planned, but I don't expect to get to it soon,
> so we should probably fork bugs. One for making the job properly report
> errors, one (probably this one) for the fixes already made and landed, and
> at least one for further fixes (that are required before we can allow the
> job to report errors again.) But I'm not sure if that's the most convenient
> way for you tracking people, so I won't create them until you weigh in.

Wait, I'm confused. This is all fixed and landed on FF60, which is currently on beta, right? Is this about uplifting the patches to 59 or something? I don't think we need to do that...
Flags: needinfo?(sphink)
Per IRC discussion, this bug is fixed and further follow-ups will go into new bugs. It's too late for any Fx60 backports at this point, but is there still something we'd want to land on ESR60 for the 60.1 release shipping in June?
Status: NEW → RESOLVED
Closed: 6 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
I'd say wontfix it. Everything so far has been false positives.
Flags: needinfo?(sphink)
I filed bug 1459323 for the continuation.
Group: layout-core-security → core-security-release
Whiteboard: [adv-main61-]
Flags: qe-verify-
Whiteboard: [adv-main61-] → [adv-main61-][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: