Open Bug 919486 Opened 11 years ago Updated 10 years ago

[tracking] Report signed integer overflows within our codebase

Categories

(mozilla.org :: Security Assurance: Applications, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

People

(Reporter: decoder, Unassigned)

References

(Depends on 1 open bug, )

Details

(Keywords: meta, sec-audit)

We have several occurrences of C/C++ signed integer overflows within our codebase. These can be problematic because signed integer overflow is undefined behavior (unless flags like -ftrapv are used, which we don't use) and the compiler can make assumptions about the overflow not happening when performing optimizations. This can easily lead to code that produces different (and incorrect) results depending on the optimization level, as well as possible security issues. One recent issue with GCC is described here in detail:

http://kqueue.org/blog/2013/09/17/cltq/

Furthermore, the CERT Secure Coding Standard explicitly recommends to avoid this behavior (INT32-C, CERT Secure Coding Standard, linked in the URL field).

Fortunately, we have utilities by now that make it easy for us to detect and report signed integer overflows at runtime, rather than using a static analysis that will likely yield false positives.

Currently, a try run (mochitests, crashtests, reftests, jsreftests an xpcshell tests) reveals a total of 125 distinct signed integer overflow issues. I can file and probably even assign these in an automated way with little effort, but I wanted to confirm first that others are ok with this :)

Also I'm not fully sure yet if these bugs should be security-sensitive by default or not. We are checking all of our tests with ASan which would easily detect out-of-bounds access (which is the main problem with integer overflows in security-context), however, Clang does different optimizations compared to GCC (in particular, the blog post linked above only applies to GCC, not to Clang), so we might still have some issues on GCC that we don't detect with ASan. I assume overall most of the bugs will be harmless though.

Thoughts?
Flags: needinfo?(mrbkap)
Depends on: 866608, 811355, 811264, 811122
dveditz also suggested that some overflows could be reported as one bug (e.g. on a file level). The assumption is that some of these errors result from a single pattern that needs to be fixed. I don't know how well that fits the results though. Fwiw, 104 of the 125 errors I'm tracking so far are in layout/.
Flags: needinfo?(dbaron)
Flags: needinfo?(dbaron) → needinfo?(matspal)
I think it's probably worth filing bugs on groups of related problems -- probably on a broader level than per-file.  (Maybe per-module, but with separate bugs for conceptually different bugs?)
I looked through the layout/ entries in the list and they are almost all 'nscoord' arithmetic.
In many cases the code already dealt with it in some way, std::max(0, ...) for example.
I really don't think those are worth spending time on fixing.

There were a couple under layout/xul/ that looked like non-nscoord arithmetic but I don't
think that code is exposed to web content anyway, and they looked harmless enough to me.

The layout code in general is resilient against nscoord overflows - nothing bad will happen.
So I suggest we leave it as is unless it affects the rendering of real web sites.

If we feel like we must fix nscoord overflows, then we should do it in a systematic
way because every single arithmetic operation is affected, not just the ones caught here.
Flags: needinfo?(matspal)
Are you sure compilers won't remove your call to std::max after analyzing the overflow and non-overflow cases?
(In reply to Mats Palmgren (:mats) from comment #6)
> I looked through the layout/ entries in the list and they are almost all
> 'nscoord' arithmetic.
> In many cases the code already dealt with it in some way, std::max(0, ...)
> for example.
> I really don't think those are worth spending time on fixing.

Please note that we're talking about *signed* integer overflow and that the overflow *is* happening. The runtime analysis reveals that the arithmetic operation indeed overflows, in every single case reported here.

It is well possible, that your checks are optimized away because signed integer overflow is undefined behavior and the compiler is fine to assume that it does *not* happen. 

I talked to some other developers and compiler people and they agreed that we should not have any occurrences of signed integer overflow in our codebase in general.

> There were a couple under layout/xul/ that looked like non-nscoord
> arithmetic but I don't
> think that code is exposed to web content anyway, and they looked harmless
> enough to me.

Any such bug remaining in the codebase makes fuzzing a lot harder because this will trigger all the time.

> 
> If we feel like we must fix nscoord overflows, then we should do it in a
> systematic
> way because every single arithmetic operation is affected, not just the ones
> caught here.

That sounds like a good approach to me, if we can solve the problem at its root! :) Thanks for looking so far.
Depends on: 920471
(In reply to Christian Holler (:decoder) from comment #8)
> It is well possible, that your checks are optimized away because signed
> integer overflow is undefined behavior and the compiler is fine to assume
> that it does *not* happen. 

What I'm saying is that that's fine when it comes to nscoord overflow
in layout.  The rendering will be messed up anyway when you use
width:9999999em and the like.

> Any such bug remaining in the codebase makes fuzzing a lot harder because
> this will trigger all the time.

I sympathize strongly with fuzzing efforts in general.  But the cost of
fixing all nscoord calculations is very high, so perhaps we should consider
alternatives?

One alternative is to run these tests in a NS_COORD_IS_FLOAT build.
That will fix the nscoord problem while still allowing layout code to be
checked for other overflow errors.  It will also have the nice side-effect
of ensuring that our NS_COORD_IS_FLOAT support is working properly.
(In reply to Mats Palmgren (:mats) from comment #9)
> 
> What I'm saying is that that's fine when it comes to nscoord overflow
> in layout.  The rendering will be messed up anyway when you use
> width:9999999em and the like.

I think we might be talking past each other. I am not worried about the result of the overflow itself here (if it was unsigned, then it would still overflow, but that would be ok if you expect that). I am worried about the undefined behavior which can lead to all sorts of stuff happening here. Not only can the result of the overflow heavily vary between compilers and optimization levels (leading to inconsistent results or problems that rely on certain optimization behavior), but overflow checks that you added in the code can also be removed by the compiler (and that's a realistic thing, not a theoretical problem).

I also think it does not matter if we "don't care" about the overflow. We should try to weed out undefined behavior from our codebase in general, especially if the type of undefined behavior is known to be optimized by existing compilers already.

Furthermore, we should follow what the secure coding guidelines from e.g. CERT say, and they clearly state that signed integer overflow must be avoided.
We should just turn on -fwrapv or -fno-strict-overflow, switching to a dialect of C++ in which signed integer overflow is defined behavior.  Fixing only the problems we spot with UBSan & STACK is a losing game.  Many other open-source projects, including the Linux Kernel and CPython, made the same decision.

MSVC doesn't support -frwapv, but it's also less likely to surprise us since it updates more slowly than gcc or clang.
(In reply to Jesse Ruderman from comment #11)
> MSVC doesn't support -frwapv, but it's also less likely to surprise us since
> it updates more slowly than gcc or clang.

It's still likely to optimize out things that overflow...
Maybe we could use -ftrapv on debug builds?
(In reply to Mike Hommey [:glandium] from comment #13)
> Maybe we could use -ftrapv on debug builds?

Note this is not a typo for -fwrapv. I do mean -ftrapv.
(In reply to Mike Hommey [:glandium] from comment #14)
> (In reply to Mike Hommey [:glandium] from comment #13)
> > Maybe we could use -ftrapv on debug builds?
> 
> Note this is not a typo for -fwrapv. I do mean -ftrapv.

I don't fully understand that. -ftrapv is used to detect the overflows at runtime right? We already have that. All of the errors I'm trying to report/get fixed here are actual overflows happening at runtime, during our tests.
(In reply to Mats Palmgren (:mats) from comment #6)
> If we feel like we must fix nscoord overflows, then we should do it in a
> systematic
> way because every single arithmetic operation is affected, not just the ones
> caught here.

Absolutely agreed here, by the way.  Your list of ~100 or so code locations in layout just demonstrates poor test suite coverage of overflows; the actual number of code locations I'd estimate to be between 1000 and 10000.

I agree our current approach is bad (bug 575011), but I'd like to hear what the proposed replacement is.  The reason we haven't replaced it is that we don't have an alternative that's clearly better.  And this (meta)bug is almost certainly the wrong place.  Bug 575011 might be the right place, but I think dev-tech-layout would be better.
(In reply to David Baron [:dbaron] (needinfo? me) from comment #16)
> I agree our current approach is bad (bug 575011), but I'd like to hear what
> the proposed replacement is.  The reason we haven't replaced it is that we
> don't have an alternative that's clearly better.  And this (meta)bug is
> almost certainly the wrong place.  Bug 575011 might be the right place, but
> I think dev-tech-layout would be better.

I think dev-platform is the best place to have this discussion because the scope of the problem is literally the entire codebase, not just layout.
Depends on: 922593
Depends on: 922595
Depends on: 922600
Depends on: 922601
Depends on: 922603
(I don't think I have anything to add here)
Flags: needinfo?(mrbkap)
Filed bug 1031653 on my suggestion (comment 11) to use -fwrapv.
You need to log in before you can comment on or make changes to this bug.