Open Bug 1750517 Opened 3 years ago Updated 2 years ago

use conservative, robust communication for update check to prevent broken updater in builds with issues

Categories

(Toolkit :: Application Update, task, P3)

task

Tracking

()

Tracking Status
firefox-esr91 --- affected
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 + wontfix

People

(Reporter: aryx, Unassigned)

References

Details

Secure connections in Nightly started to fail last Friday (bug 1750188). This prevents the affected Nightly from updating to a version with the fix (unknown if the background updater on Windows fixes this for that platform).

The use of a conservative mode for update requests to ensure networking for critical services doesn't break has been discussed.

Dana said "Updates are also signed, so maybe it's safe enough to not do revocation checking (or limit it to a subset of our checks)"

We do something like this for update checking. I'm guessing that you would like this to be done for update downloads as well? AFAIK, we don't have any way of doing that for BITS downloads (which, on Windows, are typically the default). But failed BITS downloads fall back to nsIIncrementalDownload. And it's presumably possible to do this for nsIIncrementalDownload.

@aryx Would that accomplish what you had in mind here?

Another possible option is to have update downloads use HTTP instead of HTTPS. I believe that we started using HTTPS for this somewhat recently. Because the updates are signed, I've never been particularly clear on why this is necessary in the first place. If it is not necessary, perhaps we should consider returning to HTTP transport for update downloads.

Flags: needinfo?(aryx.bugmail)

Benjamin, could you evaluate Kirk's suggestions for more robust application updates in case of connections impacted by HTTPS issues?

Flags: needinfo?(aryx.bugmail) → needinfo?(bbeurdouche)

(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #2)

Benjamin, could you evaluate Kirk's suggestions for more robust application updates in case of connections impacted by HTTPS issues?

302 to Dan who might have an opinion here as well.

Flags: needinfo?(bbeurdouche) → needinfo?(dveditz)
Priority: -- → P3

Changing the priority to p1 as the bug is tracked by a release manager for the current release.
See What Do You Triage for more information

Priority: P3 → P1
Priority: P1 → P3

We do something like this for update checking. I'm guessing that you would like this to be done for update downloads as well?

If we're not using beConservative for the downloads themselves we certainly should. The conservative flag turns off some newer HTTP and DNS features, and if the connection is proxied we'll fall-back and try a direct connection. It would not have helped in this case because that was not the problem! Switching back to insecure downloads (or rather, only secured by package signature) would not have helped either.

Once users were in this state all HTTPS connections would fail according to jschanck, and that includes the update check itself. That the update check was the problem is supported by the error message people got, as in bug 1750188 comment 14, where it says no new version was available and not that the download failed. [Tangent rant: I've always hated that! People can handle the difference between "You're all good" and "Check failed, try later", and there IS a difference! When we pretend everything is fine people will be happy but insecure. If we tell them there is a problem they have a fighting chance of fixing it if it persists, and possibly detecting a local attack.]

beConservative does not turn off certificate or revocation checking, or bypass any number of imagined HTTPS bugs we could have introduced -- it's not passed along to PSM as far as I can tell. There's a reasonable argument to be made that the TLS code should honor the flag and turn off features under active development (like previously TLS 1.3, or maybe currently ECH, maybe some new ciphers), but for most of those we don't ship the feature until it's got robust interop, and usually it's also avoided by not using the feature on the server which we also control. I think we did have a problem one time when our cloud provider turned on something without warning that ended up breaking things temporarily? But if it's broken on the server it can be fixed on the server.

We certainly don't want to turn off certificate checking (== no security at all), and I'm reluctant to turn off revocation checking, too. If we revoke our own cert it's probably a pretty serious problem. If we'd had this flag passed in should we have skipped new CRLite code and fallen back to standard OCSP revocation? That's a harder question for me. If a user had their updates attacked by an AITM it would be trivial to block the OCSP check and avoid the revocation. On the other hand, CRLite helping presumes that we've detected the bad cert, revoked it, and the user has gotten a CRLite update before the AITM starts. You could argue CRLite isn't "conservative", but I'd argue our update checks are amongst the most important places to get revocation right. (I don't care as much about revocation on our download server--TLS there is more important for privacy than security since updates are signed--but as noted above that wouldn't have helped in the 1750188 case)

Since we're mostly dealing only with Firefox and our servers, any problem should be found in testing -- and in this case it was! It was reported within just a few hours of landing on mozilla-central. Not nice that it broke everyone, but builds that are unusable for some chunk of users aren't unknown.

Flags: needinfo?(dveditz)

TL;DR

  1. We should use beConservative in nsIncrementalDownload.cpp
  2. We should brainstorm if "conservative" TLS features make sense. If so, PSM needs to know which requests are conservative and do the whatever we decide

#1 would not have avoided bug 1750188, but it does mean a bunch of the "Be Conservative" work would not actually keep updates working if the download can be broken using the non-conservative features. For example, the recent work to hook proxy failover to the conservative flag would not have actually rescued the folks who were captured by a set of malicious addons we battled last summer/fall.

I'm not sure what goes in #2.

Severity: -- → S3
You need to log in before you can comment on or make changes to this bug.