ImperialViolet

Revocation checking and Chrome's CRL (05 Feb 2012)

When a browser connects to an HTTPS site it receives signed certificates which allow it to verify that it's really connecting to the domain that it should be connecting to. In those certificates are pointers to services, run by the Certificate Authorities (CAs) that issued the certificate, that allow the browser to get up-to-date information.

All the major desktop browsers will contact those services to inquire whether the certificate has been revoked. There are two protocols/formats involved: OCSP and CRL, although the differences aren't relevant here. I mention them only so that readers can recognise the terms in other discussions.

The problem with these checks, that we call online revocation checks, is that the browser can't be sure that it can reach the CA's servers. There are lots of cases where it's not possible: captive portals are one. A captive portal frequently requires you to sign in on an HTTPS site, but blocks traffic to all other sites, including the CA's OCSP servers.

If browsers were to insist on talking to the CA before accepting a certificate, all these cases would stop working. There's also the concern that the CA may experience downtime and it's bad engineering practice to build in single points of failure.

Therefore online revocation checks which result in a network error are effectively ignored (this is called “soft-fail”). I've previously documented the resulting behaviour of several browsers.

But an attacker who can intercept HTTPS connections can also make online revocation checks appear to fail and so bypass the revocation checks! In cases where the attacker can only intercept a subset of a victim's traffic (i.e. the SSL traffic but not the revocation checks), the attacker is likely to be a backbone provider capable of DNS or BGP poisoning to block the revocation checks too.

If the attacker is close to the server then online revocation checks can be effective, but an attacker close to the server can get certificates issued from many CAs and deploy different certificates as needed. In short, even revocation checks don't stop this from being a real mess.

So soft-fail revocation checks are like a seat-belt that snaps when you crash. Even though it works 99% of the time, it's worthless because it only works when you don't need it.

While the benefits of online revocation checking are hard to find, the costs are clear: online revocation checks are slow and compromise privacy. The median time for a successful OCSP check is ~300ms and the mean is nearly a second. This delays page loading and discourages sites from using HTTPS. They are also a privacy concern because the CA learns the IP address of users and which sites they're visiting.

On this basis, we're currently planning on disabling online revocation checks in a future version of Chrome. (There is a class of higher-security certificate, called an EV certificate, where we haven't made a decision about what to do yet.)

Pushing a revocation list

Our current method of revoking certificates in response to major incidents is to push a software update. Microsoft, Opera and Firefox also push software updates for serious incidents rather than rely on online revocation checks. But our software updates require that users restart their browser before they take effect, so we would like a lighter weight method of revoking certificates.

So Chrome will start to reuse its existing update mechanism to maintain a list of revoked certificates, as first proposed to the CA/Browser Forum by Chris Bailey and Kirk Hall of AffirmTrust last April. This list can take effect without having to restart the browser.

An attacker can still block updates, but they have to be able to maintain the block constantly, from the time of revocation, to prevent the update. This is much harder than blocking an online revocation check, where the attacker only has to block the checks during the attack.

Since we're pushing a list of revoked certificates anyway, we would like to invite CAs to contribute their revoked certificates (CRLs) to the list. We have to be mindful of size, but the vast majority of revocations happen for purely administrative reasons and can be excluded. So, if we can get the details of the more important revocations, we can improve user security. Our criteria for including revocations are:

The CRL must be crawlable: we must be able to fetch it over HTTP and robots.txt must not exclude GoogleBot.
The CRL must be valid by RFC 5280 and none of the serial numbers may be negative.
CRLs that cover EV certificates are taken in preference, while still considering point (4).
CRLs that include revocation reasons can be filtered to take less space and are preferred.

For the curious, there is a tool for fetching and parsing Chrome's list of revoked certificates at https://github.com/agl/crlset-tools.

Extracting Mozilla's Root Certificates (30 Jan 2012)

When people need a list of root certificates, they often turn to Mozilla's. However, Mozilla doesn't produce a nice list of PEM encoded certificates. Rather, they keep them in a form which is convenient for NSS to build from: https://mxr.mozilla.org/mozilla/source/security/nss/lib/ckfw/builtins/certdata.txt?raw=1.

Several people have written quick scripts to try and convert this into PEM format, but they often miss something critical: some certificates are explicitly distrusted. These include the DigiNotar certificates and the misissued COMODO certificates. If you don't parse the trust records from the NSS data file, then you end up trusting these too! There's at least one, major example of this that I know of.

(Even with a correct root file, unless you do hard fail revocation checking you're still vulnerable to the misissued COMODO certificates.)

So, at the prodding of Denton Gentry, I've open-sourced a tool for converting NSS's file to PEM format: extract-nss-root-certs. (At the time of writing it requires a 6g built from the weekly or current tree (hg -r weekly), not the release tree. A few of the APIs have changed since the last Go release was done. This will be resolved when Go 1.0 is released.)

BEAST followup (15 Jan 2012)

(See the original post for background.)

Everyone seems to have settled on 1/n-1 record splitting as a workaround for the BEAST attack in TLS 1.0 and SSLv3. Briefly: 1/n-1 record splitting breaks CBC encrypted records in two: the first with only a single byte of application data and the second with the rest. This effectively randomises the IV and stops the attack.

The workaround which OpenSSL tried many years ago, and which hit significant issues, was 0/n record splitting. It's the same thing, but with the first record being empty. The problem with it was that many stacks processed the empty record and returned a 0-byte read, which higher levels took to mean EOF.

1/n-1 record splitting doesn't hit that problem, but it turns out that there's a fair amount of code out there that assumes that the entire HTTP request comes in a single read. The single byte record breaks that.

We first implemented 1/n-1 record splitting in Chrome 15 but backed off after only a couple of days because logging into several large sites broke. But that did motivate the sites to fix things so that we could switch it on in Chrome 16 and it stuck that time.

Opera also implemented it around this time, but I think Chrome took the brunt of the bug reports and it's time consuming dealing with them. Myself and a colleague have been emailing and phoning a lot of sites and vendors while dealing with upset users and site admins. Chrome certainly paid a price for moving before Firefox and IE but then we're nice like that.

Thankfully, this week, Microsoft released a security update which implements 1/n-1 record splitting in SChannel and switches it on in IE. (Although it defaults to off for other users of SChannel, unlike NSS.) Now the sites which broke with Chrome 16 are also broken in a patched IE and that takes some pressure off us. In a few weeks, Firefox 10 should be released and then we'll be about as good as we're going to get.

After taking the brunt with Chrome 16, there is one case that I'm not going to fight: Plesk can't handle POST payloads that don't come in a single read. Chrome (currently) sends POSTs as two writes: one for the HTTP headers and a second for the POST body. That means that each write is split into two records and Plesk breaks because of the second split. IE and Firefox send the headers and body in a single write, so there's only a single split in the HTTP headers, which Plesk handles.

Chrome will start merging small POST bodies into the headers with Chrome 17 (hopefully) and this will fix Plesk. Also, merging as Firefox and IE do saves an extra packet so it's worthwhile on its own. Once again, anything that's mostly true soon becomes an unwritten rule on the Internet.

It's worth contrasting the BEAST response to the renegotiation attack. The BEAST workaround caused a number of problems, but it worked fine for the vast majority of sites. The renegotiation fix requires that very nearly every HTTPS site on the Internet be updated and then that browsers refuse to talk to unpatched servers.

I'd bet that we'll not manage to get enough patched servers for any browser to require it this side of 2020. Unpatched servers can still disable renegotiation to protect themselves, but it's still not hard to find major sites that allow insecure renegotiation (www.chase.com was literally the second site that I tried).

OTR in Go (14 Jan 2012)

“Off the record” is, unfortunately, an overloaded term. To many it's feature in gTalk and AOL IM which indicates that the conversation isn't logged. However, to crypto folks it's a protocol for secure chat.

(In fact, resoloving the ambiguity is on the EFF's wish list.)

Pidgin has been my chat client of choice for some time because it's pretty fully featured and supports OTR via a plugin. However, I just don't trust it from a security point of view. The latest incident was only a couple of weeks ago: CVE-2011-3919.

So, I implemented otr in Go, as well as an XMPP library and client. It's an absolutely minimal client (except for OTR support) and implements only what I absolutely need in a client.

But it does mean that the whole stack, including the TLS library, is implemented in a memory safe language. (On the other hand, pretty much everything in that stack, from the modexp function to the terminal handling code was written by me and has never really been audited. I'm a decent programmer but I'm sure there are some howlers of security issues in there somewhere.)

Certificate Transparency (29 Nov 2011)

(I don't have comments on this blog, but you can comment on my Google+ post.)

Ben Laurie and I have been working on a longer term plan for improving the foundations of the certificate infrastructure on which most Internet transport security is based on these days. Although Chrome has public key pinning for some domains, which limits the set of permitted certificates, we don't see public key pinning as a long term solution (and nor was it ever designed to be).

For the 10 second summary of the plan, I'll quote Ben: “certificates are registered in a public audit log. Servers present proofs that their certificate is registered, along with the certificate itself. Clients check these proofs and domain owners monitor the logs.”. I would add only that anyone can check the logs: the certificate infrastructure would be fully transparent.

We now have an outline of the basic idea and will be continuing to flesh it out in the coming months, hopefully in conjunction with other browser vendors.

But I thought that, at the outset, it would be helpful to describe some of the limitations to the design space, as I see them:

No side-channels

As I've previously described, side-channels occur when a browser needs to contact a server other than the immediate destination in order to verify a certificate. Revocation checking with OCSP is an example of a side-channel used today.

But in order to be effective, side-channels invariably need to block the certificate verification until they complete, and that's a big problem. The Internet isn't fully connected. Captive portals, proxies and firewalls all mean that the only thing you can really depend on talking to is the immediate destination server. Because of this, browsers have never been able to make OCSP lookups blocking, and therefore OCSP is basically useless for security.

And that's not to mention the privacy, performance and functionality issues that arise from needing side-channels. (What happens when the side-channel server goes down?)

So our design requires that the servers send us the information that we require. We can use side-channels to check up on the logs, but it's an asynchronous lookup.

It's not opt-in, it's all-in

SSL Stripping is a very easy and very effective attack. HSTS prevents it and is as easy to deploy as anything can be for HTTPS sites. But, despite all this, and despite a significant amount of evangelism, take up has been very limited, even by sites which are HTTPS only and the subject of attacks.

While HSTS really has to be opt-in, a solution to the certificate problem doesn't. Although our scheme is incrementally deployable, the eventual aim is that it's required for everybody. Thankfully, since certificates have to be renewed there's an obvious means to incrementally deploy it: require it for certificates issued after a certain date. Although an eventual hard requirement is still needed, it's a lot less of a problem.

It's easy on the server operator

Since the aim is to make it a requirement for all servers, we've sacrificed a lot in order to make it very easy on the server operator. For most server operators, their CA will probably take care of fetching the audit proofs, meaning there's no additional work at all.

Some initial designs included short-lived log signatures, which would have solved the revocation problem. (Revocation would simply be a matter of instructing the logs to stop signing a given certificate.) However, this would have required server operators to update their audit proofs on a near-daily basis. After discussions it became clear that such a requirement wouldn't be tenable for many and so we reluctantly dropped it.

We are also sacrificing decentralisation to make things easy on the server. As I've previously argued, decentralisation isn't all it's cracked up to be in most cases because 99.99% of people will never change any default settings, so we haven't given up much. Our design does imply a central set of trusted logs which is universally agreed. This saves the server from possibly having to fetch additional audit proofs at runtime, something which requires server code changes and possible network changes.

There are more valid certificates than the one that's currently serving

Cheap virtual hosting and EC2 have made multi-homed services common. Even small to medium scale sites have multiple servers these days. So any scheme that asserts that the currently serving certificate is the only valid certificate will run into problems when certificates change. Unless all the servers are updated at exactly the same time, then users will see errors during the switch. These schemes also make testing a certificate with a small number of users impossible.

In the end, the only real authority on whether a certificate is valid is the site itself. So we don't rely on external observations to decide on whether a certificate is valid, instead to seek to make the set of valid certificates for a site public knowledge (which it currently isn't), so that the site can determine whether it's correct.

It's not easy to do

We believe that this design will have a significant, positive impact on an important part of Internet security and that it's deployable. We also believe that any design that shares those two properties ends up looking a lot like it. (It's no coincidence that we share several ideas with the EFF's Sovereign Keys.)

None the less, deployment won't be easy and, hopefully, we won't be pushing it alone.

Forward secrecy for Google HTTPS (22 Nov 2011)

As announced on the Google Security Blog, Google HTTPS sites now support forward secrecy. What this means in practice is two things:

Firstly, the preferred cipher suite for most Google HTTPS servers is ECDHE-RSA-RC4-SHA. If you have a client that supports it, you'll be using that ciphersuite. Chrome and Firefox, at least, support it.

Previously we were using RSA-RC4-SHA, which means that the client (i.e. browser) picks a random key for the session, encrypts it with the server's public key and sends it to the server. Since only the holder of the private key can decrypt the session key, the connection is secure.

However, if an attacker obtains the private key in the future then they can decrypt recorded traffic. The encrypted session key can be decrypted just as well in ten years time as it can be decrypted today and, in ten years time, the attacker has much more computing power to break the server's public key. If an attacker obtains the private key, they can decrypt everything encrypted to it, which could be many months of traffic.

ECDHE-RSA-RC4-SHA means elliptic curve, ephemeral Diffie-Hellman, signed by an RSA key. You can see a previous post about elliptic curves for an introduction, but the use of elliptic curves is an implementation detail.

Ephemeral Diffie-Hellman means that the server generates a new Diffie-Hellman public key for each session and signs the public key. The client also generates a public key and, thanks to the magic of Diffie-Hellman they both generate a mutual key that no eavesdropper can know.

The important part here is that there's a different public key for each session. If the attacker breaks a single public key then they can decrypt only a single session. Also, the elliptic curve that we're using (P-256) is estimated to be as strong as a 3248-bit RSA key (by ECRYPT II), so it's unlikely that the attacker will ever be able to break a single instance of it without a large, quantum computer.

While working on this, Bodo Möller, Emilia Kasper and I wrote fast, constant-time implementations of P-224, P-256 and P-521 for OpenSSL. This work has been open-sourced and submitted upstream to OpenSSL. We also fixed several bugs in OpenSSL's ECDHE handling during deployment and those bug fixes are in OpenSSL 1.0.0e.

Session Tickets

The second part of forward secrecy is dealing with TLS session tickets.

Session tickets allow a client to resume a previous session without requiring that the server maintain any state. When a new session is established the server encrypts the state of the session and sends it back to the client, in a session ticket. Later, the client can echo that encrypted session ticket back to the server to resume the session.

Since the session ticket contains the state of the session, and thus keys that can decrypt the session, it too must be protected by ephemeral keys. But, in order for session resumption to be effective, the keys protecting the session ticket have to be kept around for a certain amount of time: the idea of session resumption is that you can resume the session in the future, and you can't do that if the server can't decrypt the ticket!

So the ephemeral, session ticket keys have to be distributed to all the frontend machines, without being written to any kind of persistent storage, and frequently rotated.

Result

We believe that forward secrecy provides a fairly significant benefit to our users and we've contributed our work back to OpenSSL in the hope that others will make use of it.

Classifying solutions to the certificate problem (08 Oct 2011)

This is something that I wrote up internally, but which Chris Palmer suggested would be useful to post publicly for reference. It presents some, somewhat artificial, axis on which I believe that all proposed solutions to addressing weaknesses in the current certificate ecosystem fall. By dividing the problem into those decisions, I hope to make the better solutions clearer.

Axis 1: Private vs public signing

At the moment we have private signing. A CA can sign a certificate and tell nobody about it and the certificate is still valid. This is the reason that we have to go crawl the Internet in order to figure out how many intermediate CA certs there are.

In public signing, certificates aren't valid unless they're public.

There are degrees of how public schemes are. Convergence is a public scheme: certificates have to be visible to the notaries, but it's a lot less public than a scheme where all certificates have to be published. And published where? Highly public schemes imply some centralisation.

Private schemes don't protect us from CAs acting in bad faith or CAs which have been compromised. Public schemes help a lot in these cases, increasingly so the more public they are. Although a certificate might be valid for a short time, evidence of misbehavior is recorded. Public schemes also allow each domain to monitor all the certificates for their domain. Fundamentally, the only entity that can answer the question of whether a certificate is legitimate is the subject of the certificate itself. (See this tale of a Facebook certificate.)

Private schemes have the advantage of protecting the details of internal networks (i.e. not leaking the name www.corp.example.com).

Axis 2: Side channels or not

Revocation checking which calls back to the CA is a side channel. Convergence notaries are too. OCSP stapling is not, because the server provides the client with everything that it needs (assuming that OCSP stapling worked).

Side channels which need to be a hard fail are a problem: it's why revocation checking doesn't actually work. Private networks, hotel networks and server downtime are the issues. Side-channels are also a privacy and performance concern. But they're easier on the server operator and so more likely to be deployed.

In the middle are soft-fail side-channels. These offer limited protection because the connection proceeds even when the side-channel fails. They often queue the certificate up for later reporting.

Axis 3: Clocks or not

OCSP with nonces doesn't need clock sync. OCSP with time stamps does.

Keeping clocks in sync is a problem in itself, but it allows for short lived statements to be cached. It can also be a useful security advantage: Nonces require that a signing key be online because it has to sign a constant stream of requests. With clocks, a single response can be served up repeatedly and the key kept largely off-line. That moves the online key problem to the clock servers, but that's a smaller problem. A compromised clock server key can be handled by querying several concurrently and picking the largest cluster of values.

Examples

OCSP today is {private,side-channel,clock}. The 'let's fix OCSP' solutions are typically {private,side-channel,no-clock} (with an option for no-side-channel). Convergence is {mostly-public,side-channel,no-clock}.

I think the answer lies with {public,no-side-channel,clock}, but it's a trek to get there. Maybe something for a future post.

False Start: Brocade broken again (03 Oct 2011)

I wrote previously that Brocade had released a firmware update for their ServerIron SSL terminators which fixed an incompatibility with False Start. However, it now appears that they didn't fix the underlying issue, rather they special cased Chrome's current behaviour.

Sadly, due to Chrome implementing a workaround for the BEAST attack, their special casing doesn't work anymore and breaks Chrome 15.

So, if you run a Brocade ServerIron you should contact me ASAP. I'll be adding sites running this hardware to the False Start blacklist for Chrome 15 to allow Brocade to release another firmware update.

Chrome and the BEAST (23 Sep 2011)

Thai Duong and Juliano Rizzo today demoed an attack against TLS 1.0's use of cipher block chaining (CBC) in a browser environment. The authors contacted browser vendors several months ago about this and so, in order not to preempt their demo, I haven't discussed any details until now.

Contrary to several press reports, Duong and Rizzo have not found, nor do they claim, any new flaws in TLS. They have shown a concrete proof of concept for a flaw in CBC that, sadly, has a long history. Early reports of the problem date back nearly ten years ago and Bard published two papers detailing the problem.

The problem has been fixed in TLS 1.1 and a workaround for SSL 3.0 and TLS 1.0 is known, so why is this still an issue?

The workaround (prepending empty application data records) is perfectly valid according to the protocol but several buggy implementations of SSL/TLS misbehaved when it was enabled. After Duong and Rizzo notified us, we put the same workaround back into Chrome to see if the state of the Internet had improved in the years since the last attempt. Sadly it was still infeasible to implement this workaround and the change had to be reverted.

Use of TLS 1.1 would also have solved the issue but, despite it being published in 2006, common SSL/TLS libraries still don't implement it. But even if there was widespread deployment of TLS 1.1, it wouldn't have helped avoid problems like this. Due to a different, common bug in SSL 3.0 implementations (nearly 12 years after SSL 3.0 was obsoleted by TLS 1.0), browsers still have to perform SSL 3.0 downgrades to support buggy servers. So even with a TLS 1.1 capable browser and server, an attacker can trigger a downgrade to SSL 3.0 and bypass the protections of TLS 1.1.

Finally, the CBC attacks were believed to be largely theoretical but, as Duong and Rizzo have pointed out today, that's no longer the case.

Initially the authors identified HTML5 WebSockets as a viable method of exploiting the CBC weakness but, due to unrelated reasons, the WebSockets protocol was already in the process of changing in such a way that stopped it. The new WebSockets protocol shipped with Chrome 14 but several plugins have been found to offer features that might allow the attack to be performed.

Duong and Rizzo confirmed that the Java plugin can be used, but Chrome already blocks the execution of Java by default. Other plugins, if installed, can be disabled on the about:plugins page if the user wishes.

The attack is still a difficult one; the attacker has to have high-bandwidth MITM access to the victim. This is typically achieved by being on the same wireless network as the victim. None the less, it's a much less serious issue than a problem which can be exploited by having the victim merely visit a webpage. (Incidentally, we pushed out a fix to all Chrome users for such a Flash bug only a few days ago.)

Also, an attacker with MITM abilities doesn't need to implement this complex attack. SSL stripping and mixed-scripting issues are much easier to exploit and will take continued, sustained effort to address. Duong and Rizzo have highlighted this fact by choosing to attack one of the few HSTS sites.

Thanks to an idea suggested by Xuelei Fan, we have another workaround for the problem which will, hopefully, cause fewer incompatibility problems. This workaround is currently being tested on the Chrome dev and beta channels but we haven't pushed it on the stable channel yet. Since we don't really know if the fix will cause problems it's not something that we want to drop on people without testing. The benefit of a fix to Chrome's TLS stack is also limited as Chrome already uses the newer WebSockets protocol and it doesn't fix problems in plugins.

If it turns out that we've misjudged something we can quickly react, thanks to Chrome's auto-update mechanism.

It's also worth noting that Google's servers aren't vulnerable to this problem. In part due to CBC's history, Google servers have long preferred RC4, a cipher that doesn't involve CBC mode. Mention has also been made of Chrome's False Start feature but, since we don't believe that there are any vectors using Chrome's stack, that's immaterial.

DNSSEC Certificates now in Chrome Stable (19 Sep 2011)

A few months back I described DNSSEC authenticated HTTPS in Chrome. This allows sites to use DNSSEC, rather than traditional, certificates and is aimed at sites which currently use no HTTPS, or self-signed certificates. Since Chrome 14 is now stable, all Chrome users now have this experimental feature.

(Also, the serialisation format has been documented.)

There's an index of all posts and one, long page with them all too. Email: agl AT imperialviolet DOT org.

Adam Langley's Weblog