Hacker News new | past | comments | ask | show | jobs | submit login
The novel HTTP/2 'Rapid Reset' DDoS attack (cloud.google.com)
365 points by jsnell 8 months ago | hide | past | favorite | 106 comments



Related ongoing threads:

The largest DDoS attack to date, peaking above 398M rps - https://news.ycombinator.com/item?id=37831062

HTTP/2 Zero-Day Vulnerability Results in Record-Breaking DDoS Attacks - https://news.ycombinator.com/item?id=37830998


Nice to see that the haproxy people had spotted this kind of issue with http/2 and apparently mitigated it back in 2018: https://www.mail-archive.com/[email protected]/msg44134.h...


Nice, I was looking for this type of information for haproxy. Gives me a lot of confidence in their new QUIC feature.


If anyone is curios, Nginx is vulnerable to this

https://www.nginx.com/blog/http-2-rapid-reset-attack-impacti...


IF configured away from the defaults:

By relying on the default keepalive limit, NGINX prevents this type of attack. Creating additional connections to circumvent this limit exposes bad actors via standard layer 4 monitoring and alerting tools.

However, if NGINX is configured with a keepalive that is substantially higher than the default and recommended setting, the attack may deplete system resources.


> In a typical HTTP/2 server implementation, the server will still have to do significant amounts of work for canceled requests, such as allocating new stream data structures, parsing the query and doing header decompression, and mapping the URL to a resource. For reverse proxy implementations, the request may be proxied to the backend server before the RST_STREAM frame is processed. The client on the other hand paid almost no costs for sending the requests. This creates an exploitable cost asymmetry between the server and the client.

I'm surprised this wasn't foreseen when HTTP/2 was designed. Amplification attacks were already well known from other protocols.

I'm similarly similarly surprised it took this long for this attack to surface, but maybe HTTP/2 wasn't widely enough deployed to be a worthwhile target till recently?


It's not really an amplification attack. It's just drastically more efficiently using TCP connections.


Isn’t any kind of attack where a little bit of effort from the attacker causes a lot of work for the victim an amplification attack? Or do you only consider it an amplification attack if it is exploiting layer 3?

I tried looking it up and couldn’t find an authoritative answer. Can you recommend a resource that you like for this subject?


> Isn’t any kind of attack where a little bit of effort from the attacker causes a lot of work for the victim an amplification attack?

That is technically any HTTP request that requires processing to satisfy. For example if I find a page on your site that executes an expensive database query.

Amplification attacks are generally defined as packets that can be sent with a spoofed source address that result in a larger number of packets being returned to the spoofed victim.


Amplification attack usually means the first victim produces more traffic than was sent to it and can direct it at the second victim.


No, this is Resource exhaustion attacks https://en.wikipedia.org/wiki/Resource_exhaustion_attack And its not the first HTTP2 rodeo https://www.akamai.com/blog/security/http2-vulnerabilities

One of the 2019 vulns https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-9514 even sounds extremely similar to current attack.


> I tried looking it up and couldn’t find an authoritative answer.

I mean, who do you consider authoritative? Googling "amplification attack" will give you plenty of descriptions from tons of sources. Take your pick. Though most will talk about DNS amplication attacks because that's the simplest example.


You're right. I hadn't had my coffee yet and the asymmetric cost reminded me of amplification attacks. I'm still surprised this attack wasn't foreseen though. It just doesn't seem all that clever or original.


I was surprised too, but if you look at the timelines then RST_STREAM seems to have been present in early versions of SPDY, and SPDY seems mostly to have been designed around 2009. Attacks like Slowloris were coming out at about the same time, but they weren't well-known.

On the other hand, SYN cookies were introduced in 1996, so there's definitely some historic precedent for attacks in the (victim pays Y, attacker pays X, X<<Y) class.


If you are working on the successor protocol of HTTP/1.1, and are not aware of Slowloris the moment it hits and every serious httpd implementation out there gets patched to mitigate it, I'd argue you are in the wrong line of work.


While I agree in principle, slow loris is a very different attack than this one.


> I'm similarly similarly surprised it took this long for this attack to surface

As with most things like this, probably many hundreds of unimportant people saw it and tried it out.

Trying to do it on Google, with a serious effort, that's the wacky part.


> Trying to do it on Google, with a serious effort, that's the wacky part

If I were the FBI, I'd be looking at people with recently bought Google puts expiring soon. I can't imagine anyone taking a swing at Google infra "for the lulz". Also in contention: nation-states doing a practice run.


That's because you don't think like a 16 year old

This is exactly the kind of things that a smart kid who's still just a foolish highschool student would do. I wouldn't be surprised if this attack already exists in the wild, it's not hard to write

Also the subsequent attacks were less effective, that's exactly what some kid would be doing.

You don't even need an expansive botnot. A rich kid whose parents are in neighborhoods with residential fiber with a bunch of friends could probably coordinate it through a discord server

Most of us really don't interact with teenagers regularly so we forget they're out there (they also tend to dislike adults so they make themselves especially invisible around us). When it comes to things like this, that's my first assumption until further evidence.

Merely teenaged graffiti for the digital age.


Google infra is attacked with large scale DDoS type attacks literally multiple times a day. They’re usually a nothingburger.


Google options with near expiries have 100s of thousands of contracts of open interest[1]. Unless you found the person some other way (and then could prove that they had also gone long a short-dated put to try to profit) there's literally no way you find anything interesting by doing that.

[1] Add up the "Open int" numbers here https://www.nasdaq.com/market-activity/stocks/goog/option-ch...


Google is rarely the target of intentional attacks. Their cloud customers are the intended victims.


So we needed HTTP2 to deliver ads, trackers and bloated frontend frameworks faster. And now it delivers attacks faster too.


HTTP/2 makes the browsing experience of high-latency connections a lot more tolerable. It also makes loading web pages in general faster.

Luckily, HTTP/1.1 still works. You can always enable it in your browser configuration and in your web servers if you don't like the protocol.


> HTTP/2 makes the browsing experience of high-latency connections a lot more tolerable. It also makes loading web pages in general faster.

HTTP/3 does that in my experience (lots of train rides with spotty onboard Wi-Fi) quite a bit better though. As HTTP/2 is still affected by head-of-line blocking and a single packet loss can block all other streams, even if the lost packet didn't hold data for them.


Are you suggesting that we didn't need HTTP2? What's the real alternative here?


In some alternative history there would have been a push to make http 1.1 pipelining work, trim fat from bloated websites (loading cookie consent banners from a 3rd party domain is a travesty on several levels) and maybe use websockets for tiny API requests. And the prioritization attributes on various resources. Then shoveling everything over ~2 TCP connections would have done the job?


Personally, as a website visitor and occasional author, I don’t want the performance to be good enough to ‘do the job’. I want it to be as fast as possible. I want it to be instant. For that we need unbloated websites and better protocols. It’s not a competition.

After all, you don’t need bloat to suffer from head-of-line blocking. You just need a few images.

(Though, personally I’m a much bigger fan of HTTP/3 than HTTP/2. With a more principled solution to head-of-line blocking and proper 0-RTT, HTTP/3 makes a stronger case for why we need a new protocol than HTTP/2 did. I don’t know why HTTP/2 had to exist at all, really, when QUIC already existed by the time HTTP/2 was being standardized. Oh well.)


> It’s not a competition.

But it is in the context of the 3-way tradeoff we're talking about here. complexity of the site vs. load time vs. protocol complexity

> You just need a few images.

On the HTTP level those can be deferred after the html/styles/js. Then you already have the content. What on your site would be "blocked" at that point? It's just images holding up each other.

On the TCP level SACK and FRTO should resolve most instances of HOL after 1 RTT. It's not perfect but I suspect a lot of people experience "slowness" not because the underlying protocols are bad but because they're on old implementations. Or because they're on networks with bufferbloat. Upgrade those and we don't need those complex workarounds.

As for HTTP/3... it's a mixed bag. The basic idea is great. The execution is another googleism. They didn't have the patience to get it into OSes, so now every client has to implement its own network stack which multiplies the things that need patching if something goes wrong. And it runs over UDP instead of being a different transport on the IP level like SCTP. And TLS is a good default but the whole CA-thing shouldn't have been mandatory. And header compression also seems like a cure for a disease of their own making, compare which the number of headers you need for HTTP 1.0.


What incentive would most businesses have to do what you're describing?

It is _much_ faster, cheaper, and easier to build a bloated website than an optimized one. Similarly, it is much easier to enable HTTP2 than it is to fix the root of the problem.

I'm not saying that it's right -- anyone without a fast connection or who cares about their privacy isn't getting a great deal here.


Most businesses are not in a position to push through a new network protocol for the entire planet! So if we lived in a world with fewer monopolies then protocols might have evolved more incrementally. Though we'd presumably still have gotten something like BBR because congestion algorithms can be implemented unilaterally.


What incentive do most businesses have to make your checkout process smooth, have automatic doors, or provide shopping carts? Simple: customers like the easiest business to shop at.


Even for leaner websites, HTTP/2 was always going to be an improvement, for HTTP head-of-line blocking and better header compression, if nothing else. These are orthogonal issues for the most part.

Also, they tried prioritization, but it was too unwieldy in practice, the browser vendors didn't agree, and it was deprecated in the latest RFC 9113.


Loading cookie consent banners from a 3rd-party domain is probably a GDPR violation because it transmits user information to a 3rd party without consent.


SCTP (Stream Control Transmission Protocol) or the equivalent. HTTP is really the wrong layer for things like bonding multiple connections, congestion adjustments, etc.

Unfortunately, most computers only pass TCP and UDP (Windows and middleboxes). So, protocol evolution is a dead end.

Thus you have to piggyback on what computers will let through--so you're stuck with creating an HTTP flavor of TCP.


QUIC (the basis for HTTP/3) is basically the spiritual successor to SCTP, except with TLS baked in, so compared with SCTP+DTLS, connection establishment requires significantly fewer roundtrips (0 round trips for session resumption, 1 roundtrip at worst, compared to 4 or so for DTLS).


Nothing in their comment claims that, there's no need to bring absurd strawmen into the discussion.


I'm just trying to figure out what the parent was trying to communicate, since the comment by itself didn't add much to the discussion.

It seems that they're more upset by the state of web development today than they are by HTTP2 or anything else that this thread pertains to.


The comment lists three negative things as "the reason we needed HTTP/2". I don't even see how you could read it other than implying that HTTP/2 was not actually necessary.


It does strongly imply it. HTTP2 is needed for more than just the bloatware of the modern internet.


Another reason to keep foundational protocols small. HTTP/2 has been around for more than a decade (including SPDY), and this is a first time this attack type surfaced. I wonder what surprises HTTP/3 and QUIC hide...


DNS is a small protocol and is abused by DDoS actors worldwide for relay attacks.


DNS is from 1983, give it some slack


The point I'm trying to make is that "small" protocols aren't less likely to be DDoS vectors.

Avoiding designing in DDoS relay/amplication vectors requires luck or intention, not just making the protocol small.


Small, less complex protocols are inherently less likely to be insecure all things being equal, simply due to reduced attack surface.

DNS was created for a different environment, at a time when security wasn't at forefront so it's not a good example of the opposite.


This is such a strong claim I'd really appreciate something other than "smaller is better"

Abuse and abuse vectors vary wildly in complexity, some complexity is certainly required exactly to avoid dumb bottlenecks if not vulnerabilities. So based on what are you saying something simple will inherently resist abuse better?


> Small, less complex protocols are inherently less likely to be insecure all things being equal, simply due to reduced attack surface.

That feels intuitive in the "less code is less bugs is less security issues" sense but implies that "secure" and "can't be abused" are the same thing.

Related? Sure. Same? No.

Oddly enough, we probably could have prevented the replay/amplification dos attacks that use DNS by making DNS more complex / adding mutual authentication so it's not possible for A to request something that is then sent to B.


We could have prevented the replay/amplification dos attacks that use DNS by making DNS use TCP.

In practice though the only way to "fix" DNS that would've worked in the 80s would've probably been to require the request be padded to larger than the response...


But TCP is way more complex


... yeah? I know? "In practice though the only way to "fix" DNS that would've worked in the 80s would've probably been to require the request be padded to larger than the response..."

It's not as complex as some "mutual authentication" scheme though lmao


I'm also from 1983 and I haven't been DDoSed


DNS is an enormous protocol, almost unmeasurably large.


That's a bit overblown. There's a lot there and some of it conflicts with itself but it's not unmeasurably large by any means. It's a knowable protocol (and yes, I'm aware of the camel meme[1]).

1. https://powerdns.org/dns-camel/


Quiz: which RFCs do you need to know and implement to implement DNS?


QUIC didn't account for amplification attacks in its design and the people complaining about it were initially dismissed.


HTTP/2 is pretty small.


“Cancelation” should really be added to the “hard CS problems” list.

Like the others on that list (off by one, cache invalidation etc) it isn’t actually hard-hard, but rather underestimated and overlooked.

I think if we took half the time we spend on creation, constructors, initialization, and spent that design time thinking about destruction, cleanup, teardown, cancelation etc, we’d have a lot fewer bugs, in particular resource exhaustion bugs.


I really like Rust's async for its ability to immediately cancel Futures, the entire call stack together, at any await point, without needing cooperation from individual calls.


How is that possible if e.g. an external SQL server needs to be told that the operation should be canceled?


I know that's true of C libraries. POSIX thread cancelation is one of those things where its mere existence pervades everything in its implications.


I would like to remind everyone that Google invented HTTP/2.

Now they are telling us a yarn about how they are heroically saving us from the problem they created, but without mentioning the part that they created it.

The nerve of these tech companies! Microsoft has been doing this for decades, too.


They tried to solve problems that weren't existant.


Can anyone can explain what's novel about this attack that isn't plain old requests flood?


It depends on what you think a "request flood" attack is.

With HTTP/1.1 you could send one request per RTT [0]. With HTTP/2 multiplexing you could send 100 requests per RTT. With this attack you can send an indefinite number of requests per RTT.

I'd hope the diagram in this article (disclaimer: I'm a co-author) shows the difference, but maybe you mean yet another form of attack than the above?

[0] Modulo HTTP/1.1 pipelining which can cut out one RTT component, but basically no real clients use HTTP/1.1 pipelining, so its use would be a very crisp signal that it's abusive traffic.


I think for this audience a good clarification is:

* HTTP/1.1: 1 request per RTT per connection

* HTTP/2 multiplexing: 100 requests per RTT per connection

* HTTP/2 rapid reset: indefinite requests per connection

In each case attackers are grinding down a performance limitation they had with previous generations of the attack over HTTP. It is a request flood; the thing people need to keep in mind is that HTTP made these floods annoying to generate.


What does RTT stand for? My gut says "Round Trip (something)" but I could certainly be wrong.


The HTTP spec defines it as Round Trip Time, but in this little discussion thread, "round-trip transaction" would be a better fit.

https://developer.mozilla.org/en-US/docs/Glossary/Round_Trip...


I wonder why exactly this attack can't be pulled off with HTTP/1.1 and TCP RST for cancellation. It seems that (even with SYN cookies involved) an attacker could create new connections, send HTTP request, then quickly after send a RST.

Is it just that the kernel doesn't really communicate TCP RST all that well to the application, so the HTTP server continues to count the connection against the "open connection limit" even though it isn't open anymore?


The server kernel won't communicate the new connection to the application until you go through SYN-SYNACK-ACK.


The problem for the attacker is they then run into resource limits on the TCP connections. The resets are essential to get the consumption not counting.


Indeed, that's a crucial clarification. Thanks.


In the style of the above clarification, how would you describe HTTP/3 in "x requests (per y)? per connection"?


What happens if you send more? Does it just get ignored by the server?


For most current HTTP/2 implementations it'll just be ignored, and that is a problem. We've seen versions of the attack doing just that, as covered in the variants section of the article.

Servers should switch to closing the connection if clients exceed the stream limit too often, not just ignoring the bogus streams.


By request flood I mean, request flood, as in sending insanely high number of requests per unit of time (second) to the target server to cause exhaustion of its resources.

You're right, with HTTP/1.1 we have single request in-flight (or none in keep-alive state) at any moment. But that doesn't limit number of simultaneous connections from a single IP address. An attacker could use the whole port space of TCP to create 65535 (theoretically) connections to the server and to send requests to them in parallel. This is a lot, too. In pre-HTTP/2 era this could be mitigated by limiting number of connections per IP address.

In HTTP/2 however, we could have multiple parallel connections with multiple parallel requests at any moment, this is by many orders higher than possible with HTTP/1.x. But the preceeding mitigation could be implemented by applying to the number of requests over all connections per IP address.

I guess, this was overlooked in the implementations or in the protocol itself? Or rather, it is more difficult to apply restrictions because of L7 protocol multiplexing because it's entirely in the userspace?

Added: The diagram in the article ("HTTP/2 Rapid Reset attack" figure) doesn't really explain why this is an attack. In my thinking, as soon as the request is reset, the server resources are expected to be freed, thus not causing exhaustion of them. I think this should be possible in modern async servers.


> But that doesn't limit number of simultaneous connections from a single IP address.

Opening new connections is relatively expensive compared to sending data on an existing connection.

> In my thinking, as soon as the request is reset, the server resources are expected to be freed,

You can't claw back the CPU resources that have already been spent on processing the request before it was cancelled.

> By request flood I mean, request flood, as in sending insanely high number of requests per unit of time (second) to the target server to cause exhaustion of its resources.

Right. And how do you send an insanely high number of requests? What if you could send more?

Imagine the largest attack you could do by "sending an insanely high number requests" with HTTP/1.1 with a given set of machine and network resources. With H/2 multiplexing you could do 100x that. With this attack, another 10x on top of that.


> An attacker could use the whole port space of TCP to create 65535 (theoretically) connections to the server and to send requests to them in parallel.

This is harder for the client than it is for the server. As a server, it's kind of not great that I'm wasting 64k of my connections on one client, but it's harder for you to make them than it is for me to receive them, so not a huge deal with today's servers.

On this attack, I think the problem becomes if you've got a reverse proxy h2 frontend, and you don't limit backend connections because you were limiting frontend requests. Sounds like HAProxy won't start a new backend request until the pending backend requests is under the session limit; but google's server must not have been limiting based on that. So cancel the frontend request, try to cancel the backend request, but before you confirm the backend request is canceled, start another one. (Plus what the sibling mentioned... backend may spend a lot of resources handling the requests that will be canceled immediately)


You're wrong about that. It's hard to make 65k new connections on your average client OS, but a packet generator has no problem with it.


The new technique described avoids the maximum limit on number of requests per second (per client) the attacker can get the server to process. By sending both requests and stream resets within the same single connection, the attacker can send more requests per connection/client than used to be possible, so it is perhaps cheaper as an attack and/or more difficult to stop


Is is a fundamental HTTP/2 protocol issue or implementations issue? Could this be an issue at all, if a server has strict limits of requests per IP address, regardless of number of connections?


Implementation issue. Some implementations are immune.



The blog header popping up constantly makes the page unreadable.



Thanks for sharing Kill Sticky!


Ha, it works great! I like it, thank you!


Not sure about chrome but in Firefox there's a button for "reader view" on many sites which works great for cutting out UI crap like that.


Good point, that also works. For some reason I never remember to use it.


Microsoft dropped its patch details here:

https://github.com/dotnet/runtime/issues/93303


Wouldn’t this same attack apply to QUIC (and HTTP/3)?


It doesn't apply to HTTP/3 because the receiver has to extend the stream concurrency maximum before the sender can open a new stream. This attack works because the sender doesn't have to wait for that after sending a reset in HTTP/2.


But the max is still ~100 streams... And you can open 100 streams all with one UDP packet using zero-rtt connections.

I can send ~1 Million UDP packets per second from one machine. So thats 100 million HTTP requests per second you have to deal with. And when I bring in my 20,000 friends, you need to deal with 2 trillion requests per second.

I'd say that's still a problem.


Ok, but it's not the same problem, which was the question asked.


I’m still missing something. Can’t you close a Quic stream and open another one within the same UDP packet?


You can do it a few times, but you can't do it 500 times. For HTTP/3, the highest permitted stream ID is an explicit state variable communicated by the server to the client, eventually forcing a round-trip. That's different from HTTP/2 where the client is entitled to assume that new "stream id window space" (for the lack of a better term) opens up immediately after a stream is closed.

(I'm fudging things a bit. You can probably build attacks that look kind of similar, but we don't think you you could build anything that is actually scalable. But we could be wrong about that! Hence the recommendation to apply similar mitigations to HTTP/3 as well, even if it isn't immediately vulnerable.)


Thanks for the explanation. Makes sense.


I got out of web proxy management a while back and haven't had to delve into HTTP2 or HTTP3.

It seems HTTP2 is TCP on TCP for HTTP messages specifically. This must be why HTTP3 is over a UDP based protocol.


HTTP2 is not TCP on TCP (that's a very basic recipe for a complete disaster, the moment any congestion kicks in); it's mostly just multiplexing concurrent HTTP requests over a single TCP connection.

HTTP3 is using UDP for different reasons, although it effectively re-implements TCP from the application POV (it's still HTTP under the hood after all). Basically with plain old TCP your bandwidth is limited by latency, because every transmitted frame has to be acknowledged - sequentially. Some industries/applications (like transferring raw video files over the pond) have been using specialized, UDP-based transfer protocols for a while for this reason. You only need to re-transmit those frames you know didn't make it, in any order it suits you.


TCP's stream nature causes multiplexing to bump into head of line blocking, basically.


HTTP on SCTP on UDP. If only protocols didn't ossify.


Isn't this trivially mitigated by throttling?

And the throttling seems even simple: give each IP address an initial allowance of A requests, then increase the allowance every T time up to a maximum of B. Perhaps A=B=10, T=150ms.


The whole point of a 'D'DoS is that there are numerous compromised IP addresses, which only need to make maybe one connection each.

You can't simply blacklist weird connections entirely, since legitimate clients can use those features.


The whole point of this attack is to be able to make a lot of requests for each IP address.

If you are making one or few requests per IP you don't need this attack, and also aren't likely to have any effect on a Google-sized entity.


It is a little more complicated because a request is few layers deep. In HTTP2 you open a connection, start a stream, then send a request over that stream.

Are you tracking per connection? Per stream? Isn't it normal for multiple requests to happen quite quickly? I load a single page with 50 external assets, those get multiplexed over the current stream - is that okay? Is that abusive? The other stream is handling a video player and its requesting (http2) frames of video data - too much? Too fast?



This is a protocol bug hence affects pretty much any implementation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: