|
|
Subscribe / Log in / New account

QUIC as a solution to protocol ossification

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

By Jonathan Corbet
January 29, 2018
linux.conf.au
The TCP protocol has become so ubiquitous that, to many people, the terms "TCP/IP" and "networking" are nearly synonymous. The fact that introducing new protocols (or even modifying existing protocols) has become nearly impossible tends to reinforce that situation. That is not stopping people from trying, though. At linux.conf.au 2018, Jana Iyengar, a developer at Google, discussed the current state of the QUIC protocol which, he said, is now used for about 7% of the traffic on the Internet as a whole.

QUIC ("quick UDP Internet connection") is, for now, intended for situations where the HTTP transport protocol is used over TCP. It has been under development for several years (LWN first looked at it in 2013), and was first deployed at Google in 2014. The main use for QUIC now is to move data between Google services and either the Chrome browser or various mobile apps. Using QUIC causes a 15-18% drop in rebuffering in YouTube and a 3.6-8% drop in Google search latency, Iyengar said. Getting that kind of improvement out of applications that have already been aggressively optimized is "somewhat absurd".

Use of QUIC increased slowly during 2015 before suddenly dropping to zero in December. It seems that somebody found a bug that could result in some requests being transmitted unencrypted, so QUIC was shut down until the issue could be fixed. In August 2016, usage abruptly doubled when QUIC was enabled in the YouTube app on phones. If anybody ever doubted that mobile is the future of computing, he said, this should convince them otherwise. Summed up, 35% of Google's outbound traffic is carried over QUIC now.

The standard network stack, as used for the world-wide web, employs HTTP on top of the TLS cryptographic layer which, in turn, sits on top of TCP. QUIC replaces those components with a new protocol based on the UDP datagram protocol. From that base, QUIC builds a reliable connection-oriented protocol, complete with TCP-like congestion-control features. There is support for both encryption and HTTP within QUIC; it can combine the cryptographic and HTTP handshakes into a single packet.

Thus far, both development and deployment of QUIC have been done primarily by Google. An IETF working group was formed to standardize the protocol in 2016, though. Among other things, standardization will replace the current QUIC cryptographic layer with one based on TLS 1.3 which, Iyengar said, took a number of its ideas from the current QUIC implementation.

Accelerating HTTP

A typical web page has a long list of objects (HTML, CSS, images, etc.) that must be loaded from the server. The HTTP/1.x protocol only allows for a single object to be transmitted at a time; that can be a problem when a large object, which takes a long time to transmit, blocks the transmission of many other objects. This problem, referred to as [Jana Iyengar] "head-of-line blocking", increases the time it takes to present a usable web page to the reader. Implementations using HTTP/1.x tend to work around head-of-line blocking by establishing multiple connections in parallel, which has its own problems. Those connections are relatively expensive, compete with each other, and cannot be managed together by congestion-control algorithms and the like.

HTTP/2 was designed to address this problem using multiple "streams" built into a single connection. Multiple objects can be sent over a stream in parallel by multiplexing them into the connection. That helps, but it creates a new problem: the loss of a single packet will stall transmission of all of the streams at once, creating new latency issues. This variant on the head-of-line-blocking problem is built into TCP itself and cannot be fixed with more tweaks at the HTTP level.

TCP suffers other problems as well. Its connection setup latency, involving a three-way handshake, is relatively high. Latency is a critical part of a user's experience with a web service, and the setup latency in TCP can be a significant part of that latency. Middleboxes (routers between the endpoints of a connection) interfere with traffic and make it difficult to improve the protocol. They aren't supposed to be looking at TCP headers, but they do so anyway and make decisions based on what they see, often blocking traffic that looks in any way out of the norm. This "ossification" of the protocol makes it nearly impossible to make changes to TCP itself. For example, TCP fast open has been available in the Linux kernel (and others) for years, but still is not really deployed because middleboxes will not allow it.

QUIC tries to resolve a number of these issues. The first time two machines talk over QUIC, a single round-trip is enough to establish the connection. For subsequent connections, cached information can be used to reduce that number to zero; the connection packet can be followed immediately by the request itself. HTTP streams map directly onto streams implemented in QUIC; a packet loss in one stream will not impact the others. The end result is the elimination of many sources of latency in typical interactions over the net.

Requirements, metrics, and implementations

The QUIC developers set out to create a protocol that was both deployable and evolvable. That dictated the use of UDP, which is able to get through middleboxes with a minimum of interference. UDP also facilitates the creation of a user-space implementation, which was also desired. (Iyengar didn't say this, but one reason to want such an implementation is to get the protocol deployed and updated quickly; many systems out there rarely receive kernel updates.) Low-latency connection establishment was a requirement, as was stream multiplexing. Beyond that, there was a desire for more flexible congestion control. This sort of work can (and has been) done in the Linux kernel, but the bar for inclusion there is high. The QUIC developers wanted to be able to experiment with various algorithms and see how they worked.

One other important requirement was resilience to "NAT rebinding". Most connections onto the Internet go through a network-address translation (NAT) box that hides the original request and port information. For TCP connections, the NAT box can see the SYN and FIN packets and know when a particular binding can be taken down. UDP itself has no "connection" concept, so NAT boxes carrying UDP traffic cannot associate it with a connection created by a higher-level protocol like QUIC. They thus have no indication of when a connection is no longer in use and instead have to rely on timers to decide when to tear down a specific port binding. As a result, UDP port bindings can be taken down while the QUIC connection using them is still active. The next UDP packet associated with that connection will cause a new binding to be established; that will cause the traffic to suddenly appear to be coming from a different port. QUIC packets must thus include the information needed to detect and handle such rebindings.

A member of the audience asked why QUIC was implemented over UDP rather than directly on top of IP. Iyengar pointed to the SCTP protocol as an example of the problem with new IP-based protocols: it comes down to the middleboxes again. SCTP has been around for years, but middleboxes still do not recognize it and tend to block it. As a result, SCTP cannot be reliably used on the net. Actually deploying a new IP-based protocol, he said, is simply impossible on today's Internet. Additionally, working on top of UDP makes a user-space implementation easier.

As noted above, deployment of QUIC has led to significant improvements in performance for Google services. The significant drop in search latency is mainly a result of eliminating round trips during connection setup. As a result, it tends to show the biggest improvement for users on slow networks. A search done in South Korea is likely to show a 1.3% improvement in latency, but in India that improvement is over 13%. Iyengar said that people measurably spent more time watching more videos when they are doing so over QUIC; that was presented as a good thing.

One key feature of QUIC is that the transport headers — buried inside the UDP packets — are encrypted. Beyond the obvious privacy benefits, encryption prevents ossification of the protocol by middleboxes, which can't make routing decisions based on information they can't understand. A few things have to be transmitted in clear text, though; a connection ID is required, for example, to find the key needed to decrypt the rest. The first byte of the clear data was a flags field which, he said, was promptly ossified by a middlebox vendor, leading to packets being dropped when a new flag was set.

That was a classic example of why changing network protocols is hard and what needs to be done to improve the situation. Middleboxes are the control points for the Internet as we know it now. The only defense against ossification of network protocols by middleboxes, he said at the conclusion of the talk, is encryption.

There were some questions from the audience regarding implementations. Most of them are still a work in progress, he said. The quic-go implementation is coming along. There are implementations being done by Apple and Microsoft, and a certain amount of interoperability testing has been done on those. When asked about an open-source reference implementation in particular, Iyengar pointed to the chromium browser, which is open-source. Other implementations exist, but everybody is waiting for the IETF process to finish.

The video of this talk is available.

[Your editor would like to thank the Linux Foundation and linux.conf.au for assisting with his travel to this event.]

Index entries for this article
KernelNetworking/Protocols
Conferencelinux.conf.au/2018


(Log in to post comments)

Also UDT for similar reasons

Posted Jan 29, 2018 17:57 UTC (Mon) by hisdad (subscriber, #5375) [Link]

http://udt.sourceforge.net/

This is more for raw speed for large data sets over long ( RTT * BW) links

Same answer though, run it on UDP.
--Dad

QUIC as a solution to protocol ossification

Posted Jan 29, 2018 19:49 UTC (Mon) by sorokin (guest, #88478) [Link]

> The only defense against ossification of network protocols by middleboxes, he said at the conclusion of the talk, is encryption.

Why have developers of these middleboxes done their work so poorly? I mean why don't middleboxes just passed-thru all the packets they don't understand? Is the ossification the thing that gets apparent only recently?

Can the developers of the affected middleboxes be contacted and asked for a fix? They must be a real people and real companies. Surely the problem won't be fixed overnight, but in a few year when hardware is replaced we can get de-ossified internet, right?

(I know very little about how network devices industry work. I see that people at google have chosen to work-around the problem instead of fixing it. Is it really so difficult to fix it? Yes, people behind old middleboxes won't be able to use new and fast protocols, but over time these middleboxes will be phased out, right?)

QUIC as a solution to protocol ossification

Posted Jan 29, 2018 20:28 UTC (Mon) by davecb (subscriber, #1574) [Link]

Many of the middleboxes need to know network details, so as to do, for example, firewalling or deep packet inspection. Others need to know a moderate amount, for example, to do NAT.

The fact that some mess up horribly on well-defined fields, on the other hand, suggests bad implementation work. See https://inl.info.ucl.ac.be/system/files/hotmb11-hesmans.pdf for some illustrative examples (:-))

QUIC as a solution to protocol ossification

Posted Feb 1, 2018 14:06 UTC (Thu) by jgknight (guest, #108323) [Link]

Definately for security appliances. Inspecting packet headers, checksums, options/flags, etc is one of the many steps these devices take to determine if something is legitimate traffic or attack/bad traffic.

Unfortunately as protocols evolve and add options, and as security device software is updated, the businesses/ISPs running these boxes tend to not upgrade until absolutely necessary. So sometimes us vendors of middleboxes are fixing/updating our software to allow new protocols or options, but the businesses/ISPs aren't upgrading.

QUIC as a solution to protocol ossification

Posted Jan 29, 2018 23:04 UTC (Mon) by josh (subscriber, #17465) [Link]

> I mean why don't middleboxes just passed-thru all the packets they don't understand?

Some people building middleboxes, especially boxes that break security by design, believe that "fail closed" includes "throw away anything you don't understand". The correct answer in that case isn't "fix the middleboxes", it's "throw away the middleboxes".

Also, some boxes do NAT and similar *really really badly*.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 4:46 UTC (Tue) by rixed (guest, #120242) [Link]

Many of those problematic middleboxes do firewalling or some other kind of service relying on inspecting the packets. This business plan is threatened by encryption. Sabotaging any encrypted traffic can therefore not entirely be imputable to incompetence.

My guess is that all those vendors will become much friendlier once there is a standard method in place to reliably spy on quic. The shift to TLS might be a first step in this direction.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 9:31 UTC (Tue) by nim-nim (subscriber, #34454) [Link]

QUIC is a Google solution and Google is very keen on dumb middleware where everything is controlled by endpoints. Middleware is anything interposed between Google cloud services and Google client software, be it networking equipment, operating systems, any security or control framework. When one endpoint is Google cloud, and the other Google Chrome, or Google Android, and everything else has no control of what happens, that effectively means Google is in control. It's usually sold to the general public as putting some control in the hands of end users. Some control being the power to nullify other forms of power and put Google in the driving seat.

The "ossification" Google harps on is someone else exercising some form of non-Google control, usually by virtue of having bought or being the owner or contractor of the network or computing gear, deploying something designed to be controlled by its owner, or applying standards with the same objective.

It takes a lot more time for many actors to agree on something, than for Google to agree with itself. And the Internet is much more vast and complex than when most of those standards where first defined. And giants like Google have quietly let rot if not sabotaged the standards they didn't like by botching their implementation in their own products. Passive obstruction is a lot more effective than having to explain publicly they want tech that enable filtering advertisements, or data collection, or whatever is heavily in their interest but not in the interests of others, to be removed from the next standard version. They can just say "look your standard is old and inefficient and does not work in today's world, give up on it, just take my tech instead". It's not especially inventive BTW Microsoft played this game for years, before stomping on everyone else resulted in market share loss, forcing them to return to the standards track.

Cloud giants are happy to let others pay for and build the physical infrastructure, as long as it is "dumb", meaning they can impose whatever rules they like on it at the software level, with no external oversight. Spending money does not give you any control of what you paid for anymore. Everyone is equal but some are more equal than others. The IETF may have governance and speed issues, but letting cloud giants write their own standards, standards heavily biased towards reinforcing their level of control and ignoring the needs of people not in their league is not a good solution.

And, BTW, Google does not need the old tech it proposes to deprecate – it has custom silicon and software in its datacenters, that serves the same control purposes, and is ready to offer you its services (but *not* the tech that enables them). The divergence with general public tech got so bad at one point it had to open up some parts like Kubernetes, or no one could figure what to do with the cloud hosting they wanted to sell. But there's a lot more parts in the Google infra that were never opened.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 12:36 UTC (Tue) by tialaramex (subscriber, #21167) [Link]

It has been darkly fascinating to watch people like nim-nim continue this dance over decades.

nim-nim is surprised that things are being moved into the cloud, why is this happening? How can it be that "owners" are preferring the cloud where nim-nim can't give them "control" ? What can be behind this? Surely only some nefarious motive at work.

The reality is that the "control" nim-nim and co. have given to "owners" results in things not working properly and it being more and more difficult to fix it. Layer upon layer of "control" afforded by middle boxes makes it impossible to innovate, and those "owners" choke and writhe. The cloud doesn't do very much, but they're so unused to having any freedom at all from the "control" nim-nim's cohort have "given" them that it feels like unlimited power in contrast.

So a company which spent $1M and three years trying to build a new web site, and failing as middle box after middle box refused to co-operate (sorry "gave them control" by not working) is astonished that Amazon, a company that sells books, can put up that web site for them in under a month for a few grand a year. And us LWN readers might smirk - we could do that for the price of a Raspberry Pi and a lost weekend, except that, well, only if we didn't submit to any of nim-nim's "control" as most home Internet users have.

The real difference in the cloud is that when a nim-nim sales person comes to call, the people in the room ask awkward questions like "How does this device deal with protocol incompatibilities? What's your plan for moving away from SHA-256?" rather than "It's really important to us that we deal with Risk, do you have a form letter that says buying this product will fix our Risk so we can show it to our auditors?". And I'm sure that makes nim-nim sad, but it makes actual _users_ happy, so I'll take that trade thanks.

Historically when I've given concrete examples nim-nim has simply insisted they're wrong, which I guess is brazen enough that it must feel righteous, like insisting "I'm the least racist person you'll ever meet" in the middle of doing something very racist. But I'll give one here again for everybody else's benefit, even if it's predictable that nim-nim will continue to believe that facts are wrong.

Last week a product I'm responsible for broke, again. Why did it break? Well, we have a middle box, you see, to ensure we have "control" over er, our own web servers, before the data reaches the next middle box, which is there to ensure "control" over the actions of the other middle box. And apparently somebody at the middle box's manufacturer heard about "Clickjacking". So, they rolled out an update that prevents it by adding a rule to forbid third party frames. (Anybody who does web development for a living now knows where this is going). This update broke "my" product which a partner uses, because they embed frames and thus look exactly like Clickjacking, which is why we hadn't already used a countermeasure and had in fact _explicitly_ rejected doing so for that product, asking the partner to instead re-design to avoid frames before we addressed the problem.

But why "again"? Well, after we fixed this the first time, another person on the team looking after that middle box saw the corrected rule and since there's no actual change management control on the middle box (it's for "control" remember, no need for any oversight or a decent UX, the people signing the cheques will never use it) they figured it must be a mistake and "fixed" it back how it was before, so frames stopped working again.

Ultimately I'm not all that worried about the nim-nims of the world, because although they're annoying they're fighting gravity here, the middle boxes guarantee the doom of companies that embrace them, the more and deeper you invest in middle boxes the worse and faster the ossification. Call it "control" all you like, if you can't grow and change this universe will kill you. The Dumb Network that nim-nim has attributed to Google is an idea which was old before Google even existed, both the Internet and its predecessors back to at least the Treaty of Bern are founded on the correct observation that a Dumb Network is the only thing that'll actually get the job done.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 18:30 UTC (Tue) by alonz (subscriber, #815) [Link]

Well, these companies now have the FCC's ear… and FCC is more than happy to agree with their vision of "help".

QUIC as a solution to protocol ossification

Posted Jan 31, 2018 9:30 UTC (Wed) by nim-nim (subscriber, #34454) [Link]

Just to be clear:

1. I don't sell and I haven't ever sold not ever benefited from any middlebox sell, directly or indirectly

2. I *have* tried to work with anti-middleware persons to get them fix their implementation of some standards. Their bugs were causing pain to tends of thousands of people that had no beef in the middleware vs non middleware dispute and probably though IT people were collectively dangerous imbeciles. Only to meet obfuscation and evasion, and finally understand the breakage was 100% intentional and they were crippling some use cases and wasting their user's (and sometimes customers') time and energy to push their opinions. Only they would not own up to it and were lying to their users in pretending they had implemented standards when they had sabotaged the parts of them they didn't like, and used it as argument to propose the removal of those parts.

3. In any medium to large organization you will have idiots that will do things just because they can and feel they will get away with it. That's what control systems are about. If you don't believe in control systems, share a computer with a score of other persons, all running as admin, with no enforcing at all, and see how long you can cope with being charged with making the thing run.

4. If you think software developers are any more responsible just look at the junk that get pushed to app stores and all the nasty permissions those apps ask for just because app stores were designed with a "no middle control" mindset and the few controls that exist were bolted on later and are completely inefficient and lacking. Networks are no magic, they're an IT system like a computer, they have the same control requirements (and will need more as they migrate to SDN. More power means more responsibility, more responsibility means more control requirements).

5. Disabling controls makes things run faster and simpler. Who would have thought of it? (see also: Intel/meltdown)

6. In a "smart object" world where everything from microwaves to lightbulbs runs IP you *will* have to administer a fairly complex network at home. We'll see how much you like it when Google strips you of any control power, and known-dangerous and unpatched gadgets get to talk freely with the exterior. I believe smart camera operators had their first waking call last year.

I'll leave you the 'amusing' insults that have really no place on lwn.

QUIC as a solution to protocol ossification

Posted Feb 2, 2018 23:10 UTC (Fri) by lsl (guest, #86508) [Link]

> 2. I *have* tried to work with anti-middleware persons to get them fix their implementation of some standards. Their bugs were causing pain to tends of thousands of people that had no beef in the middleware vs non middleware dispute and probably though IT people were collectively dangerous imbeciles. Only to meet obfuscation and evasion, and finally understand the breakage was 100% intentional and they were crippling some use cases and wasting their user's (and sometimes customers') time and energy to push their opinions. Only they would not own up to it and were lying to their users in pretending they had implemented standards when they had sabotaged the parts of them they didn't like, and used it as argument to propose the removal of those parts.

Some specifics would be nice. Are we really talking about bugs or about things like not implementing plaintext downgrades on encrypted protocols? Or making it harder to perform MITM attacks on TLS users thus breaking the "use cases" of ad/garbage injection (loved by mobile ISPs) and other integrity/confidentiality violations of traffic intended to be encrypted and authenticated?

QUIC as a solution to protocol ossification

Posted Feb 2, 2018 0:44 UTC (Fri) by jschrod (subscriber, #1646) [Link]

> nim-nim has simply insisted they're wrong, which I guess is brazen enough that it must feel righteous, like insisting "I'm the least racist person you'll ever meet" in the middle of doing something very racist.

To associate a poster on lwn.net with racism because you disagree with him is not the style of discussion that I'm used to, here around. Please refrain from it. This kind of personal attack doesn't help to communicate your arguments, either.

FWIW: I neither follow nim-nim's opinion nor yours, so please don't accuse me of bias.
excors made a good point in a neighbouring post of yours: https://lwn.net/Comments/745923/

QUIC as a solution to protocol ossification

Posted Feb 7, 2018 19:33 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

Since I wrote the earlier piece, the application I'm responsible for broke again. Senior management were upset and wanted to make sure I knew about it (I work from home, so you can't physically collar me, being aggressively angry doesn't work very well over a video conference either, it just sort of looks comical but you can send loads of email, and they did), they had been told very clearly that the problem was definitely with my application. Users, including our customers and our own in-house agents were unable to use the system, frequently running into an error message, versions of which have now been emailed to me as JPEG screenshots, PDF documents, and other formats over the last several days. The error message isn't mentioned in any official documentation, and the strings from it aren't in any of our source or libraries. But there are mentions in forums found with Google. All of them end up, often after a considerable wild goose chase, pointing at a particular middle box vendor.

So I explained, calmly, that this is undoubtedly caused by a middle box, I specified the exact brand of middle box they'd most likely bought which causes this error, and that I had previously _emphatically_ warned those implementing the latest middle box that it would likely cause serious problems unless its rules had been appropriately hand-crafted to let our actual traffic through, at which point it would simply be dead weight. I think I even forwarded the emails where those responsible had promised to co-operate with me during installation, and then gone silent.

I was assured that even if we had coincidentally just bought a middle box from the exact manufacturer I specified, the network engineering team responsible were 100% certain that it wasn't their box at fault. The problem must be in my application and I needed to pull my finger out and fix whatever it was. I reiterated that it's the middle box, and suggested means by which they could verify that for themselves, and then I just let it all slide by, because it's futile to engage with such nonsense.

This morning my Product Owner, who for his sins does have to actually sit there in person and listen to this every day, managed to persuade someone to do him a favour and try a simple A/B test. He would fail to get into the application, they would switch off the middle box, then he'd try again. To their astonishment it suddenly worked. Then they switched it back on, his application session failed almost immediately and he wasn't able to get back in. How about that?

Now, nim-nim has explained elsewhere that despite symptoms like these in their opinion it's wrong to blame the middle box. Blame client software, or policy decisions, or Google. Here's the problem even if you want to sympathise with that point of view: Nobody cares. You can fire me and get another team in and re-write the entire application, spending millions of dollars and refunding all the existing customers, and at the end it still doesn't work. Or, you can switch off the middle box and it works immediately. It's a no brainer. It wouldn't even matter if _somehow_ the middle box isn't really the "cause" of your problem: no middle box, no problem.

QUIC as a solution to protocol ossification

Posted Feb 8, 2018 1:36 UTC (Thu) by karkhaz (subscriber, #99844) [Link]

Although I'm sorry for your frustration, I must say that reading these tirades has given me a good chuckle. Thanks for sharing your experience :-)

QUIC as a solution to protocol ossification

Posted Feb 8, 2018 18:34 UTC (Thu) by Wol (subscriber, #4433) [Link]

And it seems quite clear to me that nim-nim THINKS that the owner of the middle box should control the middle box. Except it's also quite clear that the person who *really* controls the middle box is the box vendor, who over-rides the owner's changes.

And the hilarious thing is that the middle box's job is to transmit the customer's data - which it is clearly and blatantly failing to do!

What Google wants, and (maybe slightly altered) what everyone else wants, is for whatever leaves my *source* network should get to my *destination* network INTACT and UNALTERED. What f'ing right does the TRANSPORT network have to interfere and alter my data or throw it away? Because that's what nim-nim is advocating - let the transport network (and indeed, not even that, let the vendors of the equipment running the transport network) dictate what traffic is allowed is allowed to pass over the transport network. NOT good if I'm paying for my data to be transported ...

Cheers,
Wol

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 14:36 UTC (Tue) by farnz (subscriber, #17727) [Link]

Ultimately, the dumb network versus smart network argument goes back to IP versus telco protocols. ATM, Frame Relay and other telco protocols all have a smart network, full of middle boxes in control of a chunk of the network. IP as it was when IP replaced ATM, Frame Relay etc had a dumb network with all the intelligence in the end points.

The ossification that Google harps on about is what happens when people who think that ATM should have won because it has the smart network try to force IP into the telco world - the trouble is that the populace chose IP, not ATM, partly because public networks have a long standing habit of using the smarts in their network to drive protocol design in ways favourable to their pockets, and the populace would prefer a dumb commodity network to a smart premium network.

However, network developers don't like this - when the difference between your gigabit service and mine is just price and congestion, there's not a lot to choose between us. When you can have premium "smart" services like per-packet CoS as well, you can find ways to charge more for the same service.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 21:21 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> The "ossification" Google harps on is someone else exercising some form of non-Google control
I guess you buy middleboxes that advertise: "Breaks VoIP connections better than any other competitor!" or maybe "Drops random TCP connections based on ECN flag!"

Right?

QUIC as a solution to protocol ossification

Posted Jan 31, 2018 9:41 UTC (Wed) by nim-nim (subscriber, #34454) [Link]

I buy middleboxes that advertise 'restrict network access of developers that can't be bothered with reading security 101 books, do not test what happens when their software is misused, do not feel responsible because cleaning up breakage is a non-dev task, will claim anything is a network or server problem since no one knows who runs those, and by the time others have wasted energy checking infra runs as designed will have moved to breaking some other part of their software'.

The kind of guys that hardcode admin/admin backdoors in their software because that enables them to silently fix their mistakes without owning up to them and asking for permission to access and modify production systems they should never touch. And it's "secure" because no one knows the backdoor is there, right?

bad smells

Posted Jan 31, 2018 14:47 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

My mother bought a new house, it has a big downstairs shower which the previous owners had installed during their remodelling. I visited her the day after she took possession, nice place. Any way, some months later I visited again, and this downstairs shower room now had a powerful scent dispenser, spraying masking scent into the air every minute or so. It smelled awful, a mix of the scent and some powerful but unpleasant smell that was hard to name.

I asked my mother about it, and she told me she had to fit the dispenser because the smell was so awful. She had never used the shower, and rarely went in there, but even just in passing she would notice the smell, so she felt she had to "do something" about it.

Bad smells don't magically appear from nowhere, and a masking scent won't fix them. I filled a cup with water and poured it into the shower drain. My mother was astonished that within hours the smell went away, I switched off her dispenser and went home. Showers have drains, which in most homes are piped into the black water sewage output, via a U-bend to trap unpleasant smells from the sewer system. If the U-bend dries up, sewer gases leak into your home which can smell bad and in rare cases even be directly dangerous. Unused drains should be removed, but if you don't want to remove a drain, just rinse it through periodically, no need for more than a cup or so of water, and it'll be fine.

What you've got is an attempt to mask the bad smell from terrible people problems in your organisation.

bad smells

Posted Feb 1, 2018 10:34 UTC (Thu) by nim-nim (subscriber, #34454) [Link]

So, your solution is too let problems spread as much as possible, and avoid any remedial, till the perfect long-term solution is implemented?

I suppose you're also happy to have your bank let hackers empty your accounts, while they work on the better system that will secure everything next year?

bad smells

Posted Feb 2, 2018 7:20 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

Stay focused and stop rambles.

In case of network, the way to fix middleshitboxes is to make them impossible. There's no other way. This means encrypting protocols, making sure there are no fixed bits in the "unused" headers, adding confusing options as a part of normal workflow, etc.

bad smells

Posted Feb 5, 2018 5:11 UTC (Mon) by immibis (subscriber, #105511) [Link]

I agree. I only lament that I can't run my own middlebox for my personal devices.

But that's the way it has to be - otherwise how would my phone know that my middlebox is valid, and the NSA's middlebox isn't? (Perhaps I explicitly allow it - but then they just blackmail you by cutting off your Internet service entirely if you don't approve the NSA middlebox)

QUIC as a solution to protocol ossification

Posted Jan 31, 2018 16:19 UTC (Wed) by excors (subscriber, #95769) [Link]

It sounds like there's disagreement about "endpoints should have control, not the network(/middleboxes)" vs "the network should have control, not endpoints" - but I think they're both derived from the same starting point of "competent people should have control, not incompetent people".

There are problems caused by incompetence on both sides - e.g. everyone can (and does) fail to implement standards correctly. Competent people working on endpoints can easily fix any problems on their side but will suffer badly from problems in the network. Competent people working on the network can easily fix any problems on their side but will suffer badly from problems in the endpoints. Everyone will tend to overestimate the level of incompetence in the other side, because that's what they spend most of their time fighting against. And since they are themselves competent, it's quite reasonable for them to say "it would be better if I was in control of everything". But that leads to contradictory conclusions from the two sides.

It would be nice if competent people *were* in control, but I have no idea how to achieve that without some authority to decide who is competent and to give them control over everything (which may be possible within a data center but not for the global internet). I suppose the endpoint people will win eventually regardless of merit, because it'll reach an equilibrium state when all traffic is encrypted and indistinguishable so middleboxes can't do anything.

(Incidentally, I don't mean to insult specific people as incompetent - they might have been perfectly competent a decade ago and e.g. designed a system that everyone agreed was brilliant at the time, but that now turns out to be fatally flawed because we judge it by new criteria. That's not their fault, but the effect is the same.)

QUIC as a solution to protocol ossification

Posted Jan 31, 2018 11:21 UTC (Wed) by jezuch (subscriber, #52988) [Link]

I don't really care what Google wants. Maybe even they want the same thing as me: SCTP that can be used, IP multicast (that was in the standard forever) that can be used etc etc. I completely agree that control is needed but right now the default is to break everything the middle box manufacturer doesn't care or know about. It's just lazy and irresponsible.

QUIC as a solution to protocol ossification

Posted Jan 31, 2018 12:50 UTC (Wed) by nim-nim (subscriber, #34454) [Link]

I used to think it was wholly the middle box manufacturer fault too… until investigation showed up than in many cases the network client contributed to the problem, and would block any resolution, all the while blaming the middle box.

QUIC as a solution to protocol ossification

Posted Feb 2, 2018 23:20 UTC (Fri) by lsl (guest, #86508) [Link]

I'd love to get some examples for this. I can see how that it be convenient for lazy developers to dismiss potential issues in their programs with "the network is crap". It shouldn't be that hard to prove them wrong though, should it? So how are they blocking the resolution of the issue?

QUIC as a solution to protocol ossification

Posted Feb 6, 2018 18:53 UTC (Tue) by jezuch (subscriber, #52988) [Link]

Sure, endpoints can be lazy and irresponsible too - and often are! They can also be actively malicious while middleboxes usually aren't. But the endpoints can be (comparably) easily replaced or fixed. Middleboxes can't. And that's why there's so much bad feelings towards them: we all can see a problem that's easily fixed but we can't fix it. That's frustrating.

QUIC as a solution to protocol ossification

Posted Feb 6, 2018 19:16 UTC (Tue) by pizza (subscriber, #46) [Link]

> They can also be actively malicious while middleboxes usually aren't.

The level of operational maliciousness is often a matter of how the middlebox owner has configured things.

QUIC as a solution to protocol ossification

Posted Feb 5, 2018 5:14 UTC (Mon) by immibis (subscriber, #105511) [Link]

I don't believe IP multicast can scale to the Internet; every router would need to know about every multicast stream passing through it. This creates major political issues. It does work when all routers and endpoints are under the same owner so they can fix the problems instead of blame-shifting.

QUIC as a solution to protocol ossification

Posted Feb 5, 2018 5:10 UTC (Mon) by immibis (subscriber, #105511) [Link]

Unfortunately there's no way to win.

As a user I want my traffic to be able to be accessed at 2 or 3 points:
* The service I am accessing
* The endpoint I am accessing it from
* Any intermediary device of my choosing

(note that the service may consist of multiple intermediary devices if it wants to; it's a black box from my PoV)

But my ISP would love for it to be:
* The service I am accessing
* The endpoint I am accessing it from
* Any intermediary device of their choosing

And my company that provided the endpoint (hypothetically) would also like it to be:
* The service I am accessing
* The endpoint I am accessing it from
* Any intermediary device of their choosing

If we can't have any middleboxes whatsoever, then we get:
* The service I am accessing
* The endpoint I am accessing it from

That's bad because now I can't see what my client app is actually doing.

But on the other hand, if anyone can add middleboxes (as is the case today) then we end up with:
* The service I am accessing
* My company's ISP's middlebox that breaks stuff
* My company's middlebox that breaks stuff
* My company's ISP's middlebox that breaks stuff
* My ISP's middlebox that breaks stuff
* My own middlebox that breaks stuff
* My endpoint

which is a huge mess and there's no way to run a new protocol without upgrading 4 different middleboxes owned by different people - good luck with that - this is why everything is tunneled over HTTP these days!

I can imagine a solution where users have to explicitly permit middleboxes. But then we'll still end up in the same situation as today. Give someone a choice between "press this button" or "have no Internet" and they'll press the button. And it's not their fault for being stupid; you'd do that too, it's not like you *really* have a choice. You can't even scare them away with big warning messages - "Oh, all my traffic may be spied upon by "US Internet Gateway"? I mean I guess that's necessary for the system to work. After all, it stops working when I turn it off."

Any such system will trivially allow your ISP to block all your traffic that doesn't pass through their middlebox. This can only be prevented if they don't have the option to do it in the first place. But that would mean you don't get to install *your* middlebox either. That's why it's a no-win tradeoff.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 23:55 UTC (Tue) by gdt (subscriber, #6284) [Link]

There's a non-technical aspect as well. Even having a small proportion of the Internet become uncontactable is billions in forgone revenue for an advertiser like Google. Part of protocol ossification is an unwillingness to say "well it just won't work that that 0.5% of sites". Poor code has always been a problem, but the changed economics means that this code now wags the dog.

QUIC as a solution to protocol ossification

Posted Feb 1, 2018 1:45 UTC (Thu) by ebiederm (subscriber, #35028) [Link]

Partly this is simple the impatience of people wanting to deploy something that works everywhere now.

Partly this is PNAT firewalls that everyone calls wireless routers in their homes. The fact they are sharing a single IPv4 address among multiple computers means for protocols to work at all they have to look past the IP layer into the TCP/UDP layer and give some ports to some machines and other ports to other machines.

There has been a lot of good work in recent years on what should be expected from home routers. With specifications so you can show your hardware is out of spec (to return it/reject supporting it if you are the ISP) and the multiple addresses of IPv6 we may see some deossification.

But even then there are only 1 byte's worth of protocols that sit above IP. Which unfortunately means that protocol numbers must be assigned with care. Which means that if you don't have something that is a fantastic improvement over what has gone before getting a protocol number will be work, and getting everyone to support it will be a huge uphill struggle.

Anyone who wants to understand the complexity of setting up an end-to-end connection between two arbitrary computers on the internet I suggest you look at RFC5245 Interactive Connectivity Establishment (aka ICE). This is being deployed in WebRTC so that video chat will work in browsers, and is much of magic the skype used back in the day to not require server resources.

When you honestly need PNAT in every home, and you consider implementation bugs. It becomes increasingly clear new protocols need to be encrypted just so that in some small fraction of cases they are not mistake for something else, and broken by someones well meaning but buggy code.

Then you get the challenge that most protocols get deigned on nice networks that don't run NAT of any kind. There are no obstructions. So the protocol designers don't make their protocols deal with the reality of the internet. SCTP is a good examples of that. Linux to this day does not have a good PNAT solution for multi-homed SCTP because the design of SCTP makes it essentially impossible unless you can see all traffic on of the paths (and machines are not multi-homed behind you if you can see all of the traffic they emit). Which means that even if someone wanted to add support for SCTP to the next generation of a NAT box it isn't easy.

If you are willing to live with not supporting the small fraction of problem cases, and have enough pull to get people to fix their broken networks you can see that things are not completely ossified. The increasing percentage of internet clients with IPv6 support is proof of that. The larger address space really is worth dealing with the mess.

So the ossification picture is not quite a bleak as some people make out. Progress is possible when it is important enough. But it is hard to design a new protocol, and it even more so with the prevalence of PNAT on the internet. The best hope for reduced ossification is the deployment of IPv6 which will mean middle boxes should be both more standardized, and have much less reason to mangle the traffic going through them. As the prevalence of addresses in IPv6 removes the need for PNAT.

QUIC as a solution to protocol ossification

Posted Feb 1, 2018 4:13 UTC (Thu) by mtaht (guest, #11087) [Link]

regrettably problems with address subassignment (e.g dhcp-pd) are making tethering via ipv6 hard, on most cellphones. You might be getting ipv6 on the phone outside, but not inside.

I shure hope "5g" deployments get this right.

QUIC as a solution to protocol ossification

Posted Feb 1, 2018 6:37 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

But you don't need prefix delegation for IPv6. For example, iOS simply bridges the tethered WiFi with the LTE interface.

QUIC as a solution to protocol ossification

Posted Feb 1, 2018 6:34 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> Anyone who wants to understand the complexity of setting up an end-to-end connection between two arbitrary computers on the internet I suggest you look at RFC5245 Interactive Connectivity Establishment (aka ICE). This is being deployed in WebRTC so that video chat will work in browsers, and is much of magic the skype used back in the day to not require server resources.
Around 2007 I was involved in a project that used multiple HTTP connections to transmit VoIP packets, with forward error correction and elaborate TCP retry avoidance logic. All-in-all, it took about several man-years to get right.

All to make sure it can work through middleboxes that horribly break everything but HTTPS.

Good luck trying to write the same code if you're a small company. But meanwhile, nim-nims can enjoy "control" that they exert by breaking Google.

QUIC as a solution to protocol ossification

Posted Feb 9, 2018 1:46 UTC (Fri) by ras (subscriber, #33059) [Link]

> All to make sure it can work through middleboxes that horribly break everything but HTTPS.

I was at the talk, and yet he did hammer on and on about middle boxes. It sounded over the top until he said something along the lines of:

"The only fix is to encrypt everything. We had one byte, just one byte that wasn't encrypted. Then we when changed it, and we started getting reports of broken connections. It has only been out for a few months, and it wasn't published anywhere so no one could have known what that byte was. Yet ossification has set in."

At that point started to feel sorry for him.

QUIC as a solution to protocol ossification

Posted Feb 2, 2018 23:57 UTC (Fri) by lsl (guest, #86508) [Link]

> There has been a lot of good work in recent years on what should be expected from home routers. With specifications so you can show your hardware is out of spec (to return it/reject supporting it if you are the ISP) and the multiple addresses of IPv6 we may see some deossification.

Having such specs available is very nice. Is there anything like it for non-home enterprise firewalls and the like?

It would be even more awesome to have some kind of test suite that admins could run when evaluating such devices. Or that you could use to help admins bring their network to a more sane state. Often times, such issues are just the result of a botched configuration with network operators unaware of the implications but happy to fix when pointed out to them.

I might even write such test suite myself if it doesn't exist yet. But what do you test for? Obvious candidates would be things like the very helpful RFC4890 ICMPv6 firewalling recommendations. Then probably test the effects of various TCP options, IP multi-cast, the proper functioning of PMTU discovery or interaction with very recent TLS and exotic features of older plaintext protocols including HTTP.

What are the other areas middlebox vendors have proven themselves incapable of implementing in a sane way? Where to expect problems?

If anyone has pointers to relevant resources, I'd be very happy to read them.

QUIC as a solution to protocol ossification

Posted Jan 29, 2018 20:51 UTC (Mon) by iwan (subscriber, #108557) [Link]

This paper seems to be quite interesting: "Taking a Long Look at QUIC. An Approach for Rigorous Evaluation of Rapidly Evolving Transport Protocols" - https://conferences.sigcomm.org/imc/2017/papers/imc17-fin.... It compares performance of TCP and QUIC.

Researchers claim that performance is sluggish on older cellular networks: "In the case of 3G, we see the benefits of QUIC diminish. Compared to LTE, the bandwidth in 3G is much lower and the loss is higher—which works to QUIC’s benefit (see Fig. 8a). However, the packet reordering rates are higher compared to LTE, and this works to QUIC’s disadvantage."

QUIC as a solution in my firewall currently

Posted Jan 29, 2018 21:22 UTC (Mon) by petur (guest, #73362) [Link]

Since youtube uses QUIC this has become an easy way to kill it when the kids at home don't want to stop watching :)

QUIC as a solution in my firewall currently

Posted Jan 30, 2018 1:27 UTC (Tue) by bradfitz (subscriber, #4378) [Link]

Doesn't seem like an effective strategy. Chrome will just detect QUIC brokenness and switch back to HTTP2/TLS/TCP.

QUIC as a solution in my firewall currently

Posted Feb 2, 2018 3:52 UTC (Fri) by TRS-80 (guest, #1804) [Link]

Ah, but then I can then MITM and block it. I explicitly block QUIC at work because I can't inspect it. deal_with_it.gif

QUIC as a solution in my firewall currently

Posted Feb 2, 2018 16:18 UTC (Fri) by nybble41 (subscriber, #55106) [Link]

> I explicitly block QUIC at work because I can't inspect it.

That works for now because there is a fallback in place, so most sites continue to work (albeit more slowly) despite blocking QUIC. As QUIC becomes more popular, however, and incidences of brokenness diminish, that fallback ought to be phased out. At that point you will no longer be able to block QUIC without cutting yourself off from most of the Internet—and with the end-to-end principle restored, there will be much rejoicing among those trapped behind your overbearing middleware.

QUIC as a solution in my firewall currently

Posted Feb 2, 2018 17:46 UTC (Fri) by TRS-80 (guest, #1804) [Link]

Perhaps, but they are students and I have a duty of care to protect them from the worst parts of the internet, therefore a there is a middlebox between them and it. Is QUIC becoming used outside of Google anyway?

The middlebox we use doesn't currently support ECDHE, so I doubt TLS 1.3 support will be on the cards any time soon either. That will be a big ossification point as well due to how middlebox unfriendly TLS 1.3 is.

QUIC as a solution in my firewall currently

Posted Feb 2, 2018 18:53 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> Perhaps, but they are students and I have a duty of care to protect them from the worst parts of the internet, therefore a there is a middlebox between them and it.
And then these students get their smartphones and jump right into the worst parts without anyone wiser...

If you do have to comply with such laws, you can install blockers directly onto the endpoints rather than on midpoints.

QUIC as a solution in my firewall currently

Posted Feb 3, 2018 5:27 UTC (Sat) by TRS-80 (guest, #1804) [Link]

Phones are not allowed in the classroom, and we tell parents not to give their students data access, or install a filter on it. Either way, you can't do proper blocking on an iOS, the only good solutions are an explicit proxy or always-on VPN, at which point we're back to middleboxes so you may as well do it transparently.

QUIC as a solution in my firewall currently

Posted Feb 4, 2018 3:59 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

My brother's daughter recently went to China for a school exchange program. The first week or so her parents were only getting email updates to a non-Google address. Then she re-appeared on Facebook and Gmail - local kids in China had shown her how to work around blocking.

This is how effective Internet blocking is against determined teenagers.

I understand that people still have to go through motions and pretend that precious little children are totally "protected" by filters. But I'm not seeing why this should be made any easier. It'd be good to stop this hypocrisy fest eventually.

QUIC as a solution in my firewall currently

Posted Feb 9, 2018 15:18 UTC (Fri) by TRS-80 (guest, #1804) [Link]

Well, if you can stop our parents being rich enough to hire lawyers in the case that little Johnny sees something inappropriate using school-provided technology, I'm sure I can update our risk matrix to obviate the need for the web filter. If they do it on parent-provided technology, that's then their problem, not ours.

QUIC as a solution to protocol ossification

Posted Jan 29, 2018 21:35 UTC (Mon) by meyert (subscriber, #32097) [Link]

What http server do support quic?

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 9:38 UTC (Tue) by Lekensteyn (guest, #99903) [Link]

A list of QUIC implementations can be found here: https://github.com/quicwg/base-drafts/wiki/Implementations

Some server programs seems more like toys/test programs though.

QUIC as a solution to protocol ossification

Posted Jan 30, 2018 9:57 UTC (Tue) by Rubusch (subscriber, #80225) [Link]

"SCTP has been around for years, but middleboxes still do not recognize it"

Probably this is obvious to everyone. I played around a bit with SCTP years ago and always wondered why it was not actually used?! Thank you so much for this side information, I really appreciate!!

QUIC as a solution to protocol ossification

Posted Jan 31, 2018 0:45 UTC (Wed) by willy (subscriber, #9762) [Link]

There are a number of other protocols developed over the years which never made it, mostly because of middlebox throttling. DCCP is one, and plan9 developed Internet Link for a similar purpose.

UDP abuse

Posted Feb 9, 2018 9:44 UTC (Fri) by flohoff (guest, #32900) [Link]

I dont think "abusing" UDP for building higher layer protocol abstraction is the way to go. Yes - fixing the "internet" to support a new IP type is hard, as was introducing IPv6 or SCTP.

Using an abstraction on top of UDP will only bring the firewall admins to drop/block those connections. You'll gain nothing. Hell - FW admins still block ICMP for no good reason because it seems unnecessary.

So do it right from the beginning and define a new protocol.

UDP abuse

Posted Feb 20, 2018 0:34 UTC (Tue) by flussence (subscriber, #85566) [Link]

I've managed to get IPv6 traffic to work across the internet (in spite of my ISP) with a lot of persistence, but for all the good things I heard about in SCTP I've never encountered anything that supports it. Are there any examples of use in the wild?

UDP abuse

Posted Feb 20, 2018 2:07 UTC (Tue) by excors (subscriber, #95769) [Link]

WebRTC uses SCTP for data channels (arbitrary user-defined data that goes alongside the standard video/audio), but it's tunnelled over DTLS over UDP, so I'm not sure where that fits in this argument.

QUIC as a solution to protocol ossification

Posted Feb 9, 2018 11:32 UTC (Fri) by kronat (guest, #117266) [Link]

It is worth to read, in their entirety, all the comments above.

My opinion on that starts from the same base as nim-nim: Google has the control of the services, Google has the control of the providing node (servers/cloud nodes), Google has the control of the most used OS (Android) and the most used Web Browser (Chrom{e,ium}).

Even committing something in the Linux networking stack is difficult, if it is not proven that it will not harm Google's use case or server performance or it does not work well with Gigabit and Gigabit of load (see for instance all the development made on TCP, perfectly fine for wired networks, but that have a performance problem on WiFi cards*).

Is it not worrying, at least?

* This will probably be solved in next release after years of not caring.


Copyright © 2018, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds