This is a personal blog. My other stuff: book | home page | Substack

October 28, 2010

HTTP cookies, or how not to design protocols

For as long as I remember, HTTP cookies have been vilified as a grave threat to the privacy of online browsing; wrongly so. That said, the mechanism itself is a very interesting cautionary tale for security engineers - and that will be the theme of today's feature.

Cookies were devised by Lou Montulli, a Netscape engineer, somewhere in 1994. Lou outlined his original design in a minimalistic, four-page proposal posted on netscape.com; based on that specification, the implementation shipped in their browser several months later - and other vendors were quick to follow.

It wasn't until 1997 that the first reasonably detailed specification of the mechanism has been attempted: RFC 2109. The document captured some of the status quo - but confusingly, also tried to tweak the design, an effort that proved to be completely unsuccessful; for example, contrary to what is implied by this RFC, most browsers do not support multiple comma-delimited NAME=VALUE pairs in a single Set-Cookie header; do not recognize quoted-string cookie values; and do not use max-age to determine cookie lifetime.

Three years later, another, somewhat better structured effort to redesign cookies - RFC 2965 - proved to be equally futile. Meanwhile, browser vendors tweaked or extended the scheme in their own ways: for example, around 2002, Microsoft unilaterally proposed httponly cookies as a security mechanism to slightly mitigate the impact of cross-site scripting flaws - a concept quickly, if prematurely, embraced by the security community.

All these moves led to a very interesting situation: there is simply no accurate, official account of cookie behavior in modern browsers; the two relevant RFCs, often cited by people arguing on the Internet, are completely out of touch with reality. This forces developers to discover compatible behaviors by trial and error - and makes it an exciting gamble to build security systems around cookies in the first place.

In any case - well-documented or not, cookies emerged as the canonical solution to an increasingly pressing problem of session management; and as web applications have grown more complex and more sensitive, the humble cookie caught the world by storm. With it, came a flurry of fascinating security flaws.

They have Internet over there, too?

Perhaps the most striking issue - and an early sign of trouble - is the problem of domain scoping.

Unlike the more pragmatic approach employed for JavaScript DOM access, cookies can be set for any domain of which the setter is a member - say, foo.example.com is meant to be able to set a cookie for *.example.com. On the other hand, allowing example1.com to set cookies for example2.com is clearly undesirable, as it allows a variety of sneaky attacks: denial of service at best, and altering site preferences, modifying carts, or stealing personal data at worst.

To that effect, the RFC provided this elegant but blissfully naive advice:

"Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT".

Regrettably, there are at least three glaring problems with this scheme - two of which should have been obvious right away:

  1. Some country-level registrars indeed mirror the top-level hierarchy (e.g. example.co.uk), in which case the three-period rule makes sense; but many others allow direct registrations (e.g., example.fr), or permit both approaches to coexist (say, example.jp and example.co.jp). In the end, the three-period rule managed to break cookies in a significant number of ccTLDs - and consequently, most implementations (Netscape included) largely disregarded the advice. Yup, that's right - as a result, you could set cookies for *.com.pl.

  2. The RFC missed the fact that websites are reachable by means other than their canonical DNS names; in particular, the rule permitted a website at http://1.2.3.4/ to set cookies for *.3.4, or a website at http://example.com.pl./ to set a cookie for *.com.pl.

  3. To add insult to injury, Internet Assigned Numbers Authority eventually decided to roll out a wide range of new top-level domains, such as .biz, .info, or .jobs - and is now attempting to allow arbitrary gTLD registrations. This last step promises to be a yet another nail to the coffin of sane cookie management implementations.
Net effect? All mainstream browsers had a history of embarrassing bugs in this area - and now ship with a giant, hairy, and frequently-updated lists of real-world "public suffix" domains for which cookies should not be set - as well as an array of checks to exclude non-FQDN, IPs, and pathological DNS notations of all sorts.

8K ought to be enough for for anybody

To make denial-of-service attacks a bit harder, it is well-understood that most web servers limit the size of a request they are willing to process; these limits are very modest - for example, Apache rejects request headers over 8 kB, while IIS draws the line at 16 kB. This is perfectly fine under normal operating conditions - but can be easily exceeded when the browser is attempting to construct a request with a lot of previously set cookies attached.

The specification neglected this possibility, offered no warning to implementators, and proposed no discovery and resolution algorithm. In fact, it mandated minimal jar size requirements well in excess of the limits enforced by HTTP servers:

"In general, user agents' cookie support should have no fixed limits. They should strive to store as many frequently-used cookies as possible. Furthermore, general-use user agents should provide each of the following minimum capabilities [...]:

* at least 300 cookies
* at least 4096 bytes per cookie (as measured by the size of the characters that comprise the cookie non-terminal in the syntax description of the Set-Cookie header)
* at least 20 cookies per unique host or domain name"

As should be apparent, the suggested minimum - 20 cookies of 4096 bytes each - allows HTTP request headers to balloon up to the 80 kB boundary.

Does this matter from the security perspective? At first sight, no - but this is only until you realize that there are quite a few popular sites that rely on user-name.example.com content compartmentalization; and that any malicious user can set top-level cookies to prevent the visitor from ever being able to access any *.example.com site again.

The only recourse domain owners have in this case is to request their site to be added to the aforementioned public suffix list; there are quite a few entries along these lines there already, including operaunite.com or appspot.com - but this approach obviously does not scale particularly well. The list is also not supported by all existing browsers, and not mandated in any way for new implementations.

"Oh, please. Nobody is actually going to depend on them."

In the RFC 2109 paragraph cited earlier, the specification pragmatically acknowledged that implementations will be forced to limit cookie jar sizes - and then, confusingly demanded that no fixed limits are put in place, yet specified minimum limits that should be obeyed by implementators.

What proved to be missing is any advice on a robust jar pruning algorithm, or even a brief discussion of the security considerations associated with this process; any implementation that enforces the recommended minimums - 300 cookies globally, 20 cookies per unique host name - is clearly vulnerable to a trivial denial-of-service attack: the attacker may use wildcard DNS entries (a.example.com, b.example.com, ...), or even just a couple of throw-away domains, to exhaust the global limit, and have all sensitive cookies purged - kicking the user out of any web applications he is currently logged into. Whoops.

It is worth noting that given proper warning, browser vendors would not find it significantly more complicated to structure the limits differently, enforce them on functional domain level, or implement pruning strategies other than FIFO (e.g., taking cookie use counters into account). Convincing them to make these changes now is more difficult.

While the ability to trash your cookie jar is perhaps not a big deal - or rather, the ability for sites to behave disruptively is also poorly mitigated on HTML or JavaScript level, making this a boring topic - the weakness has special consequences in certain contexts; see next section for more.

Be my very special cookie

Two special types of HTTP cookies are supported by all contemporary web browsers: secure, sent only on HTTPS navigation (protecting the cookie from being leaked to or interfered by rogue proxies); and httponly, exposed only to HTTP servers, but not visible to JavaScript (protecting the cookie against cross-site scripting flaws).

Although these ideas appear to be straightforward, the way they were specified implicitly allowed a number of unintended possibilities - all of which, predictably, plagued web browsers through the years. Consider the following questions:

  • Should JavaScript be able to set httponly cookies via document.cookie?

  • Should non-encrypted pages be able to set secure cookies?

  • Should browsers hide jar-stored httponly cookies from APIs offered to plugins such as Flash or Java?

  • Should browsers hide httponly Set-Cookie headers in server responses shared with XMLHttpRequest, Flash, or Java?

  • Should it be possible to drop httponly or secure cookies by overflowing the "plain" cookie jar in the same domain, then replace them with vanilla lookalikes?

  • Should it be possible to drop httponly or secure cookies by setting tons of httponly or secure in other domains?
All of this is formally permitted - and some of the aforementioned problems are prevalent to this day, and likely will not be fixed any time soon.

At first sight, the list may appear inconsequential - but these weaknesses have profound consequences for web application design in certain environments. One striking example is rolling out HTTPS-only services that are intended to withstand rogue, active attackers on open wireless networks: if secure cookies can be injected on easy-to-intercept HTTP pages, it suddenly gets a whole lot harder.

If it tastes good, who cares where it comes from?

Cookies diverge from JavaScript same-origin model in two fairly important and inexplicable ways:
  • domain= scoping is significantly more relaxed than SOP, paying no attention to protocol, port number, or exact host name. This undermines the SOP-derived security model in many compartmentalized applications that also use cookie authentication. The approach also makes it unclear how to handle document.cookie access from non-HTTP URLs - historically leading to quite a few fascinating browser bugs (set location.host while on a data: page and profit!).

  • path= scoping is considerably stricter than what's offered by SOP - and therefore, it is completely useless from the security standpoint. Web developers misled by this mechanism often mistakenly rely on it for security compartmentalization; heck, even reputable security consultants get it completely wrong.
On top of this somewhat odd scoping scheme, conflict resolution is essentially ignored in the specification; every cookie is identified by a name-domain-path tuple, allowing identically named but differently scoped cookies to coexist and apply to the same request - but the standard fails to provide servers with any metadata to assist in resolving such conflicts, and does not even mandate any particular ordering of such cookies.

This omission adds another interesting twist to the httponly and secure cookie cases; consider these two cookies:

Set on https://www.example.com/:
  FOO=legitimate_value; secure; domain=www.example.com; path=/

Set on http://www.example.com/:
  FOO=injected_over_http; domain=.example.com; path=/

The two cookies are considered distinct, so any browser-level mechanisms that limits attacker's ability to clobber secure cookies will not kick in. Instead, the server will at best receive both FOO values in a single Cookie header, their ordering dependent on the browser and essentially unpredictable (and at worst, the cookies will get clobbered - a problem in Internet Explorer). What next?

Character set murder mystery

HTTP/1.0 RFC technically allowed high-bit characters in HTTP headers without further qualification; HTTP/1.1 RFC later disallowed them. Neither of these documents provided any guidance on how such characters should be handled when encountered, though: rejected, transcoded to 7-bit, treated as ISO-8859-1, as UTF-8, or perhaps treated in some other way.

The specification for cookies further aggravated this problem, cryptically stating:

"If there is a need to place such data in the name or value, some encoding method such as URL style %XX encoding is recommended, though no encoding is defined or required."

There is an obvious problem with saying that you can use certain characters, but that their meaning is undefined; the systemic neglect of this topic has profound consequences in two common cases where user-controlled values frequently appear in HTTP headers: Content-Disposition is one (eventually "solved" with browser-specific escaping schemes); another is, of course, the Cookie header.

As can be expected, based on such poor advice, implementators ended up with the least sensible approach; for example, I have a two-year-old bug with Mozilla (418394): the problem is that Firefox has a tendency to mangle high-bit values in HTTP cookies, permitting cookie separators (";") to suddenly materialize in place of UTF-8 in the middle of an otherwise sanitized cookie value; this led to more than one web application vulnerability to date.

A session is forever

The last problem I want to mention in this post is far less pressing - but is also an interesting testament to the shortcomings of the original design.

For some reason, presumably due to privacy concerns, the specification decided to distinguish between session cookies, meant to be non-persistent; and cookies with a specified expiration date, which may persist across browser sessions, are stored on the disk, and may be subject to additional client-enforced restrictions. On the topic of the longevity of the former class of cookies, the RFC conveniently says:

"Each session is relatively short-lived."

Today, however, this is obviously not true, and the distinction feels misguided: with the emergence on portable computers with suspend functionality, and the increased shift toward web-oriented computing, users tend to keep browsers open for weeks or months at a time; session cookies may also be stored and then recovered across auto-updates or software crashes, allowing them to live almost indefinitely.

When session cookies routinely persist longer than many definite-expiry ones, and yet are used as a more secure and less privacy-invasive alternative, we obviously have a problem. We probably need to rethink the concept - and either ditch them altogether, or impose reasonable no-use time limits at which such cookies are evicted from the cookie jar.

Closing words

I find it quite captivating to see the number of subtle problems caused by such a simple and a seemingly harmless scheme. It is also depressing how poorly documented and fragile the design remains some 15 years later; and that the introduction of well-intentioned security mechanisms, such as httponly, only contributed to the misery. An IETF effort to document and clarify some of the security-critical aspects of the mechanism is underway only now - but it won't be able to fix them all.

Some of the telltale design patterns - rapid deployment of poorly specified features, or leaving essential security considerations as "out of scope" - are still prevalent today in the browser world, and can be seen in the ongoing HTML5 work. Hopefully, that's where the similarities will end.

27 comments:

  1. Do you think HTML5's sessionStorage and localStorage can eventually replace the use of cookies completely and do they have any of the same problems?

    ReplyDelete
  2. They do not have many of these problems, most notably the ones related to broad scoping and general divergence DOM SOP. It also does not feature any pruning mechanisms, leaving the burden of keeping the store clean on the application itself.

    These two properties unfortunately makes the mechanism a bit harder to use, so I do not see it displacing cookies in the short run. Local storage also suffers from fairly poor visibility and management capabilities today, but this can be corrected.

    ReplyDelete
  3. Cookies are just one small example of how HTTP is a *horrible* protocol itself, which has been distorting TCP for entirely too many years.

    ReplyDelete
  4. I've been promoting separate session mechanisms in browsers for years. Something like a public/private keypair in the browser to sign session challenges from the server. In private mode the keypair could be generated per site and web session, in normal mode the keypair could be generated per site or even per browser invokation.

    Any thoughts on that?

    Also, I've found that app servers such as Tomcat enforces different requirements on outgoing versus incoming cookies. It for instance accepts JavaScript as cookie value on the way out but silently drops such cookies on the way in.

    ReplyDelete
  5. Nice article. But why is it posted on an anonymous blog? Why doesn't the author take ownership?

    And Mark Atwood, why is http a horrible protocol? You do know cookies are not part of the http protocol?.

    ReplyDelete
  6. Both the cookies and the user agent have been left up to browser implemenations for far too long, making use of that data rather broken.

    Even the "referrer" is no longer very useful between url shorteners and sites that produce 250 character long url's to maintain session state when cookies are disabled.

    ReplyDelete
  7. Sounds to me like a classic case of "do the least thing that can possibly work."

    ReplyDelete
  8. @Mark Atwood - I feel the same way. A new protocol should be created and http should be retired.

    ReplyDelete
  9. I think the problem with HTTP is that it was never meant to scale to the proportion that it's at today. It was originally designed as a replacement to Gopher (which was a text-only document web system) to provide graphics and sound capabilities.

    Perhaps someday someone will roll out a new web application system that does not rely on HTTP...

    ReplyDelete
  10. This is the most comprehensive, digestible, high-level post on the state of cookie implementations I've seen to date. Well done. Obviously each one of your sections can be blown out into it's own document, but you do a great job hitting a broad set of issues around cookie impls.

    Lou and I worked together on the cookie implementation way back in the day at Netscape. We constantly got requests for "how do cookies work?," or "where's the cookie spec?" As it turns out, to this day, the only real "spec" is the original description document (http://curl.haxx.se/rfc/cookie_spec.html). That doc was hosted on netscape.com until a few years ago when AOL knocked it down.

    The RFC's never really went anywhere, and all browsers yield to the original description document for implementation (with some storage related nuances that aren't as interesting as the underlying rules for domain scoping and expiration). Cookie reflection in JS is yet another manual, but suffice it to say, the JS objects ultimately rely on the underlying HTTP cookie/set-cookie headers, so if you ever want to fully understand how cookies work, you just have to grok the description doc as it overrides any JS foolery.

    long live cookie_spec.html!

    ReplyDelete
  11. @John Wilander: Adam Barth proposed "cake" as a lightweight alternative to HTTP cookies (http://tools.ietf.org/html/draft-abarth-cake-00). That said, I sort of suspect we're stuck with cookies, the same way we're stuck with SMTP.

    @Edwin Martin: There is no particular conspiracy, it's just my blog (and I'm pretty well-known by that handle); it's linked from my home page, and so forth.

    @amy: Failure to account for the unknown is unfortunate, but generally excusable; the really frustrating part about HTTP cookies is that they were designed and implemented not that long ago, when we already understood the security properties of the Internet pretty well; and it failed to account for *existing* considerations, such as the whole ccTLD mess. They narrowly predate DOM SOP, but both technologies were devised by the same product team, too.

    ReplyDelete
  12. @Edwin Martin - HTTP is broken in several ways. I will list some of the more egregious ones here:

    * The Slashdot effect is a direct result of how HTTP works, and it should not happen. A well architected protocol would allow even a server on a DSL line to host something incredibly popular. There will always be scalability issues for websites that interact with their users, but most websites serve up relatively static data, and they should not break under heavy load, no matter how tiny a pipe they're at the end of.
    * Lack of any kind of session handling. Cookies are a poor mans attempt to graft this on.
    * Not encrypted by default.
    * The one attempt to make encryption work breaks the ability of one server to host multiple websites.
    * The deployed encryption solution requires trusting a small number of entities with nebulous responsibilities that they are not necessarily well-equipped to carry out.
    * Lack of any kind of built-in session support means that in order for a POST to be secure from XSS attacks you have to be inordinately careful with exactly how it's implemented (i.e. your POST data must indicate which session its from in order to disambiguate it from a random POST generated from another website).

    ReplyDelete
  13. Cookies are just one small example of how HTTP is a *horrible* protocol itself, which has been distorting TCP for entirely too many years.

    TCP itself is nothing to write home about either. In an ideal world, a well designed HTTP replacement wouldn't use it at all. Something like SCTP would be much better.

    ReplyDelete
  14. @Omnifarious

    * How does your well architected protocol handle hundreds or thousands of document requests per second on a bandwidth starved DSL line? Imagination? Doing something like that even now is complicated & resource intensive. Something like bittorrent is neither lightweight or simple to implement. If you're talking about server resources being overloaded, that is not an issue with a properly configured server. NGINX, Apache,LigHTTPd even IIS can handle a Slashdotting. Parsing HTTP headers not intensive, it's usually dynamic languages or database backends that chew up your CPU cycles. Which has zero to do with HTTP.

    * Long lived session handling was not the original intent of HTTP. The only command available when it was introduced was "GET". It was meant to be used in the context of client requesting, server serving. It's like asking why your bran flakes don't have marshmallows in them. Sure maybe it'd be nice, but usually people are eating bran flakes to get fiber, not marshmallows.

    * How many protocols out there are encrypted by default? How about the resource costs involved 20+ years ago as far as encrypting or decrypting traffic? HTTP was initially used and somewhat designed to be used between trusted sources. Vital information like peoples credit card numbers weren't being transferred back then & wouldn't really be used en masse for about a decade. Man-in-the-middle attacks were unlikely between large universities & even if it did occur, vital information wasn't meant to be transferred over HTTP in the first place.

    * Blaming HTTPS' failings on HTTP is shortsighted.

    * Sounds like you're repeating yourself.

    HTTP was designed in the context of clients requesting and servers serving. The web exploded in popularity throughout the 90s, but a lot of people did not take the web seriously and I think this lead to a lot of people rushing for solutions rather than thinking everything through. Combine this with the fact that no one really was in charge of the web, it morphed almost like a virus. This can be seen in HTTP's evolution, HTML's evolution & Javascript's evolution. There were people doing a bunch of things slightly differently, but no one struck out on their own to create a "New HTTP" because more than likely the other guys wouldn't support it and no one would care, you'd just waste money inventing a new protocol. If you were to walk in somewhere and say "oh hey, I've got this great new software that's going to fix all your problems, but it doesn't work with any of your current server software/programs or any of the existing client software & the protocol & methodology is different than what your staff is trained to support, but I assure you once that's all worked out it's really great...". People aren't going to buy into it.

    In a way this hasn't been a bad thing as it's prevented certain companies from converting the web/internet into their own private proprietary network. I wonder what would have happened had MS not been shortsighted and had invested heavily in dominating the web/internet. We might have seen a very different landscape than we do now. Perhaps with a protocol worse than HTTP, even though HTTP is not terrible for what it was meant to do. Yes we've probably outgrown it, but it has gotten us this far & there aren't any viable alternatives in the near future so we must deal with it.

    ReplyDelete
  15. @Gabriel - I do not care why HTTP was designed or when. I care that it is currently inadequate and downright dangerous for what it's being used for. Understanding it's history is only useful from the standpoint of designing for the future. I ever understand why people want to defend something that is inadequate and bad by saying that it was designed that way in the past. It's not like I personally attacked the designers, and by defending the original design decisions it's implied that I am.

    In a well architected protocol the vast majority of requests would not even talk to the original server. The server would hand out a few signed statements stating which bits of static data corresponded to which URLs and then a distributed cache would handle the rest. Ideally, the information about which site corresponded to which public key would be cached and so the server could go completely offline and the website wouldn't go down.

    Bittorrent is totally unsuited to what HTTP is used for because it doesn't really help until you have at least 50M to transfer, and I wouldn't imagine that a bittorrent-like protocol would be an appropriate replacement for HTTP.

    Not bothering to design better things because of a set of things that already exist that seem to solve a similar problem would've resulted in a whole host of things that we take for granted today never having been created. If something is better in the right ways, and captures the imaginations of the right people, it will be switched to.

    And you're right. I began realizing that a lot of the problems I was highlighting can be traced to either poor handling of encryption, poor session management, poor caching, or some combination thereof.

    Personally, I blame HTTP and its failings on the current abysmal state of telecommunications. So many people designing an infrastructure for centralized control and management because it's what they're used to and the architecture of HTTP encouraged it. If protocols treated everybody as first class citizens from the outset, maybe the cable and telephone companies wouldn't have decided that an asymmetric physical infrastructure is what suited their customers the best.

    ReplyDelete
  16. @Mark Butler - SCTP is to TCP as Subversion is to CVS.

    ReplyDelete
  17. lcamtuf,
    You do point out problems with the specification and standardization process, but what suggestions do you have for fixing the problem? Do you wish to do away with the IETF for failing to produce an RFC consistent with real world implementations? Do you want to cast blame on the browsers for not breaking their existing user base to conform to the whims of the IETF? Do you propose some other standards body that can produce a perfect spec? Would you rather have no cookies at all?

    Certainly anything created by humans will have flaws. My credo when creating cookies, was and still is today: "perfect is the enemy of done". I would say that my experience dealing with the IETF to try and produce a standard that was in tune with reality was extremely frustrating, disappointing and not one I would like to repeat any time soon.

    Having been through the standardization "ringer", I am all too aware of the pitfalls of the process and I am eager to hear your solution to the problem. I am sure you will be able to design a process in which everyone will be heard and yet a thoughtful, complete and perfect specification will be produced in period of time suitable for an implementation.

    I look forward to hearing your solution,
    :lou

    ReplyDelete
  18. I do not have any good solutions, and I am also acutely aware of the fact that in the web world, standardization has always been an afterthought, often for political reasons. For example, during the first browser wars, W3C seemed to struggle to keep up documenting where browsers were actually headed; and some vendors certainly weren't helping to get there.

    However, cookies are a fairly recent development that proved to be particularly problematic in security engineering - and an example that seems striking in that this standardization catch-up never happened here; that both the original documentation and the subsequent RFCs diverged from reality more than would seem prudent; and more importantly, that all standardization aside, there were quite a few things that probably should have been obvious when drafting the proposals.

    These things probably could have been done better with the resources at hand, including the then-emerging security community.

    That said, hindsight is always 20/20, so I certainly can take my own views with a grain of salt. My point isn't pointing fingers; but rather, pointing out that this is interesting study in the pitfalls we should probably be trying to avoid for future web technologies, no matter how simple they may at first appear.

    ReplyDelete
  19. Michal,
    To put it in perspective, cookies may seem fairly recent, but they do indeed predate many of other technologies of the web. Javascript or any web browser scripting language did not exist, nor did SSL or XML or AJAX. (HTTP/1.1, flash, frames, java, etc...) Netscape was about 14 people in the summer of '94 when cookies were designed.

    The domain problem that you use as an example is interesting in that there is no solution that can satisfy everyone. DNS names don't work in a way that can solve the problem, yet that is what we have to work with. The compromise of the two dot rule works for many cases, but can't work for all of the country domains. This was one of the issues debated at length in the IETF RFC process and yet no proposal ever substantially improved on the original simple design. Perhaps we could have eliminated the feature, but that could have had an adverse security impact since it would preclude separate servers for a secure checkout process.

    I think that there is perhaps a common belief that problems with implementations are there because they were never discussed or designed by idiots. The truth is usually far more complicated and is hard to express in a final form document.

    Cookies, along with SSL and other features put forth by Netscape at the time, helped to foster trust and commerce on the WWW. These features helped to bring interest and capital to the platform allowing us to have this conversation via the web rather than on AOL or MSN. Waiting for further review prior to implementation might have made things more seamless, but it might have caused the web to never have grown into what it is today.

    ReplyDelete
  20. Yeah, sure - unlike some of the commenters here and on Slashdot, I am pretty far from claiming that cookies are harmful, were designed by clueless people, or that we have not benefitted from this technology in some important and interesting ways. In fact, I do think that the net effect of pretty much any web-related technology embraced in the past 15 years is quite positive, so in a sense, arguing over the fine print is mostly pointless.

    Now, cookies proved to be far more problematic than SOP proper, and it's a shame their security properties were not explored and documented well - which makes them interesting from my perspective. I am not singling this technology out, though - I also had posts about problems with other core bits of the web, such as framed browsing:

    http://lcamtuf.blogspot.com/2010/10/attack-of-monster-frames-mini.html

    ...and defended cookies in the past on other counts:

    http://lcamtuf.blogspot.com/2010/08/cookies-v-people.html

    ReplyDelete
  21. I wonder why we need cookies at all. It'd be better if users had user-specific credentials that when combined with a unique key per website, the domain name or a longer URL component indicating the unique site, would generate a unique one-way hash per user per website that would be reasonably permanent and transfer between browsers and computers by simply typing a couple key phrases in. A semi-permanent anonymous unique id. Why store anything more on the user's computer?

    ReplyDelete
  22. There are benefits to being able to keep application state on client side, especially in large-scale websites. But I am not sure how much of a deal this is; Adam's "cake" proposal (linked in earlier comments) deals away with this aspect.

    ReplyDelete
  23. @Omnifarious

    The Slashdot effect is not a direct result of how HTTP operates; rather it is a shortcoming in their path to their consumers.

    The HTTP protocol provides room to account for crowd-sourcing; as the host of the document(s) you need to ensure that you properly establish the caching headers _and_ your hosting path has HTTP caching/proxy servers in that path. Given proper caching information and an ISP provider (the most important aspect) a site running over 54Kb modem connection can easily serve well over 1 million users. It really comes down to how far is your static content cached and distributed.

    In any case you are confusing state management with security/authentication. This is why authentication at the HTTP layer is a different header from cookies. When you read that requests should be performed over HTTPS, don't presume that it means to protect cookies, but everything...especially the traditional BASIC authentication.

    If you're really concerned about security vulnerabilities, then enforce Digest authentication. You'll get running nonces to avoid replay attacks that you have with Basic auth; while avoiding the 'overhead' of always using an SSL connection. If that's not secure enough then insist that all your HTTP clients must use SSL certificates to login to your server.

    In all cases you'll find that your fallacy is in assuming that HTTP cookies are the same as an authentication token. They are not. Cookies are a memento--a value that the client just passes back when the conditions match.

    Don't agree? Then come up with a new application protocol that all web providers will allow to pass over their networks and nearly all computers in the world will understand. Best of luck, until then I'll ensure that I keep a separation of my cookies from AuthC/AuthZ.

    brs,

    eduardo

    ReplyDelete
  24. And all these problems surface on cookies, a well-known feature of HTTP. Just imagine what happens to semi-forgotten protocols like Web Proxy Autodiscovery.

    Like cookies, WPAD relies on the number of dots to prevent you from setting up a proxy for entire countries. But some implementations are broken, and some TLDs include more than one dot... you get the point.

    A polish guy actually registered many "interesting" domains and was apparently trying to intercept traffic to two polish e-banking sites. We have the full story on our blog
    (http://allievi.sssup.it/techblog/?cat=36).

    ReplyDelete
  25. "Character set murder mystery"
    To be honest, I think many protocols have this mistake. Particularly in the earlier days, designers would define a set of 8-bit octets to mean "ASCII" without even thinking about what values above 127 mean.

    ReplyDelete

Note: Only a member of this blog may post a comment.