Content deleted Content added
Rescuing 1 sources and tagging 1 as dead.) #IABot (v2.0 |
|||
(43 intermediate revisions by 31 users not shown) | |||
Line 1:
{{Short description|Capability that can be built into web servers and web clients}}
{{HTTP}}
'''HTTP compression''' is a capability that can be built into [[web server]]s and [[web client]]s to improve transfer speed and bandwidth utilization.<ref>{{cite web|url=http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/d52ff289-94d3-4085-bc4e-24eb4f312e0e.mspx?mfr=true|title=Using HTTP Compression (IIS 6.0)|
There are two different ways compression can be done in HTTP. At a lower level, a Transfer-Encoding header field may indicate the payload of
==Compression scheme negotiation==
1. The [[web client]] advertises which compression schemes it supports by including a list of tokens in the [[HTTP request]]. For ''Content-Encoding'', the list is in a field called ''Accept-Encoding''; for ''Transfer-Encoding'', the field is called ''TE''.
<
GET /encrypted-area HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip, deflate
</syntaxhighlight>
2. If the server supports one or more compression schemes, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a ''Content-Encoding'' or ''Transfer-Encoding'' field in the HTTP response with the used schemes, separated by commas.
<
HTTP/1.1 200 OK
Date: mon, 26 June 2016 22:38:34 GMT
Line 27 ⟶ 28:
Content-Type: text/html; charset=UTF-8
Content-Encoding: gzip
</syntaxhighlight>
The [[web server]] is by no means obligated to use any compression method – this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.
==Content-Encoding tokens==
The official list of tokens available to servers and client is maintained by IANA,<ref>{{cite web|url=
*br – [[Brotli]], a compression algorithm specifically designed for HTTP content encoding, defined in {{IETF RFC
*[[compress]] – UNIX "compress" program method (historic; deprecated in most applications and replaced by gzip or deflate)
*deflate – compression based on the [[DEFLATE|deflate]] algorithm (described in
*exi – W3C [[Efficient XML Interchange]]
*[[gzip]] – GNU zip format (described in
*[[Identity function|identity]] – No transformation is used. This is the default value for content coding.
*[[Pack200|pack200-gzip]] – Network Transfer Format for Java Archives<ref>{{cite web|url=https://jcp.org/en/jsr/detail?id=200|title=JSR 200: Network Transfer Format for Java Archives|publisher=The Java Community Process Program}}</ref>
*[[zstd]] –
In addition to these, a number of unofficial or non-standardized tokens are used in the wild by either servers or clients:
*[[bzip2]] – compression based on the free bzip2 format, supported by [[lighttpd]]<ref>{{cite web|url=http://redmine.lighttpd.net/projects/1/wiki/Docs_ModCompress|title=ModCompress - Lighttpd|publisher=lighty labs|
*[[Lempel–Ziv–Markov_chain_algorithm|lzma]] – compression based on (raw) LZMA is available in Opera 20, and in elinks via a compile-time option<ref>[http://elinks.or.cz/documentation/html/manual.html-chunked/ch01s07.html#CONFIG-LZMA elinks LZMA decompression]</ref>
*peerdist<ref>{{cite web|url=http://msdn.microsoft.com/en-us/library/dd304322%28v=PROT.10%29.aspx|title=[MS-PCCRTP]: Peer Content Caching and Retrieval: Hypertext Transfer Protocol (HTTP) Extensions|publisher=Microsoft|
*[[rsync]]<ref>{{cite web |title=rproxy: Protocol Definition for HTTP rsync Encoding |url=https://rproxy.samba.org/doc/protocol/protocol.html |website=rproxy.samba.org}}</ref> – [[Delta_encoding#Delta_encoding_in_HTTP|delta encoding in HTTP]], implemented by a pair of ''rproxy'' proxies.
*xpress
*[[XZ Utils|xz]]
==Servers that support HTTP compression==
*[[SAP NetWeaver]]
*[[Internet Information Services|Microsoft IIS]]: built-in or using third-party module
*[[Apache HTTP Server]], via '''[
*[[Hiawatha (web server)|Hiawatha HTTP server]]: serves pre-compressed files<ref>{{cite web|url=http://www.hiawatha-webserver.org/manpages|title=Extra part of Hiawatha webserver's manual}}</ref>
*[[Cherokee (Webserver)|Cherokee HTTP server]], On the fly gzip and deflate compressions
*[[Oracle iPlanet Web Server]]
*[[Zeus Web Server]]
*[[lighttpd]]
*[[nginx]] – built-in
*Applications based on [[Tornado (web server)|Tornado]], if "compress_response" is set to True in the application settings (for versions prior to 4.0, set "gzip" to True)
Line 73 ⟶ 72:
*[[HAProxy]]
*[[Varnish (software)|Varnish]] – built-in. Works also with [[Edge Side Includes|ESI]]
*[https://line.github.io/armeria/ Armeria] – Serving pre-compressed files<ref>{{cite web|url=https://line.github.io/armeria/server-http-file.html#serving-pre-compressed-files|title=Serving static files part of Armeria's documentation}}</ref>
*[[NaviServer]] – built-in, dynamic and static compression
*[[Caddy (web server)|Caddy]] – built-in via [https://caddyserver.com/docs/caddyfile/directives/encode encode]
Many [[content delivery network]]s also implement HTTP compression to improve speedy delivery of resources to end users.
The compression in HTTP can also be achieved by using the functionality of [[server-side scripting]] languages like [[PHP]], or programming languages like [[Java (programming language)|Java]].
Various online tools exist to verify a working implementation of HTTP compression. These online tools usually request multiple variants of a URL, each with different request headers (with varying Accept-Encoding content). HTTP compression is considered to be implemented correctly when the server returns a document in a compressed format.<ref>{{ cite web|url=https://httptools.dev/gzip-brotli-check|title=How does the gzip compression check work? }} httptools.dev, retrieved 10 April 2022.</ref> By comparing the sizes of the returned documents, the effective compression ratio can be calculated (even between different compression algorithms).
==Problems preventing the use of HTTP compression==
A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted<ref name="google-use-compression">{{cite web|url=https://developers.google.com/speed/articles/use-compression|title=Use compression to make the web faster|
Another problem found while deploying HTTP compression on large scale is due to the '''deflate''' encoding definition: while HTTP 1.1 defines the '''deflate''' encoding as data compressed with deflate (RFC 1951) inside a [[zlib]] formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream,<ref>{{cite web|url=https://stackoverflow.com/questions/9170338/why-are-major-web-sites-using-gzip/9186091#9186091|title=deflate - Why are major web sites using gzip?|publisher=Stack Overflow|
==Security implications==
{{main article|CRIME|BREACH}}
Compression allows a form of [[chosen plaintext]] attack to be performed: if an attacker can inject any chosen content into the page, they can know whether the page contains their given content by observing the size increase of the encrypted stream. If the increase is smaller than expected for random injections, it means that the compressor has found a repeat in the text, i.e. the injected content overlaps the secret information. This is the idea behind CRIME.
In 2012, a general attack against the use of data compression, called [[CRIME]], was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined.
In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting a malicious web link.<ref name=Gooin20130801>{{cite web|last=Goodin|first=Dan|title=Gone in 30 seconds: New attack plucks secrets from HTTPS-protected pages |url=https://arstechnica.com/security/2013/08/gone-in-30-seconds-new-attack-plucks-secrets-from-https-protected-pages/|work=Ars Technica|publisher=Condé Nast|
As of 2016, the TIME attack and the HEIST attack are now public knowledge.<ref>{{cite web|last=Sullivan|first=Nick|title=CRIME, TIME, BREACH and HEIST: A brief history of compression oracle attacks on HTTPS |url=https://www.helpnetsecurity.com/2016/08/11/compression-oracle-attacks-https/|
==References==
Line 94 ⟶ 102:
==External links==
*
*{{IETF RFC|9110|link=no}}: HTTP Semantics
*[
*[http://redmine.lighttpd.net/projects/lighttpd/wiki/Docs:Modcompress Compression with lighttpd]
*[http://www.codinghorror.com/blog/2004/08/http-compression-and-iis-6-0.html Coding Horror: HTTP Compression on IIS 6.0] {{Webarchive|url=https://web.archive.org/web/20140206020708/http://www.codinghorror.com/blog/2004/08/http-compression-and-iis-6-0.html |date=2014-02-06 }}
*{{webarchive |url=https://web.archive.org/web/20110716033901/http://www.15seconds.com/Issue/020314.htm |date=July 16, 2011 |title=15 Seconds: Web Site Compression }}
*[http://www.serverwatch.com/tutorials/article.php/3514866 Using HTTP Compression] {{Webarchive|url=https://web.archive.org/web/20160314155152/http://www.serverwatch.com/tutorials/article.php/3514866 |date=2016-03-14 }} by Martin Brown of Server Watch
*[https://web.archive.org/web/20060411174003/http://www.
*[https://web.archive.org/web/20120430023716/https://banu.com/blog/38/dynamic-and-static-http-compression-with-apache-httpd/ Dynamic and static HTTP compression with Apache httpd]
|