Intent
Transfer a defined set of information from its owner to an
anonymous client in a form the client understands.
Motivation
Servers often wish to expose information for general client consumption.
In doing so, they do not wish to to introduce unnecessary coupling with the
client. GET permits servers to expose information without exposing the method
by which that information is produced, and without needing to keep track of
individual clients.
The server wants to be able to support clients of different ages, some
of whom will share the latest and greatest understanding of how to encode
and parse particular kinds of data. Others will be left with legacy
implementations. Likewise, the server may have been deployed for some time
and may be running a legacy implementation while upgraded clients are present
in the architecture.
Clients seek to minimise load on the server in processing their requests,
load on the network in transferring messages, or on themselves in
processing responses. Clients also need to be able to deal with possible error
conditions, including communication failures.
The GET pattern provides a clean client/server separation that is able to
survive independent upgrades of each over time, exercises control over traffic
and processing waste, and deals with possible errors.
Applicability
GET is appropriate whenever a client wants to acquire the whole
of the information behind a known URL, and can decide when it wants
to issue the request (subject to a cache miss).
Here are some common means by which a URL is discovered:
- Direct entry allows a user to enter the URL through an input device
- Configuration allows a document to be prepared ahead of time with links that have particular meaning to the client
- Hyperlinking is a generalisation of configuration. The document that
contains meaningful links may be acquired from anywhere, including an earlier
completed GET request
- Construction is the assembly of information available to the client into
a URL format agreed with the server. This may be achieved by populating a form
supplied by the server in an earlier GET request.
Methods of determining when to issue a GET request include:
- One-shot, the issuing of a request at a predefined time, when the URL
first becomes known, or when the client needs access to the information
- Cyclic, the issuing of a request at a predefined rate while the client is active
- On cache expiry, the issuing of a request whenever a cache entry expires.
Note that this requires the server to set a maximum age on cache entries,
something that is not always provided.
- Otherwise-triggered, the issuing of a request based on some form of
back-channel that indicates information at a given URL may have changed
Structure
Participants
- Client
-
- Keeps a URL that lets it access the Server
- Issues a GET request that includes a condition and weighted acceptable types list
- Is capable of parsing all forms that the data might be encoded in that are semantically rich enough to use
- Selects the right parser implementation to use based on the returned document type
- (optional) Retains a cache of past successful GET responses and
their related cache control information
- Is responsible for overall successful execution of the operation,
including modifications to the request and resubmissions of the
request
- Treats a lost response as equivalent to a Resubmit response with
no required changes
- Aborts the operation on a failure response, on a resubmission
response that cannot or will not be satisfied, or on a lost response
after too many retries.
- Server
-
- Evaluates any condition supplied in the GET request before performing significant processing
- Selects the information to return based on the supplied URL
- (optional) Is configured with mechanism to require the client to resubmit their request with or without modifications
- Guarantees that a GET request is a read-only operation that is
never interpreted by the server as a request to "buy an airline
ticket". The server may choose to update log files and other
information, but is not free to behave as if the client has requested
or authorised the change.
- Can return the requested information in various formats. Any format
which the client might reasonably request with its acceptable types
list should be supported.
- Selects the most appropriate encoding based on the supplied
weighted acceptable types list and any preference it may have itself,
and returns the document in that format
Collaboration
- Client issues requests to Server via the Request Interface, modifying
and resubmitting its request as needed until:
- A success response is elicited
- The request condition is not met, meaning that the cached response is still valid
- A failure response is elicited
- The client is unable to make changes required by a Resubmit response
- Client policy prevents either changes required by a Resubmit response, or
further resubmissions in general
Consequences
The GET pattern introduces a Uniform Interface for transferring identified
sets of information from server to client. Clients and servers of different
ages can communicate without impediment, and communication failures can be
overcome.
The use of an acceptable types list in a GET request means that clients
built during different phases of the architecture will generally be able to
communicate. Document-based communication has a degree of flexibility built
in with must-ignore parameters. The acceptable types list fills a gap
when incompatible changes occur to the set of document types, for example
a new type deprecates an old type such as atom depreciating rss for news
feed syndication.
An explicit failure response allows problems in the architecture to be
reported and repaired as required. The resubmit feature allows temporary
or permanent changes to the architecture to be accommodated by components
without explicit reconfiguration, simplifying management. Note, however,
the potential security implications of allowing one component to
reconfigure others. A predefined policy for which modifications are
permitted and which are to be treated as failure cases can be useful in
security-sensitive environments.
The potential exists in common transports such as HTTP for requests sent
down parallel TCP connections or pipelined requests to be processed in a
different order to that in which they actually return to Client. This could
cause the client to become confused by "seeing" an older state after a more
recent state. A simple solution is to hold off sending a GET request to a given
URL when the previous related GET has not yet returned.
A client that is holding off sending the next GET request should queue the
first such request for the identified URL. After this point it should not
queue another GET request to the URL until the previous has been transmitted.
There is no point queuing up multiple GET requests for the same URL. If the
first request has not been issued by the time motivation to issue a second
request comes around, a single request will fulfil the motivation behind both.
Twin consequences of the GET pattern are that interim states at URLs
may be missed, and that the architecture as a whole does not become overloaded
as the architecture is put under stress. Each GET retrieves the current state
of the resource, so rapid changes may see the next GET arrive several changes
after an earlier GET. These states will be lost unless an additional buffering
mechanism is employed. The client will read back the current state rather than
the old transitional states.
The flip-side of this behaviour is that clients are never stuck reading
old data. They come completely up to date quickly and process the latest
information available. Many algorithms for real-time processing will behave
better under this scenario than if they are fed through old changes. The GET
pattern can be adapted to a buffering model for algorithms that are sensitive
to losses of interim states.
Implementation
GET can be implemented with HTTP using the following mappings:
-
GET(url, condition, weighted acceptable types list)
-
GET url HTTP/1.1
Accept: weighted acceptable types list
If-condition
-
Success(document, type, cache)
-
HTTP/1.1 200 OK
Content-Type: type
Cache-Control: cache
document
All 2xx series response codes can be treated as Success responses for GET
-
Condition Not Met()
-
HTTP/1.1 304 Not Modified
-
Fail(reason)
-
HTTP/1.1 400 Bad Request
reason
Unknown 1xx series response codes can be treated as a Fail for GET.
300 Multiple Choices is a non-implementable Resubmit response for automated
clients, so should also be treated as Fail alongside other 3xx series codes that
are not understood. 4xx series response codes are Fail, except for 401 Unauthorised and 407 Proxy Authentication Required. These are Resubmit responses and
should only be treated as failures if they are not understood. 5xx series
responses should be treated as Fail, except for 503 Service Unavailable and
504 Gateway Timeout. These are Resubmit and Response Lost responses,
respectively.
-
Resubmit(required changes)
-
Any of: 301 Moved Permanently, 302 Found, 303 See Other, 305 Use Proxy,
307 Temporary Redirect, 401 Unauthorized, or 407 Proxy Authentication Required.
-
Response Lost()
-
Any loss of communication before a response is received. This may
include application or TCP/IP level timeouts, or an explicitly terminated
connection. The 504 Gateway Timeout response is also equivalent to
Response Lost, and indicates a loss occured somewhere past the TCP connection
made directly by the client.
Sample Code
Request request;
request.url="http://example.com/publication-dates"
if cache_manager.fresh(request.url)
{
// Do nothing. Our cache entry is still fresh.
}
else if (blocked())
{
// Only queue one request for the URL
request_pending(url) = true
}
else
{
try_again:
request.accept=parser.accept
request.condition=cache_manager.condition(request.url)
switch (request())
{
Success(document, type, cache):
cache_manager.update(document, type, cache)
process(parser(document, type))
Condition Not Met():
// Do nothing.
// We have already processed the
// latest data with our last request.
Fail(reason):
log(reason)
Resubmit(required_changes):
if policy(request, required_changes)
request.modify(required_changes)
jump try_again
else
log("Policy forbids request modification")
Response Lost():
if policy(request, no required changes)
jump try_again
else
log("Too many retries")
}
}
Known Uses
GET is widely used on the Web, both under direct human control and under
automation. Various aspects of GET are not always used well.
Common errors in applying the GET pattern include:
- "Unsafe" GET handling by servers, where GET is treated as an update request.
- Not including the acceptable types list, meaning that the deployed client
component will not be readily handled by an upgraded server component. The
wrong document type may be returned.
- Returning the wrong type with a document, or using heuristics based on the
URL to determine which parser to invoke for a returned document.
- Using type identifiers that are too generic. Type specifications such as
application/xml or application/rdf+xml
could match multiple formats for the return of data,
application/atom+xml and application/atom+rdf+xml
allow the server to choose a document to return that is more likely to be
understood when the client parses it.
- Returning a different document from the same URL based on session state.
GET requests should not create information on the server that has to be tracked
and available when the client's next GET request arrives. The client should be
able to issue its next request at any time without further coordination with
the server. Sessions may be used to short-cut expensive processing over a
series of requests. However, an expired or lost session should not cause a
given request to fail or fail to be understood.
Related Patterns