Hyper Text Transfer Protocol (HTTP)
HTTP • HTTP defines how Web pages are requested and served on the Internet • Early servers and browsers used an ad-hoc approach • A standardized protocol, called HTTP/1.0, was derived from this • The earlier approach is now called HTTP/0.9 • Later, HTTP/1.0 was extended to HTTP/1.1 • The protocol versions are upwardly compatible • servers and browsers which can handle HTTP/1.1 can also handle HTTP/1.0 and HTTP/0.9
History: “HTTP/0.9” • HTTP/0.9 was very simple: • A browser would send a request like this to a server: GET /hobbies.html • In response, the server would send the contents of the requested file. • Only GET requests were supported • Only a file path and name could appear in a GET request • The response had to be a HTML document.
History (contd.) • Different browsers/servers soon extended this basic scheme in various ways • To achieve some standardization, the HTTP/1.0 protocol was specified, in 1996, in a document called RFC1945 • (for historical reasons, an Internet standard spec is called a Request for Comment or RFC) • This was soon extended to HTTP/1.1, in RFC2068, released in January 1997 • An update to RFC2068 was produced in June 1999, as RFC2616 • Various other protocols, based on HTTP, have been produced from time-to-time • we will see a “cookie” protocol, based on HTTP, which was specified in February 1997, in RFC2109
How HTTP Works • HTTP sits on TCP, which, in turn, sits on IP • Usually, HTTP servers are configured to listen to TCP/IP Port 80 • although sometimes a different port is used, • particularly if two HTTP servers are running on one machine • You can see how HTTP works by pretending to be a browser yourself • Using telnet to connect to a server, you can issue a request and see the response
Example • If you were to point a browser at the URL http://student.cs.ucc.ie you would get a HTML home-page which provides links to various pages for students, etc. • The server on student.cs.ucc.ie uses the standard HTTP port, Port 80, so you can get the same page by • telnetting to Port 80 on student.cs.ucc.ie • and typing a GET request
Connecting to the HTTP server on student.cs.ucc.ie • On any machine, say interzone, specify the address and port in a telnet command: interzone.ucc.ie> telnet student.cs.ucc.ie 80 • You will get the following response: Trying 126.96.36.199... Connected to student.cs.ucc.ie. Escape character is '^]'. • The HTTP server is now listening
Requesting the home page • Issue the following HTTP/1.0 request, noting that you must type two carriage returns: GET / HTTP/1.0 [RETURN] [RETURN] • The response consists of • a status line, • a sequence of headers and • the requested home page • Then you are told that the telnet connection was closed by the server, as you will see on the next slide
The reply to your request: • The server’s response: HTTP/1.1 200 OK ... Content-Type: text/html <HTML> ... </HTML> • Then your local telnet program tells you that the connection was closed by the server: Connection closed by foreign host. interzone.ucc.ie>
Getting a different page: • Consider the page whose URL is http://student.cs.ucc.ie/cs1064/jabowen/ • Telnet to the server: interzone.ucc.ie> telnet student.cs.ucc.ie 80 • When the server is listening, ask for the page like this: GET /cs1064/jabowen/ HTTP/1.0 [RETURN] [RETURN]
What was going on above: • Once connected to a HTTP server, we can • send a HTTP request line, • optionally followed by request headers. • In the cases above, GET / HTTP/1.0 and GET /cs1064/jabowen/ HTTP/1.0 were request lines • Each request line was terminated by pressing [RETURN] • In each case, the second [RETURN] marked the end of an empty list of request headers
GET requests • In GET / HTTP/1.0 • the / is the resource the client wants to get • the HTTP/1.0 tells the server that the client is using the HTTP/1.0 protocol • In GET /cs1064/jabowen/ HTTP/1.0 • the /cs1064/jabowen/ is the resource the client wants to get • the HTTP/1.0 tells the server that the client is using the HTTP/1.0 protocol • In each case, the server responds by sending a status line, a number of response headers and the content of the requested resource.
Consider the response: HTTP/1.1 200 OK ... Content-Type: text/html <HTML> ... </HTML> • The first line, HTTP/1.1 200 OK ,is a status line • The next few lines, ending in the line Content-Type: text/html, are header lines • The lines bounded by <HTML> and </HTML> form the content of the requested resource.
HEAD requests • HEAD requests were new in HTTP/1.0 • A HEAD request is similar to a GET, the only difference being the use of the word HEAD instead of the word GET, for example: HEAD /cs1064/jabowen/ HTTP/1.0 [RETURN] [RETURN] • The server sends the same status line and the same response headers as if it had received a GET request, • but does not send the actual content of the resource mentioned in the request. • Thus, human clients can use HEAD requests to • access easily information about a resource on a server • without being overwhelmed by the mass of detail that would be received if the resource content were sent in the response
Example HEAD request • Suppose, for example, we wanted to see information about http://student.cs.ucc.ie/cs1064/jabowen/ such as its size, when it was last edited, etc. • We can send the request HEAD /cs1064/jabowen/ HTTP/1.0
Response to example HEAD request: HTTP/1.1 200 OK Date: Wed, 13 Dec 2000 12:21:35 GMT Server: Apache/1.3.14 (Unix) PHP/4.0.3pl1 Last-Modified: Thu, 07 Dec 2000 13:16:18 GMT ETag: "2160-29c6-3a2f8da2" Accept-Ranges: bytes Content-Length: 10694 Connection: close Content-Type: text/html
Analysis of response: • The first line in the response HTTP/1.1 200 OK is the status line in which • HTTP/1.1 indicates that the server can use HTTP/1.1 (although it can accept requests in earlier HTTP forms) • 200 is a code which indicates the status the request was given by the server • OK is an English language phrase giving the meaning of the status code • The other lines in the response give information either about the server or the resource:
Analysis (contd.) Date: Wed, 13 Dec 2000 12:21:35 GMT gives date/time of the response Server: Apache/1.3.14 (Unix) PHP/4.0.3pl1 gives details on server Last-Modified: Thu, 07 Dec 2000 13:16:18 GMT says when resource was last modified ETag: "2160-29c6-3a2f8da2" provides a supposedly-unique string to identify this entity Accept-Ranges: byte says that this server could serve up pieces of this resource, pieces specifiable to the nearest byte Content-Length: 10694 gives the size of the resource Connection: close says that the server does not regard this as a persistent connection Content-Type: text/html gives the type of data in the resource
Another example • Suppose, we wanted to learn about the resource with URL http://student.cs.ucc.ie/cs1064/jabowen/vh40.gif • We can send the request HEAD /cs1064/jabowen/vh.gif HTTP/1.0 • Response is: HTTP/1.1 200 OK Date: Wed, 13 Dec 2000 12:23:04 GMT Server: Apache/1.3.14 (Unix) PHP/4.0.3pl1 Last-Modified: Fri, 24 Nov 2000 11:46:00 GMT ETag: "3133-361-3a1e54f8" Accept-Ranges: bytes Content-Length: 865 Connection: close Content-Type: image/gif
HTTP/1.1 A (fairly) detailed description
We have just seen some example HTTP/1.0 interactions • The same kinds of concepts we saw in these interactions will arise as we examine HTTP/1.1 in more detail • The versions of HTTP have a great deal in common, so, in what follows, much of what is said will be true of all three versions • Therefore,, any mention of just “HTTP” will mean that the statement applies to HTTP/0.9, HTTP/1.0 and HTTP/1.1
Overall Operation of HTTP • The HTTP protocol is a request/response protocol. • request • An HTTP message sent by a client to a server • response • An HTTP message sent by a server to a client which has made a request. • client • A program that establishes connections for the purpose of sending requests. • server • A program that accepts connections in order to service requests by sending back responses. • As we shall see, a program may act as both a client and a server.
Message from a client: A client sends, over a connection, to a server • a request line in the form of • a request method, • a URI (Uniform Resource Identifier), and • a protocol version, • possibly followed by a message containing • request modifiers, • information about the client, • and (possibly) body content.
Response from a server: The server responds with • a status line, in the form of • the message's protocol version, • a success or error code and • an English phrase explaining the code • possibly followed by a message containing • server information, • information about the entity in the body content (if any) • and (possibly) body content.
HTTP Communication • Most communication • is started by a user agent and • consists of a request to be applied to a resource on some origin server. • user agent • A client (browser, spider, etc.) which initiates a request. • resource • A data object or service that can be identified by a URI. • origin server • The server on which a resource resides or is to be created.
Simple communication • Involves single connection between user agent (UA) and origin server (O) • This connection is denoted, in diagrams on this and future slides, by ------- ====request chain ==========> UA -----------------------------------O <=========response chain====
More complicated case • Intermediaries present in request/response chain. ====request chain =======================> UA ----------- A ----------- B ----------- C ----------- O <======================response chain==== • Above, 3 intermediaries (A, B, and C) lie between user agent and origin server. • Intermediaries act as both clients and servers • Request or response message that travels the whole chain passes through 4 separate connections: UA-A connection; A-B connection; B-C connection; C-O connection
Simple versus complicated • Distinction is important because some HTTP options may apply • only to the connection with the nearest neighbour, • only to the end-points of the chain, • or to all connections along the chain.
3 forms of intermediary • proxy, an agent which • receives a request for a resource whose URI is in its absolute form and, • if necessary, rewrites all or part of the message and forwards the reformatted request toward the server identified by the URI. • gateway, an agent which • acts as a translation interface to a server for another protocol, such as WAP, etc. • tunnel, an agent which • acts as a relay point between two connections without changing messages; • tunnels are used, for example, in security firewalls
Caching • User agents, proxies and gateways (but not tunnels) may use a local cache to handle requests, instead of forwarding them on to an origin server • A request/response chain is shortened if one of the parties along the chain has a cached response applicable to the request.
Example Network topology The example caching scenarios in the next few slides will use this network: UA3____________D | UA2_____ | | | | | UA1_____A______B________C_________O
Caching Example 1 ====request chain ====================> UA1 ----------- A ----------- B -------- C --------- O <==================response chain===== • In the example above: • the user has made a request for a resource on origin server O • neither UA1 nor any of the proxies A, B or C has an appropriate cached response • so the request has been forwarded all the way to O • Four connections are involved in servicing the request
Caching Example 2 request chain UA1…………….... A ……... B …….. C …… O response chain • In the example above: • the user has repeated the same request for a resource on O • UA1 has a cached response to the earlier request and gives this to the user without sending the request anywhere • No connection is involved in servicing the request
Caching Example 3 ===request chain => UA2 ----------------- UA1 …..……...... A …….. B …….. C ……... O <=response chain== • In the example above: • the user at UA2 has requested the same resource on origin server O that was earlier requested by the user at UA1 • UA2 has forwarded the request to proxy A • proxy A has an appropriate cached response, from when it serviced the earlier request from UA1 • Only one connection is involved in servicing the request
Caching Example 4 ===request chain ====> UA3 ---------- D -------- | UA1 …..…... A …….. B …….. C ……... O <===response chain=== • In the example above: • the user at UA3 has requested the same resource on origin server O that was earlier requested by the user at UA1 • UA3 has forwarded the request to proxy D, which has forwarded it to proxy B • proxy B has an appropriate cached response, from when it serviced the earlier request from UA1 • Two connections are involved in servicing the request
To cache or not? • Not all responses are usefully cacheable • As we will see later, some requests may contain modifiers which place special requirements on cache behavior. • The same is true of responses
Caching/Proxy architectures • A wide variety of cache and proxy architectures/configurations exist, including: • national hierarchies of proxy caches to save inter-national and/or inter-continental bandwidth, • systems that broadcast or multicast cache entries, • organizations that distribute subsets of cached data via CD-ROM, • and so on.
Temporary Connections • In most implementations of HTTP/1.0, a server closed a connection after it had serviced the request received on that connection: • We saw this earlier, when the server on student.cs.ucc.ie closed the telnet connection that we had established, after it had sent its response to the HTTP/1.0 GET request we had sent • The use of inline images, sound files, etc., in web pages often requires a client to make multiple requests of the same server when loading one document • Thus the temporary connections provided by HTTP/1.0 meant that loading even one web page required many separate TCP connections (one to to fetch each inline image, each sound file etc.) • This imposed a significant unnecessary load on HTTP servers and caused congestion on the Internet.
Advantages of Persistent Connections Persistent HTTP connections offer a number of advantages: • By opening and closing fewer TCP connections, CPU time is saved • HTTP requests and responses can be pipelined on a connection, allowing a client to make multiple requests without waiting for each response • Network congestion is reduced by reducing the number of packets caused by TCP opens, • Latency on subsequent requests is reduced since there is no time spent in TCP's connection-opening handshake.
Persistent Connections in HTTP/1.1 • Unlike HTTP/1.0 and earlier, persistent connections are the default behavior of any HTTP/1.1 connection. • This means that, in HTTP/1.1, when a connection has been opened to service a request, it is kept open for further possible requests from the same client • This is true even if the initial request triggered an error response from the server • But, when no further request has been received after some time-out period, the server may close the connection • However, a client can indicate, when making a request, that it wants the connection closed after the request is serviced
Connection Persistency Negotiation • HTTP/1.1 provides a mechanism by which a client and a server can signal the close of a TCP connection. • the Connection: header field. • If a HTTP/1.1 client wants a connection closed after it receives a response to its request, it should include, in the request, a Connection: header containing the token "close" . • Similarly, if a HTTP/1.1 server intends to close a connection closed after it sends a response to a request, it should include, in the response, a Connection: header containing the token "close" . • If either the client or the server sends the close token in a Connection: header, that request becomes the last one for the connection.
Example 1: Introduction • A human, using a telnet client, sends a HTTP/1.0 request to a HTTP/1.1 server • The server assumes that the client, because it is using HTTP/1.0, cannot handle persistent connections and, in its response, signals its intention to close the connection • After printing the response, the telnet client says that the connection was closed by the foreign host
Example 1 interzone.ucc.ie> telnet student.cs.ucc.ie 80 Trying 188.8.131.52... Connected to student.cs.ucc.ie. Escape character is '^]'. HEAD /cs1064/jabowen/ HTTP/1.0 HTTP/1.1 200 OK Date: Sat, 06 Jan 2001 17:56:44 GMT Server: Apache/1.3.14 (Unix) PHP/4.0.3pl1 Last-Modified: Wed, 20 Dec 2000 11:34:46 GMT ETag: "2160-2dee-3a409956" Accept-Ranges: bytes Content-Length: 11758 Connection: close Content-Type: text/html Connection closed by foreign host.