860 likes | 943 Vues
Java Technology and Applications. 240-527 CoE Masters Programme, PSU Semester 2, 2003-2004. Objectives to explain the Hypertext Transfer Protocol (HTTP). 7. HTTP. Overview. 1. How a Browser Works 2. HTTP Transactions 3. Client Request Methods 4. HTTP Protocol Versions
E N D
Java Technology and Applications 240-527 CoE Masters Programme, PSUSemester 2, 2003-2004 • Objectives • to explain the Hypertext Transfer Protocol (HTTP) 7. HTTP
Overview 1. How a Browser Works 2. HTTP Transactions 3. Client Request Methods 4. HTTP Protocol Versions 5. Server Response Codes 6. Some Advanced Features 7. More Information
1. How a Browser Works • Browsers use the HTTP protocol to communicate with Web servers • HTTP is a request/response protocol request network response Client browser Web server
1.1. Details of a Client Request • From a browser, I request: http://fivedots.coe.psu.ac.th/~ad/ • The browser connects to the site fivedots.coe.psu.ac.th at port 80, and sends the request: continued
HTTP method/ command URL HTTP version used by client GET /~ad/ HTTP/1.1Host: fivedots.coe.psu.ac.thUser-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01Accept: */*Accept-Language: enAccept-Encoding: gzip,deflate,compress,identityKeep-Alive: 300Connection: keep-alive various header information; one per line
HTTP version used by server Details of a Server Response status code and text HTTP/1.1 200 OKDate: Sun, 12 Oct 2003 04:20:51 GMTServer: Apache/1.3.9 (Unix) Debian/GNU PHP/4.0.3pl1X-Powered-By: PHP/4.0.3pl1Keep-Alive: timeout=15, max=100Connection: Keep-AliveTransfer-Encoding: chunkedContent-Type: text/html; charset=iso-8859-1<html><head><title>Andrew Davison's Home Page at PSU</title></head><body bgcolor=#ffffff test=#000000> : // rest of HTML text for page HTML for Page
1.2. Web Page Images • My home page contains several images. • The browser sees them in the text of the Web page: • e.g. <img src="me.jpg" align="right" alt="[PIC of Andrew]"> • The browser automatically requests each one.
An Image Request the page where the link to the image is located GET /~ad/me.jpg HTTP/1.1Referer: http://fivedots.coe.psu.ac.th/~ad/Host: fivedots.coe.psu.ac.thUser-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01Accept: */*Accept-Language: enAccept-Encoding: gzip,deflate,compress,identityKeep-Alive: 300Connection: keep-alive
The Image Response HTTP/1.1 200 OKDate: Sun, 12 Oct 2003 04:20:55 GMTServer: Apache/1.3.9 (Unix) Debian/GNU PHP/4.0.3pl1Last-Modified: Tue, 17 Oct 2000 09:40:05 GMTETag: "1bf29-1194-39ec1e75"Accept-Ranges: bytesContent-Length: 4500Keep-Alive: timeout=15, max=99Connection: Keep-AliveContent-Type: image/jpeg; charset=iso-8859-1// ... data of the JPEG file
1.3. Clicking on a Link • In the browser, if I click on the link labelled 'AIT', then the browser examines the associated HTML: • <a href="http://www.cs.ait.ac.th/">AIT</a> • The browser then connects to www.cs.ait.ac.th at port 80, and requests the top page: continued
sent to www.cs.ait.ac.th GET / HTTP/1.1Referer: http://fivedots.coe.psu.ac.th/~ad/Host: www.cs.ait.ac.thUser-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01Accept: */*Accept-Language: enAccept-Encoding: gzip,deflate,compress,identityKeep-Alive: 300Connection: keep-alive
Server Response This server uses HTTP 1.0 HTTP/1.0 200 OKDate: Sun, 12 Oct 2003 06:08:24 GMTServer: Apache/1.3.12 Ben-SSL/1.41 PHP/4.0.1pl2Last-Modified: Fri, 11 Apr 2003 02:48:54 GMTETag: "214d69-543b-3ad3c616"Accept-Ranges: bytesContent-Length: 21563Content-Type: text/htmlAge: 120X-Cache: MISS from cache3.psu.ac.thConnection: keep-alive<HTML><HEAD> // ... rest of Web page text
1.4. Getting a Page with Telnet In CoE/PSU, the request needs to be 'local'. ad@calvin$ telnet fivedots.coe.psu.ac.th 80Trying 172.30.0.5...Connected to fivedots.coe.psu.ac.th.Escape character is '^]'.GET ~ad/index.html HTTP/1.0HTTP/1.0 200 OKDate: Wed, 22 Oct 2003 05:07:26 GMTServer: Apache/1.3.12 Ben-SSL/1.41 PHP/4.0.1pl2Last-Modified: Wed, 11 Jun 2003 02:48:54 GMTETag: "214d69-543b-3ad3c616"Accept-Ranges: bytes // ... rest of headers and HTML text of page two newlines required response
The Form HTML Code • <form method="post" action= "http://fivedots.coe.psu.ac.th/cgi-bin/ad/echoer"> <input TYPE="text" NAME="pat1" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat2" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat3" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat4" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat5" SIZE="15" MAXLENGTH="15" VALUE=""></p> <br> <p><input TYPE="submit" VALUE="Submit"> <input TYPE="reset" VALUE="Clear"> </form>
Form Input Request The HTTP Post method POST /cgi-bin/ad/echoer HTTP/1.1Referer: http://fivedots.coe.psu.ac.th/~ad/echoer/ eform.htmlHost: fivedots.coe.psu.ac.thUser-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01Accept: */*Accept-Language: enAccept-Encoding: gzip,deflate,compress,identityKeep-Alive: 300Connection: keep-aliveContent-type: application/x-www-form-urlencodedContent-Length: 39pat1=hello&pat2=&pat3=world&pat4=&pat5=
Server Response HTTP/1.1 200 OKDate: Sun, 12 Oct 2003 08:30:07 GMTServer: Apache/1.3.9 Debian/GNU PHP/4.0.3pl1Keep-Alive: timeout=15, max=100Connection: Keep-AliveTransfer-Encoding: chunkedContent-Type: text/html; charset=iso-8859-1<html><head><title>Query Result</title></head><body background="http://fivedots.coe.psu.ac.th/~ad/chalk.jpg"><H1 align=center>Query Result</H1> // ... rest of page
1.6 Proxies • Most clients and servers do not communicate directly • the client must send its request via a proxy • the proxy acts as a firewall and/or cache • At PSU, most Web requests must go through the cache.psu.ac.th proxy • this is set up in the browser's preferences continued
In other applications, it may be necessary to explicitly communicate with the proxy • this is done by connecting to the proxy, and sending it the full URL of the page required
Students should be able to do this. Using a Proxy with Telnet ad@fivedots$ telnet cache.psu.ac.th 8080Trying 192.168.98.6... Connected to proxy6.psu.ac.th.Escape character is '^]'.GET http://www.student.math.uwaterloo.ca/~cs488/ HTTP/1.0HTTP/1.0 200 OKDate: Thu, 21 Nov 2002 06:01:31 GMTServer: Apache/1.3.27 (Unix) mod_perl/1.21Last-Modified: Wed, 20 Nov 2002 12:00:21 GMTETag: "1b66a-2234-3ddb7955" : response
:Accept-Ranges: bytesContent-Length: 8756Content-Type: text/htmlAge: 3263X-Cache: HIT from cache.psu.ac.thProxy-Connection: close<html> // ... rest of Web page text</html>Connection closed by foreign host.ad@fivedots$
2. HTTP Transactions Method URL VersionGeneral headerRequest headerEntity headerEntity body request network response Client browser Web server Version Status ReasonGeneral headerResponse headerEntity headerEntity body
Client Request Example Method URL Version POST /cgi-bin/ad/echoer HTTP/1.1Referer: http://fivedots...User-Agent: Mozilla/5.0 ...Accept: */*Accept-Language: enAccept-Encoding: gzip,...Keep-Alive: 300Connection: keep-aliveContent-type: application/x-www-form-urlencodedContent-Length: 39pat1=hello&pat2=&pat3=world&pat4=&pat5= Request headers General headers Entity headers Entity body
Request Components • HTTP methods: • GET, POST, HEAD, PUT, DELETE • OPTIONS and TRACE (HTTP 1.1.) • other non-standardized methods • General headers • optional general information such as the current date/time, or network characteristics continued
Request headers • information about the client, used by the server • e.g. browser info., document formats that the client can understand • Entity headers • used when an entity (a Web document) is about to be sent • e.g. encoding scheme, length, type, origin continued
Headers may be sent in any order. • Header names are case-insensitive • e.g. Content-Type == Content-type
Server Response Example Version Status Reason HTTP/1.1 200 OKDate: Tue,...Keep-Alive: timeout=15, max=100Connection: Keep-AliveTransfer-Encoding: chunkedServer: Apache...Content-Type: text/html;...<html> // ... rest of page General headers Response headers Entity headers Entity body
Server Components • The general and entity headers are the same as those used in a client request. • Response header • gives the client information about the server configuration • e.g. what HTTP methods are supported, request authorization details, or server time-out report
Some Other headers • General Headers • Cache-Control caching behaviour • Connection should connection close after this transaction • MIME-Version message encoding • Pragma directives for proxies • Via info about processing by gateways and proxies between the client and server continued
Request Headers • Authorization to request restricted docs. • Cookie send name=value info • Host required address & port info • If-Modified-Since get doc. if newer • If-Match get doc. if matches etags • If-Range get part of a doc. if changed • Max-Forwards limits no. of proxies/gateways • Proxy-Authorization for proxy • Range only get part of a doc continued
Response Headers • Accept-Ranges will accept range requests • Age age of doc in seconds • Proxy-Authenticate gives auth. scheme • Public supported methods • Retry-After try again after given time • Set-Cookie sends a name=value pair • Warning info used for caching • WWW-Authentication gives auth scheme for access to Web pages continued
Entity Headers • Allow methods allowed on URL • Content-Location useful if a doc is storedin several locations • Content-Range range of partial doc sent • ETag entity tag for the doc • Expires when content may change • Last-Modified when doc last changed
3. Client Request Methods • GET • retrieve the specified document • POST • for sending (form) information • HEAD • get information about the document, but not the actual document • PUT • store the specified document on the server continued
DELETE • delete the specified document on the server • TRACE • asks that proxies/gateways add information to the headers of the request, which is sent back in the response • OPTIONS • ask the server to send info about the HTTP methods it supports
3.1. The GET Method • The main purpose of GET is to request a document from a server • see earlier examples in section 1 • But the response can be generated in various ways: • a file on the Web server • the output of a CGI script • the script may examine server-side hardware, files, or do some special calculations
CGI Diagram the Web/Internet request becomesinput request response CGI script Client browser Web server output becomes response
A CGI Request • Data for a CGI script is passed as extra name=value arguments added to the URL: GET /cgi-bin/create.pl?user=util-tester& pass=1234 HTTP/1.0Referer: ...User-Agent: ... : • The arguments are URL-encoded. two arguments
URL Encoding • name=value pairs are combined into a single string separated by &'s. • This is added to the end of the URL after a ? • Certain special characters are converted to hexadecimal preceded by a %. • e.g. '#' becomes %23, '/' becomes %2F
3.2. The POST Method • The main purpose of the POST method is to send form information to a server • see the example in section 1.5 • Most servers use CGI programs to process form requests. • The text in the form name=value data is URL encoded.
Forms can use GET • The <form> tag in HTML can also be used to send data in the GET format: <form method="get" action="http://fivedots.coe.psu.ac.th/ cgi-bin/create.pl"> <input name="user"> <input name="pass" type="password"> <input type="submit" value="Submit"></form>
Which Method to Use? • The GET method adds form input to the end of the URL, and there is often a maximum length limit • e.g. the URL string must be 255 chars or less • For large input, the POST method is better since there is no limit on the size of the entity body in the request.
3.3. The HEAD Method • The HEAD method returns information about a document: • this includes its modification time, its size, its type, and details about its server • this information is useful in guiding/speeding up search engines and browsers
HEAD using Telnet ad@calvin$ telnet fivedots.coe.psu.ac.th 80Connected to fivedots.coe.psu.ac.th.HEAD /~ad/index.html HTTP/1.0HTTP/1.0 200 OKDate: Sun, 12 Oct 2003 06:42:48 GMTServer: Apache/1.3.12 Ben-SSL/1.41 PHP/4.0.1pl2Last-Modified: Tue, 29 Jul 2003 11:11:51 GMTETag: "1f1f6e-522-3982bbf7"Accept-Ranges: bytesContent-Length: 1314Content-Type: text/htmlAge: 157Connection: closeConnection closed by foreign host.ad@calvin$ response
3.4. The PUT Method • The PUT method is used for uploading files to a server • PUT URL HTTP-version • used in HTML editors such as FrontPage • Usually involves an authorization phase when the server asks for a user name and password before accepting the PUT • this is processed by FrontPage using details entered by the user
3.5. The DELETE Method • The DELETE method deletes the specified file: • DELETE URL HTTP-version • The server will usually ask for authorization information before carrying out the request.
3.6. The TRACE Method • The TRACE method allows a programmer to see how the client's request is passed through proxies/gateways to the server • TRACE URL HTTP-version • The server echoes the request back together with a Via header (and other optional headers).
TRACE using Telnet • ad@calvin$ telnet cache.psu.ac.th 8080Trying 192.16898.6...Connected to proxy6.psu.ac.th.Escape character is '^]'.TRACE http://www.cs.ait.ac.th HTTP/1.0HTTP/1.0 200 OKDate: Wec, 22 Oct 2003 07:11:20 GMTServer: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix)Content-Type: message/httpAge: 118X-Cache: MISS from cache.psu.ac.thProxy-Connection: closeTRACE / HTTP/1.0 : response