310 likes | 321 Vues
Learn about HTTP properties, commands, server-client interaction, caching, and improvements in HTTP/1.1 for efficient web communication.
E N D
HTTP CS587x Lecture Department of Computer Science Iowa State University
What to Cover • WWW • HTTP/1.0 • Protocol highlights • Problems • HTTP/1.1 • Highlights of improvement
World Wide Web (WWW) • Core Components • Servers • Store files and execute remote commands • Browsers (i.e., clients) • Retrieve and display “pages” of content linked by hypertext • Networks • Send information back and forth upon request • Problems • How to identify an object • How to retrieve an object • How to interpret an object
Semantic Parts of WWW • URI (Uniform Resource Identifier) • protocol://hostname:port/directory/object • http://www.cs.iastate.edu/index.html • ftp://popeye.cs.iastate.edu/welcome.txt • https://finance.yahoo.com/q/cq?s=ibm&d=v1 • Implementation: extend hierarchical namespace to include • anything in a file system • server side processing • HTTP (Hyper Text Transfer Protocol) • An application protocol for information sending/receiving • HTML (Hypertext Markup Language) • An language specification used to interpret the information received from server
HTTP Properties • Request-response exchange • Server runs over TCP, Port 80 • Client sends HTTP requests and gets responses from server • Synchronous request/reply protocol • Stateless • No state is maintained by clients or servers across requests and responses • Each pair of request and response is treated as an independent message exchange • Resource metadata • Information about resources are often included in web transfers and can be used in several ways
HTTP Commands • GET • Transfer resource from given URL • HEAD • Get resource metadata (headers) only • PUT • Store/modify resource under a given URL • DELETE • Remove resource • POST • Provide input for a process identified by the given URL (usually used to post CGI parameters)
Response Codes of HTTP 1.0 • 2xx success • 3xx redirection • 4xx client error in request • 5xx server error; can’t satisfy the request
Steps of Processing an HTTP Requesthttp://www.cs.iastate.edu/index.html • The client • Contact its local DNS to find out the IP address of www.cs.iastate.edu • Initiate a TCP connection on port 80 • Send the get request via the established socket GET /index.html HTTP/1.0 • The server • Send its response containing the required file • Tell TCP to terminate connection • The browser • Parse the file and display it accordingly • Repeat the same steps in the presence of any embedded objects
Server Response HTTP/1.0 200 OK Content-Type: text/html Content-Length: 1234 Last-Modified: Mon, 19 Nov 2001 15:31:20 GMT <HTML> <HEAD> <TITLE>CS Home Page</TITLE> </HEAD> … </BODY> </HTML>
HTTP/1.0 Example Server Client Request file 1 Transfer file 1 Request file 2 Transfer file 2 Request file n Transfer file n Finish display page
HTTP Server Implementation public WebServerDemo(String[] args) { public static void main(String[] args) { ServerSocket ss = new ServerSocket(80); for (;;) { // accept connection Socket accept = ss.accept(); // Start a thread to process the request new Handler(accept).start(); } }
HTTP Server Implementation class Handler extends Thread { // Handler for a HTTP request Socket socket; BufferedReader br; PrintWriter pw; public Handler(Socket _socket) { socket=_socket; } public void run() { br = new BufferedReader(new InputStreamReader(socket.getInputStream())); pw = new PrintWriter(new OutputStreamWriter(bos)); String line = br.readLine(); // Read HTTP request from user if(line.toUpperCase().startsWith("GET")) { // parse the string to find the file name // locate the file and send it back ::::: } //other commands: post, delete, put, etc. } }
HTTP/1.0 Caching • CLIENT • GET request: • If-modified-since – return a “not modified” response if resource was not modified since specified time • Request header • No-cache – ignore all caches and get resource directly from server • SERVER • Response header: • Expires – specify to the client for how long it is safe to cache the resource
Issues with HTTP/1.0 • Each resource requires a new connection • Large number of embedded objects in a web page • Many short lived connections • Serial vs. parallel connections • Serial connection downloads one object at a time (e.g., MOSAIC) causing long latency to display a whole page • Parallel connection (e.g., NETSCAPE) opens several connections (typically 4) contributing to network congestion • HTTP uses TCP as the transport protocol • TCP is not optimized for the typical short-lived connections • Most Internet traffic fit in 10 packets (overhead: 7 out of 17) • Too slow for small object • May never exit slow-start phase
Highlights of HTTP/1.1 • Persistent connections • Pipelined requests/responses • Support for virtual hosting • More explicit support on caching • Internet Caching Protocol (ICP) • Content negotiation/adaptation • Range Request
Persistent Connections • The basic idea was • reducing the number of TCP connections opened and closed • reducing TCP connection costs • reducing latency by avoiding multiple TCP slow-starts • avoid bandwidth wastage and reducing overall congestion • A longer TCP connection knows better about networking condition (Why?) • New GET methods • GETALL • GETLIST
Pipelined Requests/Responses • Buffer requests and responses to reduce the number of packets • Multiple requests can be contained in one TCP segment • Note: order of responses has to be maintained Server Client Request 1 Request 2 Request 3 Transfer 1 Transfer 2 Transfer 3
Support for Virtual Hosting • Problem – outsourcing web content to some company • http://www.hostmany.com/Ahttp://www.A.com • http://www.hostmany.com/B http://www.B.com • In HTTP/1.0, a request forhttp://www.A.com/index.htmlhas in its header only: • GET /index.html HTTP/1.0 • It is not possible to run two web servers at the same IP address, because GET is ambiguous • HTTP/1.1 addresses this by adding “Host” header GET /index.html HTTP/1.1 Host: www.A.com
Content Negotiation/Adaptation • A resource may have more than one representation • Different languages • Different size of images, etc. Example GET /index.html HTTP/1.1 Host: www.getbelix.com Accept-Language: en-us, fr-BE • Two approaches • Agent-driven: the client receives a set of alternative representation of the response, chooses the best representation and indicates in the second request • Server-driven: the server chooses the representation based on what is available at the server, the headers in the request messages, or information about the client, such as its IP
Range Request • A user may want to load only some portion of content • E.g., retrieve only the newly appended portion • E.g., load some pages of a PDF file GET bigfile.html HTTP/1.1 Host: www.justwhatiwant.com Range: 2000-3999 Range: -1000 Range: 2000-
Cache-Control Request Directives • no-cache: forcible revalidation with origin server • only-if-cached: obtain resource only from cache • no-store: don’t allow caches to store request/response • max-age: response’s should be no greater than this value • max-stale: expired response OK but not older than staled value • min-fresh: response should remain fresh for at least stated value • no-transform: proxy should not change media type
Cache-Control Response Directives • public: OK to cache response anywhere • private: response for specific user only • no-cache: do not serve from cache without prior revalidation • Must revalidate regardless of when the response becomes stale • no-store: caches are not permitted to store response, request • no-transform: proxy should not change media type • must-revalidate: can be cached but revalidate if stale • A file may be associated with an age (expiration) • proxy-revalidate: force shared user agent caches to revalidate cached response • max-age: response’s age should be no greater than this value • s-maxage: shared caches use value as response’s maximum age (overide max-age)
Factors to Consider for Cache Replacement • Cost of storing the resource (size) • Cost of fetching the resource (size+distance) • The time since the last modification of the resource • The number of accesses to the resource in the past • The probability of the resource being accessed in the near future • May be a known priori or based on the past access pattern • The heuristic expiration time • If there is no server-specified expiration time, the cache decides on a heuristic expiration time. • If no expired resource are available as candidates, then resource that are close to their expiration time are prioritized as candidates for replacement
Summary • HTTP 1.0 • HTTP 1.1
What covered so far DNS HTTP TCP UDP IP Ethernet FDDI Token Etc.
HTTP Server (1) import java.io.*; import java.net.*; import java.util.*; public class WebServerDemo { protected String docroot; // Directory of HTML pages and other files protected int port; // Port number of web server protected ServerSocket ss; // Socket for the web server class Handler extends Thread { // Handler for a HTTP request protected Socket socket; protected PrintWriter pw; protected BufferedOutputStream bos; protected BufferedReader br; protected File docroot; public Handler(Socket _socket, String _docroot) throws Exception { socket=_socket; docroot=new File(_docroot).getCanonicalFile(); // Absolute dir of the filepath }
HTTP Server (2) public void run() { try { // Prepare our readers and writers br = new BufferedReader(new InputStreamReader(socket.getInputStream())); bos = new BufferedOutputStream(socket.getOutputStream()); pw = new PrintWriter(new OutputStreamWriter(bos)); String line = br.readLine(); // Read HTTP request from user socket.shutdownInput(); // Shutdown any further input if(line == null) { socket.close(); return; } if(line.toUpperCase().startsWith("GET")) { // Eliminate any trailing ? data, such as for a CGI GET request StringTokenizer tokens = new StringTokenizer(line," ?"); tokens.nextToken(); String req = tokens.nextToken(); String name; // ... form a full filename if(req.startsWith("/") || req.startsWith("\\")) name = this.docroot+req; else name = this.docroot+File.separator+req; File file = new File(name).getCanonicalFile(); // Get absolute file path // Check to see if request doesn't start with our document root .... if(!file.getAbsolutePath().startsWith(this.docroot.getAbsolutePath())) { pw.println("HTTP/1.0 403 Forbidden"); pw.println(); }
HTTP Server (3) // run() continued else if(!file.canRead()) { // No access pw.println("HTTP/1.0 403 Forbidden"); pw.println(); } else if(file.isDirectory()) { // Directory, not file sendDir(bos,pw,file,req); } else { sendFile(bos, pw, file.getAbsolutePath()); } } else { // Unsupported command pw.println("HTTP/1.0 501 Not Implemented"); pw.println(); } pw.flush(); bos.flush(); } catch(Exception e) { e.printStackTrace(); } try { socket.close(); } catch(Exception e) { e.printStackTrace(); } } // run() protected void sendFile(BufferedOutputStream bos, PrintWriter pw, String filename) throws Exception { try { BufferedInputStream bis = new BufferedInputStream(new FileInputStream(filename)); byte[] data = new byte[10*1024]; int read = bis.read(data); pw.println("HTTP/1.0 200 Okay"); pw.println(); pw.flush(); bos.flush(); while(read != -1) { bos.write(data,0,read); read = bis.read(data); } bos.flush(); } catch(Exception e) { pw.flush(); bos.flush(); } }
HTTP Server (4) protected void sendDir(BufferedOutputStream bos, PrintWriter pw, File dir, String req) throws Exception { try { pw.println("HTTP/1.0 200 Okay"); pw.println(); pw.flush(); pw.print("<html><head><title>Directory of " + req + "</title></head><body><h1>Directory of “ + req + "</h1><table border=\"0\">"); File[] contents=dir.listFiles(); for(int i=0;i<contents.length;i++) { pw.print("<tr><td><a href=\"" + req + contents[i].getName()); if(contents[i].isDirectory()) pw.print("/"); pw.print("\">"); if(contents[i].isDirectory()) pw.print("Dir -> "); pw.println(contents[i].getName() + "</a></td></tr>"); } pw.println("</table></body></html>"); pw.flush(); } catch(Exception e) { pw.flush(); bos.flush(); } } } protected void parseParams(String[] args) throws Exception { switch(args.length) { // Check that a filepath has been specified and a port number case 1: case 0: System.err.println ("Syntax: <jvm> "+this.getClass().getName()+" docroot port"); System.exit(0); default: this.docroot = args[0]; this.port = Integer.parseInt(args[1]); break; } }
HTTP Server (5) public WebServerDemo(String[] args) throws Exception { System.out.println ("Checking for paramters"); parseParams(args); // Check for command line parameters System.out.print ("Starting web server...... "); this.ss = new ServerSocket(this.port); // Create a new server socket System.out.println ("OK"); for (;;) { // Forever Socket accept = ss.accept(); // Accept connection via server socket // Start a new handler instance to process the request new Handler(accept, docroot).start(); } } // Start an instance of the web server public static void main(String[] args) throws Exception { WebServerDemo webServerDemo = new WebServerDemo(args); } }