1 / 65

The HTTP Protocol Web Server

What Does the WWW Server Do?. Enables browser requests ProvidesSupport for retrieving hypertext documentsManages access to the Web siteProvides several mechanisms for executing server-side scripts Common Gateway Interface (CGI) Application Programmers Interface (API)Produces log files and u

ban
Télécharger la présentation

The HTTP Protocol Web Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. The HTTP Protocol && Web Server

    2. What Does the WWW Server Do? Enables browser requests Provides Support for retrieving hypertext documents Manages access to the Web site Provides several mechanisms for executing server-side scripts Common Gateway Interface (CGI) Application Programmers Interface (API) Produces log files and usage statistics

    3. How Does a Web Server Communicate? Web browsers and Web servers communicate using a protocol known as Hypertext Transfer Protocol (HTTP) Current HTTP protocol is version 1.1 HTTP/1.0, as defined by RFC 1945, allows messages to be in the format of MIME-like messages, containing meta-information about the data transferred and modifiers on the request/response semantics. HTTP/1.1 extends the protocol to handle the effects of hierarchical proxies caching the need for persistent connections virtual hosts

    4. A Graphical View

    5. An HTTP Scenario Communication takes place over a TCP/IP connection, generally on port 80

    6. Persistent Connections In the original HTTP protocol each request was made over a new connection so an HTML page with n distinct graphic elements produced n+1 requests TCP uses a three-way handshake when establishing a connection, so there is significant latency in establishing a connection client sends SYN, server replies ACK/SYN, client responds with ACK HTTP1.0 introduced a keep-alive feature the connection between client and server is maintained for a period of time allowing for multiple requests and responses

    7. HTTP/1.0 Keep Alive Connections

    8. MIME MEDIA TYPES HTTP sends the media type of the file using the Content-Type: header Some important media types are text/plain, text/html image/gif, image/jpeg audio/basic, audio/wav model/vrml video/mpeg, video/quicktime application/*, application-specific data that does not fall under any other MIME category, e.g. application/octet-stream

    10. Client HTTP Requests General form of a request has four fields: HTTP_method, identifier, HTTP_version, Body HTTP_Method says what is to be done to the object specified in the URL; some possibilities include GET, HEAD, and POST identifier is the URL of the resource or the body HTTP_version is the current HTTP version, e.g. HTTP/1.1 Body is optional text

    11. HTTP Request Methods GET, retrieve whatever information is identified by the request URL HEAD,identical to GET, except the server does not return the body in the response POST, instructs the server that the request includes a block of data, which is typically used as input to a server-side application PUT,used to modify existing resources or create new ones DELETE, used to remove existing resources TRACE, traces the requests in a chain of web proxy servers

    12. Example of a GET Request A GET request asks the server to retrieve a document <A HREF="http://tlaloc.sfsu.edu/file.html">click here </A> The request from the client may contain the following lines GET /html/file.html HTTP/1.1 Accept: text/plain Accept: application/pdf Accept: application/html Accept: text/x-html Accept: audio/* ... Accept: */* User-Agent: NCSA Mosaic for the X Window System/2.4 Referer: http://tlaloc.sfsu.edu/html/prevfile.html If-Modified-Since: Wed, 11 Feb 1998 13:14:15 GMT {there is a blank line here which terminates the input}

    13. Response of the Server to GET In response to the previous client request, the server responds with the following HTTP/1.1 200 OK Date: Monday, 29-May-98 12:02:12 GMT Server: Apache/1.4 MIME-version: 1.0 Content-Type: text/html Last-modified: Sun, 28-May-98 15:36:13 GMT Content-Length: 145 {a blank line goes here } {the contents of file.html goes here }

    14. HTTP Headers HTTP/1.1 defines 46 headers divided into four categories: general, present in requests or responses request, present only in requests response, present only in responses entity, describe the content of a body

    15. Examples of HTTP Headers - General Connection: close used to prevent persistent connections Date: Wed, 30 Dec 1998 23:56:32 date and time message was generated Transfer-Encoding: chunked indicates if a transformation has been made on the message Upgrade: HTTP/2.0 causes protocol to switch dynamically Via: 1.1 myproxy, 1.1 second-proxy indicates a proxy chain that the request was passed through

    16. Examples of HTTP Headers - Request Accept: text/html, image/* indicates what media types are acceptable Accept-Charset: iso-8859-5 indicates acceptable character sets. By default all are acceptable Accept-Encoding: compress, gzip indicates acceptable encodings Accept-Language: en, fr=0.5 indicates language preferences, English preferred, but French also accepted Authorization: used to pass users credentials to the server

    17. Examples of HTTP Headers - Request From: name@site.com requesting users email address, rarely present Host: www.sfsu.edu:8080 hostname and port of the requesting URL Referer: http://www.sfsu.edu/index.html the URL of the doucment that contains the reference to the requested URL User-agent: Mozilla/4.5 (X11;Solaris 5.1) reports the client software name and version and possibly platform

    18. Examples of HTTP Headers - Response The response-header fields allow the server to pass additional information about the response which cannot be placed in the Status-Line These header fields give information about the server about further access to the resource identified by the Request-URI. response-headers include: Accept-Ranges | Age | ETag | Location | Proxy-Authenticate | Retry-After | Server | Vary | Warning | WWW-Authenticate

    19. Examples of HTTP Headers - Response Age: 1246 Location: http://www.myco.com/page.html Public: GET, HEAD, POST, OPTIONS, PUT Server: Apache/1.3.1 WWW-AUTHENTICATE: sent with 401 Unauthorized status code, it includes authorization parameters Retry-after: 240 used with Service Unavailable status, indicates requested data will be available in 4 minutes

    20. Examples of HTTP Headers - Response A URL may point to a document with multiple representations: languages, formats (html, pdf), or html features based upon user-agent e.g. the request is GET http://www.myco.com/ HTTP/1.1 User-agent: Mozilla/4.5 Accept-language: en the response is HTTP/1.1 200 OK Vary: Accept-language Content-type: text/html Content-language: en

    21. Examples of HTTP Headers - Entity Content-Base: http://www.sfsu.edu/somedir Content-Encoding: gzip Content-Language: en Content-Length: 7890 Content-type:text/html Expires: Wed, 30 Dec 1998 03:43:21 GMT specifies the expiration date/time of the object; a cached copy should not be used beyond; Expires 0/now is immediate Last-Modified: Wed, 30 Dec 1998 01:20:34 GMT specifies the creation or last modification time of the object on the web server

    22. HTTP Status Codes - Informational After receiving and interpreting a request message, a server responds with an HTTP response message. Syntax of response is Status-Line *( general-header | response-header | entity-header ) CRLF [ message-body ] where the Status line is composed of Status-Line = HTTP-Version Status-Code Reason-Phrase CRLF

    23. HTTP Status Codes 101 Switching Protocols, switching either the version or the actual protocol (Informative) 200 OK, request succeeded (Successful) 201 Created, result is newly created (Successful) 301 Moved permanently, new location is specified in Location: header (Redirection) 304 Not Modified: the client or proxy copy is still up-to-date (Redirection) 400 Bad request, server could not understand (Client error) 401 unauthorized, authorization challenge (Client error) 403 forbidden, server refuses to fulfill request; e.g. check protections (Client error) 404 Not found, document does not exist (Client error) 500 Internal server error, generic error code for the server 501 Not implemented, request could not be serviced because server does not support it (Server error)

    24. Simple Proxy Server Show sample software and do several examples On the class web page there is a simple proxy server built by Dan Boneh of Stanford See the Special Tools category start the proxy server on a specific port, e.g. 5678 Point your browser to the proxy server telnet to the proxy server issue commands to your browser and watch as the proxy server prints out the request/response messages

    25. What to Look for in a Web Server Main features platform they run on; complete support for HTTP 1.1 Multithreaded, load balancing Server side includes image mapping, analysis tools Security IP address restriction Domain name restriction Secure transactions: SSL or S_HTTP Proxy server

    26. How Servers Handle Multiple Requests For each request, a complete copy of the server is made and executed Initially a parent process waits for requests; for each new request a child process is spawned Or, a single server program handles many requests simultaneously (multithreaded) the server must keep track of all requests and switch between them as needed writing multithreaded programs is easier in a language that supports multiple threads, e.g. Java

    27. Some (free or commercially) Available Servers Some freeware servers include Apache for UNIX, NT, http://www.apache.org webstar for Mac http://www.biap.com Publisher Platform URL Netra (Sun MicroSystems) Sparcstations/Solaris www.sun.com Webforce (Silicon Graphics) Indigo/MIPS/IRIX www.sgi.com Internet connection (IBM) Intel/OS2 www.ibm.com IIS (Microsoft) Intel, Others/W95//NT www.microsoft.com Enterprise (Netscape) Intel/WinNT www.netscape.com Website (OReilly) Intel/Win95/WinNT www.ora.com Webserver (SPRY) Intel/Win95/WinNT www.spry.com

    28. Server Usage http://www.netcraft.co.uk has studied more than 3,000,000 sites UNIX currently dominates Internet servers Windows NT has a small, but growing share Statistics on intranets are difficult to determine

    29. Server Features Document Root Server Side Includes Authentication Proxy Servers Caching CGI Scripting Application Programmers Interface

    30. Document Tree

    31. Document Types File extensions indicate the type of document, .html, .htm html document .txt unformatted ASCII .ps postscript .gif GIF image .jpeg JPEG image .wrl, .vrml VRML scene Some documents are doubly encoded, e.g. a compressed postscript file. Since there is no file suffix to describe this, the web server uses MIME content headers, e.g. Content-type: application/postscript Content-encoding: gzip

    32. Server-Side Includes A server-side include is an action taken by the server before it delivers the document to the client Unfortunately, this feature requires the server to examine and parse the document, looking for the special commands Most servers supporting this feature come with it disabled as the default Typically, parsable HTML files are denoted by a different suffix; e.g., shtml General form: <!--#command tag1="value1" tag2="value2" -->

    33. Some Server-Side Include Directives

    34. Example of Server-Side Includes <html> <head> <Title>example of server includes</Title> </head> <body> <h1>Examples of Server Side Includes</h1> <!-- a comment: example of server side includes --> <!--#config errmsg="unable to parse include file" --><P> Local Date: <!--#echo var="DATE_LOCAL"--><P> Date (GMT): <!--#echo var="DATE_GMT"--><P> Last Modified On: <!--#echo var="LAST_MODIFIED"--><P> File Size: <!-- #fsize file="includes.html" --> <!--#exec cmd="/cgi-bin/myscript.cgi" --> </body> </html>

    35. Browser Output of Server-Side Includes

    36. Basic User Authentication Basic authentication is supported by all HTTP servers The server administrator creates secure directories accessible via password files maintained by the server Client requests a resource from a secure directory; e.g., GET/secure/test.html HTTP/1.0 Server responds with authentication request to the client; e.g., HTTP/1.0 401 Unauthorized Browser requests user name and password, scrambles them, and retries the request GET http://domain/secure/test.html HTTP/1.0 Authorization: Basic 0<V%L:EB.G-E8W))$J Server decodes name and password and checks it against its password file

    37. Web Servers as Proxy Servers A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying servers protocol Any party to the communication between browser and final server may employ an internal cache for handling requests. The effect is to shorten the request/response chain

    38. Caching The goal of caching is to eliminate the need to send requests (reducing the number of network round trips) or eliminate the need to send full responses (reducing network bandwidth) The Cache-Control header field contains directives to the caching mechanism, e.g.some request directives are: no-cache do NOT cache max-age client will accept a response whose age is no greater than specified some response directives are: public cachable by any cache private do NOT cache

    39. Caching

    40. Creating Server-Side Applications Servers offer two mechanisms Application Programming Interface (API) Common Gateway Interface (CGI) For Windows/NT IIS API applications are compiled as Dynamic Link Libraries (DLLs) and loaded by the IIS at startup API applications are efficient as no new process has to be started

    41. CGI Scripting A CGI application is a separate executable process invoked by the web server to handle a specific client request Immediately before invoking the CGI program (called a script), the web server creates a set of environment variables which can be used by the invoking program Since the CGI scripts runs as a separate process, it cannot crash the web server. If the program fails, the web server eventually times out

    42. Configuring a Server No matter which operating system or server, you will need to define the location of the server (server root) location of documents to deliver (document root) location of CGI scripts to execute You may also wish to define Access restrictions Fancy indexing server-side includes Other features

    44. Apache Home Page

    46. Apache Settings - httpd.conf Directives are keywords followed by their value,e.g. Port 9637 ServerAdmin yoon@cs.sfsu.edu ServerRoot /home/csc667/WebServer/apache_1.3.4 ErrorLog logs/error_log the above are ones that must be set

    47. Apache Settings - httpd.conf # KeepAlive: Whether or not to allow persistent connections (more than one request per connection). Set to "Off" to deactivate. KeepAlive On # MaxKeepAliveRequests: The maximum number of requests to allow during a persistent connection. Set to 0 to allow an unlimited amount. They recomend you leave this number high, for maximum performance. MaxKeepAliveRequests 100 # KeepAliveTimeout: Number of seconds to wait for the next request KeepAliveTimeout 15

    48. Apache Settings - httpd.conf # Server-pool size regulation. Apache dynamically adapts to the load it sees --- it tries to maintain enough server processes to handle the current load, plus a few spare servers to handle transient load spikes. It does this by periodically checking how many servers are waiting for a request. If there are fewer than MinSpareServers, it creates a new spare. If there are more than MaxSpareServers, some of the spares die off. MinSpareServers 2 -ISD limit MaxSpareServers 5 -ISD limit # StartServers is number of servers to start --- StartServers 2 -ISD limit

    50. Apache Settings - Scripting # ScriptAlias: This controls which directories contain server scripts. Format: ScriptAlias fakename realname ScriptAlias /cgi-bin/ /home/cscixxx/WebServer/apache/cgi-bin/ # If you want to use server side includes, or CGI outside ScriptAliased directories, uncomment the following lines. # AddType allows you to tweak mime.types without actually editing it, or to make certain files to be certain types. Format: AddType type/subtype ext1 # To use CGI scripts: #AddHandler cgi-script .cgi # To use server-parsed HTML files #AddType text/html .shtml #AddHandler server-parsed .shtml

    51. Authentication in Apache there are two methods for controlling access to directories the file access.conf in the conf/ directory can be used per-directory access rules can be set by a file placed in a specific directory the name of the file is set by the directive AccessFileName .htaccess is the default name When the server attempts to retrieve a document, it looks for an access control file in the directory or the parent directory. The file it looks for is set by the directive AccessFileName

    52. Limit Sectioning Directive <Limit> controls which clients can access a directory; directives within LIMIT include: order in which deny and allow are evaluated deny from host1, host2, ... allow from host1, host2, ... require named-users or group-users or AuthUserFile referer allows access only from this directory satisfy all or any

    55. Using LIMIT Example of a directive used to limit access to only SFSU and ISI domains <limit GET> order deny, allow deny from all allow from 128.125 allow from 128.9 </Limit>

    56. Example Suppose you want to restrict files in a directory called secure/ to the user named student1 and password XXXYYY Step 1. create a file called .htaccess in directory secure/ that contains these 7 lines: AuthUserFile /otherdir/.htpasswd AuthGroupFile /dev/null AuthName ByPassword AuthType Basic <Limit GET> require user student1 </Limit>

    57. Example Step 2. Next create the password file /otherdir/.htpasswd using the htpasswd program distributed with apache. Type htpasswd -c /otherdir/.htpasswd student1 and enter the password twice. You are done. Step 3. To add other names and passwords run htpasswd without the -c option.

    58. Example To generalize this example to include several students, Step 4. create a group file called /otherdir/.htgroup which looks like: my-users: student1 student2 student3 student4 Alter the .htaccess file to look like AuthUserFile /otherdir/.htpasswd AuthGroupFile /otherdir/.htgroup Now it points to the group file AuthName ByPassword AuthType Basic <Limit GET> require group my-users </Limit>

    59. Contents of .htaccess The .htaccess file contains pointers to the users file, the group file, and defines the name and type of authorization nunki.usc.edu(10): more .htaccess AuthUserFile /home-scf-03/cscixxx/pwd/.htpasswd AuthGroupFile /home-scf-03/cscixxx/pwd/.htgroup AuthName OnlineCourse AuthType Basic

    61. Using htpasswd This program is included in the Apache distribution as htpasswd.c try htpasswd to get instructions htpasswd [-c] passwordfile username creates a new file called passwordfile and prompts for a password for username

    64. Common Log Format Host client hostname or IP address ident remote identity, often a dash usr authenticated user name time date/time request is received req HTTP request s1 servers HTTP response status code c1 Content-length marmot.usc.edu - - [17/Feb/1999:23:21:09 -0800] "GET / HTTP/1.0" 200 1316 128.125.229.195 - - [23/Feb/1999:10:32:09 -0800] "GET / HTTP/1.1" 200 1316

    65. Sample error_log Contents Here are two lines from the error_log file [Mon Jan 29 11:47:36 1999] httpd: access to /home/ scf-13/csci665/www/pleader/horohome.html failed for internet-gw.watson.ibm.com, reason: No file matching URL: /pleader/horohome.html from - [Tue Feb 6 13:45:02 1999] httpd: access to /home/ scf-13/csci665/www/cocomo/member.html failed for sunset.usc.edu, reason: No file matching URL: /cocomo/member.html from http://nunki.usc.edu:8080/ cocomo/

More Related