1 / 31

Hypertext Transfer Protocol

Hypertext Transfer Protocol. IS 373—Web Standards Todd Will. Topics. Intro to HTTP Following links What actually happens during a request Content Tips and Tricks For Next Week. Intro. HTTP is the Hypertext Transfer Protocol

orrick
Télécharger la présentation

Hypertext Transfer Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypertext Transfer Protocol IS 373—Web Standards Todd Will

  2. Topics • Intro to HTTP • Following links • What actually happens during a request • Content • Tips and Tricks • For Next Week CIS 373---Web Standards-HTTP

  3. Intro • HTTP is the Hypertext Transfer Protocol • When you browse the web, you transfer data between the server and your client machine using http • Major steps performed • You start up your browser that can understand and display html text • You either click on a link or type a link into the address space • You make a request of a web server (it listens to and responds to requests for data from the client) • This request can be any digital resource • The web server executes the request and delivers the returned document to the user • The web server identifies the type of document to the browser • The browser displays the document • Images, JavaScript, style sheets are downloaded if referenced • Each additional item that is retrieved generates an additional request to the server • HTTP only defines how the browser and the web server communicate with each other • Actual data moved using the TCP/IP protocol • Simplified version of how HTTP works CIS 373---Web Standards-HTTP

  4. HTTP Versions • HTTP/0.9 • Very primitive standard • Earliest version • HTTP/1.0 • In common usage today • HTTP/0.9 very rarely used anymore • HTTP/1.1 • Extends and improves HTTP/1.0 • Supported by few browsers • Client can keep request open after downloading the file so that a new request does not have to be generated • Decreases server load • Reduces bandwidth CIS 373---Web Standards-HTTP

  5. What happens in HTTP? • Parse the URL • The browser must identify the url of the request • Most url’s have the form: • protocol://server/request-URI • Protocol tells the server the document you want and how to retrieve it • Server part tells the web server which server to query to find the document • Request-uri tells the specific document to retrieve • Sending the Request • Most usually, the protocol will be http • Sometimes it can be https to request the data over a secure connection • Assume you wanted the document http://web.njit.edu/~txw5999/index.html • GET /~txw5999/index.html HTTP/1.0 • Note – the request is all the server sees, independent of where the request originated, whether it be by a robot, link validator, or browser CIS 373---Web Standards-HTTP

  6. Server Response • Step 3: The server response • Upon receiving the request, the web server must identify the document and return it to the user • Sample header content returned to the browser • HTTP/1.0 200 OK Server: Netscape-Communications/1.1 Date: Tuesday, 25-Nov-97 01:22:04 GMT Last-modified: Thursday, 20-Nov-97 10:44:53 GMT Content-length: 6372 Content-type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> ... Followed by the html page • HTTP/1.0 tells the browser the version of http used • 200 OK is the most common response, this is the code returned by the server to say all is well (more on this later) • Server: Netscape-Communications/1.1 is the web server that returns the document • Date: Tuesday, 25-Nov-97 01:22:04 GMT is the date and time of the request • Last-modified: Thursday, 20-Nov-97 10:44:53 GMT tells the last time the document was modified (useful in caching) • Content-length: 6372 is how many bytes the document is • Content-type: text/html tells the browser the returned document type, could be image/gif or something else • <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> is the version of html to be used • The browser does not care how the page was produced, could be by scripts or straight html CIS 373---Web Standards-HTTP

  7. The Client Request • All requests follow the same basic pattern • [METH] [REQUEST-URI] HTTP/[VER] [fieldname1]: [field-value1] [fieldname2]: [field-value2] [request body, if any] • The METH (for request method) • The request body uri is the url to be retrieved • Ver is the http version used • Fieldname and values are on the next slide • Getting a document • Get request means to send me a document • Assume you wanted the document http://web.njit.edu/~txw5999/index.html • GET /~txw5999/index.html HTTP/1.0 • Longer version request • GET / HTTP/1.0 User-Agent: Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4) Accept: */* Host: web.njit.edu:81 • Head works just like GET except just the header will be retrieved CIS 373---Web Standards-HTTP

  8. Get Header Fields • Some of the header fields that can be used with GET are: • User-Agent • Identifies the user-agent • Examples: "Mozilla/4.03 [en] (WinNT; I ;Nav)” • Referer • The referer field (yes the standard spells it this way) • Logs where the page request came from • Useful to find out where your audience is located • If-Modified-Since • If the browser has the document in its cache, this field can be set to the last time this version was received • If the document is out of date, then it can be reloaded from the web server • Checks to make sure that the cache is current • From • The from field contains the email address of the person who is using the agent • SPAMMER’s DREAM • Web robots use it sometimes so that webmasters can contact the sender of the robot • Authorization • Holds the username and password of the user if authorization is required to access the page CIS 373---Web Standards-HTTP

  9. HTTP Status Codes • No need to memorize, just know they exist • Codes are the same, but text can be different • 1xx Informational • Request received, continuing process. • 100: Continue • 101: Switching Protocols • 2xx Success • The action was successfully received, understood, and accepted. • 200: OK • 201: Created • 202: Accepted • 203: Non-Authoritative Information • 204: No Content • 205: Reset Content • 206: Partial Content • 207: Multi-Status CIS 373---Web Standards-HTTP

  10. 3xx Status Codes • 3xx Redirection • The client must take additional action to complete the request. • 300: Multiple Choices • 301: Moved Permanently • 302: Found • 303: See Other (since HTTP/1.1) • 304: Not Modified • 305: Use Proxy (since HTTP/1.1) • 306 is no longer used, but reserved. Was used for 'Switch Proxy'. • 307: Temporary Redirect (since HTTP/1.1) CIS 373---Web Standards-HTTP

  11. 4xx Status Codes • The request contains bad syntax or cannot be fulfilled. • 400: Bad Request • 401: Unauthorized • 402: Payment Required • 403: Forbidden • 404: Not Found • 405: Method Not Allowed • 406: Not Acceptable • 407: Proxy Authentication Required • 408: Request Timeout • 409: Conflict • 410: Gone • 411: Length Required • 412: Precondition Failed • 413: Request Entity Too Large • 414: Request-URI Too Long • 415: Unsupported Media Type • 416: Requested Range Not Satisfiable • 417: Expectation Failed • 449: Retry With CIS 373---Web Standards-HTTP

  12. 5xx Status Codes • Server Error • The server failed to fulfill an apparently valid request. • 500: Internal Server Error • 501: Not Implemented • 502: Bad Gateway • 503: Service Unavailable • 504: Gateway Timeout • 505: HTTP Version Not Supported • 509: Bandwidth Limit Exceeded CIS 373---Web Standards-HTTP

  13. Browser Cache • If a page has already been retrieved by your browser, it is usually stored in your cache • If you return to that page, your browser will first check to see if the data on that page has already been downloaded and on your local drive • If it finds the page or images, the browser will load those images from your cache and only make the request to the web server for the changed information • Usually set a max size or a time limit to keep stored pages in your cache • Most browsers have a refresh button that can be selected to force a reload of the page • Reduce the number of requests and the server load as well as reducing bandwidth costs substantially CIS 373---Web Standards-HTTP

  14. Proxy Cache • Browser cache’s are stored on the local machine whereas a proxy cache is stored on a proxy server • The proxy is essentially a cache for many different users • The user’s browser now checks the proxy to see if a page is already loaded into its cache • If the page is found, the page is loaded into the user’s browser cache • If the page is not found, the request is made of the web server • After getting the new page from the server, it is loaded into the proxy cache for anyone else that may request that page • The proxy then returns the cached page or item to the user’s local cache • Proxy cache reduces network traffic dramatically and substantially reduces the load on the web server • Skews log statistics dramatically as the requests if they can be filled by the proxy cache are not seen by the web server CIS 373---Web Standards-HTTP

  15. Proxy Cache Hierarchy • You can also have a hierarchy of proxy caches, as in each department in a company could have its own smaller proxy cache. The page would be loaded from the local proxy cache if it can be found, and if not, then make the request of the company cache. If the company cache cannot fulfill the request, then the request is sent to the web server to be filled. The returned page or item will then be sent to the local proxy cache and then to the local cache. • This method has an even larger reduction in network traffic • However, the pages may not be the most current version of the page that the web server would return • The cache should be cleaned out as pages go more out of date CIS 373---Web Standards-HTTP

  16. Caching Diagram CIS 373---Web Standards-HTTP

  17. Cache Replacement Algorithms • LRU: the algorithm replaces the least recently used document • FBR (Frequency Based): the algorithm takes into account both the recency and frequency of access to a page • LRU/2: the algorithm replaces the page whose penultimate (second-to-last) access is least recent among all penultimate accesses • SLRU: the algorithm combines both the recency and frequency of access when making a replacement decision CIS 373---Web Standards-HTTP

  18. Replacement Algorithms • (hit rate) of all algorithms increases with cache size • For caches larger than 1 Gbyte all algorithms perform very close to the best . For very small caches (<100 Mbytes) FBR and SLRU have the best performance followed by LRU/2 and LRU. For mid-size caches, LRU/2, SLRU and FBR have similar performance followed by LRU. For large caches (> 1 Gbyte) all algorithms have similar performance. CIS 373---Web Standards-HTTP

  19. Server Side Programming • Server side scripts run on the web server to respond to requests from the client • There is no way for the client to know whether the page has been generated from a script or was a straight html file • Used to dynamically change the output of a page based on some type of input • Can accept input from cookies to identify the user for example and check for authorization to download a file • Can also accept parameters as passed in the address bar CIS 373---Web Standards-HTTP

  20. Server Side Programming • When to use server side instead of client side • Client side will be much faster to run since it does not need to generate a new request to the web server every time something changes • User server side when data that needs to be accessed is on the web server and not on the client machine • Use server side to interact with a database on the web server • Best used when infrequent interactions are required with the server • Need to use server side when gathering information over time and the data is stored on the web server • Take for example Google • It would not be good to download Google’s entire catalog of pages to the client • Better to send the search query to the web server at Google and only return those documents that match the user query • Checking to ensure that the user has entered a search query before sending the request to the web server would be best served by using a client side script CIS 373---Web Standards-HTTP

  21. CGI • Stands for Common Gateway Interface • A method that allows for web servers and client side pages to interact with each other • Used in the same way by almost all web servers in existence • Web server needs to differentiate between scripts and ordinary html files • CGI scripts are placed in different cgi directories on the server • The web server is configured to identify all files in a particular folder as cgi scripts • Default directory is cgi-bin CIS 373---Web Standards-HTTP

  22. More About CGI • CGI programs are ordinary executable programs written in some language and compiled • The CGI script contains a number of environment variables • Think of the ?variable=value seen on web pages • Example – the developer could require that the ip address be a variable to ensure that a hit counter only counts unique visitors • <img src="http://stats.vendor.com/cgi-bin/counter.pl?ip address”> • The CGI script returns a text string that can be used to identify the image to be displayed as above • New image source would be: • <img src=“1000hits.jpg”> CIS 373---Web Standards-HTTP

  23. Server Side Programming • CGI is one way to develop sever side scripts • Slow and inefficient to use • Better way is to use a server Application Programming Interface (API) • The program essentially is a part of the server process • The programming language is server dependent • Much faster since the program is already in memory and the data that is required can be easily inputted and results obtained • Examples include ASP, Java Server Pages, Python CIS 373---Web Standards-HTTP

  24. Server Logs • Most servers keep a log of all requests and responses generated by the server • A sample log is as follows: • rip.axis.se - - [04/Jan/1998:21:24:46 +0100] "HEAD /ftp/pub/software/ HTTP/1.0" 200 6312 - "Mozilla/4.04 [en] (WinNT; I)" tide14.microsoft.com - - [04/Jan/1998:21:30:32 +0100] "GET /robots.txt HTTP/1.0" 304 158 - "Mozilla/4.0 (compatible; MSIE 4.0; MSIECrawler; Windows 95)" microsnot.HIP.Berkeley.EDU - - [04/Jan/1998:22:28:21 +0100] "GET /cgi-bin/wwwbrowser.pl HTTP/1.0" 200 1445 "http://www.ifi.uio.no/~larsga/download/stats/" "Mozilla/4.03 [en] (Win95; U)" isdn69.ppp.uib.no - - [05/Jan/1998:00:13:53 +0100] "GET /download/RFCsearch.html HTTP/1.0" 200 2399 "http://www.kvarteret.uib.no/~pas/" "Mozilla/4.04 [en] (Win95; I)" isdn69.ppp.uib.no - - [05/Jan/1998:00:13:53 +0100] "GET /standard.css HTTP/1.0" 200 1064 - "Mozilla/4.04 [en] (Win95; I)" CIS 373---Web Standards-HTTP

  25. Server Logs (cont) • This log can be useful in troubleshooting or finding dead links • You can also track page views to determine the popularity of a page • Good practice to review these to see if your web server is having any problems • Caching of web pages can cause problems as they will be viewed but not counted in the server log CIS 373---Web Standards-HTTP

  26. Cookies • In HTTP, each request is counted as an individual request • Only way to transfer data between each request is by passing parameters or by using cookies • Say you want to allow a user to log in to your website and maintain that login across several different pages • In straight http, you cannot pass the user information between different pages • You would need to generate a cookie to store the userid of the current logged in user • Cookies store data • Cookies have an expiry date (can be days, weeks, or session) • Need a script to generate and read data from cookies (html cannot do this) CIS 373---Web Standards-HTTP

  27. Cookies (cont) • Useful to keep track of user data, but privacy issues are involved that need to be resolved • Keep in mind that the user can turn off cookies in his or her browser, so you may want to design so that the system will not fail if a cookie is failed to be read • Can check to see if the browser supports cookies by trying to write a cookie and then read it back CIS 373---Web Standards-HTTP

  28. Tips and Hints about HTTP • Hiding the source • No but you can try to hide it by putting blank lines at the top • Can make the source look messy so the user has a hard time finding a particular part • Web crawlers can save the html page without even loading it in browser • Downloading images • You cannot stop the user from downloading images from your site • Can watermark images to show where they cam from • HTML request does not care what kind of document it is and will return it anyway • Passing parameters between web pages • Can’t do this if the type is plain html • Need to design a dynamic web page using asp or java in order to use the parameters • HTML pages will just drop the parameters and do nothing with them CIS 373---Web Standards-HTTP

  29. Tips (cont) • Preventing browsers from caching pages • Set the expiration date of the content to a past date • Advantage of caching is that the browser can fetch the page from the cache without generating a new request to the web server • Using slash at the end of a url • If the url points to a directory then yes • If it is not included, then the web server must first check for the file and then if the file does not exist then try to find the directory • Some web servers can automatically direct you to an index or default file, but only if you include the slash • Good practice to do so CIS 373---Web Standards-HTTP

  30. Remember • HTTP Verbs spell CRUD • Create  PUT • Read  GET • Update  POST • Delete  DELETE • Of these, GET and POST are the most important CIS 373---Web Standards-HTTP

  31. For Next Week • Read Zeldman Chapter 14 • HTTP web log reading • Next week – Web Accessibility (making the web accessible to everyone, including those with disabilities) CIS 373---Web Standards-HTTP

More Related