Unlocking Content Negotiation in HTTP: Language, Charset, and Encoding Management
This guide explores the intricacies of HTTP content negotiation related to languages, charsets, and encoding. It explains how to identify the right filesystem directory from the URI and extract the "filestem" for negotiation. You will learn how to leverage the Accept headers to prioritize candidates for language and charset, ultimately enabling efficient and accurate content delivery. We also discuss response strategies (200, 300, 404) in negotiation scenarios and highlight important headers like Content-Language, Content-Encoding, and Vary.
Unlocking Content Negotiation in HTTP: Language, Charset, and Encoding Management
E N D
Presentation Transcript
The Elbert HTTP Server Adding languages, transfer encoding, charsets; in other words: Content-Negotiation By: Shawn M. Jones
Dissecting Filenames <filename>.extension.language.charset.encoding Extension – e.g. txt, pdf, png Language – e.g. en, de, ja, ko Charset – e.g. jis, koi8-r, euc-kr Encoding – e.g. Z, gz
Making the choice for negotiation • Get the right real filesystem directory from the URI and document home • Get the “filestem” from the end of the URI (e.g. index.html.en has a filestem of “index”) • Search that directory for a list of candidates for negotiation; if none found, return a 404 • Incorporate all of the q values from the Accept* headers into a dictionary of values for charset, language, etc. • Loop through the list of candidates • If the candidate matches a value from an Accept* header, add that q value to its score • If the q value is 0 the candidate is removed from the list • Determine the highest score for all candidates • Eliminate candidates who have a score less than the highest • If there is 1 candidate left, send a 200 with appropriate CN headers • If there is >1 candidate left, send a 300 • If there are no candidates left, send a 416
Completed Work • 206 Partial Content • Testing 304 (was done already) • New 301 redirections, removed old 302 redirections • Request Headers: • Range • User-Agent • Referer • Negotiate: 1.0 • Accept (needs a few more tests) • Vary (use only if content negotiation performed) • Combined Log Format
Completed Work • Response Headers: • Content-Language • en, es, de, ja, ko, ru • Content-Encoding • Compress, Gzip • Transfer-Encoding: chunked (every 2 lines for dynamically generated pages) • Accept-Ranges • Content-Range • Content-Type: add non-ASCII charsets • Alternatives (for 300) • 300 in response to Negotiate: 1.0 and Accept request header • No default q values
Work Remaining • 300 Multiple Choices • Need to tie in additional Accept headers • 406 Not Acceptable • If no representations are suitable • entity with list of closest options • Request Headers: • Accept-Charset • Accept-Encoding • Accept-Language • Response Headers: • Content-Location • Alternatives (for 406) • TCN • Structured Etags
Questions for Dr. Nelson • Status 416 is not in the assignment, should we implement it for bad ranges? Right now I give a 500, which doesn’t seem right. • Because there are no “default q values”, the following shouldn’t happen, right? It should return a 300?
Accept dhcp65-74-196-93:~/Desktop/cs595-s06 mln$ telnet www.cs.odu.edu 80 Trying 128.82.4.2... Connected to xenon.cs.odu.edu. Escape character is '^]'. HEAD /~mln/teaching/cs595-s06/a3-test/fairlane HTTP/1.1 Host: www.cs.odu.edu Connection: close HTTP/1.1 200 OK Date: Mon, 13 Mar 2006 04:04:22 GMT Server: Apache/1.3.26 (Unix) ApacheJServ/1.1.2 PHP/4.3.4 Content-Location: fairlane.txt Vary: negotiate,accept TCN: choice Last-Modified: Mon, 13 Mar 2006 04:00:53 GMT ETag: "2288-c1-4414ee75;4414ee7a" Accept-Ranges: bytes Content-Length: 193 Connection: close Content-Type: text/plain Connection closed by foreign host.