Electronic Mail (SMTP, POP, IMAP, MIME)

Electronic Mail (SMTP, POP, IMAP, MIME) We will work through the handout from Tanenbaum’s book “Computer Networking.” Internet E-mail standards were published in two parts in 1982: RFC 822: STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES by David H. Crocker RFC 821: SIMPLE MAIL TRANSFER PROTOCOL by Jonathan B. Postel (Updated as RFC 2822 and 2821 (April, 2001).) Overview: The message will be constructed under RFC 822, then passed to SMTP (RFC 821) for transmission.

7.4.3 Message Formats RFC 822 messages consist of lines of ASCII text, ending with <CR> <LF> maximum 1000 characters Messages are divided into three sections: ■ header fields ■ a blank line (a line with nothing except <CR><LF> ) ■ optionally, the message body.

Headers ■ contain readable text (ASCII – no control characters) ■ are divided into lines ■ each line of form <keyword> : <value> Keywords To and From are required, others optional.

Some other RFC 822 header fields not involved in transport:

RFC 822 states that the message can consist only of ASCII text and SMTP (RFC 821) expects this. ASCII is a 7-bit code, which is transmitted right-adjusted in an 8-bit byte, leaving binary 0 in the high-order position.

MIME – Multipurpose Internet Mail Extensions (RFC 1521, 1993) In the body of the message we would like to be able to include items such as: ■ messages in languages with accents ■ Messages in non-Latin alphabets (Arabic, Russian, Hebrew) ■ Messages in languages without alphabets (Chinese and Japanese) ■ Messages not containing any kind of text (image, audio and video) Such material may contain arbitrary sequences of binary digits. No reason that high-order bit of byte is always zero. To send non-ASCII information (arbitrary binary string) we must “disguise” it as ASCII

Questions: ■ how does the sender disguise the binary string as ASCII? ■ when recipient receives the “ASCII” how does she retrieve the binary string? ■ when recipient retrieves the binary string, how does she know what it is?

Questions: ■ how do we disguise the binary string as ASCII?

U A B 010101 01 V In this example, disguise is not necessary, since ‘UAB’ is already ASCII text!

Second Question: ■ when recipient receives the “ASCII” how does she retrieve the binary string? Receiver sees the Content-Transfer-Encoding header, then knows how to reverse the encoding to retrieve the original binary string.

Third question: ■ when recipient retrieves the binary string, how does she know what it is?

RFC 822 Headers Required blank line Section boundary Body

Overview: This message has been constructed under RFC 822, and will be passed to SMTP (RFC 821) for transmission. 7.4.4 Message Transfer This is RFC 821, ”Simple Mail Transfer Protocol.” SMTP is a simple ASCII protocol, running on top of TCP. First, the client establishes a TCP connection to port 25 of the server (this would have involved a preliminary access to the DNS system to discover a type MX resource record for the destination domain). We will illustrate the client/server exchange by considering transmission of the message in figure 7-46.

TCP connection from client abc.com to port 25 on Mail Exchanger for xyz.com already established. RFC821 (SMTP) Dialog RFC 822 message End marker added by SMTP client

What if the 822 message itself has a period alone in the first position? Will SMTP server see this and terminate the message prematurely? The e-mail message as seen on user screen: Subject: Test II From: Anthony Barnard <barnard@earthlink.net> Date: Fri, 20 Jul 2007 11:59:23 -0500 To: "Anthony (work) Barnard" barnard@cis.uab.edu The following two lines have a period in the first position: . . The following two lines have periods in the first two positions: .. .. end test

Wireshark trace of sending message: Frame 22 (588 bytes on wire, 588 bytes captured) Internet Protocol, Src: 192.168.2.99, Dst: 207.69.189.206 Transmission Control Protocol, Src Port: 3693 (3693), Dst Port: smtp (25), Simple Mail Transfer Protocol Message: Message-ID: <46A0E9EB.8030105@earthlink.net>\r\n Message: Date: Fri, 20 Jul 2007 11:59:23 -0500\r\n Message: From: Anthony Barnard <barnard@earthlink.net>\r\n Message: User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)\r\n Message: MIME-Version: 1.0\r\n Message: To: "Anthony (work) Barnard" <barnard@cis.uab.edu>\r\n Message: Subject: Test II\r\n Message: Content-Type: text/plain; charset=ISO-8859-1; format=flowed\r\n Message: Content-Transfer-Encoding: 7bit\r\n Message: \r\n [the blank line] Message: The following two lines have a period in the first position:\r\n Message: ..\r\n Message: ..\r\n Message: The following two lines have periods in the first two positions:\r\n Message: ...\r\n Message: ...\r\n Message: end test\r\n Message: .\r\n [the end-of-message marker appended by SMTP client] [Extra period “stuffed” in by SMTP client] [Extra period “stuffed” in by SMTP client]

Introduction to the World Wide Web Since we are coming off a study of E-mail, it may be helpful to note the influence that it had on the WWW protocols. Both separate the specification of the message from its transmission. ►RFC822/MIME govern format of E-mail messages HTML governs format of WWW pages ►RFC821/SMTP and RFC1939/POP3 govern transmission of E-mail messages HTTP governs transmission of WWW pages However, the correspondence is only loose: HTML look very different from RFC/822/MIME, whereas HTTP draws from both RFC 822/MIME and RFC821/SMTP Like SMTP and POP3, HTTP is an “ASCII protocol” that can be easily read and understood by humans.

An HTML document! We will revisit this!

Chapter 27 – World Wide Web Skim sections 27.1 – 27.5

27.6 Hypertext Transfer Protocol (HTTP) ► Application Level ► Request/Response ► Stateless ► Bi-directional Transfer ► Capability Negotiation ► Support for Caching ► Support for Intermediaries (proxies)

27.7 HTTP GET Request Using Comer’s example http://www.cs.purdue.edu/people/comer/ once TCP connection to HTTP server www.cs.purdue.edu has been made, browser sends command GET /people/comer/ HTTP/1.1 Required request header (see later) Host: www.cs.purdue.edu 27.8 Error Messages Not much to say!

27.9 Persistent Connections HTTP/1.0 followed the FTP paradigm, using one TCP connection per data transfer – create data connection, transfer one file, close data connection. Default in HTTP/1.1 is persistent connection ► Advantage: reduced overhead pipelining ► Disadvantage: need to identify beginning and end of each item can’t reserve a bit pattern as “sentinel” have to use content-length response header

27.10 Data Length and Program Output May not be convenient or even possible for server to know the length of an item before sending. In this case we cannot use persistent connection. HTTP server reverts to closing connection after a sending a single file (as in HTTP/1.0) Server tells client about this by sending connection: closeheader (HTTP headers in next section).

27.11 Length Encoding and Headers After the first line of a request or response: “..HTTP borrows the basic format from e-mail, using the 822 format and MIME extensions. Like a standard 822 message, each HTTP transmission contains a header, a blank line, and the item being sent. Furthermore each line in the header contains a keyword, a colon, and information.” Some headers: Figure 27.1

Wireshark example (request): Key in http://www.cis.uab.edu/barnard/old_home.html Hypertext Transfer Protocol GET /barnard/old_home.html HTTP/1.1\r\n Host: www.cis.uab.edu\r\n User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.3) Gecko/20040922\r\n Accept:text/xml,application/xml,application/xhtml+xml, text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\n Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\n Keep-Alive: 300\r\n Connection: keep-alive\r\n \r\n Request line Required request header Required blank line Message body empty

Wireshark example (reply to request in previous slide): Hypertext Transfer Protocol HTTP/1.1 200 OK\r\n Date: Fri, 08 Oct 2004 17:30:54 GMT\r\n Server: Apache/1.3.29 (Unix) PHP/4.3.5RC3\r\n Last-Modified: Mon, 08 Mar 2004 23:52:12 GMT\r\n ETag: "770077-ee9-404d072c"\r\n Accept-Ranges: bytes\r\n Content-Length: 3817\r\n Keep-Alive: timeout=15, max=100\r\n Connection: Keep-Alive\r\n Content-Type: text/html\r\n \r\n Line-based text data: text/html: Status line Response code 200 Required blank line Message body: 3817 bytes of data /barnard/old_home.html

Omit rest of Chapter 27 – World Wide Web Comer’s presentation is inadequate for our purpose, so we will again use parts of Tanenbaum’s presentation (handout). Preview: To make the WWW usable for E-commerce, 4 key developments were needed: 1. Cookies 2. Forms 3. Three-tier system 4. Security

7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 WWW server is stateless Like a packet filter, it does not remember anything. But for applications like E-commerce we need state! In 1994 Netscape invented a “fix” to HTTP – “cookies”

Statelessness and Cookies – continued Along with a WWW page, the server sends a cookie to the client. On later accesses to the server, the client returns the cookie. This identifies the client and provides continuity from visit to visit. Cookie is a small file that the client stores on its hard disk (terminology is that server sets the cookie).

Statelessness and Cookies– continued Cookies have been set by: ►Tom’s Casino to Identify this client. ►Joe’s Store to record that shopping cart currently has the items in it. ►A WWW portal to record the client’s news interests. ►Sneaky.com to track the user’s WWW browsing. We’ll take a closer look at cookies later.

7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 ******** content ********* 7.3.2 Static Web Documents 629 HTML — The Hypertext Markup Language 629 Forms 634

7.3.2 Static Web Documents WWW pages are written in Hypertext Markup Language (HTML) Formatting commands are called tags e.g. <h2> this is a second-level headline </h2> states that the text between the tags should be displayed at level-2 size. I will assume that you are familiar with basic HTML

Forms HTML 1.0 was basically one-way; HTML 2.0 introduced forms, which can be completed by the client and returned to the server. This was a key step in making E-commerce possible. (Latest is HTML 5.0)

Forms - continued Upper part of Figure 7-29(a) In these examples the inputtag has no type parameter – default is “text” – user keys in information In first example: System will assign the keyed-in string to the variable “customer”

Forms - continued Anthony Barnard 3037 Westmoreland Drive Mountain Brook AL USA 123456789 07/20 * Figure 29(b)

Forms - continued input tag has parameter type with value radio – like car radio buttons Select exactly one of the alternatives IF VISA clicked value visacard will be assigned to variable cc Figure 7-29(a)

Input type checkbox – optional – can check or ignore Input type submit – click when ready to upload data to WWW server

When Submit order button is clicked the system first assembles the input information into a string. customer=Anthony+Barnard&address=3037+Westmoreland+Drive&city=Mountain+Brook&state=AL&country=USA&cardno=123456789&expires=7/20&cc=visacard&product=expensive&express=on

Every form needs at least one submit button! The ACTION and method parameters specify what should happen next after the submit order button is clicked.

What happens when the submitorderbutton is clicked? • Make TCP connection to widget.com, port 80 2. Use HTTP to POST the string to script widgetorder in directory cgi-bin

7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 ******** content ********* 7.3.2 Static Web Documents 629 HTML — The Hypertext Markup Language 629 Forms 634 7.3.3 Dynamic Web Documents 643 Server-Side Dynamic Page Generation 643 656

7.3.3 Dynamic Web Documents Not all WWW pages can be prepared in advance. Server-side Dynamic Web Page Generation Example of the need for a server to build a page dynamically: You have several items in your shopping cart and have clicked on the PROCEED TO CHECKOUT button. The server needs to build a page showing your purchases, for your confirmation.

7.3.3 Dynamic Web Documents – continued Recall ACTION parameter in figure 7-29(a) : “3-tier system” Common Gateway Interface (CGI) Standard interface allows WWW servers to talk to back-end servers. Scripts are usually stored in directory cgi-bin

7.3 THE WORLD WIDE WEB 7.3.1 Architectural Overview Statelessness and Cookies 625 ******** content ********* 7.3.2 Static Web Documents 629 HTML — The Hypertext Markup Language 629 Forms 634 7.3.3 Dynamic Web Documents 643 Server-Side Dynamic Page Generation 643 ******** transmission across internet ******* 7.3.4 — The HyperText Transfer Protocol 651 Connections 652 Methods 652 Message Headers 654 Example HTTP Usage 656

7.3.4 HTTP – The Hypertext Transfer Protocol Each interaction consists of one ASCII request, followed by one RFC 822 MIME-like response.

7.3.4 HTTP – The Hypertext Transfer Protocol Connections(recall from Comer section 27.9) “In HTTP 1.0 after the connection was established, a single request was sent over and a single response was sent back. Then the TCP connection was released.” HTTP 1.1 default is persistent connections – can send numerous requests and get numerous responses over the same TCP connection.

7.3.4 HTTP – The Hypertext Transfer Protocol – continued Requests “Each request consists of one or more lines of ASCII text, with the first word on the first line being the name of the method requested.” Example: GET filename HTTP/1.1

7.3.4 HTTP – The Hypertext Transfer Protocol – continued Responses “Every request gets a response, consisting of a status line and possibly additional information (e.g. all or part of a WWW page).”

Electronic Mail (SMTP, POP, IMAP, MIME)