270 likes | 365 Vues
TCP/IP and Other Transports for High Bandwidth Applications Back to Basics. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks” then look for “Brasov”. Structure of the Talks.
E N D
TCP/IP and Other Transports for High Bandwidth ApplicationsBack to Basics Richard Hughes-Jones The University of Manchesterwww.hep.man.ac.uk/~rich/ then “Talks” then look for “Brasov” Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Structure of the Talks The aim is to give you a picture of how researchers are using high performance networks to support their work. • Back to Basics • Simple Introduction to Networking • TCP/IP on High Bandwidth Long Distance Networks • But TCP/IP works ! • The effect of packet loss • Advanced TCP Stacks • Fairness • Real Applications on Real Networks • Disk-2-disk applications on real networks • Memory-2-memory tests • Transatlantic disk-2-disk at Gigabit speeds • Remote Computing Farms • The effect of distance • Radio Astronomy e-VLBI Thanks for allowing me to use their slides to: Sylvain Ravot CERN, Les Cottrell SLAC, Brian Tierney LBL, Robin Tasker DL Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Simple Introduction to Networking Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
What is a Protocol Stack ? • ISO OSI (Open Systems Interconnection) Seven Layer Model defines a framework allowing development of real network protocols • A layer… • performs unique and specific tasks • only has knowledge of those layers immediately above and below • uses services of layer below, and provides services to layer above • the services defined by a layer are implementation independent –it’s a definition of how things work • conceptually communicates with its peer in the remote system Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
App data Layer 7: Applicationuser processes Layer 6: Presentationdata interpretation, code transformation App data PH SH App data PH Layer 5: SessionConnection, negotiation control Segment TH SH App data PH Layer 4: TransportEnd-2-end data transfer & integrity Packet sequencing, flow control Packet Layer 3: Network Addressing, Routing Packet sequencing, flow control SH App data PH NH TH Frame FCS Layer 2: Data Link Packet assembly/disassembly Transmission control, Error checking DH SH NH TH App data PH Layer 1: PhysicalElectrical, Optical, Mechanical Bits on the “wire” The Layering Principle • Encapsulation: • Each protocol layer N adds a Header to the data unit from layer N+1 • Header contains control information Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
What do the Layers do? • Transport Layer: acts as a go-between for the user and network • Provides end-to-end data movement & control • Gives the level of reliability/integrity need by the application • Can ensure a reliable service (which network layer cannot), e.g. assigns sequence numbers to identify “lost” packets • Network Layer: deals with logical addressing & the transmission of packets, mechanism for routing. • Data Link Layer: provides the synchronization and error checking for the data transmitted over a single physical link (may ensure correct delivery of frames) • Going down: fits packets from the network layer above into frames. • Going up: Groups bits from the physical layer into frames. • Physical Layer: concerned with the transmission of individual bits. Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
How do the “IP” Protocols fit together? TFTP RFC 783 File Transfer Protocol (FTP) RFC 559 ssh ping SNMP RFC 1157 TELNET RFC 854 Application DNS traceroute Simple Mail Transfer Protocol (SMTP) RFC 821 DNS ( Presentation NFS RFC 1024, 1057 and 1094 Session) POP3/IMAP HTTP Internet Control Message Protocol (ICMP) RFC 792 Routing OSPF, BGP User Datagram Protocol (UDP) RFC 768 Transmission Control Protocol (TCP) RFC 793 Transport Address Resolution Protocols ARP: RFC 826 RARP: RFC 903 Internet Protocol IP RFC 791 Network Network Interface Cards Data Link Ethernet Token Ring ISDN FDDI SMDS ATM SDH/SONET xDSL Transmission Mode Physical TP Copper Fibre Optic Satellite Microwave DWDM CWDM etc Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Some of the “IP” Protocols • Transmission Control Protocol.TCP provides application programs access to the network using areliable, connection-oriented transport layer service. • User Datagram Protocol.UDP provides unreliable, connection-less delivery service using the IP protocol to transport messages between machines. It adds the ability to distinguish among multiple destinations on a single host computer. • Internet Protocol. IP receives datagrams from the upper-layer software and transmits it to the destination host based upon abest effort, connection-less delivery service. • Internet Control Message Protocol. ICMP allows internet routers to transmit error messages and test messages. • Internet Group Message Protocol. IGMP is used with multicast to send UDP datagrams to multiple hosts. • Address Resolution Protocol. ARP translates between the 32 bit IP address and a 48 bit LAN address. • Reverse Address Resolution Protocol. RARP translates between the 48 bit LAN address and the 32 bit IP address. Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
The Physical Layer 1: Ethernet Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
The Link Layer 2: Ethernet Frame Frame header IP Datagram FCS 12 bytes Inter Frame Gap Preamble, which is comprised of 56 bits of alternating 0s and 1s. The preamble provides all the nodes on the network a signal against which to synchronize. Start Frame delimiter, which marks the start of a frame. The start frame delimiter is 8 bits long with the pattern10101011 Media Access Control (MAC) Address Every Ethernet network card has, built into its hardware, a unique six-octet (48-bit) hexadecimal number that differentiates it from all other Ethernet cards in the universe. The DA and SA define the path across the link Length/Type field two octets long. If the value =< 1500 (0x05dc hex) indicates the length of data If the value > 1500 indicates network-layer protocol : “Ethernet Types” Data, the reason the frame exists. MTU Maximum Transport Unit Frame Check Sequence to protect the frame contents Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
The Link Layer: Ethernet VLANs VLANS are logical networks built over the same physical cable plant. Distinguishes Ethernet frames between their logical networks using VLAN header VLAN is defined by the use of value 0x8100 in the Type field location. The next two octets are composed of the following three fields: User Priority field This field is 3 bits in length and is used to define the priority of the Ethernet frame. This is utilized to define and deliver a class of service Canonical format indicator This is 1 bit in length. Just **don’t** ask!!! VLAN Identifier field This field is 12 bits in length and contains the VLAN identifier (VID) of this frame. The original Length/Type field will then follow the inserted VLAN tag. Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
The Network Layer 3: IP • IP Layer properties: • Provides best effort delivery • It is unreliable • Packet may be lost • Duplicated • Out of order • Connection less • Provides logical addresses • Provides routing • Demultiplex data on protocol number Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
31 0 24 4 8 16 19 Vers Hlen Type of serv. Total length Identification Flags Fragment offset TTL Protocol Header Checksum Source IP address Destination IP address IP Options (if any) Padding The Internet datagram Frame header Transport FCS IP header 20 Bytes Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
IP Datagram Format (cont.) • Type of Service – TOS:now being used for QoS • Total length: length of datagram in bytes, includes header and data • Time to live – TTL: specifies how long datagram is allowed to remain in internet • Routers decrement by 1 • When TTL = 0 router discards datagram • Prevents infinite loops • Protocol: specifies the format of the data area • Protocol numbers administered by central authority to guarantee agreement, e.g. ICMP=1, TCP=6, UDP=17 … • Source & destination IP address: (32 bits each) contain IP address of sender and intended recipient • Options: (variable length) Mainly used to record a route, or timestamps, or specify routing Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Internet Class-based addresses • An Address looks like 192.168.22.123 • Class A: large number of hosts, few networks • 0nnnnnnn hhhhhhhh hhhhhhhh hhhhhhhh • 7 network bits (0 and 127 reserved, so 126 networks), 24 host bits (> 16M hosts/net) • Initial byte 1-127 (decimal) • Class B: medium number of hosts and networks • 10nnnnnn nnnnnnnn hhhhhhhh hhhhhhhh • 16,384 class B networks, 65,534 hosts/network • Initial byte 128-191 (decimal) • Class C: large number of small networks • 110nnnnn nnnnnnnn nnnnnnnn hhhhhhhh • 2,097,152 networks, 254 hosts/network • Initial byte 192-223 (decimal) • Class D: Multicast (See RFC 1112) • 1110nnnn nnnnnnnn nnnnnnnn hhhhhhhh • Initial byte 224-239 (decimal) • Class E: Reserved • Initial byte 248-255 (decimal) Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
The Transport Layer 4: UDP • UDP Provides : • Connection less service over IP • No setup teardown • One packet at a time • Minimal overhead – high performance • Provides best effort delivery • It is unreliable: • Packet may be lost • Duplicated • Out of order • Application is responsible for • Data reliability • Flow control • Error handling Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
0 24 8 16 31 Source port Destination port UDP message len Checksum (opt.) UDP Datagram format Frame header FCS IP header UDP header Application data • Source/destination port: port numbers identify sending & receiving processes • Port number & IP address allow any application on Internet to be uniquely identified • Ports can be static or dynamic • Static (< 1024) assigned centrally, known as well known ports • Dynamic • Message length: in bytes includes the UDP header and data (min 8 max 65,535) 8 Bytes Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
The Transport Layer 4: TCP • TCP RFC 768 RFC 1122 Provides : • Connection orientated service over IP • During setup the two ends agree on details • Explicit teardown • Multiple connections allowed • Reliable end-to-end Byte Stream delivery over unreliable network • It takes care of: • Lost packets • Duplicated packets • Out of order packets • TCP provides • Data buffering • Flow control • Error detection & handling • Limits network congestion Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Frame header FCS IP header TCP header Application data 24 8 16 0 4 10 31 Source port Destination port Sequence number Acknowledgement number Hlen Resv Code Window Checksum Urgent ptr Options (if any) Padding The TCP Segment Format 20 Bytes Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Source port Destination port Sequence number Acknowledgement number Hlen Resv Code Window Checksum Urgent ptr Options (if any) Padding TCP Segment Format – cont. • Source/Dest port: TCP port numbers to ID applications at both ends of connection • Sequence number:First byte in segment from sender’s byte stream • Acknowledgement: identifies the number of the byte the sender of this segment expects to receive next • Code: used to determine segment purpose, e.g. SYN, ACK, FIN, URG • Window: Advertises how much data this station is willing to accept. Can depend on buffer space remaining. • Options: used for window scaling, SACK, timestamps, maximum segment size etc. Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
24 8 16 0 4 31 12 Sequence Num. P X CSRC VPT I M Ver S T RTP Time Stamp 12 Bytes Synchronization Source Identifier The RTP Header Format FCS IP header Frame header UDP header Application data RTP header Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
31 0 24 4 8 16 19 Vers Hlen Type of serv. Total length Identification Flags Fragment offset TTL Protocol Header Checksum Source IP address Destination IP address IP Options (if any) Padding Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
More Information • Lectures, tutorials etc. on TCP/IP: • www.nv.cc.va.us/home/joney/tcp_ip.htm • www.cs.pdx.edu/~jrb/tcpip.lectures.html • www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS • www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm • www.cis.ohio-state.edu/htbin/rfc/rfc1180.html • www.jbmelectronics.com/tcp.htm • Encylopaedia • http://www.freesoft.org/CIE/index.htm • TCP/IP Resources • www.private.org.il/tcpip_rl.html • Understanding IP addresses • http://www.3com.com/solutions/en_US/ncs/501302.html • Configuring TCP (RFC 1122) • ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt • Assigned protocols, ports etc (RFC 1010) • http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Any Questions? Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
Backup Slides Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
More Information Some URLs • UKLight web site: http://www.uklight.ac.uk • MB-NG project web site: http://www.mb-ng.net/ • DataTAG project web site: http://www.datatag.org/ • UDPmon / TCPmon kit + writeup: http://www.hep.man.ac.uk/~rich/net • Motherboard and NIC Tests: http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/ • TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html& http://www.psc.edu/networking/perf_tune.html • TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004 • PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ • Dante PERT http://www.geant2.net/server/show/nav.00d00h002 Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester
tcpdump / tcptrace • tcpdump: dump all TCP header information for a specified source/destination • ftp://ftp.ee.lbl.gov/ • tcptrace: format tcpdump output for analysis using xplot • http://www.tcptrace.org/ • NLANR TCP Testrig : Nice wrapper for tcpdump and tcptrace tools • http://www.ncne.nlanr.net/TCP/testrig/ • Sample use: tcpdump -s 100 -w /tmp/tcpdump.out host hostname tcptrace -Sl /tmp/tcpdump.out xplot /tmp/a2b_tsg.xpl Summer School, Brasov, Romania, July 2005, R. Hughes-Jones Manchester