1 / 52

TCP Servers: Offloading TCP/IP Processing in Internet Servers

TCP Servers: Offloading TCP/IP Processing in Internet Servers. Liviu Iftode Department of Computer Science University of Maryland and Rutgers University. My Research: Network-Centric Systems. TCP Servers and Split-OS [ NSF CAREER ] Migratory TCP and Service Continuations

kimo
Télécharger la présentation

TCP Servers: Offloading TCP/IP Processing in Internet Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TCP Servers: Offloading TCP/IP Processing in Internet Servers Liviu Iftode Department of Computer Science University of Maryland and Rutgers University

  2. My Research: Network-Centric Systems • TCP Servers and Split-OS [NSF CAREER] • Migratory TCP and Service Continuations • Federated File Systems • Smart Messages [NSF ITR-2] and Spatial Programming for Networks of Embedded Systems • http://discolab.rutgers.edu

  3. Networking and Performance • The transport-layer protocol must be efficient C C C IP Network TCP WAN S S Internet Servers Storage Networks SAN IP or not IP ? TCP or not TCP? D D D

  4. The Scalability Problem Apache web server on 1 Way and 2 Way 300 MHz Intel Pentium II SMP repeatedly accessing a static16 KB file

  5. Breakdown of CPU Time for Apache

  6. The TCP/IP Stack APPLICATION SYSTEM CALLS SEND copy_from_application_buffers TCP_send IP_send packet_scheduler setup_DMA RECEIVE copy_to_application_buffers TCP_receive IP_receive software_interrupt_handler hardware_interrupt_handler packet_in KERNEL packet_out

  7. Breakdown of CPU Time for Apache

  8. Serialized Networking Actions APPLICATION SYSTEM CALLS SEND copy_from_application_buffers TCP_send IP_send packet_scheduler setup_DMA packet_out RECEIVE copy_to_application_buffers TCP_receive IP_receive software_interrupt_handler hardware_interrupt_handler packet_in Serialized Operations

  9. TCP/IP Processing is Very Expensive • Protocol processing can take up to 70% of the CPU cycles • For Apache web server on uniprocessors [Hu 97] • Can lead to Receive Livelock [Mogul 95] • Interrupt handling consumes a significant amount of time • Soft Timers [Aron 99] • Serialization affects scalability

  10. Outline • Motivation • TCP Offloading using TCP Server • TCP Server for SMP Servers • TCP Server for Cluster-based Servers • Prototype Evaluation

  11. TCP Offloading Approach • Offload network processing from application hosts to dedicated processors/nodes/I-NICs • Reduce OS intrusion • network interrupt handling • context switches • serializations in the networking stack • cache and TLB pollution • Should adapt to changing load conditions • Software or hardware solution?

  12. The TCP Server Idea Host Processor TCP Server TCP/IP Application OS CLIENT FAST COMMUNICATION SERVER

  13. TCP Server Performance Factors • Efficiency of the TCP server implementation • event-based server, no interrupts • Efficiency of communication between host(s) and TCP server • non-intrusive, low-overhead • API • asynchronous, zero-copy • Adaptiveness to load

  14. TCP Servers for Multiprocessor Systems CPU N CPU 0 TCP Server Application Host OS CLIENT SHARED MEMORY Multiprocessor (SMP) Server

  15. TCP Servers for Clusters with Memory-to-Memory Interconnects TCP Server Host Application CLIENT MEMORY-to-MEMORY INTERCONNECT Cluster-based Server

  16. TCP Servers for Multiprocessor Servers

  17. SMP-based Implementation TCP Server Application Host OS IO APIC Disk & Other Interrupts Network and Clock Interrupts Interrupts

  18. ENQUEUE SEND REQUEST SMP-based Implementation (cont’d) TCP Server Application Host OS DEQUEUE AND EXECUTE SEND REQUEST SHARED QUEUE

  19. TCP Server Event-Driven Architecture Dispatcher Monitor Send Handler Receive Handler Asynchronous Event Handler Shared Queue NIC From Application Processors To Application Processors

  20. Dispatcher • Kernel thread executing at the highest priority level in the kernel • Schedules different handlers based using input from the monitor • Executes an infinite loop and does not yield the processor • No other activity can execute on the TCP Server processor

  21. Asynchronous Event Handler (AEH) • Handles asynchronous network events • Interacts with the NIC • Can be an Interrupt Service Routine or a Polling Routine • Is a short running thread • Has the highest priority among TCP server modules • The clock interrupt is used as a guaranteed trigger for the AEH when polling

  22. Send and Receive Handlers • Scheduled in response to a request in the Shared Memory queues • Run at the priority of the network protocol • Interact with the Host processors

  23. Monitor • Observes the state of the system queues and provides hints to the Dispatcher to schedule • Used for book-keeping and dynamic load balancing • Scheduled periodically or when an exception occurs • Queue overflow or empty • Bad checksum for a network packet • Retransmissions on a connection • Can be used to reconfigure the set of TCP servers in response to load variation

  24. TCP Servers for Cluster-based Servers

  25. TUNNEL SOCKET REQUEST Cluster-based Implementation TCP Server Host Application Socket Stub DEQUEUE AND EXECUTE SOCKET REQUEST VI Channels

  26. SAN TCP Server Architecture Eager Processor Resource Manager TCP/IP Provider Socket Call Processor Request Handler VI Connection Handler NIC - WAN (To Host)

  27. Sockets and VI Channels • Pool of VI’s created at initialization • Avoid cost of creating VI’s in the critical path • Registered memory regions associated with each VI • Send and receive buffers associated with socket • Also used to exchange control data • Socket mapped to a VI on the first socket operation • All subsequent operations on the socket tunneled through the same VI to the TCP server

  28. Socket Call Processing • Host library intercepts socket call • Socket call parameters are tunneled to the TCP server over a VI channel • TCP server performs socket operation and returns results to the host • Library returns control to the application immediately or when the socket call completes (asynchronous vs synchronous processing).

  29. Design Issues for TCP Servers • Splitting of the TCP/IP processing • Where to split? • Asynchronous event handling • Interrupt or polling? • Asynchronous API • Event scheduling and resource allocation • Adaptation to different workloads

  30. Prototypes and Evaluation

  31. SMP-based Prototype • Modified Linux – 2.4.9 SMP kernel on Intel x86 platform to implement TCP server • Most parts of the system are kernel modules, with small inline changes to the TCP stack, software interrupt handlers and the task structures • Instrumented the kernel using on-chip performance monitoring counters to profile the system

  32. Evaluation Testbed • Server • 4-Way 550MHz Intel Pentium II Xeon system with 1GB DRAM and 1MB on chip L2 cache • Clients • 4-way SMPs • 2-Way 300 MHz Intel Pentium II system with 512 MB RAM and 256KB on chip L2 cache • NIC : 3-Com 996-BT Gigabit Ethernet • Server Application: Apache 1.3.20 web server • Client program: sclients [Banga 97] • Trace driven execution of clients

  33. Trace Characteristics

  34. C3 C2 Splitting TCP/IP Processing APPLICATION APPLICATION PROCESSORS SYSTEM CALLS SEND copy_from_application_buffers TCP_send IP_send packet_scheduler setup_DMA packet_out RECEIVE copy_to_application_buffers TCP_receive IP_receive software_interrupt_handler interrupt_handler packet_in DEDICATED PROCESSORS C1

  35. Implementations

  36. Throughput

  37. CPU Utilization for Synthetic Trace

  38. Throughput Using Synthetic Trace With Dynamic Content

  39. Adapting TCP Servers to Changing Workloads • Monitor the queues • Identify low and high water marks to change the size of the processor set • Execute a special handler for exceptional events • Queue length lower than the low water mark • Set a flag which dispatcher checks • Dispatcher sleeps if the flag is set • Reroute the interrupts • Queue length higher than the high water mark • Wake up the dispatcher on the chosen processor • Reroute the interrupts

  40. Load behaviour and dynamic reconfiguration

  41. Throughput with Dynamic Reconfiguration

  42. Cluster-based Prototype • User-space implementation (bypass host kernel) • Entire socket operation offloaded to TCP Server • C1, C2 and C3 offloaded by default • Optimizations • Asynchronous processing: AsyncSend • Processing ahead: Eager Receive, Eager Accept • Avoiding data copy at host using pre-registered buffers • requires different API: MemNet

  43. Implementations

  44. Evaluation Testbed • Server • Host and TCP Server: 2-Way 300 MHz Intel Pentium II system with 512 MB RAM and 256KB on chip L2 cache • Clients • 4-Way 550MHz Intel Pentium II Xeon system with 1GB DRAM and 1MB on chip L2 cache • NIC: 3-Com 996-BT Gigabit Ethernet • Server application: Custom web server • Flexibility in modifying application to use our API • Client program: httperf

  45. Throughput with Synthetic Trace Using HTTP/1.0

  46. CPU Utilization

  47. Throughput with Synthetic Trace Using HTTP/1.1

  48. Throughput with Real Trace (Forth) Using HTTP/1.0

  49. Related Work • TCP Offloading Engines • Communication Services Platform (CSP) • System architecture for scalable cluster-based servers, using a VIA-based SAN to tunnel TCP/IP packets inside the cluster • Piglet - A vertical OS for multiprocessors • Queue Pair IP - A new end point mechanism for inter-network processing inspired from memory-to-memory communication

  50. Conclusions • Offloading networking functionality to a set of dedicated TCP servers yields up to 30% performance improvement • Performance Essentials: • TCP Server architecture • event driven • polling instead of interrupts • adaptive to load • API • asynchronous, zero-copy

More Related