1 / 26

Realizing the Performance Potential of the Virtual Interface Architecture

Realizing the Performance Potential of the Virtual Interface Architecture. Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of Electrical and Computer Engineering Presented by Constantin Serban, R.U. VIA Goals.

tocho
Télécharger la présentation

Realizing the Performance Potential of the Virtual Interface Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of Electrical and Computer Engineering Presented by Constantin Serban, R.U.

  2. VIA Goals • Communication infrastructure for System Area Networks (SANs) • Targets mainly high speed cluster applications • Efficiently harnesses the communication performance of underlying networks

  3. Trends • The peak bandwidth increase two order of magnitude over past decade while user latency decreased modestly. • The latency introduced by the protocol is typically several times the latency of the transport layer. • The problem becomes acute especially for small messages

  4. Targets VI architecture addresses the following issues: • Decrease the latency especially for small messages (used in synchronization) • Increase the aggregate bandwidth (only a fraction of the peak bandwidth is utilized) • Reduce the CPU processing due to the message overhead

  5. Overhead Overhead mainly comes from two sources: • Every network access requires one-two traps into the kernel • user/kernel mode switch is time consuming • Usually two data copies occur: • From the user buffer to the message passing API • From message layer to the kernel buffer

  6. VIA approach • Remove the kernel from the critical path • Moving communication code out of the kernel into user space • Provide 0-copy protocol • Data is sent/received directly into the user buffer, no message copy is performed

  7. VIA emerged as a standardization effort from Compaq, Intel, and Microsoft It was built on several academic ideas: • The main architecture most similar to U-Net • Essential features derived from VMMC Among current implementations : • GigaNet cLan – VIA implemented in hardware • Tandem ServerNet –VIA software driver emulated • Myricom Myrinet - software emulated in firmware

  8. VIA architecture

  9. VIA operations Set-Up/Tear-Down : • VIA is point-to-point connection oriented protocol • VI-endpoint : the core concept in VIA • Register/De-Register Memory • Connect/Disconnect • Transmit • Receive • RDMA

  10. VIA operations Set-Up/Tear-Down :VIA is point-to-point connection oriented protocol • VI-endpoint : the core concept in VIA • VipCreateVi function creates a VI endpoint in the user space. • The user-level library passes the call to the kernel agent which passes the creation information to the NIC. • OS thus controls the application access to the NIC

  11. VIA operations - cont’d Register/De-Register Memory: • All data buffers and descriptors reside in a registered memory • NIC performs DMA I/O operation in this registered memory • Registration pins down the pages into the physical memory and provides a handle to manipulate the pages and transfer the addresses to the NIC • It is performed once, usually at the beginning of the communication session

  12. VIA operations - cont’d Connect/Disconnect: • Before communication, each endpoint is connected to a remote endpoint • The connection is passed to the kernel agent and down to the NIC • VIA does not define any addressing scheme, existing schemes can be used in various implementations

  13. VIA operations - cont’d Transmit/receive: • The sender builds a descriptor for the message to be sent. The descriptor points to the actual data buffer. Both descriptor and data buffer resides in a registered memory area. • The application then posts a doorbell to signal the availability of the descriptor.The doorbell contains the address of the descriptor. • The doorbells are maintained in an internal queue inside the NIC

  14. VIA operations - cont’d Transmit/receive (cont’d): • Meanwhile, the receiver creates a descriptor that points to an empty data buffer and posts a doorbell in the receiver NIC queue • When the doorbell in the sender queue has reached the top of the queue, through a double indirection the data is sent into the network. • The first doorbell/ descriptor is picked up from the receiver queue and the buffer is filled out with data

  15. VIA operations - cont’d RDMA: • As a mechanism derived from VMMC, VIA allows Remote DMA operations: RDMA Read and Write • Each node allocates a receive buffer and registers it with the NIC. Additional structures that contain read and write pointers to the receive buffers are exchanged during connection setu • Each node can read and write to the remote node address directly. • These operations posts potential implementation problems.

  16. Evaluation Benchmarks • Two VI implementations : • GigaNet cLan B:125MB/sec, Latency 480ns • Tandem ServerNet, 50MB/S, Latency 300ns • Performance measured: • Bandwidth and Latency • Poling vs. Blocking • CPU Utilization

  17. Bandwidth

  18. Latency

  19. Latency Polling/Blocking

  20. CPU utilization

  21. MPI performance using VIA • The challenge is to deliver performance to distributed application • Software layers such MPI are mostly used between VIA and the application: provide increased usability but they bring additional overhead • How to optimize this layer in order to use it efficiently with VIA ?

  22. MPI VIA - performance

  23. MPI observations • Difference between MPI-UDP and MPI-VIA-baseline is remarkable • MPI-VIA-baseline is dramatically far from VIA-Native • Several improvements proposed to shift MPI-Via to be closer to VIA native : reduce MPI overhead

  24. MPI Improvements • Eliminating unnecessary copies: MPI UDP and VIA use a single set of receiving buffers, thus data should be copied to the application : allow the user to register any buffer • Choosing a synchronization primitive: All synchronization formerly using OS constructs/events. Better implementation using swap processor commands • No Acknowledge: Remove the acknowledge of the message by switching to a reliable VIA mode

  25. VIA - Disadvantages • Polling vs. blocking synchronization – a tradeoff between CPU consumption and overhead • Memory registration: locking large amount of memory makes virtual memory mechanisms inefficient. Registering / deregistering on the fly is slow • Point-to-point vs. multicast: VIA lacks multicast primitives. Implementing multicast over the actual mechanism, makes communication inefficient

  26. Conclusion • Small latency for small messages. Small messages have a strong impact on application behavior • Significant improvement over UDP communication (still after recent TCP/UDP hardware implementations?) • At the expense of an uncomfortable API

More Related