1 / 23

LRPC

LRPC. Firefly RPC, Lightweight RPC, Winsock Direct and VIA. Important optimizations: LRPC. Lightweight RPC (LRPC): for case of sender, dest on same machine (Bershad et. al.) Uses memory mapping to pass data

eve-pearson
Télécharger la présentation

LRPC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LRPC • Firefly RPC, Lightweight RPC, Winsock Direct and VIA

  2. Important optimizations: LRPC • Lightweight RPC (LRPC): for case of sender, dest on same machine (Bershad et. al.) • Uses memory mapping to pass data • Reuses same kernel thread to reduce context switching costs (user suspends and server wakes up on same kernel thread or “stack”) • Single system call: send_rcv or rcv_send

  3. LRPC O/S and dest initially are idle Dest on same site O/S Source does xyz(a, b, c)

  4. LRPC Control passes directly to dest Dest on same site O/S Source does xyz(a, b, c) arguments directly visible through remapped memory

  5. LRPC performance impact • On same platform, offers about a 10-fold improvement over a hand-optimized RPC implementation • Does two memory remappings, no context switch • Runs about 50 times faster than standard RPC by same vendor (at the time of the research) • Semantics stronger: easy to ensure exactly once

  6. Fbufs • Peterson: tool for speeding up layered protocols • Observation: buffer management is a major source of overhead in layered protocols (ISO style) • Solution: uses memory management, protection to “cache” buffers on frequently used paths • Stack layers effectively share memory • Tremendous performance improvement seen

  7. Fbufs control flows through stack of layers, or pipeline of processes data copied from “out” buffer to “in” buffer

  8. Fbufs control flows through stack of layers, or pipeline of processes data placed into “out” buffer, shaded buffers are mapped into address space but protected against access

  9. Fbufs control flows through stack of layers, or pipeline of processes buffer remapped to eliminate copy

  10. Fbufs control flows through stack of layers, or pipeline of processes in buffer reused as out buffer

  11. Fbufs control flows through stack of layers, or pipeline of processes buffer remapped to eliminate copy

  12. Where are Fbufs used? • Although this specific system is not widely used • Most kernels use similar ideas to reduce costs of in-kernel layering • And many application-layer libraries use the same sorts of tricks to achieve clean structure without excessive overheads from layer crossing

  13. Active messages • Concept developed by Culler and von Eicken for parallel machines • Assumes the sender knows all about the dest, including memory layout, data formats • Message header gives address of handler • Applications copy directly into and out of the network interface

  14. Performance impact? • Even with optimizations, standard RPC requires about 1000 instructions to send a null message • Active messages: as few as 6 instructions! One-way latency as low as 35usecs • But model works only if “same program” runs on all nodes and if application has direct control over communication hardware

  15. U/Net • Low latency/high performance communication for ATM on normal UNIX machines, later extended to fast Ethernet • Developed by Von Eicken, Vogels and others at Cornell (1995) • Idea is that application and ATM controller share memory-mapped region. I/O done by adding messages to queue or reading from queue • Latency 50-fold reduced relative to UNIX, throughput 10-fold better for small messages!

  16. U/Net concepts • Normally, data flows through the O/S to the driver, then is handed to the device controller • In U/Net the device controller sees the data directly in shared memory region • Normal architecture gets protection from trust in kernel • U/Net gets protection using a form of cooperation between controller and device driver

  17. U/Net implementation • Reprogram ATM controller to understand special data structures in memory-mapped region • Rebuild ATM device driver to match this model • Pin shared memory pages, leave mapped into I/O DMA map • Disable memory caching for these pages (else changes won’t be visible to ATM)

  18. U-Net Architecture ATM device controller sees whole region and can transfer directly in and out of it ... organized as an in-queue, out-queue, freelist User’s address space has a direct-mapped communication region

  19. U-Net protection guarantees • No user can see contents of any other user’s mapped I/O region (U-Net controller sees whole region but not the user programs) • Driver mediates to create “channels”, user can only communicate over channels it owns • U-Net controller uses channel code on incoming/outgoing packets to rapidly find the region in which to store them

  20. U-Net reliability guarantees • With space available, has the same properties as the underlying ATM (which should be nearly 100% reliable) • When queues fill up, will lose packets • Also loses packets if the channel information is corrupted, etc

  21. Minimum U/Net costs? • Build message in a preallocated buffer in the shared region • Enqueue descriptor on “out queue” • ATM immediately notices and sends it • Remote machine was polling the “in queue” • ATM builds descriptor for incoming message • Application sees it immediately: 35usecs latency

  22. Protocols over U/Net • Von Eicken, Vogels support IP, UDP, TCP over U/Net • These versions run the TCP stack in user space!

  23. VIA and Winsock Direct • Windows consortium (MSFT, Intel, others) commercialized U/Net: • Virtual Interface Architecture (VIA) • Runs in NT Clusters • But most applications run over UNIX-style sockets (“Winsock” interface in NT) • Winsock direct automatically senses and uses VIA where available • Today is widely used on clusters and may be a key reason that they have been successful

More Related