1 / 19

Presented By Srinivas Sundaravaradan

Improving IPC by Kernel Design By Jochen Liedtke German National Research Center for Computer Science . Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC. L3 Similar to MACH

brier
Télécharger la présentation

Presented By Srinivas Sundaravaradan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving IPC by Kernel Design ByJochen LiedtkeGerman National Research Center for Computer Science Presented By Srinivas Sundaravaradan

  2. MACH • µ-Kernel system based on message passing • Over 5000 cycles to transfer a short message • Buffering IPC • L3 • Similar to MACH • Hardware Interrupts delivered through messages • No Ports

  3. Design Philosophy • Focus on IPC • Any Feature that will increase cost must be closely evaluated. • When in doubt, design in favor of IPC • Design for Performance • A poorly performing technique is unacceptable • Evaluate feature cost compared to concrete baseline • Aim for a concrete performance goal • Comprehensive Design • Consider synergistic effects of all methods and techniques • Cover all levels of implementation, from design to code

  4. Making IPC faster • Fewer • Call / Reply & Receive Next • Combining messages • Faster • 15 other optimizations • Architectural level • Use redesign of L3 as opportunity to change kernel design

  5. Methodology • Theoretical minimum • Null message between address spaces • receiver is ready to receive it • 107 cycles to enter & leave kernel • 45 cycles for TLB misses • 172 cycles • Goal • 350 cycles • Achieved 250 cycles = T

  6. Minimize system calls • Why minimize system calls ? • 60% of T • Traditional IPC • 4 system calls • Solution • Call • Reply & Receive next

  7. Minimize system calls Client Server Receive Blocked Send Call Unblocked Receive (reply) Send (reply) Reply and receive next Receive (next) Unblocked Blocked

  8. A Complex Message Complex Message • Direct String • Data to be transferred directly from send buffer to receive buffer • Indirect String • Location and size of data to be transferred by reference • Memory Object • Description of a region of memory to be mapped in receiver address space (shared memory)

  9. Ways of Message Transfer • Twofold Message Copy • user space A -> kernel space -> user space B • LRPC mechanism • share user-level memory • secure ? • does not support variable-to-variable transfer

  10. copy mapped with kernel-only permission Temporary Mapping… • Two copy message transfer costs 20 + 0.75n cycles • L3 copies data once to a special communication window in kernel space • Window is mapped to the receiver for the duration of the call (page directory entry) kernel add mapping to space B kernel

  11. Temporary Mapping… frames in memory Top-level Page table 2nd-level tables

  12. Temporary Mapping

  13. Lazy Scheduling • Scheduler overhead is significant component of IPC cost • Threads doing IPC are often moved to wait queue only to be inserted back again onto the ready queue. • Lazy Scheduling • avoid locking of queues • queue manipulation is avoided • instruction execution • TLB misses

  14. Use registers for short messages • Messages are usually short ! • ack/error replies from drivers • hardware interrupt messages • Intel 486 processor • 7 general purpose registers • sender info, data • May not work for CPU’s with fewer registers

  15. Summary of Optimizations • Architectural • System Calls, Messages, Direct Transfer, Strict Process Orientation, Thread Control Blocks • Algorithmic • Thread Identifier, Virtual Queues, Timeouts/Wakeups, Lazy Scheduling, Direct Process Switch, Short messages • Interface • Unnecessary Copies, Parameter passing • Coding • Cache misses, TLB misses, Segment registers, General registers, Jumps and Checks, Process Switch

  16. Results…

  17. Results

  18. Conclusions • L3’s message passing was 22 times faster than that of MACH • Kernel redesign focused mainly on IPC • Caveats • Ports and Buffering • Specific to the architecture

  19. Thank You !

More Related