1 / 31

Supporting Fine-grain TLP with Selective Timesharing

Supporting Fine-grain TLP with Selective Timesharing. John Giacomoni. Advisor: Dr. Manish Vachharajani University of Colorado at Boulder 2008.04.26. The Rise of Multicore. The Challenge of Multicore. Programming Finding parallelism Extracting parallelism Debugging Scalability

yovela
Télécharger la présentation

Supporting Fine-grain TLP with Selective Timesharing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Fine-grain TLP withSelective Timesharing John Giacomoni Advisor: Dr. Manish Vachharajani University of Colorado at Boulder 2008.04.26

  2. The Rise of Multicore

  3. The Challenge of Multicore • Programming • Finding parallelism • Extracting parallelism • Debugging • Scalability • Cores scale faster than memory and I/O bandwidth • Low-latency synchronization for many cores • System Design Issues • Isolation from interference (OS, shared resources)

  4. FRACTALSpring 2007 • Development Support for Concurrent Threaded Pipelining • Communicating Concurrent Threads • Low-overhead core-to-core communication is critical • Need to pay attention to computer architecture • Need to hide computer architecture from users • PPoPP ‘08, ANCS ‘07

  5. Why?Why Pipelines? • Multicore systems are the future • Many apps can be pipelined if the granularity is fine enough • ≈ < 1 µs • ≈ 3.5 x interrupt handler

  6. Fine-GrainPipelining Examples • Network processing: • Intrusion detection (NID) • Traffic filtering (e.g., P2P filtering) • Traffic shaping (e.g., packet prioritization)

  7. Network ProcessingScenarios

  8. IP IP APP Dec APP OP App Dec Enc Enc IP OP APP OP Core-Placements 4x4 NUMA Organization (ex: AMD Opteron Barcelona)

  9. Routing/BridgeData Flow OP App IP OS

  10. Example3 Stage Pipeline

  11. Example3 Stage Pipeline

  12. Communication Overhead

  13. Communication Overhead Locks  190ns GigE

  14. Communication Overhead Lamport  160ns Locks  190ns GigE

  15. Communication Overhead Hardware  10ns Lamport  160ns Locks  190ns GigE

  16. Communication Overhead Hardware  10ns FastForward  28ns Lamport  160ns Locks  190ns GigE

  17. More Fine-GrainPipelining Examples • Signal Processing • Media transcoding/encoding/decoding • Software Defined Radios • Encryption • Triple-DES • Counter-Mode AES • Other Domains • ODE Solvers • Fine-grain kernels extracted from sequential applications

  18. ComparativePerformance Lamport FastForward

  19. Routing/BridgeData Flow OP App IP OS

  20. FShm Forward(Bridge) • AES encrypting filter • Link layer encryption • ~10 lines of code • IDS • Complex Rules • IPS • DDoS • Data Recorders • Traffic Analysis • Forensics • CALEA 64B*  1.36 Mfps

  21. Gazing intothe Crystal Ball Hardware  10ns FastForward  28ns Lamport  160ns Locks  190ns Int. Handler OC-48 GigE

  22. Gazing intothe Crystal Ball Hardware  10ns FastForward  14ns FastForward  28ns Lamport  160ns Locks  190ns Int. Handler OC-48 GigE

  23. Routing/BridgeData Flow OP App IP OS

  24. Register BoundPerformance Cycles per Iteration Iteration

  25. Register BoundPerformance Cycles per Iteration Iteration

  26. Effect of Jitteron Pipelines

  27. TheReal World • System Jitter - Symmetric & Asymmetric • Timer Interrupts • Scheduler • I/O • TLB updates

  28. Selective Timesharing • Partitioning the system • Isolate performance critical applications • Processors • Memory (NUMA) • Interconnects • Lasts until application releases resources or user changes priorities • Timeshare normal applications • Normal applications see a system with fewer resources

  29. Register BoundPerformance Cycles per Iteration Iteration

  30. System withSelective Timesharing Cycles per Iteration Iteration

  31. Shared Memory Accelerated Queues Now Available! http://ce.colorado.edu/core Questions? john.giacomoni@colorado.edu

More Related