1 / 36

iGPU : Exception Support and Speculative Execution on GPUs

iGPU : Exception Support and Speculative Execution on GPUs. Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison. Presented at ISCA 2012 . Executive Summary. Compiler/hardware co-design for efficient, general-purpose GPUs

plato
Télécharger la présentation

iGPU : Exception Support and Speculative Execution on GPUs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group University of Wisconsin−Madison Presented at ISCA 2012

  2. Executive Summary Compiler/hardware co-design for efficient, general-purpose GPUs Exceptionsupport with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings

  3. Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

  4. CPU Evolution Retrospective • IBM 360 era – precise exceptions as a performance tradeoff • However, two key shifts in processor design – • Virtual memory no longer optional • Speculative execution on ILP processors

  5. Precise exception handling and speculation was a key enabler for modern CPUs

  6. GPU Architectural trends A single unified CPU-GPU address space • Significant interest in supporting demand paging • Emerging necessity for supporting speculation • More workloads – “irregular” workloads • Handling reliability problems

  7. Need general purpose exception and speculation support for GPUs

  8. Why not just borrow CPU ideas? • CPUs use buffering to preserve arch. state • Future file, History file, Re-order Buffer … • But GPUs have 1000x as many registers • Not practical!

  9. Fundamental Challenges • Well defined restart point in program • GPU pipeline and SIMT model make this hard • Preserving architecture state prior to restart • Need to save 1000s of registers

  10. Key Ideas of our Solution Creation of restart points Preservation of necessary state • Well defined restart point in program • Idempotent code regions • Restartable regions producing same effect • Preserving architecture state prior to restart • Regions constructed with small live state: 1 to 3 regs • Save only this live state

  11. Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

  12. Exception Support Creation idea Exception handler B A B Implicit checkpoints using idempotence Idempotent regions mark restart points Register file provides all the reqd. state! Idempotence guarantees correctness

  13. Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation

  14. Context Switch A B ? ? Exception is page fault  B  Page-fault handling Cleanly remove process 1 ? Start another process and execute  Get page from disk concurrently  Restore process 1 ? Restart process 1 

  15. Context Switch A B ? ? Exception is page fault  B  Page-fault handling Cleanly remove process 1 ? Start another process and execute  Get page from disk concurrently  Restore process 1 ? Restart process 1 

  16. Context Switch • Must save and restore architectural stateBut...GPUs have megabytes of register state • Save only live state • Save only live state at points of minimal live state

  17. Context Switch Preserve idea Candidate cut point Exception handler B B A B 2 4 9 23 2 # live registers # live registers Implicit minimum live state checkpoints using idempotence • Must save and restore architecture stateBut...GPUs have megabytes of register state • Save only live state • Save state at points of minimal live state

  18. Outline • Challenges and Implications • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

  19. Speculation Tuning the Creation idea Implicit checkpoints with low re-execution overhead using idempotence • Speculation generates state that is wrong • Need even more buffers • Recall: buffers are impractical for GPUs • Use idempotence! • Reduce re-execution cost by sub-dividing regions

  20. Speculation C C B B1 B2 B2 B A Misspeculation # live registers: 2 * Region construction details: Idempotent Processing, PLDI ‘12

  21. Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

  22. iGPU Architecture Application Compiler Hardware

  23. iGPU Architecture - Software Creation idea Preserve idea region marker instructions register re-assignment, moves and spills region formation state preservation Form regions Preserve state Reg. pressure

  24. iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Device Code

  25. iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation Idempotent Device Code

  26. iGPU Architecture - Software Kernel Source Code Source Code Compiler Device Code Generator Region formation State preservation Idempotent Device Code

  27. iGPU Architecture - Hardware (not to scale) … L1 cache & TLB Creation idea SIMD Processor L2 Cache RPCs General Purpose Registers … Core Core Fetch Unit Decode … Core Core

  28. iGPU Architecture - Hardware (to scale) General Purpose Registers Restart PC Register 2 RPCs per warp - one each for Sparseand Short regions Compare to 1024 GPRs per warp (32 x 32)

  29. iGPU Architecture - Hardware Preserve idea State preservation handled purely by compiler!Not hardware’s responsibility

  30. Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

  31. Evaluation

  32. Evaluation – Voltage Speculation

  33. Outline • Motivation and Background • iGPU Mechanisms • General exception handling • Context switching • Speculation support • iGPU Architecture • Software • Hardware • Evaluation • Conclusion

  34. Executive Summary Compiler/hardware co-design for efficient, general-purpose GPUs Exceptionsupport with 1.5% overhead (no more than 4%) Demand paging support with 2.5% overhead Context switch (no more than 4%) Exploiting speculation provides > 10% energy savings

  35. Conclusions • Exception support for GPUs is practical • Enables better integration with CPUs in CPU-GPU architectures • Speculative execution on GPUs • Both for performance and reliability • presents interesting possibilities in the context of “irregular” workloads

  36. Questions

More Related