1 / 25

No Free Lunch, No Hidden Cost

No Free Lunch, No Hidden Cost. X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame. How Can Co-Design Help?. The Salishan Conference on High-Speed Computing. 1. 1. Department of Computer Science and Engineering. Theme: Exposing Hidden Execution Costs.

kevina
Télécharger la présentation

No Free Lunch, No Hidden Cost

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. No Free Lunch, No Hidden Cost X. Sharon Hu Dept. Computer Science and Engineering University of Notre Dame How Can Co-Design Help? The Salishan Conference on High-Speed Computing 1 1 Department of Computer Science and Engineering

  2. Theme: Exposing Hidden Execution Costs • Cost of execution: performance and power • Computation • Communication • Data motion • Synchronization • … • How can we strike a balance between the extremes? • Hide as much as possible? • Explicitly manage “all” costs? • My “position”: • Expose widely and choose wisely • Focus on power

  3. Why Taking the Position? • Expose widely • Better understanding the contribution by each component • Allowing application-specific tradeoffs • Providing opportunities for powerful co-design tools • Choose wisely • Requiring sophisticated co-design tools • Exploring more algorithm/software options

  4. But Easier Said Than Done! • Heterogeneity • Compute nodes: (multi-core) CPU, GP-GPU, FPGA, … • Memory components: on-chip, on-board, disks, … • Communication infrastructure: bus, NoC, networks, … • Parallelism (”non-determinism”) • Data access: movement, coherence, … • Resource contention • synchronization

  5. Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward

  6. Why Expose Widely? (1) • Different programs has different power distribution GPU Power Distribution (NVidia GTX 280) GPU Cores ConstCache Memory ConstSM TextCache } Hong and Kim, ISCA 2010

  7. Why Expose Widely? (2) • Data movement impacts different algorithms differently Energy consumptions of three sorting algorithms (Pentium 4 + GeForce 570)

  8. Why Expose Widely? (3) • Application dependent Performance degradation due to memory bus contention Massaki Kondo, et. al., SigARCH 2007

  9. Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward

  10. How to Benefit from “Exposing Widely”? • Co-design is the key • Expose all factors impacting the “execution model” • Computation: processing resource • Data motion: memory components and hierarchy • Communication: bus and network • Resource contention, synchronization… • Some examples • Software macromodeling • Hardware module-based modeling • Optimize through power management • Keep in mind Amdahl’s law

  11. Macromodeling: Algorithm Complexity Based • Relate power/energy of a program with its complexity • Example: E = C1S + C2S2 + C3S3 (Tan, et. al. DAC’01) where S is the size of the array for a sorting algorithm • Example: Ecomm = C0 + C1S (Loghi, et. al. ACMTECS’07) where S is the size of exchanged messages • More sophisticated models to account for both computing and communication • How to handle resource contention?

  12. Power Modeling of Bus Contension • Penolazzi, Sander and Ahmed Hemani: DATE’11 • Characterization step • C%N,1 : percentage of cycle difference between the N-processor case and 1-processor case • Can be one by IP providers on chosen benchmarks • Prediction step

  13. Hierarchical Module-Based Power Modeling • Accumulate energy/power of modules • CPU+GPU example • Access rate: software dependent • Data movement contributes to memory power • Resource contention modifies access rate Adapted from Isci and Martonosi, Micro’03

  14. Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward

  15. Managing Bus Contention to Reduce Energy • M. Kondo, H. Sasaki and H. Nakamura, 2006 • Counter for mem request • Register for PU identification • Thresholds for selecting which PU uses what Vdd value

  16. Application Mapping to Reduce Energy (1) • Application mapping for heterogeneous systems ([minR2,maxR2], D2) ([minR1,maxR1], D1) PE 2 PE 1 J1 J2 PE 4 PE 3 J3 J4 ([minR3,maxR3], D3) ([minR4,maxR4], D4) Memory R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.

  17. Application Mapping to Reduce Energy (2) • Optimization: • Minimize power/energy dissipation • Satisfying timing properties (e.g. average path latency, average lateness, etc.) • … • Search Space: • Scheduling parameter, traffic shaping, … • Task level DVFS, i.e. task speed assignment • Resource level DVFS, i.e., resource speed assignment • …

  18. Application Mapping (3): Sensitivity Analysis R. Racu, R. Ernst, A. Hamann, B. Mochocki and X. Hu, “Methods for power optimization in distributed embedded systems with real-time requirements,”, CASES’06.

  19. Application Mapping (4): GA-Based Approach 2’. Scheduling Trace 3’. Power Dissipation Power Analyzer Power model needed

  20. A Sample Result

  21. Outline • Why expose widely? • How to benefit from exposing widely? • How to choose wisely? • Going forward

  22. Going Forward: Systematic Co-design Effort • Expose more • More hardware counters / registers • More efficient/accurate high-level power models • Better models for resource contention and synchronization • Choose better • Handling parallelism • Algorithm, OS, hardware • Resource contention • synchronization • Handling non-determinism • Worst case bounds • Statistical analysis • Interval-based techniques

  23. ES Design v.s. HPCS Design • Differences (maybe) • Application specific workloads v.s. domain specific workloads • Constraints, objectives, desirables? • latency, throughput, energy, cost, reliability, fault tolerance, IP protection/privacy, ToM, … • Other issues: homogeneous v.s. heterogeneous, levels of complexity, user expertise,… • Similarities • Ever increasing hardware capability: multi-core, multi-thread, complex communication fabrics, memory hierarchy, … • Productivity gap • Common concerns: latency, throughput, energy, cost, reliability, fault tolerance, …

  24. Leverage Co-Design for HPC • Systematic performance estimation • Formal methods: scenario-based, statistical analysis • Hybrid approaches: analytical+simulation • Seamless migration from one abstraction level to the next • Efficient design space exploration • Efficient search techniques • Multiple-level abstraction models • Multiple-attribute optimization • Others: memory and communication analysis and design

  25. Thank you!

More Related