1 / 71

The Yin and Yang of Hardware Heterogeneity: Can Software Survive? Kathryn S McKinley

The Yin and Yang of Hardware Heterogeneity: Can Software Survive? Kathryn S McKinley. Computation Turing 1936. The Transistor Shockley, Bardeen, Brattain 1947. Virtuous Cycle. Doubling of Transistors faster, smaller, cheaper, …. Software Innovation. Software Innovation.

hina
Télécharger la présentation

The Yin and Yang of Hardware Heterogeneity: Can Software Survive? Kathryn S McKinley

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Yin and Yang of Hardware Heterogeneity: Can Software Survive? Kathryn S McKinley

  2. ComputationTuring 1936

  3. The TransistorShockley, Bardeen, Brattain 1947

  4. Virtuous Cycle Doubling of Transistors faster, smaller, cheaper, … • Software Innovation • Software Innovation • Device Innovation Software Complexity Hardware Complexity Sequential Interface Sequential Interface

  5. Hardware

  6. Dennard Scaling is overPower = Clock Speed × Voltage2 Performance Performance Power

  7. Electricity Dark silicon Electricity costs in U.S. Data Centers $$$$$$ 2011 $7.4 billion 2006 $4.5 billion [U.S. EPA 2007] Battery life [Goulding et al. Hot Chips 2010]

  8. Multicore Hardware 2003 Pentium4 (130) 2006 C2D(65) C2Q(65) 2008 i7(45) 2008 Atom(45) 2009 C2D(45) 2009 AtomD(45) 2010 i5 (32) 130nm 55M tran. 131mm2 1 core 2-way SMT Northwood 45nm 731M tran. 263mm2 4 cores 2-way SMT Bloomfield 45nm 47M tran. 36mm2 1 core 2-way SMT Diamondville 45nm 228M tran. 82mm2 2 cores no SMT Wolfdale 45nm 176M tran. 87mm2 2 cores + GPU 2-way SMT Pineview 32nm 382M tran. 81mm2 2 cores 2-way SMT Clarkdale 65nm 291M tran. 143mm2 2 cores no SMT Conroe Kentsfield Each die shown at correct scale

  9. Virtuous Cycle End of Dennard Scaling Doubling of Transistors faster, smaller, cheaper, … • Software Innovation • Software Innovation • Device Innovation Software Complexity Hardware Complexity Parallel Interface Sequential Interface Sequential Interface

  10. Software

  11. PC Software era ending 12

  12. Software = Mobile + Web + Cloud 13

  13. AppWriters Software People Computer Scientists 14

  14. Languages People Use • C C++ PHP

  15. Software Fast enough Performance Productivity Managing complexity Abstractions 16

  16. Hardware & Softwarepulling in opposite directions

  17. Hardware & Software Marriage

  18. How’s this marriage working out?

  19. What should we measure?

  20. energy = power dt

  21. Multicore Hardware 2003 Pentium4 (130) 2006 C2D(65) C2Q(65) 2008 i7(45) 2008 Atom(45) 2009 C2D(45) 2009 AtomD(45) 2010 i5 (32) 130nm 55M tran. 131mm2 1 core 2-way SMT Northwood 45nm 731M tran. 263mm2 4 cores 2-way SMT Bloomfield 45nm 47M tran. 36mm2 1 core 2-way SMT Diamondville 45nm 228M tran. 82mm2 2 cores no SMT Wolfdale 45nm 176M tran. 87mm2 2 cores + GPU 2-way SMT Pineview 32nm 382M tran. 81mm2 2 cores 2-way SMT Clarkdale 65nm 291M tran. 143mm2 2 cores no SMT Conroe Kentsfield Each die shown at correct scale

  22. Workloads 61 benchmarks native managed Native Non-scalable 27 C, C++, Fortran SPEC CPU2006 Java Non-scalable 18 Java SPEC jvm98, DaCapo, pjbb2005 0.25 0.25 non-scalable Native Scalable 11 C, C++ PARSEC Java Scalable 5 Java DaCapo 0.25 0.25 scalable

  23. Power vs Performance Power is benchmark dependent 2008 2003 Pentium 4 (130) i7 (45) 2006 Core 2 Duo (65) i5 (32) 2010 Core 2 Duo (45) ? 2008 ?

  24. Parallelism did not solve the power problem

  25. Web Page Energy on Windows Phone

  26. What’s next in hardware?

  27. Pareto Analysis (45nm)Workload determines energy efficient architecture

  28. Workload determines energy efficient architecture

  29. Parallelism & Heterogeneity big 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 custom small 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

  30. Motivation Single ISA Heterogeneity NVIDIA Tegra3 5 Cortex A9 (4 x 1.4 GHz, 1 x 500 MHz) Texas Instruments OMAP5432 2 Cortex A15 + 2 Cortex M4

  31. Heterogeneous Hardware Energy Efficiency Complexity

  32. Heterogeneous parallel hardware + software ? July 31, 1922. Train wreck at Laurel, Maryland [Washington Post, August 1, 1922]

  33. Exploiting Heterogeneity Parallelism Ubiquity Differentiation

  34. Case Studies Mobile UI [UIST’13] Managed runtime [ISCA’12] Always awake [NSDI’12] Interactive cloud [ICAC’13]

  35. Case Studies Mobile UI[UIST’13] Managed runtime [ISCA’12] Always awake [NSDI’12] Interactive cloud [ICAC’13]

  36. User Interface I/O on OMAP big/little Kihm, Guimbretière UIST’13 Key board Scrolling Inking App 0 0 0 0 0 0 0 0 A9 big cores M3 little cores

  37. User Interface I/O& Heterogeneity UI I/O Characteristics Parallelism Ubiquity Differentiated

  38. A9+M3 Heterogeneity Battery life Increase

  39. Case Studies Mobile UI [UIST’13] Managed runtime[ISCA’12] Always awake [NSDI’12] Interactive cloud [ICAC’13]

  40. VM Services on little coresCao et al., ISCA’12 VM Services GC + JIT App 0 0 0 0 0 0 0 big cores little cores

  41. VM Services & Heterogeneity VM Services Characteristics Parallelism Ubiquity Differentiated

  42. Measured (fill) Model (empty) Better Better Better Better 2.8 GHz AMD + 2.8 GHz AMD| 2.8 GHz AMD + 1.66 GHz Atom

  43. Case Studies Mobile UI [UIST’13] Managed runtime [ISCA’12] Always awake[NSDI’12] Interactive cloud [ICAC’13]

  44. SomniloquyApplication stubs on little cores wake up big core as neededAgarwal et al., NSDI’12 Wakeup filters App stubs Embedded OS Network stack Applications filtering, notifications, downloads, keep alive Laptop Host Big core RAM, peripherals, … Apps Somniloquy daemon OS Network stack CPU + DRAM + Flash Little core Network interface

  45. Big: Lenovo X60 Laptop + Little: gumstix

  46. Case Studies ✔ One Time Improvements Mobile UI [UIST’13] Managed runtime [ISCA’12] Always awake [NSDI’12] Interactive cloud [ICAC’13]

  47. Case Studies Mobile UI [UIST’13] Managed runtime [ISCA’12] Always awake [NSDI’12] Interactive cloud [ICAC’13]

  48. Interactive Cloud Services Bing, Finance, Recommendations, GamesRen et al. ICAC’13 Interactive Services Characteristics Parallelism Ubiquity Differentiated

  49. Interactive ServicesWorkload Characterization of Bing Search Completion vs Quality Quality Completion Ratio Responsiveness deadline ~100ms = interactive Partial execution trades quality for responsiveness

More Related