1 / 22

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture. G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July 7-9. Motivation. Source of complexity on high-performance VLIW processors : hardware duplication

oneida
Télécharger la présentation

Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding the Energy-Delay Tradeoff of ILP-based Compilation Techniques on a VLIW Architecture G. Pokam, F. Bodin CPC 2004 Chiemsee, Germany, July 7-9

  2. Motivation • Source of complexity on high-performance VLIW processors: • hardware duplication • many FUs of different types (ALUs, LSUs, FPUs, BR, etc.) • need large register file • Power growth factor architecture complexity compiler

  3. Motivation • Assume a fixed ; does compiling for higher ILP results in dissipating less power ? • Which issues (architecture, software, etc.) affect power when compiling for ILP ? Try to figure out what happens analytically !

  4. Agenda • Motivation • Used metrics • Energy model • Tradeoff analysis • Hyperblock example • Experiments • Conclusions

  5. Metric • Performance to energy ratio (PTE) [Gonzales, R. et al.] : nb. of oper. per Basic Block : average nb. of oper. per bundle : energy per Basic Block higher is better

  6. Agenda • Motivation • Used metrics • Energy model • Tradeoff analysis • Hyperblock example • Experiments • Conclusions

  7. Energy Model • The execution of a bundle dissipates an energy : • Consider loop intensive kernels … Energy due to execution of bundle Energy due to D-cache misses Energy due to I-cache misses Energy base cost

  8. Agenda • Motivation • Used metrics • Energy model • Tradeoff analysis • Hyperblock example • Experiments • Conclusions

  9. Analysis • Use as a lever for power exploration • Assume R is a CFG region to be transformed into an ILP region H • a sufficient condition for this is given by

  10. Analysis • Idea: • keep track of IPC values that improve energy efficiency • solve the PTE inequality at : • : avg. #oper. in transformed region • : avg. #oper. in the CFG region R

  11. Analysis where • f : exec. freq. • N : # of oper. • n : # of bundles • s : # stall due to dmiss • m : #of BB in region C is a measure of extra work! Shape of ILPtransform function depends on sign of C

  12. vs. • C < 0: • exponential shape means high extra work! • dependence height mismatch • resource contention e.g. Hyperblock: Compensation code • C = 0 • linear shape • negligible extra work • C > 0 • Optimal scenario • Logarithmic shape e.g. Hyperblock: Instruction merging

  13. Agenda • Motivation • Used metrics • Energy model • Tradeoff analysis • Hyperblock example • Experiments • Conclusions

  14. Hyperblock framework • predication model via the select instruction slct dest = cond, src1, src2 • only hammock regions are considered • single entry – single exit hyperblock

  15. Transformation heuristic • build the loop tree • traverse the loop tree from innermost to outermost loop • evaluate profit for each candidate loop region • propagate profit to CFG after transformation

  16. Agenda • Motivation • Used metrics • Energy model • Tradeoff analysis • Hyperblock example • Experiments • Conclusions

  17. Platform • Lx Platform from STMicroelectronics • 4-issue VLIW machine • 64 GPRs, 8 CBRs • 4 ALUs, 1 LD/ST, 2 MULs, 1 BU • Instruction-based energy model from STMicroelectronics • Lx compiler • prefetch disabled • only scalar optimizations (-O2)

  18. Methodology • Post-pass optimization • Instrumentation: • BB frequency • Dmiss per BB phase 1 absciss .s file • original CFG • selective hyperblock • all hyperblock Lx Compiler source .s file SALTO • Hyperblock formation • Hyperblock optimization • instr. promotion • instr. merging • instr. renaming phase 2

  19. Results ? relative larger increase of operation count and static schedule length negligible IPC improvement

  20. Agenda • Motivation • Used metrics • Energy model • Tradeoff analysis • Hyperblock example • Experiments • Conclusions

  21. Conclusions • Analytical scheme to understand the impact of ILP compilation on energy • Heuristic shows 17% energy-delay improvement on a restricted hyperblock scheme • programs suffer from limited ILP which quickly turns into wasted energy • need to go beyond compiler-centric approaches in order to overcome ILP limitations • What is missing: • impact of post-optimization passes has not been determined • only a restricted hyperblock scheme has been evaluate

  22. Thanks!

More Related