1 / 18

Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES

Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES. Ganesh Dasika 1 , Shidhartha Das 2 , Kevin Fan 1 , Scott Mahlke 1 , David Bull 2. 2 ARM Ltd. Cambridge United Kingdom. 1 University of Michigan Advanced Computer Architecture Laboratoy Ann Arbor, MI. Introduction.

Télécharger la présentation

Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Voltage/Frequency Scaling in Loop Accelerators using BLADES Ganesh Dasika1, Shidhartha Das2, Kevin Fan1, Scott Mahlke1, David Bull2 2ARM Ltd. Cambridge United Kingdom 1University of Michigan Advanced Computer Architecture Laboratoy Ann Arbor, MI 1

  2. Introduction [Austin, IEEE Computer March 04] 2

  3. Razor • Allows for voltage/frequency scaling beyond first-failure point • Exploits difference between design-time conditions (“slow”) and actual conditions (“typical”) [Das, JSSC 2006] 3

  4. Razor in General Purpose Processors • Requires detailed analysis of microarchitectural impact • Analyze what state should be stored • Lengthening pipeline for stabilization increases complexity of forwarding logic • Unpredictable control and data flow • Difficult to determine worst-case vectors 4

  5. BLADES • Better-than-worst-case Loop Accelerator Design • Incorporate DVFS into ASICs using Razor • Shave off some of the high NRE using HLS • Develop generic methodology for any application • Razor solution for a templated architecture • Create ASIC design flow that is aware of Razor-ization costs 5

  6. Loop Accelerator Template • Hardware realization of modulo-scheduled loop • Parameterized execution resources, storage, connectivity • Control is statically determined, simple and not timing-critical • Opportunity to make application-specific optimizations 6

  7. Razor + * + * “Roll-back” muxes } R Added interconnect Extended register queues R is the number of extra entries required Function of max pipeline depth and error-detection delay Razorized Loop Accelerator 7

  8. Razor + * + * Error “Life-Cycle” Error stabilization Error OR-tree Error … + Control Error processing … Error Reset Roll-back pipelining 8

  9. Issues with Razor • Area, added hold-fixing D CLK t spec 9

  10. + I Time 0 Time 1 Time 2 Time 3 50% FU utilization removes hold-fixing need, but requires halving performance or doubling area FU 0 Add-Or0 Add-Or1 Time 0 Time 1 Time 2 FU 0 Add0 Add1 FU 1 Or0 Or1 Time 0 Time 1 Time 2 FU 0 Add0 Time 0 Time 1 Time 2 Time 3 Time 4 Time 5 FU 1 Add1 FU 0 Add0 Add0 Add1 Add1 Use hybrid scheme to execute >2 ops per FU FU 2 Or0 FU 1 Or0 Or0 Or1 Or1 FU 3 Or1 Opcode-chaining 10

  11. Identifying Opcode Chains • Compiler identifies subgraphs of 3-4 input, 1 output instructions • All arith. ops supported • Greedy selection algorithm << << LD 2 1 LD + + << 6 4 >> >> >> << + 5 3 + + + + 7 + >> + & & ST ST 11

  12. Enabled every 2 cycles << << LD << << LD 2 2 1 1 << LD + + << LD + + << 6 6 4 4 >> >> >> << + >> >> >> << + 5 5 Razor DFF + 3 3 + + + + + + + + 7 7 + >> + >> >> + + + & & & & + ST ST ST ST Custom FUs 12

  13. idct, sharp, systolic_dct had multiple CFUs, and overall lower # of FUs Viterbi, dequant had signficant control-flow that restricted opportunities for creating custom ops 22% reduction in hold-fixing overhead in sobel Results 13

  14. Conclusion • Application-specific optimizations definitely help to mitigate Razor costs • 24% reduction in overhead • 33% energy savings overall • Can optimize Razor-ization with further input from the compiler • Critical-instruction analysis • Error impact analysis 14

  15. Thank you! http://cccp.eecs.umich.edu 15

  16. Future Work • Errors in different FUs affect the system differently • Error “impact-analysis” • Data computation not necessarily error-sensitive • Address, branch target/direction critical to functionality • Razor-ization of arbitrary Verilog 16

  17. Motivation • Using Razor has significant design overhead • Error-recovery system • Added “backup” state • Additional hold-time fixing • Modifications for different u-archs are different • Information about work-load cannot be used since design must preserve generality 17

  18. + * 18

More Related