1 / 14

Systematic Register Bypass Customization for Application-Specific Processors

Systematic Register Bypass Customization for Application-Specific Processors. Kevin Fan, Nathan Clark, Michael Chu, K. V. Manjunath, Rajiv Ravindran, Mikhail Smelyanskiy, Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan. Introduction.

brigit
Télécharger la présentation

Systematic Register Bypass Customization for Application-Specific Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Systematic Register Bypass Customizationfor Application-Specific Processors Kevin Fan, Nathan Clark, Michael Chu,K. V. Manjunath, Rajiv Ravindran,Mikhail Smelyanskiy, Scott MahlkeAdvanced Computer Architecture Laboratory University of Michigan 1

  2. Introduction • Bypass network allows for data forwarding to reduce pipeline stalls • Full bypass: any FU can bypass from any other FU and from any pipeline stage # paths = (issue width)2 bypassable stages input ports per FU output ports per FU 2

  3. Bypass Path Utilization • As processors get wider and deeper, cost of bypass network increases quadratically [Palacharla ’98] • Only few bypasses are heavily utilized 3

  4. Designing a Partial Bypass Network • Reduce hardware at the cost of runtime • Design a sparse bypass network while minimizing performance impact • Challenges: • Reconcile different requirements for different program regions • Interplay between different bypass paths • Huge search space, exponential number of possible configurations 4

  5. Spacewalking Partial Bypass • Profile-guided Pareto ascent • Rank bypass paths by importance • Remove least important path and evaluate performance impact • Update rankings with new statistics • Repeat until performance degrades too far Bypasses (Ranked by Importance) Program Most Useful … Evaluate New Machine Replace Bypass If Performance Drops Too Much Remove the least useful bypass Cost/ Performance Paretomachines Least Useful X 1 Performance Usage statistics Cost 5

  6. Ranking Bypass Paths cycles bypass was used total cycles Importance = % utilization offload potential redundant cycles cycles bypass was used Bypass path +1 +2 Equivalent bypass paths 6

  7. I1 I2 M3 A Closer Look • Uses more bypasses than necessary • Not all edges require 1-stage bypass Ma Critical edges Ic Ib Id Ie If I1 I2 M3 7

  8. I1 I2 M3 Compiling for Partial Bypass Optimal: Possible edge latencies • Difficulties: • Latencies between operations vary depending on resource assignments • Current assignment will affect future decisions • Naïve scheduler will arbitrarily place Op c • Need to provide resource hints to the scheduler to break ties Ma 1,2 1,2 Ic Ic Ic Ib 1,2 1,2 1,2 Id Id Id Ie Ie Ie 1,2 Scheduler: If If If 8

  9. BUG Preference Algorithm • Perform pre-scheduling pass over the DFG • Bottom-Up Greedy algorithm based on [Ellis ’85] • Traverse DFG, critical paths first • Select bypass paths to achieve earliest completion time for each operation • Take into account time to: • Get inputs • Execute • Send outputs to consumers 9

  10. Ma Ma Ic {2} Ib Ic Ib Ma {3} Ma Id Ie {2} Id Ie Ic Ib {1,2} Ic Ib {1} If If Id Ie {1,2} Id Ie {1,2} I1 I2 M3 Ma If {1,2} If {1,2} Ic Ib Id Ie If BUG Example • Place ops b, d, f on unit 1 since M bypasses to it • Place ops c, e on unit 2 since resource is free 10

  11. Bypass Cost Savings Relative Performance 11

  12. Pareto-optimal Machines djpeg (5-wide) g721dec (9-wide) BUG Preferences ILP Preferences 12

  13. Bypass Usage is Variable Utilization epic bfish rawc rawd rasta cjpeg djpeg mesa unepic pegenc pegdec gsmenc gsmdec g721enc g721dec mpeg2enc 13

  14. Conclusion • Significant bypass network cost can be saved without much performance loss • Our approach: • Intelligent bypass spacewalking • Resource hints allow compiler to schedule code effectively • 95% of original performance maintained when removing 60% of utilized bypasses • http://cccp.eecs.umich.edu 14

More Related