Systematic Register Bypass Customization for Application-Specific Processors

Systematic Register Bypass Customizationfor Application-Specific Processors Kevin Fan, Nathan Clark, Michael Chu,K. V. Manjunath, Rajiv Ravindran,Mikhail Smelyanskiy, Scott MahlkeAdvanced Computer Architecture Laboratory University of Michigan 1

Introduction • Bypass network allows for data forwarding to reduce pipeline stalls • Full bypass: any FU can bypass from any other FU and from any pipeline stage # paths = (issue width)2 bypassable stages input ports per FU output ports per FU 2

Bypass Path Utilization • As processors get wider and deeper, cost of bypass network increases quadratically [Palacharla ’98] • Only few bypasses are heavily utilized 3

Designing a Partial Bypass Network • Reduce hardware at the cost of runtime • Design a sparse bypass network while minimizing performance impact • Challenges: • Reconcile different requirements for different program regions • Interplay between different bypass paths • Huge search space, exponential number of possible configurations 4

Spacewalking Partial Bypass • Profile-guided Pareto ascent • Rank bypass paths by importance • Remove least important path and evaluate performance impact • Update rankings with new statistics • Repeat until performance degrades too far Bypasses (Ranked by Importance) Program Most Useful … Evaluate New Machine Replace Bypass If Performance Drops Too Much Remove the least useful bypass Cost/ Performance Paretomachines Least Useful X 1 Performance Usage statistics Cost 5

Ranking Bypass Paths cycles bypass was used total cycles Importance = % utilization offload potential redundant cycles cycles bypass was used Bypass path +1 +2 Equivalent bypass paths 6

I1 I2 M3 A Closer Look • Uses more bypasses than necessary • Not all edges require 1-stage bypass Ma Critical edges Ic Ib Id Ie If I1 I2 M3 7

I1 I2 M3 Compiling for Partial Bypass Optimal: Possible edge latencies • Difficulties: • Latencies between operations vary depending on resource assignments • Current assignment will affect future decisions • Naïve scheduler will arbitrarily place Op c • Need to provide resource hints to the scheduler to break ties Ma 1,2 1,2 Ic Ic Ic Ib 1,2 1,2 1,2 Id Id Id Ie Ie Ie 1,2 Scheduler: If If If 8

BUG Preference Algorithm • Perform pre-scheduling pass over the DFG • Bottom-Up Greedy algorithm based on [Ellis ’85] • Traverse DFG, critical paths first • Select bypass paths to achieve earliest completion time for each operation • Take into account time to: • Get inputs • Execute • Send outputs to consumers 9

Ma Ma Ic {2} Ib Ic Ib Ma {3} Ma Id Ie {2} Id Ie Ic Ib {1,2} Ic Ib {1} If If Id Ie {1,2} Id Ie {1,2} I1 I2 M3 Ma If {1,2} If {1,2} Ic Ib Id Ie If BUG Example • Place ops b, d, f on unit 1 since M bypasses to it • Place ops c, e on unit 2 since resource is free 10

Bypass Cost Savings Relative Performance 11

Pareto-optimal Machines djpeg (5-wide) g721dec (9-wide) BUG Preferences ILP Preferences 12

Bypass Usage is Variable Utilization epic bfish rawc rawd rasta cjpeg djpeg mesa unepic pegenc pegdec gsmenc gsmdec g721enc g721dec mpeg2enc 13

Conclusion • Significant bypass network cost can be saved without much performance loss • Our approach: • Intelligent bypass spacewalking • Resource hints allow compiler to schedule code effectively • 95% of original performance maintained when removing 60% of utilized bypasses • http://cccp.eecs.umich.edu 14

Systematic Register Bypass Customization for Application-Specific Processors

Systematic Register Bypass Customization for Application-Specific Processors

Presentation Transcript

SYNTHESIS OF APPLICATION SPECIFIC VLIW PROCESSORS

Design Automation of Co-Processors for Application Specific Instruction Set Processors

An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Register Bank Assignment For Spatially Partitioned Processors

Application-Specific Signatures for Transactional Memory in Soft Processors

Data Generation for Application-Specific Benchmarking

SoC Subsystem A cceleration using Application-Specific Processors (ASIPs)

Application-Specific Customization and Scalability of Soft Multiprocessors

Architecture and Design Automation for Application-Specific Processors

Application-Specific Customization of Parameterized FPGA Soft-Core Processors

Automatic Application-Specific Customization of Soft Processor Microarchitecture

Application-Specific Languages

Application-Specific Customization of FPGA Soft-core Processors

Application-Specific Customization of Soft Processor Microarchitecture

Application Specific Module

Bypass Aware Instruction Scheduling for Register File Power Reduction

Traffic Classification for Application Specific Peering

PROSPERO International prospective register of systematic reviews

Traffic Classification for Application Specific Peering

Application-Specific Customization of Microblaze Processors, and other UCR FPGA Research

Application-Specific Customization of Soft Processor Microarchitecture