220 likes | 345 Vues
This seminar by Alok Garg explores the concept of dynamic predication in computer architecture, particularly focusing on its impact on conditional instruction execution. It highlights how predication can eliminate simple branches by converting control dependencies into data dependencies, ultimately improving performance and energy efficiency. Despite its advantages, the seminar discusses limitations such as performance overhead, instruction fetch challenges, and the need for compiler support. Real-world scenarios showcasing dynamic Hammock predication and adaptive predicated execution will also be examined.
E N D
Dynamic Predication ACAL Group Seminar Alok Garg
What is Predicated Execution? • Conditional instruction • Executed : if condition is true • NOP: if condition is false • Eliminate simple branches • If(A==0) { S = T} • Convert control dependencies into data dependencies BNEZ R1, L ADDU R2, R3, R0 L: CMOVZ R2, R3, R1
Simple Example Normal Execution A [B D E] C D E Predicted Execution A [C[!p] B[p]] D E A T NT B C D Pipeline flush due to misprediction Conditional instructions E • Limitations of software predication: • If branch is NT 98% of time • Delayed execution of blocks B or C
Limitations of Predication • ISA support • Predicate registers • Predicated instructions • Performance overhead • Instruction fetch from both paths • Can not execute predicated instructions until the predicate value is resolved • Ideal predication speedup - 16.4% • Only small subset of control-flow graph is covered • Compiler cannot if-convert Complex control-flow • Ideal predication for all conditional branches – 37.4%
Motivation • Some branches are still very hard to predict with conventional branch predictors • Mispredictions lead to costly pipeline flushes • Performance • Energy • Predication is used to avoid pipeline flushes for those hard to predict branches
Paper Covered • Dynamic Hammock Predication for Non-predicated Instruction Set Architecture. Artur Klauser, Todd Austin, Dirk Gruwald, and Brad Calder – Pact 1998 • Wish Branches: Combining Conditional Branching with Predication for Adaptive Predicated Execution. Hyesoon Kim, Onur Mutlu, Jared Stark, and Yale N. Patt – MICRO 2005, IEEE MICRO TOP PICKS 2006 • Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths.Hyesoon Kim, Jose A. Joao, Onur Mutlu, and Yale N. Patt – MICRO 2006
Type of Control-flow graphs A A A B C B C B C G F D D E F G D E H E F I H Simple hammock Nested hammock Frequently hammock
Type of Control-flow graphs L A A B C G B D E F C Loop Non-merging control flow
Distribution of mispredicted branches • Simple + Nested : 16 % of all mispredictions • All except non-merging: 66 % of all mispredictions
Dynamic Hammock Predication • Target firstlimitation of software predication • Get rid of ISA support required • Dynamic predication for simple hammock • 11% of all mispredictions • Compiler support to mark simple hammock boundaries • Predication decision • Dynamic decision • Static profile based
Support for Dynamic Predication Fork Context • R1 := … • R2 := … • R3 := … • R4 := … • B - cc (i) Then Context cc is false • R1 := R1 + R2 • R3 := R1 x 2 • BR (k) Else Context cc is true • R2 := R1 – R2 • R3 := R2 x 2 Join Context • RA := R1 • RB := R2 • RC := R3 • RD := R4
Support for Dynamic Predication fork fork then then else else R1 R1 a k R2 R2 l b R3 R3 c m R4 R4 d d Rename Table Rename Table Fork Context • R1.a := … • R2.b := … • R3.c := … • R4.d := … • PL.e f i g j Then Context cc is false • R1 := R1 + R2 • R3 := R1 x 2 • BR (k) • R1.f := R1.a + R2.b • R3.g := R1.f x 2 • Removed Else Context cc is true Predicate Value = 0 • R2 := R1 – R2 • R3 := R2 x 2 • R2.i := R1.a – R2.b • R3.j := R2.i x 2 Predicate Value = 1 Join Context • RA := R1 • RB := R2 • RC := R3 • RD := R4 • R1.k := PL.e : R1.a : R1.f • R2.l := PL.e : R2.i : R2.b • R3.m:= PL.e : R3.j : R3.g • RA.n := R1.k • RB.o := R2.l • RC.p := R3.m • RD.q := R4.d
Wish Branches • Target second and third limitation of software predication • Dynamic decision based on confidence estimator • Improved coverage by predicating loops • Uses compiler generated predicated blocks • Add “wish” code for dynamic decision • Define how to include simple loops for predication
Wish Jumps and Wish Joins Code Predicated Code Branch Code Wish jump/join code
Wish Loops Code Normal Code Wish Loop Code
Dynamic Number of Wish Branches Performance improvement: 10.7% over predicated code
Dynamic Number of Wish Loops Performance improvement: 13.3% over predicated code
Diverge-Merge Processor (DMP) • Target all 3 limitations of software predication • Dynamic Predication - Little compiler support • Dynamic decision based on confidence estimation • Only on frequently executed control-flow paths • Software support • Compiler mark all diverge and merge points • Hardware support – similar to Dynamic Hammock predication • Enters predication mode at diverge point • Predicate only frequently executed paths
Frequently Executed Control-Flow Paths • Dynamically predicate: Blocks B C E • Reduces predication overhead • Improve predication coverage by including complex control flow graphs
Comparison of Various Predication Schemes A A A L A B C B C B C A B C G F D D E F G D G B D E F E H E C Loop F I H Non-merging control flow Simple hammock Nested hammock Frequently hammock
Performance • 19.3% average performance improvement • 38% reduction in pipeline flushes • Consumes 9% less energy
Conclusion • Most of the hard to predict branches (66%) have convergence point • Dynamic predication is more effective than software predication in terms of: • Number of miss-predicted branches covered • Accuracy of coverage • Effectively reduce large number of pipeline flushes