150 likes | 272 Vues
This document outlines the dynamic instruction scheduling within a speculative execution context, focusing on a repetitive loop structure. In it, we provide detailed specification of the instruction flow, including the execution cycles for various operations such as load, add, and multiply, while noting the impact of branching and integer logic on processing efficiency. The instruction pipeline is demonstrated, showcasing how reordering and register renaming can improve performance while managing execution dependencies. The implications for cycle management in a VLIW architecture are also discussed.
E N D
Example Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop
Speculative Dynamic Machine specification • Issue rate of 1 • One broadcast per cycle for CDB • branch takes 1 cycle, • Load takes 1 cycle, • integer alu takes 1 cycle, • float add takes 2 cycle • float multiply takes 3 cycle. • These cycle count doesn’t include write to CDB
Reorder buffer Cycle 0 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle 1 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle 2 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle 3 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle 4 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle n Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle n+1 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle n+2 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop FP register status
Reorder buffer Cycle n+3 Reservation table Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R1, Loop FP register status
VLIW example Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop • Static machine specification • One delay slot between any true data flow dependency • One branch delay slot
Register rename Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F0, 0(R2) Mult.D F0, F0, F2 S.D 0(R2), F0 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F1, 0(R2) Mult.D F1, F1, F2 S.D 0(R2), F1 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop
Instruction reorder Loop: L.D F0, 0(R1) Add.D F0, F0, F2 S.D 0(R1), F0 L.D F1, 0(R2) Mult.D F1, F1, F2 S.D 0(R2), F1 SUBI R1, R1, 8 SUBI R2, R2, 8 BNEZ R2, Loop Loop: L.D F0, 0(R1) L.D F1, 0(R2) Add.D F0, F0, F2 Mult.D F1, F1, F2 S.D 0(R1), F0 S.D 0(R2), F1 SUBI R2, R2, 8 BNEZ R2, Loop SUBI R1, R1, 8
Software pipeline Code for one iteration. L.D F0, 0(R1) L.D F1, 0(R2) Add.D F0, F0, F2 Mult.D F1, F1, F2 S.D 0(R1), F0 S.D 0(R2), F1 SUBI R2, R2, 8 SUBI R1, R1, 8 BNEZ R2, Loop L.D F0, 0(R1) L.D F1, 0(R2) Add.D F0, F0, F2 Mult.D F1, F1, F2 S.D 0(R1), F0 S.D 0(R2), F1 SUBI R2, R2, 8 SUBI R1, R1, 8 BNEZ R2, Loop L.D F0, 0(R1) L.D F1, 0(R2) Add.D F0, F0, F2 Mult.D F1, F1, F2 S.D 0(R1), F0 S.D 0(R2), F1 SUBI R2, R2, 8 SUBI R1, R1, 8 BNEZ R2, Loop 8 copies