210 likes | 337 Vues
This chapter delves into instruction-level parallelism (ILP) with a focus on dynamic exploitation techniques, notably through Tomasulo's algorithm. It examines a series of assembly language instructions for the IBM 360 architecture, showcasing advantages and disadvantages of operand limitations. Examples illustrate loop iterations and the principles of dynamic branch prediction, emphasizing the role of prediction buffers and nested loops. The analysis includes a presentation of prediction accuracy for various buffer sizes highlighting the evolution and challenges in optimizing ILP.
E N D
CSC 4250Computer Architectures October 20, 2006 Chapter 3. Instruction-Level Parallelism & Its Dynamic Exploitation
One More Example on Tomasulo’s Algorithm L.D F0,0(R0) ADD.D F0,F0,F2 MUL.D F0,F0,F4 ADD.D F0,F0,F2 MUL.D F0,F0,F4 S.D F0,0(R0) ADD.D F0,F4,F2
IBM 360 Assembly Language • Only two operands. Advantage? Disadvantage? • Example: L.D F0,0(R0) ADD.D F0,F2 MUL.D F0,F4 ADD.D F0,F2 MUL.D F0,F4 S.D F0,0(R0) … …
Modified Loop-Based Example Loop: L.D F0,0(R1) MUL.D F0,F0,F2 ADD.D F0,F0,F4 S.D F0,0(R1) DADDIU R1,R1,#−8 BNE R1,R2,Loop
Dynamic Branch Prediction • Static branch prediction in Appendix A • Branch Prediction Buffer: a small memory indexed by the lower portion of the address of the branch instruction. The memory contains a bit that says whether the branch was recently taken or not • The prediction bit may have been placed there by another instruction
Figure 3.14. A Branch Prediction Buffer • Use the 4 low-order address bits of the branch (word address) to choose a row.
Nested Loops Loop1: L.D F2,1600(R1) DADDIU R2,R0,#80 Loop2: L.D F0,1000(R2) ADD.D F0,F0,F2 S.D F0,1000(R2) DADDIU R2,R2,#−8 BNEZ R2,Loop2 DADDIU R1,R1,#−8 BNEZ R1,Loop1
Figure 3.8. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer for SPEC89 Benchmarks
Figure 3.9. Prediction Accuracy of 4096-entry 2-bit Prediction Buffer versus an infinite 2-bit Prediction Buffer for SPEC89