1 / 28

Tomasulo Dynamic Scheduling

Tomasulo Dynamic Scheduling. Dynamic Issue. In IBM 360/91 about 3 years after CDC 6600 (1966) Goal: High Performance without special compilers Things to remember about the 60’s: No caches, no RISC, very few registers, no precise exceptions Differences between IBM 360 & CDC 6600 ISA

idra
Télécharger la présentation

Tomasulo Dynamic Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tomasulo Dynamic Scheduling

  2. Dynamic Issue • In IBM 360/91 about 3 years after CDC 6600 (1966) • Goal: High Performance without special compilers • Things to remember about the 60’s: • No caches, no RISC, very few registers, no precise exceptions • Differences between IBM 360 & CDC 6600 ISA • IBM has only 2 register specifiers/instr vs. 3 in CDC 6600 • IBM has 4 FP registers vs. 8 in CDC 6600 • Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II, PowerPC 604, …

  3. Dynamic Issue Goal: take advantage of multiple function units and deal with long memory latencies • Advantages: • Speed • Problems: multiple execution latencies • Result is out of order completion • Forwarding and hazard control become more difficult • Precise exceptions would later amplify the problem (non-issue in the ’60s) • Answer: HW to issue instructions when hazards clear

  4. Dynamic Issue • Hazards = data, structural, control • Data: RAW (true data dependence), WAR ( anti-dependence), WAW (output dependence) • Structural: Are the required resources available? • Control: Is this instruction supposed to execute or not? • Implementation – 2 early approaches • Control flow – CDC 6600 (scoreboard) (1964) • Data flow – Tomasulo, IBM 360/91 (1967) • Simple idea – when opcode and operands are ready, and the appropriate set of resources are ready, launch the “execution packet” • Interesting wrinkle – does not used named registers for intermediate storage • Implicit introduction of Register Renaming

  5. Tomasulo vs. Scoreboard • Control & buffers distributed with Function Units (FU) vs. centralized in scoreboard; • FU buffers called “reservation stations”; have pending operands • Registers in instructions replaced by values or pointers to reservation stations(RS); called registerrenaming; • avoids WAR, WAW hazards • More reservation stations than registers, so can do optimizations compilers can’t • Results to FU from RS, not through registers, over Common Data Bus that broadcasts results to all FUs • Load and Stores treated as FUs with RSs as well • Integer instructions can go past branches, allowing FP ops beyond basic block in FP queue

  6. Tomasulo Organization FP Op Queue FPRegisters LoadBuffer StoreBuffer CommonDataBus FP AddRes.Station FP MulRes.Station

  7. Reservation Station Duties • Snarf sources off CDB when they appear • CDB results are tagged with where they came from • When all operands are present, enable the associate FU to execute • Since values aren’t really written to registers (until later): no WAR or WAW hazards are possible • Structural hazards checked at two points • At dispatch – a free reservation station of the right type must be available • When execution packet is ready – multiple reservatino stations may compete for a shared FU • Program order used as basis for arbitration if required

  8. Virtual Registers • Tag field associated with data • Tag field is a virtual register ID • Corresponds to reservation station and load buffer names • Motivation due to the 360’s register weakness • Had only 4 FP regs • The 9 renamed regs (reservation station slots) were a significant bonus • Intel’s x86 architecture is also register-poor • With renamed registers they can get around this

  9. Three Stages of Tomasulo Algorithm 1. Issue—get instruction from FP Op Queue If reservation station free (no structural hazard), control issues instr & sends operands (renames registers). 2. Execution—operate on operands (EX) When both operands ready then execute; if not ready, watch Common Data Bus for result 3. Write result—finish execution (WB) Write on Common Data Bus to all awaiting units; mark reservation station available • Normal data bus: data + destination (“go to” bus) • Common data bus: data + source (“come from” bus) • 64 bits of data + 4 bits of Functional Unit source address • Write if matches expected Functional Unit (produces result) • Does the broadcast

  10. Reservation Station Components Op—Operation to perform in the unit (e.g., + or –) Vj, Vk—Value of Source operands • Store buffers has V field, result to be stored Qj, Qk—Reservation stations producing source registers (value to be written) • Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready • Store buffers only have Qi for RS producing result Busy—Indicates reservation station or FU is busy Register result status—Indicates which functional unit will write each register, if one exists. Blank when no pending instructions that will write that register.

  11. Tomasulo Example Cycle 0

  12. Tomasulo Example Cycle 1

  13. Tomasulo Example Cycle 2

  14. Tomasulo Example Cycle 3

  15. Tomasulo Example Cycle 4 Sort of like Figure 4.9 in your text

  16. Tomasulo Example Cycle 5

  17. Tomasulo Example Cycle 6

  18. Tomasulo Example Cycle 7

  19. Tomasulo Example Cycle 8 • Note: ADDD can execute (and complete) before DIVD issues because an old version of F6 is stored in the reservation station which avoids the WAR hazard

  20. Tomasulo Example Cycle 9

  21. Tomasulo Example Cycle 10

  22. Tomasulo Example Cycle 11

  23. Tomasulo Example Cycle 12

  24. Tomasulo Example Cycle 13

  25. Tomasulo Example Cycle 14 This is Figure 4.10 in the text

  26. Tomasulo Example Cycle 15

  27. Tomasulo Example Cycle 16 • Now do 38 more DIVD cycles and then write back F10 to finish

  28. Review: Tomasulo • Prevents Register as bottleneck • Where’s the new bottleneck? • Avoids WAR, WAW hazards of Scoreboard • If we assume branch prediction (next subject…) • Allows loop unrolling in HW • Not limited to basic blocks • Lasting Contributions • Dynamic scheduling • Register renaming • Load/store disambiguation • Out of order is OK if addresses don’t match • 360/91 descendants are PowerPC 604, 620; MIPS R10000; HP-PA 8000; Intel Pentium Pro

More Related