1 / 101

Computer Architecture Superscalar Processors

Computer Architecture Superscalar Processors. Ola Flygt Växjö University http://w3.msi.vxu.se/users/ofl/ Ola.Flygt@msi.vxu.se +46 470 70 86 49. Outline. 7.1 Introduction 7.2 Parallel decoding 7.3 Superscalar instruction issue 7.4 Shelving 7.5 Register renaming 7.6 Parallel execution

avi
Télécharger la présentation

Computer Architecture Superscalar Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer ArchitectureSuperscalar Processors Ola Flygt Växjö University http://w3.msi.vxu.se/users/ofl/ Ola.Flygt@msi.vxu.se +46 470 70 86 49

  2. Outline 7.1 Introduction 7.2 Parallel decoding 7.3 Superscalar instruction issue 7.4 Shelving 7.5 Register renaming 7.6 Parallel execution 7.7 Preserving the sequential consistency of instruction execution 7.8 Preserving the sequential consistency of exception processing 7.9 Implementation of superscalar CISC processors using a superscalar RISC core 7.10 Case studies of superscalar processors CH01

  3. Superscalar Processors vs. VLIW

  4. Superscalar Processor: Intro • Parallel Issue • Parallel Execution • {Hardware} Dynamic Instruction Scheduling • Currently the predominant class of processors • Pentium • PowerPC • UltraSparc • AMD K5

  5. Emergence and spread of superscalar processors

  6. Evolution of superscalar processor

  7. Specific tasks of superscalar processing

  8. Parallel decoding {and Dependencies check} • What need to be done

  9. Decoding and Pre-decoding • Superscalar processors tend to use 2 and sometimes even 3 or more pipeline cycles for decoding and issuing instructions >> Pre-decoding: • shifts a part of the decode task up into loading phase • resulting of pre-decoding • the instruction class • the type of resources required for the execution • in some processor (e.g. UltraSparc), branch target addresses calculation as well • the results are stored by attaching 4-7 bits + shortens the overall cycle time or reduces the number of cycles needed

  10. The principle of perdecoding

  11. Number of perdecode bits used

  12. Specific tasks of superscalar processing: Issue

  13. 7.3 Superscalar instruction issue • How and when to send the instruction(s) to EU(s)

  14. Issue policies

  15. Instruction issue policies of superscalar processors

  16. Issue rate {How many instructions/cycle} • CISC about 2 • RISC:

  17. Issue policies: Handling Issue Blockages

  18. Issue stopped by True dependency • True dependency  (Blocked: need to wait)

  19. Issue order of instructions

  20. Aligned vs. unaligned issue

  21. Issue policies: Use of Shelving

  22. Direct Issue

  23. The principle of shelving: Indirect Issue

  24. Design space of shelving

  25. Scope of shelving

  26. Layout of shelving buffers

  27. Implementation of shelving buffer

  28. Basic variants of shelving buffers

  29. Using a combined buffer for shelving, renaming, and reordering

  30. Number of shelving buffer entries

  31. Number of read and write ports • how many instructions may be written into (input ports) or • read out from (output parts) a particular shelving buffer in a cycle • depend on individual, group, or central reservation stations

  32. Shelving: Operand fetch policy

  33. 7.4.4 Operand fetch policies

  34. Operand fetch during instruction issue Register file

  35. Operand fetch during instruction dispatch Register file

  36. Shelving:Instruction dispatch Scheme

  37. 7.4.5 instruction dispatch scheme

  38. Dispatch policy • Selection Rule • Specifies when instructions are considered executable • e.g. Dataflow principle of operation • Those instructions whose operands are available are executable. • Arbitration Rule • Needed when more instructions are eligible for execution than can be disseminated. • e.g. choose the ‘oldest’ instruction. • Dispatch order • Determines whether a non-executable instruction prevents all subsequent instructions from being dispatched.

  39. Dispatch policy: Dispatch order

  40. Trend of Dispatch order

  41. -Dispatch rate (instructions/cycle)

  42. Maximum issue rate <= Maximum dispatch rates >> issue rate reaches max more often than dispatch rates

  43. Scheme for checking the availability of operands: The principle of scoreboarding

  44. Schemes for checking the availability of operand

  45. Operands fetched during dispatch or during issue

  46. Use of multiple buses for updating multiple individual reservation stations

  47. Internal data paths of the PowerPC 604

  48. Treatment of an empty reservation station

  49. 7.4.6 Detail Example of Shelving • Issuing the following instruction • cycle i: mul r1, r2, r3 • cycle i+1: ad r2, r3, r5 • ad r3, r4, r6 • format: Rs1, Rs2, Rd

  50. Example overview

More Related