1 / 25

CPU and Memory: Design, Implementation, and Enhancement

Enhancement : PART 2. CPU and Memory: Design, Implementation, and Enhancement. Adapted from: The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander John Wiley and Sons  2003 Wilson Wong, Bentley College

cade-newman
Télécharger la présentation

CPU and Memory: Design, Implementation, and Enhancement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enhancement : PART 2 CPU and Memory:Design, Implementation, and Enhancement Adapted from: The Architecture of Computer Hardware and Systems Software: An Information Technology Approach 3rd Edition, Irv Englander John Wiley and Sons 2003 Wilson Wong, Bentley College Linda Senne, Bentley College

  2. Enhancement : PART 2 Topics: CISC vs RISC….part1 Address Modes….part1 Cache……part1 Pipelining…..part2 Scalar and Super Scalar….part2

  3. Background Much of this material is based on the “Data Path” (see figure next slide)

  4. The “DATA PATH”

  5. The “DATA PATH” • As you study the DATA PATH figure here are some things to note: • The external bus connects the MAIN MEMORY and the Bus Interface Unit (BIU); also referred to as the CPU local bus • Data and instructions have separate cache within the CPU • The Prefetch Unit is looking for the next instruction in memory (or cache) and loading it into the prefetch queues (recall the serial nature of programming)

  6. The “DATA PATH” • The Branch Prediction Unit is looking for the next instruction in memory BASED ON A branch-type instruction • Example: if the LMC code is executing at address 40, and the code is a 768 -- branch to address 68 if the Accumulator = 0; the Branch Prediction Unit goes and gets the instructions in 68,69,70…etc • The Pentium uses prefetch queues that are 64 bytes deep

  7. The “DATA PATH” The sequence of events: • CPU initiates a fetch request—sent over the BIU • Memory subsystem gets needed data/inst received by BIU • BIU forwards Instructions  instruction cache; data  data cache • The prefetcher searches code cache for next instruction  instruction queues (D1) • From the 2 prefetch queues, instructions are moved to the control unit to determine if both can be executed at the same time or just one (D2) • Concurrently, (and if the instruction is a branch type) the Branch Prediction Unit tries to determine what branch will be taken and fills the instruction queues (D1)

  8. Pipelining • Fetching an instruction from memory is a major bottleneck. • So, the first step in pipelining is to get as many instructions as possible into instruction cache • The actions of fetching and decoding are broken down into “stages”… • Many texts use the assembly line concept as an analogy for pipelining • See next page for a five-stage pipeline

  9. Pipelining

  10. Pipelining • Notes for the previous slide… • During clock cycle 1, stage S1 is working on instruction 1, fetching it from memory • During clock cycle 2, stage S2 decodes instruction 1, while S1 fetches instruction 2 • During clock cycle 3, stage S3 fetches the operands for instruction 1; stage S2 decodes instruction 2; stage S1 fetches instruction 3. • During clock cycle 4, stage S4 executes instruction ___, S3 fetches operands for instruction ___, S2 decodes instruction ___, and S1 fetches instruction ___.

  11. Pipelining U-pipeline V-pipeline

  12. Pipelining • Notes for the previous slide… • Only one instruction is being complete at a time (scalar) • Two instructions must not conflict over resources of the other • Either the complier checks or • Conflicts are detected during execution • The u-pipeline (top) is the main pipeline • Can execute any Pentium instruction • The other v-pipeline (bottom) only executes simple integer instructions

  13. Pipelining • The numbers…. • Suppose cycle time is 2nsec. Then for ONE instruction to complete is 2nsc X 5 stages = 10nsec (called latency) • But every clock cycle (2nsec) an instruction completes! • Look: 1 instruction = 2nsec • =1inst/2(10-9)sec • =1,000,000,000inst/2sec • 500,000,000 inst/sec • Or 500MIPS This is like 24inches = 2 feet….. so, 24in/2feet = 2in/feet

  14. Pipelining • Test Question: Suppose cycle time is 7nsec and there is a 8 stage pipeline A) Calculate the latency B) calculate the MIPS ----------------------------------------------------- Solution: a) 7nsec x 8 = 56 nsec latency b) 7nsec = 1 instruction  (1/7)109 inst/sec = (1/7)103106inst/sec Or 143MIPS…..(rounded) ………………………………………………………………………………………note 106 = M

  15. Pipelining Summary • Assembly-line technique to allow overlapping between fetch-execute cycles of sequences of instructions • Only one instruction is being executed to completion at a time • Pipelining is also known as Scalar processing • Average instruction execution is approximately equal to the clock speed of the CPU • Problems from stalling • Instructions have different numbers of steps • Problems from branching

  16. Pipelining Questions • Q: A program has 500 instructions. Each instruction averages 6 steps to complete; How many CPU cycles will it take to complete if it is implemented on a CPU that has pipelining capability? • Assume there are no branches or dependencies between instructions (unlikely, but just for academic purposes…) • Solution: Pipelining assumes each INSTRUCTION STEP completes in one CPU cycle; so • Total=500 inst x 6 steps/inst = 3,000 CPU cycles

  17. Super Scalar • Process more than one instruction per clock cycle • Separate fetch and execute cycles as much as possible • Buffers for fetch and decode phases • Parallel execution units • The DATA PATH of the Pentium CPU is SUPER SCALAR

  18. Super Scalar

  19. Scalar vs. Superscalar Processing

  20. Branch Problem Solutions • Separate pipelines for both possibilities • Probabilistic approach • Requiring the following instruction to not be dependent on the branch • Instruction Reordering (superscalar processing)

  21. Superscalar Issues • Out-of-order processing – dependencies (hazards) • Data dependencies • Branch (flow) dependencies and speculative execution • Parallel speculative execution or branch prediction • Branch History Table • Register access conflicts • Logical registers

  22. Other Enhancements • Timing Issues • Microprogrammed Implementation • Hardware Implementation

  23. Hardware Implementation • Hardware – operations are implemented by logic gates • Advantages • Speed • RISC designs are simple and typically implemented in hardware

  24. Pipelining Questions • Q: A program has 500 instructions. Each instruction averages 6 steps to complete; How many CPU cycles will it take to complete if it is implemented on a CPU that has SUPER SCALAR capability? • Assume there are no branches or dependencies between instructions (unlikely, but just for academic purposes…) • Solution: SUPER SCALAR assumes each INSTRUCTION completes in one CPU cycle; so • Total=500 inst x 1 cycle/inst = 500 CPU cycles

  25. Microprogrammed Implementation • Microcode are tiny programs stored in ROM that replace CPU instructions • Advantages • More flexible • Easier to implement complex instructions • Can emulate other CPUs • Disadvantage • Requires more clock cycles

More Related