1 / 21

TDC 311

TDC 311. The Microarchitecture. Introduction. As mentioned earlier in the class, one Java statement generates multiple machine code statements Then one machine code statement generates one or more micro-code statements. Introduction Continued. For example, in Java: counter += 1;

devon
Télécharger la présentation

TDC 311

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TDC 311 The Microarchitecture

  2. Introduction • As mentioned earlier in the class, one Java statement generates multiple machine code statements • Then one machine code statement generates one or more micro-code statements

  3. Introduction Continued • For example, in Java: counter += 1; Might generate the following machine code: load reg1,counter inc reg1 store reg1,counter

  4. machine code instr C Bus A Bus B Bus PC 1 Control Store MAR 2 Memory MDR 3 Reg A 4 Read, write signals Reg B 5 MIR Reg C 6 Addr C Bus (32 individual signals) B Bus Decoder Reg BB 31 A Bus Decoder (assume 31 registers, 0 means no register) Dec A 4 Dec B 5 AND 6 OR 7 Pass A 8 TwosC A 9 ALU control Add 0 Multiply 1 Inc A 2 Inc B 3 ALU

  5. Clock Subcycles • Subcycle 1 – set up signals to drive data path • Subcycle 2 – drive A and B buses • Subcycle 3 – ALU operation • Subcycle 4 – drive C bus Registers loaded from C Bus Cycle starts here Next microinstruction loaded from control store 1 2 3 4 Requires 2 complete clock cycles to perform a microinstruction.

  6. Simple Example • Java statement: counter += 1; • What might the microinstructions look like? • load reg1,counter • (Assume the address of counter is currently in Register C) • Rd=1; Wr=0; A=00110 (Reg C); B=00000; C=00010 (MAR); ALU=1000 (pass A thru) • Rd=1; all else 0 (counter should now be sitting in MDR) • Rd=0; Wr=0; A=00011 (MDR); B=00000; C=00100 (Reg A/1); ALU=1000 • inc reg1 • Rd=0; Wr=0; A=00100 (Reg A/1); B=00000; C=00100 (Reg A); ALU=0010 (Inc A) • store reg1,counter • Rd=0; Wr=1; A=00100 (Register A); B=00000; C=00011 (MDR); ALU=8 (assume address of counter is still in MAR) • Rd=0; Wr=1; all else 0

  7. Design Issues • Speed vs. cost • reduce the number of clock cycles needed to execute an instruction • simplify the organization so that the clock cycle can be shorter • overlap the execution of instructions • Any way to improve upon the micro-architecture?

  8. Design Issues • Create independent units that fetch and process the instructions? (double-up on other things? Everything?) • Pre-fetch one/two/three instructions? • Perform pipelining?

  9. Pipeline Example

  10. Pipeline Problems • Pipe stall – when a subsequent instruction must wait before it can proceed • What causes stalls? • waiting for memory • waiting for subsequent instruction • determining the next instruction • What if you encounter a branch instruction? • Also takes time to fill the pipeline

  11. Design Issues • Perform branch prediction? • Perform out-of-order execution • add two register contents and store in register • increment counter by 1 • start a write operation • changed to: • add two register contents and store in register • start a write operation • increment counter by 1

  12. Design Issues • Perform speculative execution? • Re-use registers that are no longer used? • Have a large register set and keep all current values in registers? • Use cache memory?

  13. Cache Memory • Main memory is usually referenced near one location (locality principle) • Program code should be in one location (if good programmer) and data often in another (but grouped together) • Bring most recently referenced values into a high speed cache • How does the CPU know something is in cache or not?

  14. Direct-mapped Cache • Most common form of cache memory • Let’s consider a cache which has 2048 entries, each entry holding 32 bytes (not bits) of data • 2048 entries times 32 bytes per entry equals 64 KB

  15. Addresses that use this entry: 65504-65535, 131040- 131071,… 64-95, 65600-65631,… 32-63, 65568-65599,… 0-31, 65536-65567, 131072-131103,… 2047 2046 2045 : : 2 1 0 V bit Tag (16 bits) Data (32 bytes)

  16. Cache Address • When a program generates a 32-bit address, it has the following form: Tag – 16 bits Line – 11 bits Word – 3 bits Byte – 2 bits

  17. Cache Hit • To see if a data item is in the cache, use the 11-bit LINE portion (of the address) to point to one of the 2048 cache row entries • Then the 16-bit TAG of the address is compared to the 16-bit TAG value in the cache entry • If there is a match, the data is there

  18. Cache Hit • If the data is there, use the 3-bit WORD portion of the address to tell you which word from the 8 words (32 bytes) in the cache line should be fetched • If necessary, the 2-bit BYTE address will tell you which one of the four bytes to fetch

  19. Cache Memory • Note that since this cache only holds 64KB, it holds data for addresses 0 – 65535. • But it may also hold data for the addresses 65536 – 131071. • That is why you must compare the TAG fields to see if there is a match

  20. Cache Miss • If no match (of TAG fields), then there is a cache miss • The CPU goes to main memory and fetches the next block of data and stores it in the cache (thus wiping out the old block in the cache)

  21. Cache Example • Consider that the CPU wants to fetch data from location 3610 (or 00000024 in hex) • Tag = 0000 0000 0000 0000 • Line = 0000 0000 001 • Word = 001 • Byte = 00

More Related