TDC 311

TDC 311 The Microarchitecture

Introduction • As mentioned earlier in the class, one Java statement generates multiple machine code statements • Then one machine code statement generates one or more micro-code statements

Introduction Continued • For example, in Java: counter += 1; Might generate the following machine code: load reg1,counter inc reg1 store reg1,counter

machine code instr C Bus A Bus B Bus PC 1 Control Store MAR 2 Memory MDR 3 Reg A 4 Read, write signals Reg B 5 MIR Reg C 6 Addr C Bus (32 individual signals) B Bus Decoder Reg BB 31 A Bus Decoder (assume 31 registers, 0 means no register) Dec A 4 Dec B 5 AND 6 OR 7 Pass A 8 TwosC A 9 ALU control Add 0 Multiply 1 Inc A 2 Inc B 3 ALU

Clock Subcycles • Subcycle 1 – set up signals to drive data path • Subcycle 2 – drive A and B buses • Subcycle 3 – ALU operation • Subcycle 4 – drive C bus Registers loaded from C Bus Cycle starts here Next microinstruction loaded from control store 1 2 3 4 Requires 2 complete clock cycles to perform a microinstruction.

Simple Example • Java statement: counter += 1; • What might the microinstructions look like? • load reg1,counter • (Assume the address of counter is currently in Register C) • Rd=1; Wr=0; A=00110 (Reg C); B=00000; C=00010 (MAR); ALU=1000 (pass A thru) • Rd=1; all else 0 (counter should now be sitting in MDR) • Rd=0; Wr=0; A=00011 (MDR); B=00000; C=00100 (Reg A/1); ALU=1000 • inc reg1 • Rd=0; Wr=0; A=00100 (Reg A/1); B=00000; C=00100 (Reg A); ALU=0010 (Inc A) • store reg1,counter • Rd=0; Wr=1; A=00100 (Register A); B=00000; C=00011 (MDR); ALU=8 (assume address of counter is still in MAR) • Rd=0; Wr=1; all else 0

Design Issues • Speed vs. cost • reduce the number of clock cycles needed to execute an instruction • simplify the organization so that the clock cycle can be shorter • overlap the execution of instructions • Any way to improve upon the micro-architecture?

Design Issues • Create independent units that fetch and process the instructions? (double-up on other things? Everything?) • Pre-fetch one/two/three instructions? • Perform pipelining?

Pipeline Example

Pipeline Problems • Pipe stall – when a subsequent instruction must wait before it can proceed • What causes stalls? • waiting for memory • waiting for subsequent instruction • determining the next instruction • What if you encounter a branch instruction? • Also takes time to fill the pipeline

Design Issues • Perform branch prediction? • Perform out-of-order execution • add two register contents and store in register • increment counter by 1 • start a write operation • changed to: • add two register contents and store in register • start a write operation • increment counter by 1

Design Issues • Perform speculative execution? • Re-use registers that are no longer used? • Have a large register set and keep all current values in registers? • Use cache memory?

Cache Memory • Main memory is usually referenced near one location (locality principle) • Program code should be in one location (if good programmer) and data often in another (but grouped together) • Bring most recently referenced values into a high speed cache • How does the CPU know something is in cache or not?

Direct-mapped Cache • Most common form of cache memory • Let’s consider a cache which has 2048 entries, each entry holding 32 bytes (not bits) of data • 2048 entries times 32 bytes per entry equals 64 KB

Addresses that use this entry: 65504-65535, 131040- 131071,… 64-95, 65600-65631,… 32-63, 65568-65599,… 0-31, 65536-65567, 131072-131103,… 2047 2046 2045 : : 2 1 0 V bit Tag (16 bits) Data (32 bytes)

Cache Address • When a program generates a 32-bit address, it has the following form: Tag – 16 bits Line – 11 bits Word – 3 bits Byte – 2 bits

Cache Hit • To see if a data item is in the cache, use the 11-bit LINE portion (of the address) to point to one of the 2048 cache row entries • Then the 16-bit TAG of the address is compared to the 16-bit TAG value in the cache entry • If there is a match, the data is there

Cache Hit • If the data is there, use the 3-bit WORD portion of the address to tell you which word from the 8 words (32 bytes) in the cache line should be fetched • If necessary, the 2-bit BYTE address will tell you which one of the four bytes to fetch

Cache Memory • Note that since this cache only holds 64KB, it holds data for addresses 0 – 65535. • But it may also hold data for the addresses 65536 – 131071. • That is why you must compare the TAG fields to see if there is a match

Cache Miss • If no match (of TAG fields), then there is a cache miss • The CPU goes to main memory and fetches the next block of data and stores it in the cache (thus wiping out the old block in the cache)

Cache Example • Consider that the CPU wants to fetch data from location 3610 (or 00000024 in hex) • Tag = 0000 0000 0000 0000 • Line = 0000 0000 001 • Word = 001 • Byte = 00

TDC 311

TDC 311

Presentation Transcript

TDC 369 / TDC 432

TDC overlap

TDC rate tests

TDC for SeaQuest

TDC 369 / TDC 432

960 fadc tdc

GTK - TDC analysis

TDC ADC

Changes from TDC Rev.B to TDC-VME64

TDC 311

Latch-TDC

TDC 311

TDC Design Scheme

TDC Marketing/Awareness

TDC 311

Data Communications TDC 362 / TDC 460

TDC 311

TDC 311

TDC 311

TDC Review

TDC Review