COMPSYS 304

COMPSYS 304 Computer Architecture Cache John Morris Electrical & Computer Enginering/Computer Science, The University of Auckland Iolanthe at 13 knots on Cockburn Sound, WA

Memory Bottleneck • State-of-the-art processor • f = 3 GHz • tclock = 330ps • 1-2 instructions per cycle • ~25% memory reference • Memory response • 4 instructions x 330ps • ~1.2ns needed! • Bulk semiconductor RAM • 100ns+ for a ‘random’ access! • Processor will spend most of its time waiting for memory!

Cache • Small, fast memory • Typically ~50kbytes (1998) • 2 cycle access time • Same die as processor • “Off-chip” cache possible • Custom cache chip closely coupled to processor • Use fast static RAM (SRAM) rather thanslower dynamic RAM • Several levels possible • 2nd level of the memory hierarchy • “Caches” most recently used memory locations “closer” to the processor • closer = closer in time

Cache • Etymology • cacher(French) = “to hide” • Transparent to a program • Programs simply run slower without it • Modern processors rely on it • Reduces the cost of main memory access • Enables instruction/cycle throughput • Typical program • ~25% memory accesses

Cache • Relies upon locality of reference • Programs continually use - and re-use -the same locations • Instructions • loops, • common subroutines • Data • look-up tables • “working” data sets

Cache - operation • Memory requests checked in cache first • If the word sought is in the cache,it’s read from cache (or updated in cache) • Cache hit • If not, request is passed to main memoryand data is read (written) there • Cache miss VA PA MMU PA Main Mem CPU Cache D or I D or I

Cache - operation • Hit rates of 95% are usual • Cache: 16 kbytes • Effective Memory Access Time • Cache: 2 cycles • Main memory: 10 cycles • Average access: 0.95*2 + 0.05*10 = 2.4cycles

Cache - organisation • Direct-mapped cache • Each word in the cache has a tag • Assume • cache size - 2kwords • machine words - p bits • byte-addressed memory • m = log2 ( p/8 ) bits not used to address words • m = 2 for 32-bit machines Address format p bits p-k-m k m tag cache address byte address

Cache - organisation A cache line • Direct-mapped cache data tag 2klines memory p p-k-m Hit? p-k-m k m CPU tag cache address byte address Memory address

Cache - Direct Mapped • Conflicts • Two addresses separated by 2k+mwill hit the same cache location • 32-bit machine, 64kbyte (16kword) cache • m = 2, k = 14 • Any program or data set larger than 64kb will generate conflicts • On a conflict, the ‘old’ word is flushed • Unmodified word • ( Program, constant data ) • overwritten by the new data from memory • Modified data needs to be written back to memory before being overwritten

Cache - Conflicts • Modified or dirty words • When a word is modified in cache • Write-back cache • Only writes data back when needed • Misses • Two memory accesses • Write modified word back • Read new word • Write-through cache • Low priority write to main memory is queued • Processor is delayed by read only • Memory write occurs in parallel with other work • Instruction and necessary data fetches take priority

Cache - Write-through or write-back? • Write-through • Allows an intelligent bus interface unitto make efficient use of a serious bottle-neck Processor - memory interface(Main memory bus) • Reads (instruction and data) need priority! • They stall the processor • Writes can be delayed • At least until the location is needed! • More on intelligent system interface units later • but ...

Cache - Write-through or write-back? • Write-through • Seems a good idea! • but ... • Multiple writes to the same location waste memory bus bandwidth • Typical programsrun better with write-back caches • however • Often you can easily predict which will be best • Some processors (eg PowerPC) allow you to classify memory regions as write-back or write-through

Cache - more bits • Cache lines need some status bits • Tag bits + .. • Valid • All set to false on power up • Set to true as words are loaded into cache • Dirty • Needed by write-back cache • Write- through cache always queues thewrite, so lines are never ‘dirty’

Cache - Improving Performance • Conflicts ( addresses 2k+m bytes apart ) • Degrade cache performance • Lower hit rate • Murphy’s Law operates • Addresses are never random! • Some locations ‘thrash’ in cache • Continually replaced and restored

Cache - Fully Associative • All tags are compared at the same time • Words can use any cache line

Cache - Fully Associative • Associative • Each tag is compared at the same time • Any match  hit • Avoids ‘unnecessary’ flushing • Replacement • Least Recently Used - LRU • Needs extra status bits • Cycles since last accessed • Hardware cost high • Extra comparators • Wider tags • p-m bits vsp-k-m bits

Cache - Set Associative 2-way set associative Each line - two words two comparators only

Cache - Set Associative • n-way set associative caches • n can be small: 2, 4, 8 • Best performance • Reasonable hardware cost • Most high performance processors • Replacement policy • LRU choice from n • Reasonable LRU approximation • 1 or 2 bits • Set on access • Cleared / decremented by timer • Choose cleared word for replacement

Cache - Locality of Reference • Temporal Locality • Same location will be referenced again soon • Access same data again • Program loops - access same instruction again • Caches described so far exploit temporal locality • Spatial Locality • Nearby locations will be referenced soon • Next element of an array • Next instruction of a program

Cache - Line Length • Spatial Locality • Use very long cache lines • Fetch one datum • Neighbours fetched also • PowerPC 601 (Motorola/Apple/IBM)first of the single chip Power processors • 64 sets • 8-way set associative • 32 bytes per line • 32 bytes (8 instructions) fetched into instruction buffer in one cycle • 64 x 8 x 32 = 16k byte total

Cache - Separate I- and D-caches • Unified cache • Instructions and Data in same cache • Two caches - • * Instructions * Data • Increases total bandwidth • MIPS R10000 • 32Kbyte Instruction; 32Kbyte Data • Instruction cache is pre-decoded! (32  36bits) • Data • 8-word (64byte) line, 2-way set associative • 256 sets • Replacement policy?

COMPSYS 304

COMPSYS 304

Presentation Transcript

EEB 304

Skill 304

Skill 304

Skill 304

LING 304 SEMANTICS

Skill 304

304-RS-07

Pg. 304

Skill 304

COMPSYS 304

MANAGEMENT 304

GEOLOGY 304

Segment Donut 304

Skill 304

Skill 304

EDEC 304

CSC 304

Skill 304

304

GEOLOGY 304