EE204 Computer Architecture

EE204Computer Architecture Memory Hina Anwar Khan 2011

Motivation • In real life memory is finite • Fast memory is expensive • Memory access is critical to overall performance • Lots of research on processor-cache interface • Main and virtual memory organization is yet another performance bottleneck Hina Anwar Khan Spring 2011

Memory-CPU Performance Gap Hina Anwar Khan Spring 2011

Memory Hierarchy Cache Static RAM 5-25 ns access time Cost/MByte highest Hard Disk 10-20 million ns Cost/MByte low Main Memory Dynamic RAM 60-120 ns access time Cost/MByte high Hina Anwar Khan Spring 2011

Memory Hierarchy Hina Anwar Khan Spring 2011

Goals of Hierarchical Memory Design • Achieve an average memory access time that is almost as fast as the fastest memory level • Achieve an average price per bit that is as low as the lowest memory level • Use lessons learned from locality principle for both data and instructions • Achieve overall high and cost-effective performance Hina Anwar Khan Spring 2011

Memory Strategies • Make entire memory fast • Separate concerns: segregate instruction memory from data memory if dual-port memory is available • Make memory ports wide so that can access larger chunks • De-randomize memory access by clever programming or compilation • Encourage prefetching to the largest extent possible • Create a memory hierarchy to keep hot items in fast memory • Use principle of locality Hina Anwar Khan Spring 2011

Key Idea • Make the common case fast • Common  Principle of locality • Fast  Smaller is faster Hina Anwar Khan Spring 2011

Principle of Locality • Temporal locality • Locality in time • If a datum has been recently referenced, it is likely to be referenced again • Spatial locality • Locality in space • When a datum is referenced, neighboring data are likely to be referenced soon Hina Anwar Khan Spring 2011

Basic Terminology • Cache -- name given to the first level of memory seen from the CPU • Miss rate -- fraction of memory accesses that are not in the cache • Miss penalty -- additional clock cycles needed to service a cache miss • Hit time -- time to hit in the cache • Block -- smallest unit of information that can be present in the cache Hina Anwar Khan Spring 2011

Cache Performance Review • CPU execution time = (CPU clock cycles + Memory Stall cycles) x Clock cycle time • Memory stall cycles = Number of misses x Miss penalty Hina Anwar Khan Spring 2011

Example • CPI=1 when all memory accesses hit in cache • Loads and stores are 50% of the instructions • Miss penalty is 25 clock cycles • Miss rate is 2 % Hina Anwar Khan Spring 2011

Main Memory Fragment 1000 5600 1004 3223 1008 23 1012 1122 1000 1016 5600 0 1016 1020 0 32324 1048 1024 2447 845 1028 1028 43 43 1032 976 1036 77554 1040 433 1044 7785 1048 2447 1052 775 1056 433 The Cache Cache 4 Most recently accessedMemory locations (exploitstemporal locality) Issues: How do we know what’s in the cache? What if the cache is full? Hina Anwar Khan Spring 2011

Basic Cache Organization Hina Anwar Khan Spring 2011

Basic Cache Questions • Block placement • Where can a block be placed in the cache? • Block Identification • How is a block found in the cache? • Block replacement • Which block should be replaced on a miss? • Write strategy • What happens on a write? Hina Anwar Khan Spring 2011

Q1: Block Placement • Fully associative • arbitrary location in cache • Direct mapped • each block has a unique location in the cache • Block_index = Block_address mod Number_of_blocks • Set associative • arbitrary location within unique set • Set_index = Block_address mod Number_of_sets • Number_of_sets = Number_of_blocks/ Number_of_blocks_per_set Hina Anwar Khan Spring 2011

6-bit Address Main Memory 00 00 00 5600 00 01 00 3223 00 10 00 23 Cache 00 11 00 1122 01 00 00 0 Valid Index Tag Data 01 01 00 32324 00 Y 00 5600 01 10 00 845 01 Y 11 775 01 11 00 43 10 Y 01 845 10 00 00 976 11 N 00 33234 10 01 00 77554 10 10 00 433 10 11 00 7785 11 00 00 2447 11 01 00 775 11 10 00 433 11 11 00 3649 Direct Mapping In a direct-mapped cache: -Each memory address corresponds to one location in the cache -There are many different memory locations for each cache entry (four in this case) Tag Index Always zero (words) Hina Anwar Khan Spring 2011

Hits and Misses • When the CPU reads from memory: • Calculate the index and tag • Is the data in the cache? Yes – a hit, you’re done! • Data not in cache? This is a miss. • Read the word from memory, give it to the CPU. • Update the cache so we won’t miss again. Write the data and tag for this memory location to the cache. (Exploits temporal locality) • The hitrateand miss rate are the fraction of memory accesses that are hits and misses • Typically, hit rates are around 95% • Many times instructions and data are considered separately when calculating hit/miss rates Hina Anwar Khan Spring 2011

2 1 0 31 12 11 Data Index V Tag 0 1 2 ... ... 1022 1023 A 1024-entry Direct-mapped Cache Memory Address Index 10 Byte offset 20 Tag One Block 20 32 Hit! Data Hina Anwar Khan Spring 2011

Directed Mapped Cache Size • Cache size for storing 64 KB of data? • Cache organized as 1 word block • 32-bit address assumed • 64 KB = 16 KW = 214 Words • Each block has • 32 bit data • 1 validity bit • 32-14-2 = 16 tag bits • Cache size = 214 x 49 = 784 Kbits Hina Anwar Khan Spring 2011

Taking advantage of Spatial Locality • Cache Block should be larger • On a Cache Miss occurs fetch multiple adjacent instruction/data • Mapping an address to multiword Cache of 64 blocks, each of 16 bytes • Byte address 1200 • Block address = 1200/16 = 75 • Cache Block = 75 modulo 64 = 11 Hina Anwar Khan Spring 2011

64KB Cache using 4 word blocks Hina Anwar Khan Spring 2011

Multi-word block Read and write • On a Read Miss entire block is read • On a Write Miss entire block is not written 4567 3210 9810 1023 X Y 6655 Hina Anwar Khan Spring 2011

Miss rate vs Block size • Block size can be increased to take advantage of spatial locality • The miss rate may actually go up if the block size becomes a significant fraction of the cache size Hina Anwar Khan Spring 2011

EE204 Computer Architecture