ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2007 Memory Organization (Chapter 7)

ELEC 5200-001/6200-001Computer Architecture and DesignSpring 2007Memory Organization (Chapter 7) Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849 http://www.eng.auburn.edu/~vagrawal vagrawal@eng.auburn.edu ELEC 5200-001/6200-001 Lecture 13

Types of Computer Memories From the cover of: A. S. Tanenbaum, Structured Computer Organization, Fifth Edition, Upper Saddle River, New Jersey: Pearson Prentice Hall, 2006. ELEC 5200-001/6200-001 Lecture 13

Electronic Memory Devices ELEC 5200-001/6200-001 Lecture 13

Random Access Memory (RAM) Address bits Address decoder Memory cell array Read/write circuits Data bits ELEC 5200-001/6200-001 Lecture 13

Static RAM (SRAM) Cell bit bit Word line Bit line Bit line ELEC 5200-001/6200-001 Lecture 13

Dynamic RAM (DRAM) Cell Bit line Word line ELEC 5200-001/6200-001 Lecture 13

Building a Computer with 1GHz Clock and 40GB Memory ELEC 5200-001/6200-001 Lecture 13

Trying to Buy a Laptop Computer?(Two Years Ago) IBM ThinkPad X40 23717GU1.20 GHz Low Voltage Intel® Pentium® M, 1MB L2 Cache ~$5 Microsoft® Windows® XP Professional 512 MB DRAM ~$100 40 GB Hard Drive ~$40 2.71 lbs, 12.1" XGA (1024x768) IBM Embedded Security Subsystem 2.0 Intel PRO/Wireless Network Connection 802.11b, Gigabit Ethernet Integrated graphics Intel Extreme Graphics 2 No CD/DVD drive PROP, Fixed BayAvailability**: Within 2 weeks$2,149.00 IBM web price*$1,741.65 sale price* ELEC 5200-001/6200-001 Lecture 13

2006/07 Choose a Lenovo 3000 V Series to customize & buy Processor Intel Core 2 Duo T5500 (1.66GHz, 2MBL2, 667MHzFSB) Total memory 512MB PC2-5300DDR2 SDRAM Hard drive 80GB, 5400rpm Serial ATA Weight 4.0lbs ELEC 5200-001/6200-001 Lecture 13

Cache • Processor does all memory operations with cache. • Miss – If requested word is not in cache, a block of words containing the requested word is brought to cache, and then the processor request is completed. • Hit – If the requested word is in cache, read or write operation is performed directly in cache, without accessing main memory. • Block – minimum amount of data transferred between cache and main memory. Processor words Cache small, fast memory blocks Main memory large, inexpensive (slow) ELEC 5200-001/6200-001 Lecture 13

Inventor of Cache M. V. Wilkes, “Slave Memories and Dynamic Storage Allocation,” IEEE Transactions on Electronic Computers, vol. EC-14, no. 2, pp. 270-271, April 1965. ELEC 5200-001/6200-001 Lecture 13

Cache Performance • Average access time = T1 × h + Tm × (1 – h) = Tm – (Tm – T1) × h where • T1 = cache access time (small) • Tm = memory access time (large) • h = hit rate (0 ≤ h ≤ 1) • Hit rate is also known as hit ratio, miss rate = 1 – hit rate Processor Access time = T1 Cache Small, fast memory Access time = Tm Main memory large, inexpensive (slow) ELEC 5200-001/6200-001 Lecture 13

Average Access Time Acceptable miss rate < 10% Tm Tm – (Tm – T1) × h Access time Desirable miss rate < 5% T1 miss rate, 1 – h 0 h = 1 1 h = 0 ELEC 5200-001/6200-001 Lecture 13

Comparing Performance • Processor with 1 cycle memory access, CPI = 1 • Assume memory access time of 10 cycles • Assume 30% instructions require memory data access • Processor with cache • Assume hit rate 0.95 for instructions, 0.90 for data • Assume miss penalty (time to read memory into cache and from it) is 17 cycles Comparing times of 100 instructions: Time without cache 100×10 + 30×10 ──────────── = ──────────────────────────── Time with cache 100(0.95×1+0.05×17) + 30(0.9×1+0.1×17) = 5.04 ELEC 5200-001/6200-001 Lecture 13

Controlling Miss Rate • Increase cache size • More blocks can be kept in cache; chance of miss is reduced. • Larger cache is slower. • Increase block size • More data available; reduced chance of miss. • Fewer blocks in cache increase chance of miss. • Larger blocks need more time to swap. Cache Blocks Large memory Cache Blocks Large memory ELEC 5200-001/6200-001 Lecture 13

Increasing Hit Rate • Hit rate increases with cache size. • Hit rate mildly depends on block size. 90% 95% 100% 10% Cache size = 4KB Decreasing chances of getting fragmented data Decreasing chances of covering data locality hit rate, h 5% miss rate = 1 – hit rate 16KB 64KB 0% 16B 32B 64B 128B 256B Block size ELEC 5200-001/6200-001 Lecture 13

The Locality Principle • A program tends to access data that form a physical cluster in the memory – multiple accesses may be made within the same block. • Physical localities are temporal and may shift over longer periods of time – data not used for some time is less likely to be used in the future. Upon miss, the least recently used (LRU) block can be overwitten by a new block. • P. J. Denning, “The Locality Principle,” Communications of the ACM, vol. 48, no. 7, pp. 19-24, July 2005. ELEC 5200-001/6200-001 Lecture 13

Data Locality, Cache, Blocks Memory Increase block size to match locality size Increase cache size to include most data Cache Data needed by a program Block 1 Block 2 ELEC 5200-001/6200-001 Lecture 13

Types of Caches • Direct-mapped cache • Partitions of size of cache in the memory • Each partition subdivided into blocks • Set-associative cache ELEC 5200-001/6200-001 Lecture 13

LRU Direct-Mapped Cache Memory Cache Swap-out Data needed by a program Block 1 Block 2 Data needed Swap-in ELEC 5200-001/6200-001 Lecture 13

LRU Set-Associative Cache Memory Swap-out Cache Data needed by a program Block 1 Swap-in Swap-in Block 2 Data needed ELEC 5200-001/6200-001 Lecture 13

Direct-Mapped Cache 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Cache of 8 blocks index (local address) Block size = 1 word tag 00 10 11 01 01 00 10 11 000 001 010 011 100 101 110 111 32-word word-addressable memory cache address: tag index Main memory 11 101 → memory address ELEC 5200-001/6200-001 Lecture 13

Direct-Mapped Cache 00000 00001 00010 00011 00100 00101 00110 00111 01000 01001 01010 01011 01100 01101 01110 01111 10000 10001 10010 10011 10100 10101 10110 10111 11000 11001 11010 11011 11100 11101 11110 11111 Cache of 4 blocks Block size = 2 word index (local address) tag 32-word word-addressable memory 00 11 00 10 00 01 10 11 0 1 block offset cache address: tag index block offset Main memory 11 10 1 → memory address ELEC 5200-001/6200-001 Lecture 13

Number of Tag and Index Bits Cache Size = w words Main memory Size=W words Each word in cache has unique index (local addr.) Number of index bits = log2w Index bits are shared with block offset when a block contains more words than 1 Assume partitions of w words each in the main memory. W/w such partitions, each identified by a tag Number of tag bits = log2(W/w) ELEC 5200-001/6200-001 Lecture 13

Direct-Mapped Cache (Byte Address) 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 Cache of 8 blocks Block size = 1 word index tag 00 10 11 01 01 00 10 11 000 001 010 011 100 101 110 111 32-word byte-addressable memory cache address: tag index Main memory 11 101 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

b6 b5 b4 b3 b2 b1 b0 Finding a Word in Cache Memory address Tag 32 words byte-address byte offset Index Valid 2-bit Index bit Tag Data 000 001 010 011 100 101 110 111 Cache size 8 words Block size = 1 word = Data 1 = hit 0 = miss ELEC 5200-001/6200-001 Lecture 13

How Many Bits Cache Has? • Consider a main memory: • 32 words; byte address is 7 bits wide: b6 b5 b4 b3 b2 b1 b0 • Each word is 32 bits wide • Assume that cache block size is 1 word (32 bits data) and it contains 8 blocks. • Cache requires, for each word: • 2 bit tag, and one valid bit • Total storage needed in cache = #blocks in cache × (data bits/block + tag bits + valid bit) = 8 (32+2+1) = 280 bits ELEC 5200-001/6200-001 Lecture 13

A More Realistic Cache • Consider 4 GB, byte-addressable main memory: • 1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0 • Each word is 32 bits wide • Assume that cache block size is 1 word (32 bits data) and it contains 64 KB data, or 16K words, i.e., 16K blocks. • Number of cache index bits = 14, because 16K = 214 • Tag size = 32 – byte offset – #index bits = 32 – 2 – 14 = 16 bits • Cache requires, for each word: • 16 bit tag, and one valid bit • Total storage needed in cache = #blocks in cache × (data bits/block + tag size + valid bits) = 214(32+16+1) = 16×210×49 = 784×210bits = 784 Kb = 98 KB Physical storage/Data storage = 98/64 = 1.53 But, need to increase the block size to match the size of locality. ELEC 5200-001/6200-001 Lecture 13

Cache Bits for 4-Word Block • Consider 4 GB, byte-addressable main memory: • 1Gwords; byte address is 32 bits wide: b31…b16 b15…b2 b1 b0 • Each word is 32 bits wide • Assume that cache block size is 4 words (128 bits data) and it contains 64 KB data, or 16K words, i.e., 4K blocks. • Number of cache index bits = 12, because 4K = 212 • Tag size = 32 – byte offset – #block offset bits – #index bits = 32 – 2 – 2 – 12 = 16 bits • Cache requires, for each word: • 16 bit tag, and one valid bit • Total storage needed in cache = #blocks in cache × (data bits/block + tag size + valid bit) = 212(4×32+16+1) = 4×210×145 = 580×210bits =580 Kb = 72.5 KB Physical storage/Data storage = 72.5/64 = 1.13 ELEC 5200-001/6200-001 Lecture 13

Using Larger Cache Block (4 Words) Memory address b31… b15 b14… b4 b3 b2 b1 b0 16 bit Tag 4GB = 1G words byte-address byte offset 12 bit Index Val. 16-bit Data Index bit Tag (4 words=128 bits) 2 bit block offset 0000 0000 0000 Cache size 16K words 4K Indexes Block size = 4 word 1111 1111 1111 = 1 = hit 0 = miss M U X Data ELEC 5200-001/6200-001 Lecture 13

Interleaved Memory Processor • Reduces miss penalty. • Memory designed to read words of a block simultaneously in one read operation. • Example: • Cache block size = 4 words • Interleaved memory with 4 banks • Suppose memory access ~15 cycles • Miss penalty = 1 cycle to send address + 15 cycles to read a block + 4 cycles to send data to cache = 20 cycles • Without interleaving, Miss penalty = 65 cycles words Cache Small, fast memory blocks Memory bank 0 Memory bank 1 Memory bank 2 Memory bank 3 Main memory ELEC 5200-001/6200-001 Lecture 13

Handling a Miss • Miss occurs when data at the required memory address is not found in cache. • Controller actions: • Stall pipeline • Freeze contents of all registers • Activate a separate cache controller • If cache is full • select the least recently used (LRU) block in cache for over-writing • If selected block has inconsistent data, take proper action • Copy the block containing the requested address from memory • Restart Instruction ELEC 5200-001/6200-001 Lecture 13

Miss During Instruction Fetch • Send original PC value (PC – 4) to the memory. • Instruct main memory to perform a read and wait for the memory to complete the access. • Write cache entry. • Restart the instruction whose fetch failed. ELEC 5200-001/6200-001 Lecture 13

Writing to Memory • Cache and memory become inconsistentwhen data is written into cache, but not to memory – the cache coherence problem. • Strategies to handle inconsistent data: • Write-through • Write to memory and cache simultaneously always. • Write to memory is ~100 times slower than to cache. • Write buffer • Write to cache and to buffer for writing to memory. • If buffer is full, the processor must wait. ELEC 5200-001/6200-001 Lecture 13

Writing to Memory: Write-Back • Write-back (or copy back) writes only to cache but sets a “dirty bit” in the block where write is performed. • When a block with dirty bit “on” is to be overwritten in the cache, it is first written to the memory. ELEC 5200-001/6200-001 Lecture 13

AMD Opteron Microprocessor L2 1MB Block 64B Write-back L1 (split 64KB each) Block 64B Write-back ELEC 5200-001/6200-001 Lecture 13

Cache Hierarchy Processor • Average access time = h1 × T1 + (1 – h1) [h2T2+(1 – h2)Tm] • Where • T1 = L1 cache access time (smallest) • T2 = L2 cache access time (small) • Tm = memory access time (large) • h1, h2 = hit rates (0 ≤ h1, h2 ≤ 1) • Average access time reduces by adding a cache. Access time = T1 L1 Cache (SRAM) Access time = T2 Access time = Tm L2 Cache (DRAM) Main memory large, inexpensive (slow) ELEC 5200-001/6200-001 Lecture 13

Average Access Time h1 × T1 + (1 - h1) [h2 T2 + (1 - h2)Tm] Tm T1 < T2 < Tm h2 = 0 T2+Tm 2 Access time h2 = 0.5 h2 = 1 T2 T1 miss rate, 1- h1 0 h1=1 1 h1=0 ELEC 5200-001/6200-001 Lecture 13

Processor Performance Without Cache • 5GHz processor, cycle time = 0.2ns • Memory access time = 100ns = 500 cycles • Ignoring memory access, CPI = 1 • Considering memory access: CPI = 1 + # stall cycles = 1 + 500 = 501 ELEC 5200-001/6200-001 Lecture 13

Performance with 1 Level Cache • Assume hit rate, h1 = 0.95 • L1 access time = 0.2ns = 1 cycle • CPI = 1 + # stall cycles = 1 + 0.95×0 + 0.05×500 = 26 • Processor speed increase due to cache = 501/26 = 19.3 ELEC 5200-001/6200-001 Lecture 13

Performance with 2 Level Caches • Assume: • L1 hit rate, h1 = 0.95 • L2 hit rate, h2 = 0.90 • L2 access time = 5ns = 25 cycles • CPI = 1 + # stall cycles = 1 + 0.95×0 + 0.05(0.90×25 + 0.10×525) = 1 + 1.125 + 2.625 = 4.75 • Processor speed increase due to caches = 501/4.75 = 105.5 • Speed increase due to L2 cache = 26/4.75 = 5.47 ELEC 5200-001/6200-001 Lecture 13

Miss Rate of Direct-Mapped Cache 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 This block is needed Cache of 8 blocks Block size = 1 word index tag 00 10 11 01 01 00 10 11 000 001 010 011 100 101 110 111 32-word word-addressable memory Least recently used (LRU) block cache address: tag index Main memory 11 101 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

Miss Rate of Direct-Mapped Cache 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 Memory references to addresses: 0, 8, 0, 6, 8, 16 Cache of 8 blocks 1. mis Block size = 1 word 3. mis index 2. mis tag 4. mis 00 / 01 / 00 / 10 xx xx xx xx xx 00 xx 000 001 010 011 100 101 110 111 32-word word-addressable memory 5. mis cache address: tag index 6. mis Main memory 11 101 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

Fully-Associative Cache (8-Way Set Associative) 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 This block is needed Cache of 8 blocks Block size = 1 word tag 00 10 11 01 01 00 10 11 000 001 010 011 100 101 110 01010 111 32-word word-addressable memory LRU block cache address: tag Main memory 11101 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

Miss Rate: Fully-Associative Cache 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 Memory references to addresses: 0, 8, 0, 6, 8, 16 Cache of 8 blocks 1. miss Block size = 1 word 4. miss tag 2. miss 00000 01000 00110 10000 xxxxx xxxxx xxxxx xxxxx 32-word word-addressable memory 6. miss 5. hit 3. hit cache address: tag Main memory 11101 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

Finding a Word in Associative Cache Memory address b6 b5 b4 b3 b2 b1 b0 5 bit Tag 32words byte-address byte offset no index Index Valid 5-bit Data bit Tag Cache size 8 words Block size = 1 word Must compare with all tags in the cache = Data 1 = hit 0 = miss ELEC 5200-001/6200-001 Lecture 13

Eight-Way Set-Associative Cache Cache size 8 words Memory address b6 b5 b4 b3 b2 b1 b0 32 words byte-address byte offset Block size = 1 word 5 bit Tag V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data = = = = = = = = 8 to 1 multiplexer 1 = hit 0 = miss Data ELEC 5200-001/6200-001 Lecture 13

Two-Way Set-Associative Cache 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 This block is needed Cache of 8 blocks Block size = 1 word tags index 000 | 011 100 | 001 110 | 101 010 | 111 00 01 10 11 32-word word-addressable memory LRU block cache address: tag index Main memory 111 01 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

Miss Rate: Two-Way Set-Associative Cache 00000 00 00001 00 00010 00 00011 00 00100 00 00101 00 00110 00 00111 00 01000 00 01001 00 01010 00 01011 00 01100 00 01101 00 01110 00 01111 00 10000 00 10001 00 10010 00 10011 00 10100 00 10101 00 10110 00 10111 00 11000 00 11001 00 11010 00 11011 00 11100 00 11101 00 11110 00 11111 00 Memory references to addresses: 0, 8, 0, 6, 8, 16 Cache of 8 blocks 1. miss Block size = 1 word tags index 2. miss 000 | 010 xxx | xxx 001 | xxx xxx | xxx 00 01 10 11 4. miss 32-word word-addressable memory 3. hit 5. hit 6. miss cache address: tag index Main memory 111 01 00 → memory address byte offset ELEC 5200-001/6200-001 Lecture 13

V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data V | tag | data Two-Way Set-Associative Cache Memory address b6 b5 b4 b3 b2 b1 b0 Cache size 8 words 32 words byte-address byte offset 3 bit tag Block size = 1 word 2 bit index 00 01 10 11 = = Data 2 to 1 MUX 1 = hit 0 = miss ELEC 5200-001/6200-001 Lecture 13

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2007 Memory Organization (Chapter 7)

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2007 Memory Organization (Chapter 7)

Presentation Transcript

ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2008 Datapath and Control (Chapter 5)

ELEC 4302

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2008 Pipelined Control and Performance (Chapter 6)

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2007 Control Unit: Hard-Wired and Microcoded (Chapter 5)

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2008 Control Unit: Hard-Wired and Microcoded (Chapter 5)

Computer Design Project ELEC 5200/6200-Computer Architecture and Design Spring 2008

ELEC 5200/6200 Computer Architecture and Design Review of VHDL

ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2008 Symbol Representation and Floating Point Numbers (Chap

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2007 Computer Arithmetic (Chapter 3)

ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2009 Datapath and Control (Chapter 4)

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2013 Performance of a Computer (Chapter 4)

Computer Architecture and Design ELEC 5200/6200 Class Project Overview

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2013 Pipeline Control and Performance (Chapter 6)

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2012 Pipeline Control and Performance (Chapter 6)

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2010 Symbol Representation and Floating Point Numbers (Ch

Samsung Elec.

Computer Architecture and Design ELEC 5200/6200 VHDL Tutorial

ELEC 5200-001/6200-001 Computer Architecture and Design Spring 2012 Microprogramming (Appendix D)

ELEC 3504

ELEC 3504

ELEC 5200-001/6200-001 Computer Architecture and Design Fall 2009 Computer Arithmetic (Chapter 3)

ELEC 3504