1 / 14

CMP 301A Computer Architecture 1 Lecture 2

CMP 301A Computer Architecture 1 Lecture 2. Outline. Direct mapped caches: Reading and writing policies Measuring cache performance Improving cache performance Enhancing main memory performance Flexible placement of blocks: Associativity Multilevel caches. Read and Write Policies.

carina
Télécharger la présentation

CMP 301A Computer Architecture 1 Lecture 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMP 301AComputer Architecture 1Lecture 2

  2. Outline • Direct mapped caches: Reading and writing policies • Measuring cache performance • Improving cache performance • Enhancing main memory performance • Flexible placement of blocks: Associativity • Multilevel caches

  3. Read and Write Policies • Cache read is much easier to handle than cache write: • Instruction cache is much easier to design than data cache • Cache write: • How do we keep data in the cache and memory consistent? • Two write options: • Write Through: write to cache and memory at the same time. • Isn’t memory too slow for this? • Write Back: write to cache only. Write the cache block to memory when that cache block is being replaced on a cache miss. • Need a “dirty” bit for each cache block • Greatly reduce the memory bandwidth requirement • Control can be complex

  4. Write Buffer for Write Through Cache Processor DRAM • A Write Buffer is needed between the Cache and Memory • Processor: writes data into the cache and the write buffer • Memory controller: write contents of the buffer to memory • Write buffer is just a FIFO: • Typical number of entries: 4 • Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle • Memory system designer’s nightmare: • Store frequency (w.r.t. time) -> 1 / DRAM write cycle • Write buffer saturation • Problem: Write buffer may hold updated value of location needed by a read miss??!! Write Buffer

  5. Write Allocate versus Not Allocate • Assume: a 16-bit write to memory location 0x0 and causes a miss • Do we read in the rest of the block (Byte 2, 3, ... 31)? Yes: Write Allocate No: Write Not Allocate 31 9 4 0 Cache Tag Example: 0x00 Cache Index Byte Select Ex: 0x00 Ex: 0x00 Valid Bit Cache Tag Cache Data : 0x00 Byte 31 Byte 1 Byte 0 0 : Byte 63 Byte 33 Byte 32 1 2 3 : : : : Byte 1023 Byte 992 31

  6. Measuring cache performanceImpact of cache miss on Performance • Suppose a processor executes at • Clock Rate = 1 GHz (1 ns per cycle), Ideal (no misses) CPI = 1.1 • 50% arith/logic, 30% ld/st, 20% control • Suppose that 10% of memory operations (involving data) get 100 cycle miss penalty • Suppose that 1% of instructions get same miss penalty 78% of the time the proc is stalled waiting for memory!

  7. Improving Cache Performance • Average memory access time(AMAT) = • Hit time + Miss rate x Miss penalty • To improve performance: • reduce the hit time • reduce the miss rate • reduce the miss penalty

  8. Enhancing main memory performance • Increasing memory and bus width • Transfer more words every clock cycle • Isn’t too much wiring • Using interleaved memory organization • Reduce access time with less wiring • Double Date Rate (DDR) DRAMs

  9. Enhancing main memory performance (Cont)

  10. Flexible placement of blocks: Associativity 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 2 2 2 2 2 2 2 2 2 2 0 1 2 3 4 5 6 7 8 9 3 3 0 1 Block Number 0 1 2 3 4 5 6 7 8 9 Memory Set Number 0 1 2 3 4 5 6 7 0 1 2 3 Cache Fully (2-way) Set Direct Associative Associative Mapped anywhere anywhere in only into set 0 block 4 (12 mod 4) (12 mod 8) block 12 can be placed

  11. Flexible placement of blocks: Associativity

  12. Cache Data Cache Tag Valid Cache Block 0 : : : Compare A Two-way Set Associative Cache • N-way set associative: N entries for each Cache Index • N direct mapped caches operates in parallel • Example: Two-way set associative cache • Cache Index selects a “set” from the cache • The two tags in the set are compared in parallel • Data is selected based on the tag result Cache Index Valid Cache Tag Cache Data Cache Block 0 : : : Adr Tag Compare 1 0 Mux Sel1 Sel0 OR Cache Block Hit

  13. And yet Another Extreme Example: Fully Associative • Fully Associative Cache -- push the set associative idea to its limit! • Forget about the Cache Index • Compare the Cache Tags of all cache entries in parallel • Example: Block Size = 32 B blocks, we need N 27-bit comparators • By definition: Conflict Miss = 0 for a fully associative cache 31 4 0 Cache Tag (27 bits long) Byte Select Ex: 0x01 Cache Tag Valid Bit Cache Data : X Byte 31 Byte 1 Byte 0 : X Byte 63 Byte 33 Byte 32 X X : : : X

  14. Replacement Policy • In an associative cache, which block from a set should be evicted when the set becomes full? • Random • Least-Recently Used (LRU) • LRU cache state must be updated on every access • true implementation only feasible for small sets (2-way) • First-In, First-Out (FIFO) a.k.a. Round-Robin • used in highly associative caches • Not-Most-Recently Used (NMRU) • FIFO with exception for most-recently used block or blocks Replacement only happens on misses

More Related