1 / 22

Multilevel Memory Caches

Prof. Sirer CS 316 Cornell University. Multilevel Memory Caches. Storage Hierarchy. SRAM on chip. Technology Capacity Cost/GB Latency Tape 1 TB $.17 100s Disk 300 GB $.34 4ms DRAM 4GB $520 20ns SRAM off 512KB $123000 5ns SRAM on 16 KB ??? 2ns

fausto
Télécharger la présentation

Multilevel Memory Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Sirer CS 316 Cornell University Multilevel MemoryCaches

  2. Storage Hierarchy SRAM on chip Technology Capacity Cost/GB Latency Tape 1 TB $.17 100s Disk 300 GB $.34 4ms DRAM 4GB $520 20ns SRAM off 512KB $123000 5ns SRAM on 16 KB ??? 2ns Capacity and latency are closely coupled, cost is inversely proportional How do we create the illusion of large and fast memory? SRAM off chip DRAM Disk Tape

  3. Memory Hierarchy • Principle: Hide latency using small, fast memories called caches • Caches exploit locality • Temporal locality: If a memory location is referenced, it is likely to be referenced again in the near future • Spatial locality: If a memory location is referenced, other locations near it will be referenced in the near future

  4. Cache Lookups (Read) • Look at address issued by processor, search cache tags to see if that block is in the cache • Hit: Block is in the cache, return requested data • Miss: Block is not in the cache, read line from memory, evict an existing line from the cache, place new line in cache, return requested data

  5. Cache Organization • Cache has to be fast and small • Gain speed by performing lookups in parallel, requires die real estate • Reduce hardware required by limiting where in the cache a block might be placed • Three common designs • Fully associative: Block can be anywhere in the cache • Direct mapped: Block can only be in one line in the cache • Set-associative: Block can be in a few (2 to 8) places in the cache

  6. Tags and Offsets • Cache block size determines cache organization 31 Virtual Address 0 31 Tag 5 4 Offset 0 Block

  7. Fully Associative Cache V Tag Block = word/byte select line select Offset Tag = hit encode

  8. Direct Mapped Cache V Tag Block Offset Index Tag =

  9. 2-Way Set-Associative Cache V Tag Block V Tag Block Offset Index Tag = =

  10. Valid Bits • Valid bits indicate whether cache line contains an up-to-date copy of the values in memory • Must be 1 for a hit • Reset to 0 on power up • An item can be removed from the cache by setting its valid bit to 0

  11. Eviction • Which cache line should be evicted from the cache to make room for a new line? • Direct-mapped • no choice, must evict line selected by index • Associative caches • random: select one of the lines at random • round-robin: similar to random • FIFO: replace oldest line • LRU: replace line that has not been used in the longest time

  12. Cache Writes Memory DRAM • No-Write • writes invalidate the cache and go to memory • Write-Through • writes go to main memory and cache • Write-Back • write cache, write main memory only when block is evicted CPU addr Cache SRAM data

  13. Dirty Bits and Write-Back Buffers • Dirty bits indicate which lines have been written • Dirty bits enable the cache to handle multiple writes to the same cache line without having to go to memory • Write-back buffer • A queue where dirty lines are placed • Items added to the end as dirty lines are evicted from the cache • Items removed from the front as memory writes are completed D V Tag Data Byte 0, Byte 1 … Byte N Line 1 0 1 1 1 0

  14. Misses • Three types of misses • Cold • The line is being referenced for the first time • Capacity • The line was evicted because the cache was not large enough • Conflict • The line was evicted because of another access whose index conflicted

  15. Cache Design • Need to determine parameters • Block size • Number of ways • Eviction policy • Write policy • Separate I-cache from D-cache

  16. Virtual vs. Physical Caches Memory DRAM CPU • L1 (on-chip) caches are typically virtual • L2 (off-chip) caches are typically physical addr Cache SRAM MMU data Cache works on physical addresses Memory DRAM CPU addr Cache SRAM MMU data Cache works on virtual addresses

  17. Cache Conscious Programming int a[NCOL][NROW]; int sum = 0; for(i = 0; i < NROW; ++i) for(j = 0; j < NCOL; ++j) sum += a[j][i]; • Speed up this program

  18. Cache Conscious Programming int a[NCOL][NROW]; int sum = 0; for(j = 0; j < NCOL; ++j) for(i = 0; i < NROW; ++i) sum += a[j][i]; • Every access is a cache miss!

  19. Cache Conscious Programming int a[NCOL][NROW]; int sum = 0; for(i = 0; i < NROW; ++i) for(j = 0; j < NCOL; ++j) sum += a[j][i]; • Same program, trivial transformation, 3 out of four accesses hit in the cache

More Related