1 / 92

CS 147 Cache Memory

CS 147 Cache Memory. Lecture 14. Prof. Sin-Min Lee Department of Computer Science. Memory: Capacity. Word size: # of bits in natural unit of organization Usually related to length of an instruction or the number of bits used to represent an integer number

Télécharger la présentation

CS 147 Cache Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 147 Cache Memory Lecture 14 Prof. Sin-Min Lee Department of Computer Science

  2. Memory: Capacity • Word size: # of bits in natural unit of organization • Usually related to length of an instruction or the number of bits used to represent an integer number • Capacity expressed as number of words or number of bytes • Usually a power of 2, e.g. 1 KB  1024 bytes why?

  3. Other Memory System Characteristics • Unit of Transfer: Number of bits read from, or written into memory at a time • Internal : usually governed by data bus width • External : usually a block of words e.g 512 or more • Addressable unit: smallest location which can be uniquely addressed • Internal : word or byte • External : device dependent e.g. a disk “cluster”

  4. Sequential Access Method • Start at the beginning – read through in order • Access time depends on location of data and previous location • e.g. tape start first location . . . read to here location of interest

  5. Direct Access • Individual blocks have unique address • Access is by jumping to vicinity plus sequential search (or waiting! e.g. waiting for disk to rotate) • Access time depends on target location and previous location • e.g. disk . . . jump to here block i read to here

  6. PRIMARY MEMORY The memory is that part of the computer where programs and data are stored. Some computer scientists (especially British ones) use the term store or storage rather than memory, although more and more, the term "storage" is used to refer to disk storage.

  7. MEMORY ADDRESSES Memories consist of a number of cells (or locations) each of which can store a piece of information. Each cell has a number, called its address, by which programs can refer to it. If a memory has n cells, they will have addresses 0 to n - 1. All cells in a memory contain the same number of bits. If a cell consists of k bits, it can hold any one of 2k different bit combinations.

  8. MEMORY ADDRESSES Computers that use the binary number system (including octal and hexadecimal notation for binary numbers) express memory addresses as binary numbers. If an address has m bits, the maximum number of cells addressable is 2m.

  9. For example, an address used to reference the memory to the left needs at least 4 bits in order to express all the numbers from 0 to 11.

  10. A 3-bit address is sufficient here. The number of bits in the address determines the maximum number of directly addressable cells in the memory and is independent of the number of bits per cell.

  11. A memory with 212 cells of 8 bits each and a memory with 212 cells of 64 bits each each need 12-bit addresses. The number of bits per cell for some computers that have been sold commercially is listed to the right.

  12. Random Access Method • Individual addresses identify specific locations • Access time independent of location or previous access • e.g. RAM main memory types . . . read here

  13. Problem: CPU Fast, Memory Slow • After a memory request, the CPU will not get the word for several cycles • Two simple solutions: • Continue execution, but stall CPU if an instruction references the word before it has arrived (hardware) • Require compiler to fetch words before they are needed (software) • May need to insert NOP instructions • Very difficult to write compilers to do this effectively

  14. The Root of the Problem:Economics • Fast memory is possible, but to run at full speed, it needs to be located on the same chip as the CPU • Very expensive • Limits the size of the memory • Do we choose: • A small amount of fast memory? • A large amount of slow memory?

  15. Memory Hierarchy Design (1) • Since 1987, microprocessors performance improved 55% per year and 35% until 1987 • This picture shows the CPU performance against memory access time improvements over the years • Clearly there is a processor-memory performance gap that computer architects must take care of

  16. Memory Hierarchy Design (1) • Since 1987, microprocessors performance improved 55% per year and 35% until 1987 • This picture shows the CPU performance against memory access time improvements over the years • Clearly there is a processor-memory performance gap that computer architects must take care of

  17. Memory Hierarchy Design (2) • It is a tradeoff between size, speed and cost and exploits the principle of locality. • Register • Fastest memory element; but small storage; very expensive • Cache • Fast and small compared to main memory; acts as a buffer between the CPU and main memory: it contains the most recent used memory locations (address and contents are recorded here) • Main memory is the RAM of the system • Disk storage - HDD

  18. Memory Hierarchy Design (3) • Comparison between different types of memory HDD Register Cache Memory size: speed: $/Mbyte: 32 - 256 B 2 ns 32KB - 4MB 4 ns $100/MB 128 MB 60 ns $1.50/MB 20 GB 8 ms $0.05/MB larger, slower, cheaper

  19. The Best of Both Worlds:Cache Memory • Combine a small amount of fast memory (the cache) with a large amount of slow memory • When a word is referenced, put it and its neighbours into the cache • Programs do not access memory randomly • Temporal Locality: recently accessed items are likely to be used again • Spatial Locality: the next access is likely to be near the last one

  20. The Cache Hit Ratio • How often is a word found in the cache? • Suppose a word is accessed k times in a short interval • 1 reference to main memory • (k-1) references to the cache • The cache hit ratio h is then

  21. Reasons why we use cache • Cache memory is made of STATIC RAM – a transistor based RAM that has very low access times (fast) • STATIC RAM is however, very bulky and very expensive • Main Memory is made of DYNAMIC RAM – a capacitor based RAM that has very high access times because it has to be constantly refreshed (slow) • DYNAMIC RAM is much smaller and cheaper

  22. Performance (Speed) • Access time • Time between presenting the address and getting the valid data (memory or other storage) • Memory cycle time • Some time may be required for the memory to “recover” before next access • cycle time = access + recovery • Transfer rate • rate at which data can be moved • for random access memory = 1 / cycle time (cycle time)-1

  23. Memory Hierarchy • size ? speed ? cost ? • registers • in CPU • internal • may include one or more levels of cache • external memory • backing store smallest, fastest, most expensive, most frequently accessed medium, quick, price varies largest, slowest, cheapest, least frequently accessed

  24. Memory: Location • Registers: inside cpu • Fastest – on CPU chip • Cache : very fast, semiconductor, close to CPU • Internal or main memory • Typically semiconductor media (transistors) • Fast, random access, on system bus • External or secondary memory • peripheral storage devices (e.g. disk, tape) • Slower, often magnetic media , maybe slower bus

  25. Memory Hierarchy - Diagram decreasing cost per bit, speed, access frequency increasing capacity, access time

  26. Performance &Hierarchy List Faster, +$/byte • Registers • Level 1 Cache • Level 2 Cache • Main memory • Disk cache • Disk • Optical • Tape soon ( 2 slides ! ) Slower, -$/byte

  27. Locality of Reference (circa 1968) • During program execution memory references tend to cluster, e.g. loops • Many instructions in localized areas of pgm are executed repeatedly during some time period, and remainder of pgm is accessed infrequently. (Tanenbaum) • Temporal LOR: a recently executed instruction is likely to be executed again soon • Spatial LOR: instructions with addresses close to a recently executed instruction are likely to be executed soon. • Same principles apply to data references.

  28. Cache smaller than main memory • small amount of fast memory • sits between normal main memory and CPU • may be located on CPU chip or module cache views main memory as organized in “blocks” block transfer word transfer cache

  29. The Cache Hit Ratio • How often is a word found in the cache? • Suppose a word is accessed k times in a short interval • 1 reference to main memory • (k-1) references to the cache • The cache hit ratio h is then

  30. Mean Access Time • Cache access time = c • Main memory access time = m • Mean access time = c +(1-h)m • If all address references are satisfied by the cache, the access time approaches c • If no reference is in the cache, the access time approaches c+m

  31. Cache Design Issues • How big should the cache be? • Bigger means more hits, but more expensive • How big should a cache-line be? • How does the cache keep track of what it contains? • If we change an item in the cache, how do we write it back to main memory? • Separate caches for data and instructions? • Instructions never have to be written back to main memory • How many caches should there be? • Primary (on chip), secondary (off chip), tertiary…

  32. Why does Caching Improve Speed? Example: • Main memory has 100,000 words, access time is 0.1 s. • Cache has 1000 words and access time is 0.01 s. • If word is • in cache (hit), it can be accessed directly by processor. • in memory (miss), it must be first transferred to cache before access. • Suppose that 95% of access requests are hits. • Average time to access a word (0.95)(0.01 s)+0.05(0.1 s+ 0.01 s) = 0.015 s Key proviso Close to cache speed

  33. Cache Read Operation • CPU requests contents of memory location • check cache for contents of location cache hit ! present get data from cache (fast) cache miss ! not present read required block from main to cache • then deliver data from cache to CPU

  34. Cache Design • Size • Mapping Function • Replacement Algorithm • Write Policy • Block Size • Number of Caches

  35. Size • Cost • More cache is expensive • Speed • More cache is faster (up to a point) • Checking cache for data takes time

  36. cache tagdata block Mapping Function • how doescachecontents map tomain memorycontents? main memory address contents 000 xxx line blocki . . . blockj use tag (and maybe line address) to identify block address

  37. Cache Basics cache line width bigger than memory location width ! • cache line vs. main memory location • same concept – avoid confusion (?) • line has address and contents • contents of cache line divided into tag and data fields • fixed width • fields used differently ! • data field holds contents of a block of main memory • tag field helps identify the start address of the block of memory that is in the data field

  38. Cache (2) • Every address reference goes first to the cache; • if the desired address is not here, then we have a cache miss; • The contents are fetched from main memory into the indicated CPU register and the content is also saved into the cache memory • If the desired data is in the cache, then we have a cache hit • The desired data is brought from the cache, at very high speed (low access time) • Most software exhibits temporal locality of access, meaning that it is likely that same address will be used again soon, and if so, the address will be found in the cache • Transfers between main memory and cache occur at granularity of cache lines or cache blocks, around 32 or 64 bytes (rather than bytes or processor words). Burst transfers of this kind receive hardware support and exploit spatial locality of access to the cache (future access are often to address near to the previous one)

  39. Where can a block be placed in Cache? (1) • Our cache has eight block frames and the main memory has 32 blocks

  40. Where can a block be placed in Cache? (2) • Direct mapped Cache • Each block has only one place where it can appear in the cache • (Block Address) MOD (Number of blocks in cache) • Fully associative Cache • A block can be placed anywhere in the cache • Set associative Cache • A block can be placed in a restricted set of places into the cache • A set is a group of blocks into the cache • (Block Address) MOD (Number of sets in the cache) • If there are n blocks in the cache, the placement is said to be n-way set associative

  41. Mapping Function Example holds up to 64 Kbytes of main memory contents • cache of 64 KByte • 16 K (214) lines – each line is 5 bytes wide = 40 bits • 16 MBytes main memory • 24 bit address • 224= 16 M • will consider DIRECT and ASSOCIATIVE mappings tag field: 1 byte 4 byte blocks of main memory data field: 4 bytes

  42. Direct Mapping • each block of main memory maps to only one cache line • i.e. if a block is in cache, it must be in one specific place – based on address! • split address into two parts • least significantw bits identify unique word in block • most significants bits specify one memory block • split s bits into: • cache line address field r bits • tag field of s-r most significant bits address s w line field identifies line containing block ! s tagline s – r r

  43. Direct Mapping: Address Structure for Example 24 bit address • two blocks may have the same r value, but then always have different tag value ! word w tag s-r line address r 14 2 8 2 bit word identifier (4 byte block) s = 22 bit block identifier

  44. Direct Mapping Cache Line Table each block = 4 bytes cache linemain memory blocks held 0 0, m, 2m, 3m, … 2s-m 1 1, m+1, 2m+1, … 2s-m+1 m-1 m-1, 2m-1,3m-1, … 2s-1 . . . . . . s=22 m=214 But…a line can contain only one of these at a time!

  45. Direct Mapping Cache Organization

  46. Direct Mapping pros & cons • Simple • Inexpensive • Fixed location for given block • If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

  47. Associative Memory • read: specify tag field value and word select • checks all lines – finds matching tag • return contents of data field @ selected word • access time independent of location or previous access • write to data field @ tag value + word select • what if no words with matching tag?

  48. Associative Mapping • main memory block can load into any line of cache • memory address is interpreted as tag and word select in block • tag uniquely identifies block of memory ! • every line’s tag is examined for a match • cache searching gets expensive s = tag does not use line address !

  49. Fully Associative Cache Organization no line field !

More Related