1 / 18

Associative Mapping

Associative Mapping. The strict mapping restriction enforced by the direct mapping strategy can be relieved by allowing a memory block to be written to any cache block, i.e., the mapping function f is still: f ( q )  {0,1,...., n -1} But: - the mapping strategy is much more flexible

Olivia
Télécharger la présentation

Associative Mapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Associative Mapping The strict mapping restriction enforced by the direct mapping strategy can be relieved by allowing a memory block to be written to any cache block, i.e., the mapping function f is still: f(q)  {0,1,....,n-1} But: - the mapping strategy is much more flexible and - the identification problem (telling if a given memory block is in cache) becomes more complex cse241 1

  2. Mapping Address Partition (Associative maps) Suppose we stick with the previous basic model viz., 16 bit word 64 KW memory (128 KB) 16 word block 2 KW cache Then the lower 4 bits of an address identify the word in block component of a cache address. But with associative mapping, the remaining 12 bits carry no mapping information; rather, they all must be stored by the cache blocks to identify which memory block is in the cache 12 bits of tag 4 bits, word in block cse241 2

  3. Associative Map Mp block maps to any free cache block; cache tag is the upper address bits (all bits except the 4 word-in-block bits), here 12 bits Since any free cache block can be used, a block in the cache is replaced only if the cache is full. cse241 3

  4. Is Mp block q in cache? Consider the tag: Suppose we have 128 blocks in cache. We could implement the tag bits of cache by using a 128x12 RAM (generally, an (N x t) RAM, where N = # of cache blocks and t = # of bits required for the tag field). Then we can search for a given tag field in the RAM to determine if the Mp block with that tag is in the cache. Here, we use log2(N) bits as the RAM address (remember, the tag field is stored at some address in this RAM); the content is the tag value of the Mp block currently loaded in that cache block (you might see a logical problem here). A better solution might be to use a (2t x N)-bit RAM; this way we could use the t bits of the tag as the address whose contents would be the value of (label of) the cache block that Mp block is stored in if it is present. (You might see the same logical difficulty here). cse241 4

  5. CAM (Content Addressable Memory) In order to see if a tag is in the tag RAM, the entire tag RAM must be searched for the tag. An alternative solution is to use a CAM (or associative memory). Unlike a RAM, a CAM functions as follows:- 1. a search key is applied to the CAM 2. the CAM emits those addresses whose contents match the search key (in whole or in part, depending on the CAM design) :- Content match lines, 1 for each word in the CAM; these are set to 1 if the CAM contents at that word match the search key Search key cse241 5

  6. CAM Operation When a search key is presented to a CAM -- all words in the CAM see the search key at exactly the same time -- all words in the CAM set their “match” lines at exactly the same time Thus, in an N-word by m-bit CAM, it takes a constant amount of time to find every occurrence of the search key (T(N) = O(1)) This is significantly different from the best search time in a RAM Question: as a function of n, the size (in words) of a RAM, what is the fastest time to find: a) any occurrence of a pattern and b) all occurrences of a pattern? cse241 6

  7. A Trivial 1 -bit CAM cell Query bit Match bit Data bit Load Initialization bits cse241 7

  8. Associative Processors Associative processors are machines whose Mp is dominated by CAM; because they can perform constant-time pattern matching (partial or whole pattern matches), they have important value in problems dominated by searching -- especially time-critical problems in which it is undesirable to have an increased search time merely because there are more entities to be searched. --Air Traffic Control (Goodyear STARAN project) cse241 8

  9. Block-set (set-associative) mapping Purely associative cache mapping is flexible, but costly. On the one hand - mapping an Mp to any cache block can improve cache performance but - there is a penalty which has to be paid (in cache cost or in search time) The standard alternative is as follows:- - map Mp blocks to groups of cache blocks in a direct (modulo) fashion but - within a group of cache blocks, allow the Mp block to be loaded to any block This is called n-way block-set associative mapping (n blocks per group). cse241 9

  10. Set-associative mapping Mp block maps directly to a specific cache block set. Within the set, the Mp block is associatively mapped to one of the set’s blocks. cse241 10

  11. Block-set mapping: address partitioning Suppose we have our 2KW cache as previously, with 128 blocks of 16 words each, but suppose there are 4 blocks per set. There are therefore 32 (25) sets. Thus the address partition is:- 7 bits (Tag field) 5 bits set (direct map) 4 bits (word in block) Identifies which Mp block is loaded in a cache block (cache tag bits) Used to map the Mp blocks directly to the cache sets Identifies words within a block cse241 11

  12. Valid Bits It is possible that an Mp block resident in memory will become invalid because the corresponding Mp block is updated by a source which is not the CPU (see DMA later). To handle this situation, valid bits (not the same a dirty bits) are set to 0 on power up. When a cache block is loaded from Mp, the valid bit is set; it stays set unless the Mp block is updated by another (non-CPU) device. cse241 12

  13. Replacement Algorithms In a purely direct-mapped cache, the replacement algorithm is trivial:- if Mp block q maps to cache block p, load q into p even if p contains another Mp block However, if q can map to more than one cache block (N in the case of a purely associative cache, or the number of blocks per set in a block-set associative cache), then a decision must be made as to the replacement policy. The following policy is obvious:- If Mp maps to an empty cache block, use that cache block. cse241 13

  14. Oldest First Suppose we have an n-way block-set associative cache (n blocks per set). The oldest-first policy is:- if there is an unused cache block in the set, load the Mp block into it else load the Mp block into the cache block in the set which has been in the cache the longest This turns out not to be a particularly good policy because it does not take into consideration how recently a cache block was used. In fact, the oldest loaded cache block in the set may be the most used cache block in the set. cse241 14

  15. Random Replacement Suppose we have an n-way block-set associative cache (n blocks per set). The random policy is:- if there is an unused cache block in the set, load the Mp block into it else load the Mp block into a randomly chosen cache block in the set Oddly enough, this policy turns out to be reasonably successful in practice cse241 15

  16. LRU (Least Recently Used) Suppose we have an n-way block-set associative cache (n blocks per set). The LRU policy is:- if there is an unused cache block in the set, load the Mp block into it else load the Mp block into the cache block in the set which is the least recently used LRU is the most effective policy, but it needs ongoing updating computations. A simple algorithm is as follows:- cse241 16

  17. LRU Algorithm Suppose we have a four-block set. A 2-bit counter is associated with each block in the set. The algorithm is:- On a cache hit, - set the counter of the referenced block to 0 - increment by 1 the counters of blocks whose counter values were less than the original value of the referenced block - leave the other counters alone On a cache miss with the set not full - the counter of the new block is set to 0 - increment all other counters by 1 On a cache miss with the set full - the block with counter value = “11” (3) is removed - the new block is loaded and its counter set to 0 - all other counters in the set are incremented by 1 cse241 17

  18. LRU sample Event Block counter value b0 b1 b2 b3 init 0 0 0 0 miss 0 1 1 1 b0 loaded miss 1 0 2 2 b1 loaded miss 2 1 0 3 b2 loaded miss 3 2 1 0 b3 loaded hit 3 0 2 1 hit b1; blocks b2 and b3 inc by 1 hit 3 0 2 1 hit b1; no blocks had lower counts miss 0 1 3 2 miss; load block b0; inc blocks by 1 hit 0 1 3 2 hit b0 hit 1 2 0 3 hit b2 (inc those orig. lower) miss 2 3 1 0 miss; load b3 Note that the counters are always distinct! cse241 18

More Related