1 / 36

Review CPSC 321

Review CPSC 321. Andreas Klappenecker. Announcements. Tuesday, November 30, midterm exam. Cache. Placement strategies direct mapped fully associative set-associative Replacement strategies random FIFO LRU. Direct Mapped Cache.

masao
Télécharger la présentation

Review CPSC 321

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ReviewCPSC 321 Andreas Klappenecker

  2. Announcements • Tuesday, November 30, midterm exam

  3. Cache • Placement strategies • direct mapped • fully associative • set-associative • Replacement strategies • random • FIFO • LRU

  4. Direct Mapped Cache • Mapping: address modulo the number of blocks in the cache, x -> x mod B

  5. Set Associative Caches • Each block maps to a unique set, • the block can be placed into any element of that set, • Position is given by (Block number) modulo (# of sets in cache) • If the sets contain n elements, then the cache is called n-way set associative

  6. Direct Mapped Cache The index is determined by address mod 1024 • Cache with 1024=210 words • tag from cache is compared against upper portion of the address • If tag=upper 20 bits and valid bit is set, then we have a cachehitotherwise it is acache missWhat kind of locality are we taking advantage of? Byte offset

  7. Direct Mapped Cache • Taking advantage of spatial locality: Block offset

  8. Address Determination reconstruction of the memory address = tag bits || set index bits || block offset || byte offset Example: • 32 bit words, cache capacity 2^12 = 4096 words, blocks of 8 words, direct mapped • byte offset = 2 bits, block offset = 3 bits, set index bits = 9 bits, tag bits = 18 bits

  9. Example • Suppose you want to realize a cache with a capacity for 8 KB of data (32 bits of address size). Assume that the blocksize is 4 words and a word consists of 4 bytes. • How many bits are needed to realize a direct mapped cache? • 8 KByte = 2K words = 512 blocks = 2^9 blocks • direct mapped => # index bits = log(2^9)=9. • 2^9 x (128 + (32 – 9 – 2 – 2) + 1) = 2^9 x 148 bits = number of blocks x (bits per block + tag + valid bit) • How many bits are needed to realize a 8-way set associative cache? • Number of tag bits increase by 3. Why?

  10. Typical Questions • Show the evolution of a cache • Determine the number of bits needed in an implementation of a cache • Know the placement and replacement strategies • Be able to design a cache according to specifications • Determine the number of cache misses • Measure cache performance

  11. Typical Questions • What kind of placement is typically used in virtual memory systems? • What is a translation lookaside buffer? • Why is a TLB used?

  12. Pages: virtual memory blocks • Page faults: if data is not in memory, retrieve it from disk • huge miss penalty, thus pages should be fairly large (e.g., 4KB) • reducing page faults is important (LRU is worth the price) • can handle the faults in software instead of hardware • using write-through takes too long so we use writeback • Example: page size 212=4KB; 218 physical pages; main memory <= 1GB; virtual memory <= 4GB

  13. Page Faults • Incredible high penalty for a page fault • Reduce number of page faults by optimizing page placement • Use fully associative placement • full search of pages is impractical • pages are located by a full table that indexes the memory, called the page table • the page table resides within the memory

  14. Page Tables The page table maps each page to either a page in main memory or to a page stored on disk

  15. Page Tables

  16. Making Memory Access Fast • Page tables slow us down • Memory access will take at least twice as long • access page table in memory • access page • What can we do? Memory access is local => use a cache that keeps track of recently used address translations, called translation lookaside buffer

  17. Making Address Translation Fast A cache for address translations: translation lookaside buffer

  18. MIPS Processor and Variations

  19. Datapath for MIPS instructions Note the seven control signals!

  20. Single Cycle Datapath

  21. Pipelined Version

  22. Obstacles to Pipelining • Structural Hazards • hardware cannot support the combination of instructions in the same clock cycle • Control Hazards • need to make decision based on results of one instruction while other is still executing • Data Hazards • instruction depends on results of instruction still in pipeline

  23. Control Hazards Resolution (for branch) • Stall pipeline • predict result • delayed branch

  24. Stall on Branch • Assume that all branch computations are done in stage 2 • Delay by one cycle to wait for the result

  25. Branch Prediction • Predict branch result • For example, predict always that branch is not taken (e.g. reasonable for while instructions) • if choice is correct, then pipeline runs at full speed • if choice is incorrect, then pipeline stalls

  26. Branch Prediction

  27. Delayed Branch

  28. Data Hazards • A data hazard results if an instruction depends on the result of a previous instruction • add $s0, $t0, $t1 • sub $t2, $s0, $t3 // $s0 to be determined • These dependencies happen often, so it is not possible to avoid them completely • Use forwarding to get missing data from internal resources once available

  29. Forwarding • add $s0, $t0, $t1 • sub $t2, $s0, $t3

  30. Typical Questions • Given a brief specification of the processor and a sequences of instructions, determine all pipeline hazards. • Most typical question: fill in some steps in a timing diagram (almost every exam has such a question, google).

  31. Example add $1, $2, $3 _ _ _ _ _ add $4, $5, $6 _ _ _ _ _ add $7, $8, $9 _ _ _ _ _ add $10, $11, $12 _ _ _ _ _ add $13, $14, $1 _ _ _ _ _ (data arrives early OK) add $15, $16, $7 _ _ _ _ _ (data arrives on time OK) add $17, $18, $13 _ _ _ _ _ (uh, oh) add $19, $20, $17 _ _ _ _ _ (uh, oh)

  32. Verilog

  33. Mixed Questions

More Related