1 / 12

Using a Victim Buffer in an Application-Specific Memory Hierarchy

Using a Victim Buffer in an Application-Specific Memory Hierarchy. Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering Dept. of Computer Science and Engineering University of California, Riverside **Also with the Center for Embedded Computer Systems at UC Irvine

iedward
Télécharger la présentation

Using a Victim Buffer in an Application-Specific Memory Hierarchy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using a Victim Buffer in an Application-Specific Memory Hierarchy Chuanjun Zhang*, Frank Vahid** *Dept. of Electrical Engineering Dept. of Computer Science and Engineering University of California, Riverside **Also with the Center for Embedded Computer Systems at UC Irvine This work was supported by the National Science Foundation and the Semiconductor Research Corporation Chuanjun Zhang, UC Riverside

  2. Low Power/Energy Techniques are Essential • High performance processors are going to be too hot to work • Low energy dissipation is imperative for battery-driven embedded systems • Low power techniques are essential to both embedded systems and high performance processors Skadron et al., 30th ISCA Hot enough to cook an egg. Frank Vahid, UC Riverside

  3. Caches Consume Much Power • Caches consume 50% of total processor system power • ARM920T and M*CORE(Segars 01, Lee 99) • Caches accessed often • Consume dynamic power • Associativity reduces misses • Less power off-chip, but more power per access • Victim buffer helps (Jouppi 90) • Add to direct-mapped cache • Keep recently evicted lines in small buffer, check on miss • Like higher-associativity, but without extra power per access • 10% energy savings, 4% performance improvement (Albera 99) >50% Processor Victim buffer Cache Memory Frank Vahid, UC Riverside

  4. One cycle Two cycles Miss L1 cache 22 cycles 21cycles Victim Buffer • With a victim buffer • One cycle on a cache hit • Two cycles on a victim buffer hit • Twenty two cycles on a victim buffer miss • Without a victim buffer • One cycle on a cache hit • Twenty one cycles on a victim buffer miss • More accesses to off-chip memory PROCESSOR HIT HIT MISS Victim buffer OFFCHIP MEMORY Frank Vahid, UC Riverside

  5. Cache Architecture with a Configurable Victim Buffer data to processor • Is a victim buffer a useful configurable cache parameter? • Helps for some applications • For others, not useful • VB misses, so extra cycle wasteful? • Thus, want ability to shut off VB for given app. • Hardware overhead • One bit register • A switch • Four-line victim buffer shown VB on/off L1 cache tag reg data SRAM SRAM from cache control circuit Vdd cache control circuit victim line s 1 0 data from next level memory to mux 27-bit tag 16-byte cache line data control signals CAM SRAM Fully-associative victim buffer control signals to the next level memory Frank Vahid, UC Riverside

  6. Hit Rate of a Victim Buffer Data cache Instruction cache Hit rate of victim buffer when added to an 8 Kbyte, 4 Kbyte, or 2 Kbyte direct-mapped cache Benchmarks from Powerstone, MediaBench, and Spec 2000. Frank Vahid, UC Riverside

  7. Computing Total Memory-Related Energy • Consider CPU stall energy and off-chip memory energy • Excludes CPU active energy • Thus, represents all memory-related energy energy_mem = energy_dynamic + energy_static energy_dynamic = cache_hits * energy_hit + cache_misses * energy_miss energy_miss = energy_offchip_access + energy_uP_stall + energy_cache_block_fill energy_static = cycles * energy_static_per_cycle energy_miss = k_miss_energy * energy_hit energy_static_per_cycle = k_static * energy_total_per_cycle (we varied the k’s to account for different system implementations) • Underlined – measured quantities • SimpleScalar (cache_hits, cache_misses, cycles) • Our layout or data sheets (others) Frank Vahid, UC Riverside

  8. Substantial benefit Should shut-off VB An 8-line victim buffer with an 8 Kbyte direct-mapped cache (0%=DM w/o victim buffer) Performance and Energy Benefits of Victim Buffer with a Direct-Mapped Cache Configurable victim buffer is clearly useful to avoid performance penalty for certain applications Frank Vahid, UC Riverside

  9. Line 2 Kb way Is a Configurable Victim Buffer Useful Even With a Configurable Cache • We showed that a configurable cache can reduce memory access power by half on average • (Zhang/Vahid/Najjar ISCA 03, ISVLSI 03) • Software-configurable cache • Associativity – 1, 2 or 4 ways • Size: 2, 4 or 8 Kbytes • Does that configurability subsume usefulness of configurable victim buffer? Frank Vahid, UC Riverside

  10. Optimal cache configuration when cache associativity, cache size, and victim buffer are all configurable. I and D stands for instruction cache and data cache, respectively. V stands for the victim buffer is on. nK stands for the cache size is n Kbyte. The associativity is represented by the last four characters Benchmark vpr,I2D1 stands for two-way instruction cache and direct-mapped data cache. Note that sometimes victim buffer should be on, sometimes off Best Configurable Cache with VB Configurations Frank Vahid, UC Riverside

  11. Performance and Energy Benefits of Victim Buffer Added to a Configurable Cache • An 8-line victim buffer with a configurable cache, whose associativity, size, and line size are configurable (0%=optimal config. without VB) • Still surprisingly effective Frank Vahid, UC Riverside

  12. Conclusion • Configurable victim buffer useful with direct-mapped cache • As much as 60% energy and 4% performance improvements for some applications • Can shut off to avoid performance penalty on other apps. • Configurable victim buffer also useful with configurable cache • As much as 43% energy and 8% performance improvement for some applications • Can shut off to avoid performance overhead on other applications • Configurable victim buffer should be included as a software-configurable parameter to direct-mapped as well as configurable caches for embedded system architectures Frank Vahid, UC Riverside

More Related