Fast Configurable-Cache Tuning with a Unified Second-Level Cache

Fast Configurable-Cache Tuning with a Unified Second-Level Cache Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine Nikil Dutt Center for Embedded Computer Systems School for Information and Computer Science University of California, Irvine This work was supported by the U.S. National Science Foundation and by the Semiconductor Research Corporation

Cache Hierarchy Optimizations • ARM920T(Segars ‘01) • The cache hierarchy is a good candidate for optimizations • Applications require highly diverse cache configurations for optimal energy consumption of the cache subsystem • Over 50% energy savings possible in the cache subsystem due to configuration [Gordon-Ross ‘04]

Previous Cache Tuning Methodologies • Previous methods limit configurability to facilitate easier heuristic development I$ I$ I$ Microprocessor Tuner Tuner Microprocessor Main Memory Main Memory D$ D$ D$ Single level cache subsystem with separate caches - less than 50 configurations Multi-level cache subsystem with separate caches - a few hundred configurations

Motivation • Unified second level caches are commonplace in desktop computers and are becoming increasingly popular in embedded microprocessors • Current cache tuning heuristics do not directly apply due to the complexity of tuning in the presence of a unified second level of cache - circular dependency • Search space explodes to ≈ 18,000 configurations A change in any cache effects the performance of all other caches in the hierarchy L1 I$ L2 U$ L1 D$

Motivation • We present an effective and efficient cache tuning heuristic for a highly configurable cache hierarchy including a unified second level of cache. I$ Tuner Microprocessor U$ Main Memory D$

Level One Configurable Cache • The base cache consists of 4 2KByte banks that may individually be shutdown for size configuration • Line size is configurable • Way concatenation allows for configurable associativity • For evaluation of energy savings, we used a base cache of size 8KB with a 32 byte line size and 4 way associativity Way shutdown 2 KB 2 KB 2 KB 2 KB 2 KB 2 KB 2 KB 2 KB 4 KBytes 8 KBytes 8 KBytes 2-way 2 KB 2 KB 2 KB 2 KB Way concatenation

Level Two Configurable Cache • For maximum configurability, level two cache utilized the Motorola M*CORE style way management • Ways can be designated as instruction, data, unified, or off • Line size is configurable • For evaluation of energy savings, we used a base cache size of 64 KB with a 64 byte line size and 4 fully unified ways U-way D-way U-way U-way I-way

Alternating Cache Exploration with Additive Way Tuning (ACE-AWT) Tune level one sizes Tune level one associativities Tune level one line sizes { } { } D I D I D I Tune level two associativity { } Tune level two line size Tune level two size D These steps are difficult because changing size and associativity is synonymous in a way management style cache

ACE-AWT - First Phase • The first phase is applied during size exploration DONE

ACE-AWT - Fine Tuning Phase • The fine tuning phase is applied during associativity exploration Start with resulting cache from the first phase DONE

Results - Energy Savings • Heuristic achieved near optimal results (when optimal could be computed) • 62%energy savings compared to base cache • Yet only searched 0.2% of the search space • Also improved performance by 35% compared to base cache due to tuned line sizes

Conclusions and Future Work • We developed an efficient and effective cache tuning heuristic to tune a two level cache with a unified second level of cache • ≈ 18,000 possible configurations • Compared to a reasonable base cache configuration: • 62% energy savings • Explores only 0.2% of the search space • 35% improvement in performance • Future work includes application of the tuning heuristic to different execution phases in the application

Fast Configurable-Cache Tuning with a Unified Second-Level Cache

Fast Configurable-Cache Tuning with a Unified Second-Level Cache

Presentation Transcript

Cache

SQL tuning and the Dynamic Statement Cache

Cache

Cache

A Self-Tuning Cache architecture for Embedded Systems

A Single-Pass Cache Simulation Methodology for Two-level Unified Caches

Cache

Chapter 18 Buffer Cache Tuning

Loading a Cache with Query Results

A Data Cache with Dynamic Mapping

Configurable Cache Subsetting for Fast Cache Tuning

Improving Data Cache Performance Under a Cache Miss

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

A Highly Configurable Cache Architecture for Embedded Systems

Cache

A Self-Tuning Configurable Cache

SQL tuning and the Dynamic Statement Cache

Cache?

A Self-Tuning Configurable Cache

Cache

A Self-Tuning Cache Architecture for Embedded Systems