Cache Utilization-Aware Scheduling for Multicore Processors

Cache Utilization-Aware Scheduling for Multicore Processors Presenter: Chi-Wei Fang YunTech University, Taiwan Authors: Edward T.-H. Chu, Wen-wei Lu 2012 IEEE Asia Pacific Conference on Circuits and Systems 1 1

Outline Introduction Contribution CUAS Experiment Conclusion

Introduction Due to the limitation of semiconductor process processor speed is not expected to have a significant raise In order to further improve the capability of processor, chip multiprocessor (CMP) has become widespread in today’s computer systems

Introduction Intel® Core™2 Quad Processor Q8400 architecture In most multicore processors, the last level cache is shared among cores to reduce possible resource underutilization. As figure shows, in the Intel® Core™2 Quad Processor, L2 caches are shared among cores

Introduction • When the tasks running on different cores read and write shared cache intensively, excessive cache miss may occur and result in performance degradation • Reduce the shared cache contention of multicore systems becomes an important design issue • J.Mars designed cipe[1] to classifies the tasks according to its abilities of anti-interference • Which is defined as the performance lose when the task competes shared cache with other tasks

Introduction If tasks have similar anti-interference abilities, it becomes difficult for the methods to generate a proper task assignment In addition, how a task interferences co-scheduled task depends on how aggressively the task accesses cache A task with little anti-interference ability may or may not seriously interferences co-scheduled applications.

Motivation • The optimal algorithm exhaustively searches all possible task assignments and selects the one with the smallest total execution time • Because of the gap between existing methods and the optimal algorithm, there is an apparent need to designa task scheduling policy to reduce shared cache contention and improve system performance

Contribution Cache utilization aware scheduling (CUAS) • Goal:maximize the difference of unhealthy level of each core that shares the same cache while balancing workload among cores • We define the unhealthy level of a core as the sum of unhealthy scores of tasks running on this core • CUAS includes two parts • Application classifier • Task scheduler  CUAS can reduce cache contention

CUAS classification We designed two micro-benchmarks to measure the anti-interference and interference ability of a task • Attack (ATT) • Strong interference ability • Randomly and intensively pollutes all cache lines • Defend (DEF) • Strong anti-interference ability • Sequentially read and writeeach cache line

CUAS classification • Based on the results of co-scheduling with ATT and DEF, we grade each task’s anti-interference and interference ability ATT DEF Task Task L2 cache L2 cache Anti-interference ability Interference ability

CUAS classification There are three formula to calculate the unhealthy score of task (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) A’d is the execution time of DEF when it is co-scheduled with the task Ad is the execution time of DEF execute solely (2) (2) (2) (2) (2) (2) (2) A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely A’i is the execution time of ATT when it is co-scheduled with task Ai is the execution time of task execute solely (3) (3) (3) (3) The unhealthy score of a task is the sum of its I and AI. A task with higher unhealthy scores will have more negative impact on system performance

The goal of CUAS scheduler Maximize the unhealthy scores gap between the cores that share the same cache Balance the workload among cores

CUAS steps Calculate the number of tasks for each core We first assign the a‘th largest unhealth score tasks to the core0 of the first cache In order to avoid the unhealth tasks effect each other,we assign the next a’th largest unhealth core tasks to the core0 of another cache We assign the tasks from cache n to cache 1 in the next turn

CUAS scheduling Core 0 Core 1 Core 0 Core 1 1 2 3 4 6 5 L2 cache L2 cache Cache 1 Cache 2 Classify result Scheduling by classification result

Outline • Introduction • Contribution • CUAS • Experiment • Conclusion

Experiment • We adopted Intel Core2 Quad Q8400 CPU for our experiment • Four cores are arranged into two groups of two cores and each group shares a 2MB L2 cache • Adopted SPEC CPU2006 benchmark for evaluation

Experiment The classify result of CUAS The reduction of total execution time at most46%

Outline • Introduction • Contribution • CUAS • Experiment • Conclusion

Conclusion • In this work, we design a novel task scheduling, called CUAS, to reduce shared cache contention based on two indexes, intra-core cache contention and task interference ability, that primarily determine the utilization of shared cached • CUAS first classifies tasks according to their anti-interference ability and interference ability. • CUAS then distributes tasks to cores based on the effect of inter-core and intra-core cache contention

Conclusion • Our experiment results shows that CUAS can significantly reduce shared cache contention and reduce total execution time at most 46% compared to existing methods

Thanks for attention Embedded Operating System Lab at Yuntech University http://eos.yuntech.edu.tw/eoslab/ Supported by NSC 100-2219-E-224-001

Cache Utilization-Aware Scheduling for Multicore Processors

Cache Utilization-Aware Scheduling for Multicore Processors

Presentation Transcript

On-Chip Optical Communication for Multicore Processors

Programming Multicore Processors

CRUISE : Cache Replacement and Utility-Aware Scheduling

III. Multicore Processors (6)

Multicore / Manycore Processors

Multicore: Commercial Processors

Scheduling on Heterogeneous Multicore Processors Using Architectural Signatures

Cache Memory Design for Network Processors

III. Multicore Processors (2)

Cache-aware batch scheduling policies for large-scale scientific data processing

11. Multicore Processors

III. Multicore Processors (3)

III. Multicore Processors (5)

III. Multicore Processors (4)

Asymmetry Aware Scheduling Algorithms for Asymmetric Processors

Image Reconstruction on Multicore Processors

III. Multicore Processors (4)

SACR: Scheduling-Aware Cache Reconfiguration for Real-Time Embedded Systems

Power Efficiency for Variation-Tolerant Multicore Processors

Cache Coherence Techniques for Multicore Processors

“Temperature-Aware Task Scheduling for Multicore Processors”

MultiCore Processors