Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems Tsinghua University Tsinghua National Laboratory for Information Science and Technology

Background • “Memory-wall” • High memory access latency • DRAM Structure • Channel, Rank, Bank, Row, Column … • Various timing constraint • Challenge of multi-core • High parallelism • More data contention • Solution • More memory channels • Efficient memory scheduler

Motivation • Threads classification [TCM:Kim:2008] • Latency-sensitive threads • Bandwidth-sensitive threads • A memory scheduler should • Improve system throughput • Avoid starvation • Keep fair among different threads

Goals • Requests of latency-sensitive threads • To be issued ASAP • Requests of bandwidth-sensitive threads • Avoid unfairness • Our proposal: PBFS • Prioritize latency-sensitive threads • Avoid starvation of bandwidth-sensitive threads

Basic Idea • Each thread gets a priority • Range from -1 to n • Top-priority (n) • latency sensitive threads • Bottom-priority (0) • intermediate threads • Medium-priority (1,n-1) • latency sensitive threads • Idle (-1) • finished threads or compute-intensive threads

Priority Updating Rules • Dynamically update • Once a request is issued • The corresponding thread priority - 1 • When there no thread has top-priority • All thread’s priorities +1 • When a time threshold is arrived • Identify Idle threads, • Adjust top-priority • Extremely unbalance: increase top-priority • Extremely balance: decrease top-priority • Other case: unchanged • Upper/lower boundaries are adjusted by active threads

System throughput • Latency-sensitive threads • Easy to get top-priority • Issued as soon as possible • Example • 2-core CMP • Thread A, latency-sensitive • Thread B, bandwidth-sensitive • Top-priority = 2 • Init, both threads’ priorities are 2

Example Rq 0 Rq 1 Thread A Rq 0 Rq 1 Rq 2 Rq 3 Rq 4 Rq 5 Rq 6 Rq 7 Rq 8 Rq 9 Thread B Rq 0 Rq 1 Rq 2 Rq 0 Rq 3 Rq 4 Rq 5 Rq 6 Rq 1 Rq 7 Rq 8 Rq 9 Execution

Starvation Avoidance • When a thread continuously issued too many requests • It will be classified as bandwidth-sensitive thread • Other threads may have more chance to promote their priorities • Example • 2-core CMP • Thread A, less bandwidth-sensitive • Thread B, bandwidth-sensitive • Top-priority = 2 • Init, both threads’ priorities are 2

Example Rq 0 Rq 1 Rq 2 Rq 3 Rq 4 Rq 5 Thread A Thread B Rq 0 Rq 1 Rq 2 Rq 3 Rq 4 Rq 5 Rq 6 Rq 7 Rq 8 Rq 9 Rq 0 Rq 1 Rq 2 Rq 0 Rq 1 Rq 3 Rq 2 Rq 4 Rq 3 Rq 5 Rq 4 Rq 6 Rq 5 Rq 7 Rq 8 Rq 9 Execution

Hardware overhead • Need hardware support to • record the priority of each thread • monitor the threads’ behavior (read counts within a time interval) • maintain the flags that whether a row buffer can close • The storage overhead is small and easy to implement

Evaluation • Usimm-1.3 • Memory configuration • 1 channel • 4 channel • Benchmarks • Metrics • Execution time • Maximum slowdown • EDP

Execution Time • Overall • CLOSE: 4.2% reduction • PBFS: 7.5% reduction

Maximum Slowdown • Overall • CLOSE: 4.7% reduction • PBFS: 7.0% reduction

EDP • Overall • CLOSE: 9.1% reduction • PBFS: 13.8% reduction

Summary • We proposed PBFS • Classify threads with priority • Dynamically update threads’ priorities • Guarantee system throughput • Avoid starvation of bandwidth-sensitive threads • Low hardware overhead

Thanks

Priority Based Fair Scheduling: A Memory Scheduler Design for Chip-Multiprocessor Systems