180 likes | 259 Vues
This research explores utilizing stream programming properties to enhance memory task scheduling in systems with multiple cores. By restricting the Memory Task Limit (MTL), memory bandwidth contention can be reduced, improving system performance. An analytical model is developed to analyze performance speedup under different MTL constraints. Experimental results validate the approach's effectiveness in reducing bandwidth contention and enhancing real workload performance.
E N D
An Analytical Model to Exploit Memory Task Scheduling Hsiang-Yun Cheng, Jian Li, and Chia-Lin Yang Dept. of Computer Science & Information Engineering National Taiwan University IBM Austin Research Laboratory
Motivation • Off-chip bandwidth on CMPs is a precious resource • If too many cores execute memory operations simultaneously • Bandwidth contention ↑ memory access latency ↑
Objective • Software task scheduling to reduce bandwidth contention and improve system performance • Utilize stream programming property to decouple threads into memory and compute tasks • Avoid too many concurrent memory tasks • Challenge: how many concurrent memory tasks is allowable to maximize system performance?
Stream Programming Style • Decouple computation and memory access • Gather Compute Scatter • Example
Exploiting Stream Programming Properties • Task division according to stream programming property • Memory tasks • Fetch data from off-chip memory to on-chip caches • Compute tasks • Directly access data from on-chip cache without cache misses
Memory Task Scheduling • Main Idea • Restrict Memory Task Limit (MTL) to reduce memory bandwidth contention • MTL : number of memory tasks that can be scheduled simultaneously • MTL↓ bandwidth contention↓ memory access latency ↓ • MTL↓ scheduling constraint↑ CPU may unnecessarily stay idle
Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=1 performs best
Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=2 performs best 10
Performance Modeling for Different MTLs • Develop an analytical model to analyze performance speedup under different MTL constraint • Given Tmk, Tc, MTL=k, n, t • Tmk: average execution time of memory tasks under MTL=k • Tc: average execution time of compute tasks • n: number of processor cores • t: number of memory tasks • Estimate performance speedup under MTL=k
Would CPU Idle under MTL=k? • If then CPU always busy • If then CPU sometimes idle • Example: n=4 • MTL=1 CPU won’t idle if • MTL=2 CPU won’t idle if • MTL=3 CPU won’t idle if
If CPU always busy If CPU sometimes idle Performance Model
Performance Trend • Comparing workloads with same Tmk, same optimal MTL, but different Tc • Optimal MTL: MTL that achieves the best speedup
Experimental Setup • Workloads • Experimental environment
Thank You Hsiang-Yun Cheng r96027@csie.ntu.edu.tw