130 likes | 258 Vues
This presentation by Terry Arnold II explores the challenges and solutions for scaling Symmetric Multiprocessing (SMP) technologies. It addresses skepticism regarding SMP scalability due to bandwidth limitations, presenting Coherent Memory Replication (CMR) and Hierarchical Affinity Scheduling (HAS) as innovative approaches. Both methods optimize memory access patterns to improve performance, particularly for Online Transaction Processing (OLTP) workloads. The session discusses implementation details, competitive comparisons, results achieved, and poses critical questions about software dependencies and compatibility with other Distributed Shared Memory (DSM) solutions.
E N D
WildFire: A Scalable Path for SMPs Erick Hagersten and Michael Koster Sun Microsystems Inc. Presented by Terry Arnold II
Introduction • What was the goal? • How did they achieve it? • CMR • HAS • Competitive Comparisons • Results • Questions
The Goal • In the past people have been skeptical about the ability of SMPs to continue to scale due to their bandwidth limitations • The trend has been to switch to cc-NUMA • To improve the scalability of SMP technologies
Cc-NUMA issues • Great scalability but have less than optimal “access patterns” • Require high software optimization for capacity and conflict misses • Non trivial scheduling, etc. (resource and memory management)
How? • The answer is the same as the answer to all engineering problems, that is, throwing new acronyms at the problem • Coherent Memory Replication (CMR) • Hierarchical Affinity Scheduling (HAS) • Both of these exploit locality as a means of increasing performance (that is for OLTP workloads)
The Acronyms: CMR • S-COMA with fixed home locations for each address • Shadow physical pages • Coherence at hardware level (64 byte) • Things start out cc-NUMA and changed into CMR based on hardware counters that monitor memory access patterns • Limitations – memory-resident pages and large physical pages can only be replicated explicitly
The Acronyms: HAS • Schedules in the following way: • Last processor it ran on • Same node processor • Remote node processor (when load balances exceeds “threshold”)
Implementation • 2 ASICs – NIAC (coherence), NIDC (bit sliced interconnect) • These improve upon latency of a switch • NIAC – Interface and Global-Coherence Layer • Translators and Counters
Competition • The SGI Origin and Sequent’s NUMA-Q
Questions? Is this “solution” too dependent on the software (kernel modifications)? How compatible are CMR and HAS with the other DSM solutions?