Extended Memory Semantics for Thread Synchronization
Explore Extended Memory Semantics to improve synchronization performance in multithreaded systems. Learn about its benefits and challenges compared to other mechanisms.
Extended Memory Semantics for Thread Synchronization
E N D
Presentation Transcript
Extended Memory Semantics for Thread Synchronization Sheng Li, Ying Zhou Operating System Progress Report Nov 1st, 2007
Problems • Hardware multithreading is no longer a privilege of supercomputing, it is already part of the major microprocessors. • E.g. In Sun Niagara 2 has 64 threads/chip and 256 threads/server. • Concurrency management is one of the biggest challenges in multithreaded system • Key requirement:Low overhead and scalable thread synchronization • Synchronization mechanisms • Atomic primitives (Test-and-Set, Compare-and-Swap, LL-SC) • Software routines built on them have poor performance and scalability • Empty/Full bits, using extension bit for each memory location to denote the empty/full state. • Better performance [1], but still not enough
Our Goal • Solve the synchronization bottleneck by using Extended Memory Semantics • Better performance and scalability • Quantify the performance gain when using EMS, compared to other synchronization mechanisms (e.g Empty/Full bits)
64 bits of data/metadata Extension bit Extended Memory Semantics Memory instructions are characterized synchronization behavior. • Load.ff, Load.fe, Store.xf, Store.ef, Store.xe. (F--- Full, e---empty, x---don’t care)
EMS handler • There is no free lunch… EMS handler has overhead • Creating the handler threads • To queue up memory requests, to build the data structure
What we have done so far • Build the EMS model on both architecture and OS aspects in the Structural Simulation Toolkit (SST) • SST is the simulation environment for massively lightweight multithreading , developed at Notre Dame and Sandia Lab • Modified the glibc to use EMS • Especially pthread library • Design benchmarks for different categories • Run the simulations to evaluate EMS performance
Tightly Coupled Parallel • Each thread competes with the others for the only lock before updating the counter • Very high contention, worst case
Loosely Coupled Parallel • Each thread competes locks with the others before updating the counters. • Mild contention
Embarrassingly Parallel • No contention, no locks
Embarrassingly parallel and loosely coupled parallel • Low synchronization overhead--- guaranteed by EMS • EMS shows very good scalability Synchronization distribution
Tightly Coupled Parallel • Bad performance for EMS in the worst case • Most of threads are used for synchronization, not for real job
The Road Ahead • Build/complete other synchronization mechanisms (e.g. Empty/Full bits and etc) into SST • Modify glibc to make it support for other synchronization mechanisms • Compare performance between EMS and other synchronization mechanisms
Thank you! Questions?
Bibliography [1] Performance and Programming Experience on the Tera MTA, Larry Carter, John Feo, Allan Snavely, PPSC, 1999
Lightweight Threads • Thread context (frame) is 32 double words (256 bytes) • Two double words are reserved for the thread status; 30 general purpose registers. • No other per thread state, easy for multithreading . • Frames are stored in memory (No Register File) • Registers are aliases for memory locations