Parallelizing the GAP Kernel
120 likes | 225 Vues
This project focuses on parallelizing the GAP kernel, an extensive codebase containing 170,000 lines of sequential C. The challenge lies in adapting the system to allow multi-threaded execution by managing interpreter state stored in global variables and utilizing thread-local storage. We explore various solutions for garbage collection, synchronization, and thread management, aiming to develop a robust programming model that enhances performance while maintaining the integrity of the original codebase. Our approach includes building synchronization primitives and utilizing FIFO channels for efficient communication.
Parallelizing the GAP Kernel
E N D
Presentation Transcript
Parallelizingthe GAP Kernel Reimer Behrends University of St. Andrews
The GAP Kernel • 170,000 lines of sequential C code. • Hundreds of global and static variables. • Custom generational garbage collector. • Goal: Allow multi-threaded execution.
Multiple Interpreter Instances • Interpreter state stored in global variables. • Objectify interpreter state? – or – • Use thread-local storage?
Objectify Interpreter State • Global variable use is pervasive. • Vast majority of functions/macros • Need access to state themselves or • Have to pass it to functions they call. • Function tables. • Too invasive for the code base overall.
Thread-Local Storage • No portable solution. • Only some systems support a TLS ABI. • __thread in gcc, .tls storage segment • pthread_getspecific() portable, but slow. • Use: SP/FP-relative addressing. • Thread stack is allocated on power-of-2 boundaries. • Mask lower bits to derive base of stack area. • pthread_setstack(), alloca().
Garbage Collection • Current “gasman” collector: • Difficult to adapt to multi-threaded environment. • Serialization bottlenecks (CHANGED_BAG). • Interim solution: BDW conservative collector. • Has thread support. • Largely plug-and-play. • Adaptation uses gasman API. • However: Problems with the 64-bit version. • Need finalization.
Synchronization • Programming model still “under construction”. • Build a set of basic thread manipulation and synchronization primitives.
Thread Management • Thread management primitives: • id := CreateThread(func, arg1, …, argn); • WaitThread(id); • Example: x := a; id := CreateThread(function(y) x := x + y; end, b); WaitThread(id);
Channels • Channels are FIFO queues • SendChannel(channel, object); • object := ReceiveChannel(channel); • Blocking and polling versions. • Both bounded and unbounded channel size. • Multiplexing: • object := ReceiveAnyChannel(ch1, …, chn);
Barriers • StartBarrier(barrier, count); • WaitBarrier(barrier); • WaitBarrier(barrier, function);
Single Assignment Variables • WriteSyncVar(var, value); • Only one write permitted. • Subsequent writes result in errors. • value := ReadSyncVar(var); • Blocks if ‘var’ has not been written yet.
Build Process • HPC GAP internal builds use SCons. • Automatic and clean dependency tracking for C. • Proper rebuilds for changes in build setup. • E.g., scons gmp=no. • Python easier to write than m4+/bin/sh+make.