500 likes | 506 Vues
Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping. David F. Bacon T.J. Watson Research Center. Page Data Synchronization, Take 1. 16. 64. 256. free. Page Data, Take 2. 16. 64. 256. Thread 1. 16. 64. 256. Thread 2. 16. 64. 256. free.
E N D
Parallel and ConcurrentReal-time Garbage CollectionPart II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center
Page Data Synchronization, Take 1 16 64 256 free
Page Data, Take 2 16 64 256 Thread 1 16 64 256 Thread 2 16 64 256 free
free 16 free 64 free 256 temp full full 256 16 64 ? free
Compare-and-Swap* boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } } } • IBM 370 since 1970, Intel 80486 since 1989 • Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993
top Non-locking* Stack push(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page); } Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n); return t; }
A B C A top top top top Thread 2 a=pop() b=pop() Thread 1 x=pop() start C B n n b A t a t 1 2 C C B n A Thread 1 pop() finish t B b Thread 2 push(A) 3 4
Correct Non-locking Stack push(PageId page) { do { <PageId t, short v> = top; page->next = t; } while (! CAS(&top, <t,v>, <page,(v+1) % 0xffff>); } PageId pop() { PageId t = 0; do { <PageId t, short version> = top; if (t==0) return 0; } while (! CAS(&top, <t,v>, <next(t),(v+1) % 0xffff>); return t; }
Really Correct Non-locking Stack push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page)); } Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next)); return t; }
Coalescing ? free • Ideal: • getMultiplePages(n) returns best fit • Does not cause mutator activities to block • Does not cause collection activities to block • Does not suffer pathology under contention • Reality: tradeoffs between • Quality of fit • Probability of blocking • Overhead (number of CAS’s)
Cartesian Tree + Try-lock ? 44 45 46 47 33 34 35 50 51 21 22 62 63 11 24 59 25 x.left.size < x.size >= x.right.size x.left.address < x.address < x.right.address
Non-locking Rebalancing Tree • Homework… • …but I won’t use it. • Solution would be • Too complex to be reliable • Require too many CAS operations to be fast
The Fix: Transactional Memory AtomicCartesianTree extends CartesianTree { atomic Block getBlock() { return super.getBlock(); } atomic void releaseBlock(Block b) { super.releaseBlock(b); } atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); } }
Hierarchical Non-locking Buddy? 21 22 24 25 11 33 34 35 50 51 44 45 46 47 59 62 63
? p Z T Y X W U r Y X a a a a a a a a b b b b b b b b Handling the Concurrency Application (“mutator”) Threads GC Threads • Trace • Collect • Format • Load pointer • Store pointer • Allocate
Yuasa Snapshot Algorithm (1990) • Logically • Mark-and-Sweep Style Algorithm • Take a “copy-on-write” heap snapshot • Collect the garbage in that snapshot • Physically • Stop all threads • Copy their stacks (“roots”) • Force them to save over-written pointers • Trace roots and over-written pointers * Dijktstra’75, Steele’76
U Y T W X Z a a a a a a b b b b b b 1: Take Logical Snapshot Stack p r
r U Y T X W Z a a a a a a b b b b b b 2(a): Copy Over-written Pointers Stack p
U T W X Z Y Z Y W X a a a a a a a a a a b b b b b b b b b b 2(b): Trace Stack p * Color is per-object mark bit
s W Z U X Y T V X Z Y W a a a a a a a a a a a b b b b b b b b b b b 2(c): Allocate “Black” Stack p
s W Z U X Y T V W Z Y X a a a a a a a a a a a b b b b b b b b b b b 3(a): Sweep Garbage free Stack p
t s V2 T Y X Z U Y V Z X W a a a a a a a a a a a b b b b b b b b b b b 3(b): Allocate “White” free Stack p
t s X V2 Z T U W Y V X Y W Z V a a a a a a a a a a a a a b b b b b b b b b b b b b 4: Clear Marks free Stack p
Yuasa Algorithm Phases • Snapshot stack and global roots • Trace • Flip • Sweep • Flip • Clear Marks * Synchronous
Metronome-2 Concurrency GC Master Thread GC Worker Threads Application Threads (may do GC work)
Metronome-2 Phases • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) * Parallel ** Callback *** Single actor symmetric
Part 3: Sweep • Clearing • Monitor Table clearing • JNI Weak Global clearing • Debugger Reference clearing • JVMTI Table clearing • Phantom Reference clearing • Re-Trace 2 • Trace Master • (Trace*) • (Trace Terminate***) • Class Unloading • Completion • Finalizer Wakeup • Class Unloading Flush • Clearable Compaction** • Book-keeping • Initiation • Setup • turn double barrier on • Root Scan • Active Finalizer scan • Class scan • Thread scan** • switch to single barrier, color to black • Debugger, JNI, Class Loader scan • Trace • Trace* • Trace Terminate*** • Re-materialization 1 • Weak/Soft/Phantom Reference List Transfer • Weak Reference clearing** (snapshot) • Re-Trace 1 • Trace Master • (Trace*) • (Trace Terminate***) • Re-materialization 2 • Finalizable Processing • Flip • Move Available Lists to Full List* (contention) • turn write barrier off • Flush Per-thread Allocation Pages** • switch allocation color to white • switch to temp full list • Sweeping • Sweep* • Switch to regular Full List** • Move Temp Full List to regular Full List* (contention) * Parallel ** Callback *** Single actor symmetric
s W Z U X Y T V X Z Y W a a a a a a a a a a a b b b b b b b b b b b Assume Trace Has Finished Stack p
thread list pthread pthread_t writebuf 43 color BarrierOn BarrierOn false true pthread pthread_t writebuf 41 color epoch epoch 43 37 JVM Application Thread Structure 16 64 256 sweep dblb Thread 1 next inVM 16 64 256 sweep dblb Thread 2 next inVM
free 16 free 64 free 256 temp full full 256 16 64 available ? free
free 16 free 64 free 256 temp full full 256 16 64 contention? available ? free
Generic Structure of Move Phase:Shared Monotonic Work Pool GC Master Thread GC Worker Threads Application Threads (may do GC work) pool of work
Performing a GC Work Quantum every (500us) tryToDoSomeWorkForGC(); tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase(); }
PhaseInfo phase workers moveavail 3 Phase Entry short enterPhase() { thread->epoch = Epoch.newest; while (true) <short phase, short workers> = PhaseInfo; if (workers == phaseTable[phase].maxWorkers) return -1; if (CAS(PhaseInfo,<phase,workers>,<phase,workers+1>)) return phase; } ABA?
PhaseInfo phase workers moveavail 3 Phase Exit short leavePhase() { while (true) <short phase, short workers> = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = <phase,workers-1>; else if (phase == TRACE_PHASE) newPhase = <phaseTable[phase].nextPhase, 1>; else newPhase = <phaseTable[phase].nextPhase, 0>; if (CAS(PhaseInfo, <phase,workers>, newPhase)) return newPhase; }
thread list pthread pthread_t PhaseInfo phase workers writebuf flush 1 43 color color BarrierOn BarrierOn false true pthread pthread_t writebuf 41 color color epoch epoch 37 43 Flush Per-Thread Pages 16 64 256 sweep sweep dblb Thread 1 next inVM 16 64 256 sweep sweep dblb Thread 2 next inVM
free 16 free 64 free 256 temp full full 256 16 64 Put Full White Pages on temp-full List available ? free
How is this Phase Different? Per-Thread State Update GC Master Thread GC Worker Threads Application Threads (may do GC work) pool of work
Callback Mechanism bool doCallbacks(int callbackId) { bool alldone = true; LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist); return alldone; } * Non-locking list with cursor?
free 16 free 64 free 256 temp full full 64 256 16 Sweep Phase available ? free
thread list pthread pthread_t PhaseInfo phase workers writebuf switchfull 1 43 color color BarrierOn BarrierOn false true pthread pthread_t writebuf 41 color epoch epoch 43 37 Switch Back to Normal Full List 16 64 256 color sweep sweep dblb Thread 1 next inVM 16 64 256 sweep sweep dblb Thread 2 next inVM
free 16 free 64 free 256 temp full full 64 256 16 Drain Temp Full available ? free
Done with Collection!! • But… • That Was the Easy Part • And there are still some problems • What if threads go out to lunch?
http://www.research.ibm.com/metronomehttps://sourceforge.net/projects/tuningforkvphttp://www.research.ibm.com/metronomehttps://sourceforge.net/projects/tuningforkvp