220 likes | 362 Vues
McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator. Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha. Goals of McRT-Malloc. Scalable Performance linear to # of processors then flat as you add more SW threads Preemption safety
 
                
                E N D
McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha
Goals of McRT-Malloc • Scalable • Performance linear to # of processors then flat as you add more SW threads • Preemption safety • Implies a lock free approach to all structures • Allows other scalable McRT algorithms to use malloc and remain scalable • Transactional memory awareness • Avoid memory blowup within transaction • Avoid freeing of bits needed to validate other transactions • Enable a object level conflict detection in STM • Best of class
Heap divided into aligned 16K blocks 18 significant bits Block Owned by a single thread during allocation Blocks segregated into bins according to objects size Meta data header Free Lists Bump Pointer Next/Previous Block Object size and usage info No per object Headers Free blocks on non-blocking LIFO queue 46 bit for update timestamp Block Data Structure 0xABCD0000 Meta data Header 0xABCD0040 Object Pointer . . . 0xABCD4000
Object Allocation and Freeing • Thread owns block they allocate in • Trick - Free uses two linked free lists per block • Private free list for block owner avoids atomic instructions • Public list for other threads use atomic instruction and non-blocking algorithm • Trick - Fresh block uses frontier pointer to avoid free list initialization • Then allocates from private free list • Privatize entire public list as needed with atomic xchg
McRT-Malloc: A Transaction Aware Memory Allocator • Three problems • Speculative memory allocation and de-allocation inside transactions can cause space blowup • Transactional conflict detection and frees • Object-based conflict detection in C/C++ • Garbage collection also solves these issues
Allocation with STM • Speculatively allocate or free inside transaction • Valid at commit - rolled back on abort • Balanced – both malloc and free within transaction • Memory is transaction-local must be reused to prevent memory blowup transaction { for (i=0; i<big_number; i++) { foo = malloc(size); … free(foo); } }
Solution • Use sequence numbers to track allocation relationships • Sequence counter per-thread (thread-local) • Every transaction (even nested) takes a new (incremented) sequence number upon start • Every allocation in the transaction is tagged with its sequence number • The relationship of an object being freed in a given transaction is determined by sequence number: • seq(object) < seq(transaction) → speculative free • seq(object) == seq(transaction) → balanced free
Monitors != Transactions • STM uses bits in object to validate at commit • Pessimistically monitors (locks) allow only one thread inside a critical section • Optimistically transactions allow multiple threads inside a critical section • This causes problems freeing an object
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } Thread 2 Deleting node 3 Thread 1 Deleting node 2
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ }
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } At this point you have read / read (non) conflict
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } Now we have a read / write conflict Thread 1 commits and thread two will abort
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate & end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate & end transaction */ free(temp); /* Anyone using? */ } STM Version information needed for validation is destroyed along with object 2
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } Thread two wakes up
nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } nodeDelete(int key) { ptr = head of list; transaction { while( ptr->next->key != key ) { ptr = ptr->next; } /* end while */ temp = ptr->next; ptr->next = ptr->next->next; } /* validate &end transaction */ free(temp); /* Anyone using? */ } The bits thread 2 are relying on to detect and resolve conflict by aborting are now garbage
Solution • Delay the actual free and reuse until in a consistent state • A global epoch (timestamp) is maintained and incremented periodically • Each thread locally remembers the global epoch of the last time it entered or exited a top level transaction • Set as part of TransactionBegin and TransactionAbort/Commit • Each free and global epoch noted in a thread local buffer • When the buffer fills each thread’s epoch is queried • All frees before the minimum epoch are freed “for real” • O(number of frees) not O(number of memory accesses)
McRT-Malloc Beats Hoard Machias Benchmark Mimics the consumer producer pattern with minimal work load (Normalized so X axis indicates linear scaling)
Conclusion • Best of class scalable malloc implementation • Non-blocking to enable other McRT algorithms to be non-blocking and still use malloc • Solved memory blowup within a transaction • Solved premature freeing problem for STM with optimistic concurrency • Enabled object granularity conflict detection in C