230 likes | 356 Vues
In C++, achieving both modularity and speed is often a complex challenge. This presentation explores the inherent tension between the two as modularity prioritizes separation and abstraction while speed demands coalescing and specialization. We delve into strategies to optimize memory allocation, including eager computation, segregation of functionality, and the creation of efficient, modular allocators. By employing advanced techniques like mixins and costless refinements, developers can manage complexity while enhancing performance. Join us to discover practical approaches to crafting high-performance memory allocators.
E N D
Chromed Metal Safe and Fast C++ Andrei Alexandrescu andrei@metalanguage.com
Agenda • Modularity and speed: a fundamental tension • Example: memory allocation • Policies • Eager Computation • Segregate functionality • Costless refinements • Based on “Composing High-Performance Memory Allocators” by Berger et al: www.heaplayers.org
Modularity: good • Developing systems from small parts is good • Best known way to manage complexity • Abstraction is good • Modularity and abstraction go hand in hand • Separate development is good • Separate testing is good • Confinement of bugs is good
Speed: good • Getting work done is good (?) • Libraries that don’t exact penalties are good • Lossless growth is good • Compounded inefficiency: abstraction’s worst enemy
Modularity and Speed • Fundamental tension: • Modularity asks for separation, hiding, abstraction, and uniform interfaces • Speed asks for coalescing, transparency, specialization, and non-uniformity • How to resolve the tension?
Two Approaches • Defer compilation/optimization • Develop subsystems separately, have the runtime optimize when it sees them all • Various JIT approaches • Expedite computation/exposure • Develop subsystems separately, have the compiler see them all early • Various macro and compilation systems
Example: Memory Allocation • Memory allocation: • Very hard to modularize/componentize • Highly competitive: • General-purpose allocators: 100 cycles/alloc • Specialized allocators: < 12 cycles/alloc • Templates: • Compute things early • Expose modular code early
Idea #1: mixins/policies • Create uncommitted, “for adoption” derived classes template <class Base> struct Heap : public Base { void* Alloc(size_t); void Dealloc(void*); }; • Exposes modular code early
Top Class • Can’t defer forever, so without further ado… struct MallocHeap { void* Alloc(size_t s) { return malloc(s); } void Dealloc(void* p) { return free(p); };
Idea #2: Eager Computation • Avoid redundant and runtime computation safely! class TopHeap { void* Alloc(size_t) { ... } void Dealloc(void*) { ... } friend void* Alloc(Heap & h, size_t s) { return h.AllocImpl( (s + AlignBytes - 1) & ~(AlignBytes - 1))); } friend void Dealloc(Heap & h, void* p) { return h.Dealloc(p); } };
Idea #3: Segregate Representation template <class Base> class SzHeap : public Base { void* Alloc(size_t s) { size_t * pS = static_cast<size_t*>( Base::AllocImpl(s + sizeof(size_t))); return *pS = s, pS + 1; } void Dealloc(void* p) { Base::Dealloc(static_cast<size_t*>(p) – 1); } size_t SizeOf(void* p) { return (static_cast<size_t*>(p))[-1]; } };
Free Lists • Unbeatable specialized allocation method • Put deallocated blocks in a freelist • Consult the freelist when allocating • Disadvantage: fixed size, no coallescing, no reallocation
Free Lists Layer template <size_t S, class Base> class FLHeap : public Base { void* Alloc(size_t s) { if (s != S || !list_) { return Base::AllocImpl(s); } void * p = list_; list_ = list_->next_; return p; } ...
(continued) ... void Dealloc(void * p) { if (SizeOf(p) != S) return Base::Dealloc(p); list * pL = static_cast<List*>(p); pL->next_ = list_; list_= pL; } ~FLHeap() { ... } private: struct List { List * next_; } };
Remarks • There is no source-level coupling between the way the size is maintained and computed, and FLHeap • Combinatorial advantage • There is coupling at the object code level • + Optimization • - Separate linking, dynamic loading…
Building a Layered Allocator typedef FLHeap<64, FLHeap<32, SzHeap<MallocHeap> > > MyHeap; • Modular • Easy to understand • Easy to change • Efficient
Idea #4: Costless Refinements template <class Heap> struct CanResize { enum { value = 0 }; }; template <class Heap> bool Resize(Heap &, void*, size_t &) { return 0; } • Refined implementations will “hide” the default and specialize CanResize • Can test for resizing capability at compile tim or runtime
Range Allocators template <size_t S1, size_t S2, class Base> class RHeap : public Base { void* Alloc(size_t s) { static_assert(S1 < S2); if (s >= S1 && s < S2) s = S2; return Base::AllocImpl(s); } ... }; • Improved speed at the cost of slack memory • User-controlled tradeoff
Idea #2 again: Eager computation template <size_t S1, size_t S2, size_t S3, class B> void* RHeap<S1, S2, RHeap<S2, S3, B> >:: Alloc(size_t s) { static_assert(S1 < S2 && S2 < S3); if (s >= S1 && s < S3) { s = s < S2 ? S2 : S3; } return Base::AllocImpl(s); } ... };
Further Building Blocks • Profiling and debug heaps • MT heaps • Locked • Lock-free • Region-based • Alloc bumps a pointer • Dealloc doesn’t do a thing • Destructor deallocates everything
Performance • 1%-8% speed improvement over gcc’s ObStack • 2%-3% speed loss over the Kingsley allocator • 2% faster – 20% slower than Lea’s allocator • Lea: monolithic general-purpose allocator • Optimized for 7 years • Memory consumption similar within 5%
Conclusions • Modularity and efficiency are at odds • Templates offer black-box source, white-box compilation • A few idioms for efficient, safe idioms: • Policies • Eager Computation • Segregate functionality • Costless refinements
Bibliography • Emery Berger et al., “Composing High-Performance Memory Allocators”, PLDI 2001 • Yours Truly and Emery Berger, “Policy-Based Memory Allocation”, CUJ Dec 2005