Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements. Gautam Chakrabarti and Fred Chow PathScale, LLC. Outline. Motivation Types of structure layout optimizations Criteria for structure layout optimizations Implementation details
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements
E N D
Presentation Transcript
Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.
Outline • Motivation • Types of structure layout optimizations • Criteria for structure layout optimizations • Implementation details • Performance results • Future work • Conclusion Open64 Workshop 2008
Motivation • Poor data locality in many applications • High data cache miss rates • Growing gap between processor and memory speeds Our Aim • Make applications more cache-friendly Our Approach • Change layout of data structures • Requires whole-program optimization • Use Inter-Procedural Analysis and Optimizations (IPA) Open64 Workshop 2008
IPA • Summarization • Analysis • Optimization Open64 Workshop 2008
Types of Structure Layout Optimizations • Structure splitting • Structure peeling struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; Open64 Workshop 2008
Structure Splitting Example struct new_struct_A { double d1; int i; long long l; struct new_struct_A * next; struct cold_sub_struct_A * p; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct cold_sub_struct_A { double d2; float f; char c; }; Open64 Workshop 2008
Structure Peeling Example struct new_struct_A { double d1; int i; long long l; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; structcold_sub_struct_A { double d2; float f; char c; }; Open64 Workshop 2008
Criteria for structure layout optimizations • Legality Analysis • Type cast • Address of a field is taken • Escaped types • Parameter types • Full visibility to IPA • Alignment restrictions • Profitability Analysis • Hotness • Affinity • Field accesses at loop level • Size Open64 Workshop 2008
Implementation Details Step 1: Type information summarization (IPL) Step 2: Symbol table merging (IPA) Step 3: Legality and profitability analysis (IPA analysis) Step 4: Transforming the program (IPA optimization) Open64 Workshop 2008
Implementation Details: Type information summarization • Information summarization in IPL • Framework for computing static profiles using heuristics • New TY flag TY_NO_SPLIT • SUMMARY_TY_INFO • SUMMARY_LOOP • For each DO_LOOP, WHILE_DO, DO_WHILE • Bit-vector to track field accesses of up to N structure for each loop • Considers field accesses immediately inside loop • These fields are considered affine to each other • Execution count of statements immediately inside loop • From statically estimated profiles or from runtime feedback Open64 Workshop 2008
Implementation Details: IPA Analysis • Inter-procedurally update statically estimated execution count of PUs • Update statically estimated loop frequencies in SUMMARY_LOOP • Consider SUMMARY_LOOP from the hottest P PUs • Determine candidates for structure-layout transformation • Determine new layout of structures Open64 Workshop 2008
Implementation Details: IPA Analysis Example Li — Loops Fj — Fields in a struct AGk — Affinity groups Open64 Workshop 2008
Implementation Details: Transforming the program • New type definitions • Field table update • Field access statements • New symbols • Assignment statements Example: struct S struct T { { // N fields // AG1 fields struct T * p; // AG2 fields // M fields }; }; // peel T struct S { // N fields struct T1 * p1; struct T2 * p2; // M fields }; struct T1 struct T2 { { // AG1 fields // AG2 fields }; }; Open64 Workshop 2008
Implementation Details: Transforming the program (continued) Function calls to memory management routines Example: p = (T *) malloc (N * sizeof (T)) if (p == NULL) exit (1); • Detect memory management routine calls involving transformed type T • Replicate call, assignment statements • Update size of memory being allocated • Handle comparisons involving pointer p Open64 Workshop 2008
Performance Results Compilations options: -Ofast at 32-bit ABI Speedup due to structure layout optimizations Open64 Workshop 2008
Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Speedup due to structure layout optimizations Open64 Workshop 2008
Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Multiple copies of 462.libquantum running on multi-core chip Platform: Quad-core AMD Barcelona (2.0 GHz, 8GB, 512KB, 2MB) 3rd level cache shared among 4 cores Speedup from structure layout optimizations Open64 Workshop 2008
Future Work • Tune static profile estimation • Less restrictions • Integrate with field-reordering Open64 Workshop 2008
Conclusion • A framework for performing structure layout transformations is now available in the Open64 compiler. • The superior infrastructure in the Open64 compiler helped us implement the optimizations cleanly and with relatively less effort. • Substantial speedups are possible on some of the CPU2000 and CPU2006 SPEC benchmarks. • Structure layout optimization is a required feature for a compiler to remain competitive. Open64 Workshop 2008