180 likes | 290 Vues
This project explores optimizing compiler techniques through data access profiling and field regrouping in the Pegasus environment. By analyzing contemporaneity in structure field access, it addresses the problem of reordering fields to improve cache performance. The authors present a comprehensive methodology that includes tagging structure field accesses, simulating data access patterns, and utilizing field affinity graphs to provide effective reordering recommendations. This innovative approach shows promising results in performance improvement, highlighting the value of efficient memory access in software systems.
E N D
Data Access Profiling & Improved Structure Field Regrouping in Pegasus Vas Chellappa & Matt Moore May 2, 2005 / Optimizing Compilers / Project Poster Session
Introduction • Structure definitions group fields by semantics, not access contemporaneity • Data access profiling can be used to improve cache performance by reordering for contemporaneity In this context, contemporaneity is a measure of how close in time two data accesses to structure fields occur
Problem Statement • Obtaining contemporaneity information for structure fields • Exploiting this information to improve the ordering of the fields • Doing this within the CASH/Pegasus environment
Approach • Pegasus Implementation • Data Access Profiling to track contemporaneous field accesses to build the Field Affinity Graphs • Modify Simulator interface to SimpleScalar (3rd party cache simulator) to achieve this • Regrouping Algorithm • Field Affinity Graphs built by the modified Simulator are then used to recommend reorderings based on a new regrouping algorithm
Design Overview • Build stage: Tag structure field accesses in the Pegasus IR • Simulation stage: Propagate tag information through SimpleScalar to the new regroup library • Final stage: Invoke regrouping algorithm to calculate reordering recommendations
Build Stage, Tagging Accesses • Objective: Identify and tag structure field accesses in the Pegasus IR • Not trivial, since SUIF/C2DIL do not preserve required type information during transformation to IR • Need to identify patterns that indicate structure field accesses
Actual Pegasus Illustration int foo(struct my_t stestfoo) { int retval = stestfoo.f2; return(retval); } Which wire here should havestruct type? int foo(struct my_t* stestfoo) { return(stestfoo->f2); } Which wire here has struct type?
Simulation Process • Tag info on loads and stores is propagated through SimpleScalar to the regrouping library that builds the field affinity graph (done online, during simulation)
Regrouping Stage • After simulation, analyze collected profiling data to produce reordering recommendation • Can be done better than has been done in previous work (greedy) • Cannot be done optimally (NP-hard) • Field Affinity Graph (one per structure): • Vertices: fields in a structure • Edge weights: represent degree of contemporaneity of accesses between the fields
Matching Heuristic • Find a maximum weight matching in the field affinity graph • Fields that will not fit into a cache line together anyway are identified and ignored • Structure is reordered by placing matched fields together
NP-Hardness • NP-Hardness is shown by reducing graph coloring problem to regrouping problem
Results • Implemented successfully to handle structure field accesses done through pointers (ptr->fld) • So far, only small programs have been tested • Reordering is done manually and fed into simulator again to obtain the number of cycles for comparison
Results - Example Original: struct my_t { int f1; int f2; char nu[4096]; int f3; int f4; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } Modified: struct my_t { int f1; int f4; int f2; char nu[4096]; int f3; }; int foo(struct my_t *elt) { int i; elt->f1 = 2; elt->f4 = 100; for(i=0; i < 50; i++) { elt->f1++; elt->f4--; } return elt->f1+elt->f4; } 745 Cycles per Call (one less cache miss) 750 Cycles per Call
Conclusion • Performance improvements are achievable even on simple programs using reorganization recommendations • Propagation of full type information in SUIF/c2dil from source would be required to optimize non-pointer accesses • Less memory-exposed languages would allow for easy and quick implementation of the reordering recommendation
References • Trishul M. Chilimbi, Bob Davidson, and James R. Larus, “Cache-Conscious Structure Definition,'' in Proceedings of the ACM SIGPLAN '99 Conference on Programming Language Design and Implementation, pages 13-24, May 1999. • Mathprog (Weighted Matching Algorithm) http://elib.zib.de/pub/Packages/mathprog/matching/weighted/ • Pegasus: http://www-2.cs.cmu.edu/~phoenix/ • SUIF: http://suif.stanford.edu/ • SimpleScalar Tool set: http://www.cs.wisc.edu/~mscalar/simplescalar.html