350 likes | 479 Vues
This study explores strategies to enhance memory locality in Java programs using advanced garbage collection techniques, particularly focusing on copying collectors. By analyzing different root traversal policies and leveraging both class-oblivious and class-based object traversal orders, the research aims to optimize the performance of Java applications. We demonstrate through experiments how reordering objects during garbage collection can improve locality, with findings suggesting that partial depth-first traversal is the most effective method. Online profiling indicates flexibility in class-based traversal, revealing potential for better memory management.
E N D
GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng
Motivation • Memory gap • How are Java programs affected?
Marksweep vs. Copying pseudojbb
Motivation • Javac with perfect L1 and L2 cache. • 16K L1 256K L2 • Appel, GCTk. • Breadth first
Motivation • Copying collector can reorder objects • Goal: take advantage of copying collectors reorder objects to improve locality
Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?
Different Root Traversal Policies • Two different types of roots: • Stack, global variables • Remember sets (for generational) • Different traversal orders • Copy all roots before traversing any children • Copy each root and its children (root-by-root) • Split roots • Stack first and the children • Remset first and the children
Experiment Setup • JikesRVM, JMTk • Generational copying collector with bounded nursery size of 4MB • PseudoAdaptive 2nd iteration
Different Root Traversal Policies • RxR has the best mutator locality
Different Root Traversal Policies • Total execution time
Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?
Different Traversal Orders • Breadth first 1,2,3,4,5,6,7 • Pure depth first 1,2,6,3,4,7,5 • Pure depth first, LIFO 1,5,4,7,3,2,6 1 5 4 2 3 7 6
Different Traversal Orders • Breadth first 1,2,3,4,5,6,7 • Pure depth first 1,2,6,3,4,7,5 • Pure depth first, LIFO 1,5,4,7,3,2,6 • Partial depth first, 2 children 1,2,6,3,4,5,7 1 5 4 2 3 7 6
Class Oblivious Type • Different traversal policies • Partial DF is the best
Exploring The Space • Different policies for traversing roots • Class-oblivious traversal orders • Which traversing order is the best? • Class-based traversal orders • How to find the “important” data structure?
Class-based Traversal • Class-oblivious traversal orders inflexible • Class-based object traversal • Static profiling • Dynamic sampling
Static Profiling • Profile object accesses • Find hot pairs with strong correlation • Example • (1,4), (4,7) and (2,6) have strong correlation • Order: 1,4,7,2,6,3,5 1 5 4 2 3 7 6
Online Profiling • Use the adaptive compiler sampling • Hot method • Hot basic block • Use field accesses to indicate hot fields • Example: (In a hot method) { Class A a; a.b=…; … } A b ….. B
Online Profiling • Micro benchmark results
Online Profiling • Geometric mean
Reasons • No advice for most of the objects copied • For jess, db and raytrace, we only pick <<1% of the objects as hot objects • 5% for javac • The hot fields are within the first 2 pointers • 90% of the advised objects for javac
Online Profiling • PseudoJBB mutator results • Generate advice for 23% of the copied objects • 75% of the objects have adviced hot fields other than first 2
Questions • Have we found all the hot objects? • Not all hot objects are connected? • Is class-base good enough? • For pseudojbb, we need instance-based? • Locality for the nursery objects?
Future Work • Sampling technique • Catch more hot objects access • Lower the threshold • Hot objects that are not connected • Dynamically change the advice for phase changing • Nursery locality • Different traversal orders for cold objects • Instance-based
Conclusion • Reorder objects during copying collection can improve locality • In class-oblivious traversal orders partial depth first order is the best • Online profiling, class-based traversal is • more flexible, up to 50% better. • very low overhead, ~0% • Still mysteries
Answers? • Lower the threshold of the sampling, not only the hot methods • For objects with only 1 or 2 pointers, it maybe easier just depth first • Maybe the nursery locality is more important • Instance-based advice
Online Profiling • Execution overhead
Online Profiling • Micro benchmark results for mutator time
Different Root Traversal Policies _227_mtrt
Static Profiling • Results
Answers? • Most objects have only one pointer • Percentage of objects copied by advice (whether it is really hot?) • For pseudojbb ~50%, for jess <<1%, for our micro benchmark ~16% • Change! Half of the pairs do not form chains longer than 2 • Maybe the nursery locality is more important
Class Oblivious Orderings • Different traversal policies • Partial DF is better pseudoJBB
Motivation • MarkSweep vs. Copying Collector Mutator time of _213_javac
Motivation Mutator L2 misses _213_javac