1 / 24

PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING

PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING. Mark Stephenson & Saman Amarasinghe Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab. INTRODUCTION & MOTIVATION. Compiler heuristics rely on detailed knowledge of the system

stacia
Télécharger la présentation

PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING Mark Stephenson & Saman Amarasinghe Massachusetts Institute of Technology Computer Science and Artificial Intelligence Lab

  2. INTRODUCTION & MOTIVATION • Compiler heuristics rely on detailed knowledge of the system • Compiler interactions not understood • Architectures are complex

  3. HEURISTIC DESIGN • Current approach to heuristic development is somewhat ad hoc • Can compiler writers learn anything from baseball? • Is it feasible to deal with empirical data? • Can we use statistics and machine learning to build heuristics?

  4. CASE STUDY • Loop unrolling • Code expansion can degrade performance • Increased live ranges, register pressure • A myriad of interactions with other passes • Requires categorization into multiple classes • i.e., what’s the unroll factor?

  5. ORC’S HEURISTIC (UNKNOWN TRIPCOUNT) if (trip_count_tn == NULL) { UINT32 ntimes = MAX(1, OPT_unroll_times-1); INT32 body_len = BB_length(head); while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max) ntimes--; Set_unroll_factor(ntimes); } else { … }

  6. ORC’S HEURISTIC (KNOWN TRIPCOUNT) } else { BOOL const_trip = TN_is_constant(trip_count_tn); INT32 const_trip_count = const_trip ? TN_value(trip_count_tn) : 0; INT32 body_len = BB_length(head); CG_LOOP_unroll_min_trip = MAX(CG_LOOP_unroll_min_trip, 1); if (const_trip && CG_LOOP_unroll_fully && (body_len * const_trip_count <= CG_LOOP_unrolled_size_max || CG_LOOP_unrolled_size_max == 0 && CG_LOOP_unroll_times_max >= const_trip_count)) { Set_unroll_fully(); Set_unroll_factor(const_trip_count); } else { UINT32 ntimes = OPT_unroll_times; ntimes = MIN(ntimes, CG_LOOP_unroll_times_max); if (!is_power_of_two(ntimes)) { ntimes = 1 << log2(ntimes); } while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max) ntimes /= 2; if (const_trip) { while (ntimes > 1 && const_trip_count < 2 * ntimes) ntimes /= 2; } Set_unroll_factor(ntimes); } }

  7. SUPERVISED LEARNING • Supervised learning algorithms try to find a function F(X) → Y • X : vector of characteristics that define a loop • Y : empirically found best unroll factor 1 2 3 4 Loops Unroll Factors 5 6 7 8 F(X)

  8. EXTRACTING THE DATA • Extract features • Most features readily available in ORC • Kitchen sink approach • Finding the labels (best unroll factors) • Added instrumentation pass • Assembly instructions inserted to time loops • Calls to a library at all exit points • Compile and run at all unroll factors (1.. 8) • For each loop, choose the best one as the label

  9. LEARNING ALGORITHMS • Prototyped in Matlab • Two learning algorithms classified our data set well • Near neighbors • Support Vector Machine (SVM) • Both algorithms classify quickly • Train at the factory • No increase in compilation time

  10. NEAR NEIGHBORS # FP operations # branches unroll don’t unroll

  11. SUPPORT VECTOR MACHINES • Map the original feature space into a higher-dimensional space (using a kernel) • Find a hyperplane that maximally separates the data

  12. # FP operations # FP operations # branches2 # branches SUPPORT VECTOR MACHINES unroll don’t unroll

  13. PREDICTION ACCURACY • Leave-one-out cross validation • Filter out ambiguous training examples • Only keep obviously better examples (1.05x) • Throw away obviously noisy examples

  14. REALIZING SPEEDUPS (SWP DISABLED)

  15. FEATURE SELECTION • Feature selection is a way to identify the best features • Start with loads of features • Small feature sets are better • Learning algorithms run faster • Are less prone to overfitting the training data • Useless features can confuse learning algorithms

  16. FEATURE SELECTION CONT.MUTUAL INFORMATION SCORE • Measures the reduction of uncertainty in one variable given knowledge of another variable • Does not tell us how features interact with each other

  17. FEATURE SELECTION CONT.GREEDY FEATURE SELECTION • Choose single best feature • Choose another feature, that together with the best feature, improves classification accuracy most …

  18. FEATURE SELECTIONTHE BEST FEATURES

  19. RELATED WORK • Monsifrot et al., “A Machine Learning Approach to Automatic Production of Compiler Heuristics.” 2002 • Calder et al., “Evidence-Based Static Branch Prediction Using Machine Learning.” 1997 • Cavazos et al., “Inducing Heuristic to Decide Whether to Schedule.” 2004 • Moss et al., “Learning to Schedule Straight-Line Code.” 1997 • Cooper et al., “Optimizing for Reduced Code Space using Genetic Algorithms.” 1999 • Puppin et al., “Adapting Convergent Scheduling using Machine Learning.” 2003 • Stephenson et al., “Meta Optimization: Improving Compiler Heuristics with Machine Learning.” 2003

  20. CONCLUSION • Supervised classification can effectively find good heuristics • Even for multi-class problems • SVM and near neighbors perform well • Potentially have big impact • Spent very little time tuning the learning parameters • Let a machine learning algorithm tell us which features are best

  21. T H E N D T H E N D E E

  22. SOFTWARE PIPELINING • ORC has been tuned with SWP in mind • Every major release of ORC has had a different unrolling heuristic for SWP • Currently 205 lines long • Can we learn a heuristic that outperforms ORC’s SWP unrolling heuristic?

  23. REALIZING SPEEDUPS (SWP ENABLED)

  24. HURDLES • Compiler writer must extract features • Acquiring labels takes time • Instrumentation library • ~2 weeks to collect data • Predictions confined to training labels • Have to tweak learning algorithms • Noise

More Related