1 / 16

A Practical Method For Quickly Evaluating Program Optimizations

A Practical Method For Quickly Evaluating Program Optimizations. Grigori Fursin, Albert Cohen, Michael O’Boyle and Olivier Temam ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud Universit, France Institute for Computing Systems Architecture, University of Edinburgh, UK. Presented by Shaofeng Liu.

makan
Télécharger la présentation

A Practical Method For Quickly Evaluating Program Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Practical Method For Quickly Evaluating Program Optimizations Grigori Fursin, Albert Cohen, Michael O’Boyle and Olivier Temam ALCHEMY Group, INRIA Futurs and LRI, Paris-Sud Universit, France Institute for Computing Systems Architecture, University of Edinburgh, UK Presented by Shaofeng Liu

  2. Outline • Short background • What’s problem we want to solve? • How to solve the problem? • The main challenges • Result • Conclusion

  3. Background • Iterative optimization has great potential for a large range of optimization techniques; • By running the program repeatedly, a new optimization technique is tested at each execution; • What’s the problem of this evaluation approach?

  4. The problem of Iterative evaluation • Problem: • Optimization option space could be huge. • An naïve iterative search could be very time-consuming. • e.g. The mgrid SpecFP2000 benchmark, it’s original execution time is 290s, if we have 32 optimization options, the total evaluation time will be 290x32=9280s. [Note: An optimization option is not necessarily a single optimization technique, it could be a combined set of techniques. ] • Few work provides a practical approach for effectively applying iterative optimization.

  5. Can we do better? • The idea is: • Can we evaluate multiple optimization options in a single run of the program? • To do this: • This paper does some research on the programs. Other than knowing nothing about the programs as last two iterative papers said, this work takes advantage of an interesting property of many programs. • The interesting thing is: • The programs (scientific applications) tend to have some performance stability. Some papers has shown that many programs exhibit phases, i.e. program trace intervals of several millions instructions where performance is similar.

  6. What’s does phase mean? • Phase is actually the stable consecutive periodic runs of the same piece of code. (e.g. time-consuming function calls, or big loop). • A phase only corresponds to one piece of code; • But one piece of code may have multiple phases; • For example, if we monitor the subroutines “resid” & “psinv” in mgrid, we can see their behaviors are quite stable and predictable.

  7. The stability of execution time of subroutine resid • We can see that the execution time of resid is quite stable with a period of 7. • Can we take advantage of this stability?

  8. More examples

  9. The Main Idea • Find some time-consuming functions and big loops to optimize; (I think this is done by the EKOPath compiler and users). • Insert the codes optimized with different optimization options into the original code, i.e, multi-version code; • Detect the phases of the program; • Apply different optimization options within one phase and measure the execution time. Since these executions are supposed to have same execution time without optimization, so the changes of execution time is the effect of the optimization techniques.

  10. Add monitor code • Compiler instrumentation • Two monitoring routines timer_start and timer_stop are added before and after each monitored code section; timer_start: select one piece of code from the multi-version, and record the starting time; timer_stop: record the completion time and compute the IPC; detect phases and regularity. • The overhead of the instrumentation is very small. (less than 1%).

  11. Detect stability • For each phase, it is assigned a unique identifier, so they are evaluated independently. • It’s true that one optimization could benefit code of one phase but do harm to the code of others; • So we only consider a single phase • We define stability by 3 consecutive of periodic code section execution instance with the same IPC. • It’s easy to design an algorithm to find the distance between consecutive periodic execution.

  12. Evaluating Optimization Options • To evaluate a single optimization options, we need four consecutive periodic executions of the code: • The first two executions run code with the optimization option, to double check the optimization performance; • The next two executions run original code, to verify the prediction is correct; if the execution time remains the baseline performance, then the prediction is correct, otherwise we start over to detect the regularity again. [Note: the miss rate is fairlow • If we have N optimization options, then we can evaluate all of them in roughly 4*N consecutive periodic executions of the code; • The rest of the program is executed using the best code; so we have a self-tuned program now!

  13. Result • The evaluation process is greatly accelerated; we can evaluate more optimization options in a single run; • The self-tuned program has

  14. Data Structure • The paper uses a Phase Detection and Prediction Table (PDPT) to record the running of the program. It looks like:

  15. Conclusion and Future work • Conclusion • The time required to search the huge program transformations space is the main issue to prevent iterative optimization from being widely used; • This paper uses a new approach to speed up the search by a factor of 32-962 over a set of benchmark; • The method has other benefit: self-tuned program across different architectures; • Future work • Analysis of large complex transformation spaces; • Improve phase detection and prediction scheme;

  16. Questions?

More Related