1 / 23

Performance Tools developed in IBM Haifa

Performance Tools developed in IBM Haifa. http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber (haber@il.ibm.com). HRL Performance Tools. FDPR-Pro Feedback-based optimizer operating on binary executable files Part of the AIX 5L Available on Linux on Power via alphaworks

jonah
Télécharger la présentation

Performance Tools developed in IBM Haifa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Tools developed in IBM Haifa http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber (haber@il.ibm.com)

  2. HRL Performance Tools • FDPR-Pro • Feedback-based optimizer operating on binary executable files • Part of the AIX 5L • Available on Linux on Power via alphaworks • Under development for • Mac OS X – to be available soon via alphaworks • z/OS • CodeAnalyzer • Eclipse plugin tool for analyzing executable files • Under development • To be added as part of the Performance Work Bench (PerfWB) • BProber • Utility for instrumenting binary executable files • Under development • ESTO • Utility for identifying the optimal set of optimization options • Under development

  3. FDPR-Pro Feedback Directed Program Restructuring

  4. FDPR-Pro - Feedback Directed Program Restructuring • Using a global view of the entire program • Operating on the executable file after linkage • These properties enable FDPR-Pro to do: • Global Code Reordering • Inter Procedure Boundaries Optimizations • Static Data Rearrangement • Constant Area Rearrangement • Data Prefetching • Examples of FDPR-Pro additional optimizations: • Usage of Branch Tables • Usage of TOC load instructions • More..

  5. Method • Phase 1: Code instrumentation • Basic block level • Phase 2: Profile information gathering • Selection of "right" input set (representative workload) • Accumulation over several input sets • Phase 3: Global Code & Data Optimizations • Complements the compiler

  6. FDPR-Pro Optimization Options • -RC Reorder Code • -bf Branch folding • -bp Branch prediction bit setting • -align Code alignment • -nop Eliminate nop instructions • -uce Unreachable code elimination • -hco_resched Hot/Cold instruction scheduling • -RD, -build_dcg Static data reordering • -tocload, -reduce_toc Tocload optimizations • -si, -ipht, -ihf, -isf Aggressive function inlining options • -ptrgl_optimization Optimize function calls via pointers • -dcbt_optimization Inject data prefetching instructions • -link_reg_optimization Eliminate stores/restore of link register • -volatile_regs Eliminate stores/restores using available volatile regs • -killed_regs Eliminate stores/restores of killed registers • -load_after_store Separate between frequent load and store to same address • -loop_unroll Loop unrolling • -stack_opt Reduce stack frame size of Hot functions • -dce Dead code elimination

  7. CodeAnalyzer

  8. CodeAnalyzer - Motivation • Architectures are becoming more complex • Using only hardware simulators to detect information about potential performance bottlenecks in a given program is hard • There is a need for performance tools that can statically analyze and visualize programs for a platform design, to be used by: • Hardware architects • Compiler writers • Application developers

  9. CodeAnalyzer • CodeAnalyzer is an eclipse plugin which performs comprehensive static analysis on given executable files and DLLs • Relies on the FDPR-Pro tool for the analysis phase • CodeAnalyzerdisplays the analyzed information together with profiling data collected by: • tprof • FDPR-Pro • The code is then colored according to: • Frequency counters - gathered by FDPR-Pro • Hardware event ticks - gathered by tprof

  10. CodeAnalyzer – (continued) • Provides several views of the input binary • Assembly instructions • Basic blocks • Procedures • CSECT modules • control flow graph • Hot loops • Call graph • Annotated source code • Dispatch group formation • Pipeline slots and functional units

  11. CodeAnalyzer – (continued)

  12. CodeAnalyzer – (continued)

  13. CodeAnalyzer – (continued)

  14. CodeAnalyzer – (continued)

  15. CodeAnalyzer – (continued)

  16. CodeAnalyzer – (continued)

  17. CodeAnalyzer – (continued)

  18. CodeAnalyzer – (continued)

  19. CodeAnalyzer – Performance Comments • Performance comments displayed by CodeAnalyzer • Comments which do not require profiling • Pipeline stalls for the Power architecture • Unreachable code and non-used data • Profile-based comments • Non-variant instructions within Hot loops • Hot function calls proceeded by overwriting non-volatile registers • Hot saves and restores of registers which could be relocated to cold spill areas • Hot instructions that could be scheduled to colder areas in the code • Removable hot branches • Hot direct unconditional branches • Hot direct conditional branches that are taken, which have a colder fallthru • Hot call sites that are appropriate candidates for function inlining • Hot call sites that are appropriate for function specialization • Hot loops that are appropriate for loop unrolling • Hot TOC load instructions that can be replaced by immediate add instructions

  20. Performance Workbench (PerfWB)

  21. PerfWB • CodeAnalyzer is part of the Performance Workbench (PerfWB) utility • PerfWB is a collection of eclipse plugins that provide performance monitoring, tuning and analysis • PerfWB consists of the following eclipse plugins: • ProcMon - system-level monitoring tool for displaying system state and for monitoring running processes and threads • E-Tune- visualizer of feedback information produced by tprof • CodeAnalyzer – performance analyzer of executables and DLLs

  22. ProcMon

  23. E-Tune with CodeAnalyzer

More Related