1 / 26

IBM Haifa Tools Update and Directions

IBM Haifa Tools Update and Directions. http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber (haber@il.ibm.com). IBM Haifa Performance Tools. FDPR-Pro Feedback-based optimizer operating on binary executable files Part of the AIX 5L Available on Linux on Power via alphaworks

jola
Télécharger la présentation

IBM Haifa Tools Update and Directions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IBM Haifa Tools Update and Directions http://www.haifa.il.ibm.com/dept/svt/code_paot.html Gad Haber (haber@il.ibm.com)

  2. IBM Haifa Performance Tools • FDPR-Pro • Feedback-based optimizer operating on binary executable files • Part of the AIX 5L • Available on Linux on Power via alphaworks • Under development for z/OS • to be available in SDK 2.0 for the Cell platform • CodeAnalyzer • Eclipse plugin for analyzing executable files and shared libraries • Part of the Visual Performance Navigator (VPA) • to be available in the Cell SDK 2.0 • ESTO • Utility for identifying the optimal set of optimization options • Embedded into FDPR-Pro • Under development for tuning compilers’ options • BProber • Utility for instrumenting binary executable files • Under development • PDT – Performance Debugging Tool for the Cell • Operates on trace files from the Cell SPEs

  3. FDPR-Pro Feedback Directed Program Restructuring

  4. FDPR-Pro - Feedback Directed Program Restructuring • Using a global view of the entire program • Operating on the executable file after linkage • These properties enable FDPR-Pro to do: • Global Code Reordering • Inter Procedure Boundaries Optimizations • Static Data Rearrangement • Constant Area Rearrangement • Data Prefetching • Examples of FDPR-Pro additional optimizations: • Usage of Branch Tables • Usage of TOC load instructions • More..

  5. Method • Phase 1: Code instrumentation • Basic block level • Phase 2: Profile information gathering • Selection of "right" input set (representative workload) • Accumulation over several input sets • Phase 3: Global Code & Data Optimizations • Complements the compiler

  6. Partial list of FDPR-Pro Optimizations • -RC Reorder Code • -bf Branch folding • -bp Branch prediction bit setting • -align Code alignment • -uce Unreachable code elimination • -i_resched Instruction re-scheduling • -RD, -build_dcg Static data reordering • -tocload, -reduce_toc Tocload optimizations • -si, -ipht, -ihf, -isf Function inlining options • -ptrgl_optimization Optimize function calls via pointers • -dp Data prefetching • -link_reg_optimization Eliminate stores/restore of link register • -volatile_regs Eliminate stores/restores using available volatile regs • -killed_regs Eliminate stores/restores of killed registers • -load_after_store Separate between frequent load and store to same address • -loop_unroll Loop unrolling • -stack_opt Reduce stack frame size of Hot functions • -dce Dead code elimination • -cp Constant propagation

  7. FDPR-Pro Directions • New heavy analyses for more optimizations enablement • Under development • Value propagation • Constant Evaluation • Stack aliasing • FDPR-Pro for multi-core • FDPR-Pro for the Cell processor to be available in SDK 2.0 • Special options for profile gahering on the Cell • New optimizations for SPE code • Auto-parallelization optimizations • FDPR-Pro for embedded PowerPC is available • Special features added to FDPR-Pro • accepting sampled profile and complemeting it • optimizations taking into account pipeline stalls of embedded PowerPC • New optimizations for space reduction are added

  8. Code Analyzer

  9. Why Code Analyzer? • Architectures are becoming more complex • Now upcoming multi-core platforms • Using only hardware simulators to detect information about potential performance bottlenecks in a given program is hard • There is a need for performance tools that can statically analyze and visualize programs for a platform design, to be used by: • Hardware architects • Compiler writers • Application developers

  10. What is Code Analyzer? • Code Analyzer is an eclipse plugin which performs comprehensive static analysis on given executable files and DLLs • Relies on the FDPR-Pro as the engine for the analysis phase • Code Analyzerdisplays the analyzed information together with profiling data collected by: • tprof/Oprofile (in VPA xml format - ETM files) • FDPR-Pro (in binary or xml format) • The code is then colored according to: • Frequency counters - gathered by FDPR-Pro • Hardware event ticks - gathered by tprof/Oprofile

  11. Code Analyzer Views • Provides several views of the input binary • Assembly instructions • Basic blocks • Procedures • CSECT modules • control flow graph • Hot loops • Call graph • Annotated source code • Dispatch group formation • Pipeline slots and functional units

  12. Grouping, Performance Comments and Pipeline Views

  13. Code Analyzer opened up from Profile Analyzer

  14. Code Analyzer (on the right) synchronized with Profile Analyzer (on the left)

  15. Code Analyzer - Available Performance Comments • Comments which do not require profiling • Pipeline stalls for the Power architecture • Pipeline stalls for the z9 platform • Unreachable code and non-used data • Misaligned targets • Profile-based comments • Invariant instructions within Hot loops • Hot function calls proceeded by overwriting non-volatile registers • Hot saves and restores of registers which could be relocated to cold spill areas • Hot instructions that could be scheduled to colder areas in the code • Removable hot branches • Hot direct unconditional branches • Hot direct conditional branches that are taken, which have a colder fallthru • Hot call sites that are appropriate candidates for function inlining • Hot TOC load instructions that can be replaced by immediate add instructions • Hot Branch to branch instructions

  16. Code Analyzer Directions • Enablement of more comments • Under development • Using FDPR-Pro added analyses • Value propagation • Constant Evaluation • Stack aliasing • Code Analyzer for multi-core • Code Analyzer for the Cell processor • to be available in SDK 2.0 • Special views for distribution of instructions’ frequency on SPE code • New stall comments relevant to the PPE and SPEs

  17. ESTO Expert System for Tuning Optimizations

  18. Why an automatic tool for tuning optimizations? • Optimization is controlled by a large number of options • The problem is finding the option set that maximizes performance • Parameterized (ranged) options complicate and multiply the possibilities • Each option performs a rather small change in the object program • Typical users do not know which options are best for their programs • The default (e.g. -O3) is adequate, but not best for a specific program • Optimizer (compiler) developers need to find the optimal option sets for the default combinations (e.g. -O3) and benchmarking (e.g. SPEC)

  19. ESTO - Expert System for Tuning Optimizations • Purpose • Enable a typical user to utilize the actual optimization potential • Automate the search in the very complex option space • Produce a ‘close to optimal’ program in a reasonable time • Method • Trial-and-error search in the multidimensional options space • In each step another option set is used to optimize same program • The program runtime is measured and compared to other results • The algorithm converges to some ‘close to optimal’ option set • Features • Flexible configuration for applications and running environments • Possibility to extend the components, run parallel processes, etc.

  20. ESTO gain % over FDPR-Pro -O3 on Linux with SPEC2000 train workload, 64 bit 16.00 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 art gap eon gcc apsi gzip mcf twolf bzip2 applu mgrid swim crafty mesa vortex parser ammp equake average perlbmk wupwise ESTO today • Embedded into FDPR-Pro • By using a command line option --tune • Reaches impressive speed-ups on some benchmarks • Provides a good average

  21. ESTO directions • Enabling ESTO to tune compiler optimizations • Under development • Requires a configuration file with descriptions of all optimization flags • Initial adaptation for GCC • Looked at GCC “binary” (on/off) options: ~60 affect performance • Runtime speed-up on SPEC BMs relative to -O1

  22. BProberBinary Prober

  23. Why binary probing technology is needed? • Analysis • Each Application has it own characteristics • Insert tailored instrumentation stubs • Simulation • New architectures • Insert code that simulates new functionality • Optimization • Performing optimizations locally • Function level down to instructions level • Insert code to be executed instead of existing one

  24. BProber Today • Based on FDPR-Pro technology • Enables insertion of code at • Specific address • Specific Function (entry and exit points) • The inserted code is defined as function in separate library • Can be written in any language • Control transfer to the code is done via inserted call • Parameters passed to the function • Original address of instrumentation • Save area of the registers prior to the call • Definition file of user code (libraries and functions) and insertion locations is used • Availability • IBM internal use (alpha) • Supports very large programs including 64bit applications • Both AIX and Linux on Power

  25. PDTPerformance Debugging Tool for the Cell

  26. PDT – Performance Debugging Tool • PDT enables analysis and visualizing of traces from the various SPE and the interactions between them

More Related