880 likes | 1.11k Vues
PAT. Briefing on Tool Evaluations. Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant HCS Research Laboratory University of Florida.
E N D
PAT Briefing on Tool Evaluations Professor Alan D. George, Principal Investigator Mr. Hung-Hsun Su, Sr. Research Assistant Mr. Adam Leko, Sr. Research Assistant Mr. Bryan Golden, Research Assistant Mr. Hans Sherburne, Research Assistant HCS Research Laboratory University of Florida
Purpose of Evaluations • Investigate performance analysis methods used in existing tools • Determine what features are necessary for tools to be effective • Examine usability of tools • Find out what performance factors existing tools focus on • Create standardized evaluation strategy and apply to popular existing tools • Gather information about tool extensibility • Generate list of reusable components from existing tools • Identify any tools that may serve as basis for our SHMEM and UPC performance tool • Take best candidates for extension and gain experience modifying to support new features
Generate list of desirable characteristics for performance tools Categorize based on influence of a tool’s: Usability/productivity Portability Scalability Miscellaneous Will present list of characteristics and actual scores in later slide Assign importance rating to each Minor (not really important) Average (nice to have) Important (should include) Critical (absolutely needed) Formulate a scoring strategy for each Give numerical scores 1-5: 5 best 0: not applicable Create objective scoring criteria where possible Use relative scores for subjective categories Evaluation Methodology
Performance Tool Test Suite • Method used to ensure subjective scores consistent across each tool • Also used to determine effectiveness of performance tool • Includes • Suite of C MPI microbenchmarks that have specific performance problems: PPerfMark [1,2], based on GrindStone [3] • Large-scale program: NAS NPB LU benchmark [4] • “Control” program with good parallel efficiency to test for false positives: CAMEL cryptanalysis C MPI implementation (HCS lab) • For each program in test suite, assign • FAIL: Tool was unable to provide information to identify bottleneck • TOSS-UP: Tool indicated a bottleneck was occurring, but user must be clever to find out and fix • PASS: Tool clearly showed where bottleneck was occurring and gave enough information so a competent user could fix it
Performance Tool Test Suite (2) • What should performance tool tell us? • CAMEL • No communication bottlenecks, CPU-bound code • Performance could be improved by using non-blocking MPI calls • LU • Large number of small messages • Dependence on network bandwidth and latency • Identify which routines take the most time
Big message Several large messages sent Dependence on network bandwidth Intensive server First node overloaded with work Ping-pong Many small messages, overall execution time dependent on network latency Random barrier One node holds up barrier One procedure responsible for slow node behavior Small messages One node is bombarded with lots of messages Wrong way Point-to-point messages sent in wrong order System time Most time spent in system calls Diffuse procedure Similar to random barrier One node holds up barrier Time for slow procedure “diffused” across several nodes in round-robin fashion Performance Tool Test Suite (3)
Profiling tools TAU (Univ. of Oregon) mpiP (ORNL, LLNL) HPCToolkit (Rice Univ.) SvPablo (Univ. of Illinois, Urbana-Champaign) DynaProf (Univ. of Tennessee, Knoxville) Tracing tools Intel Cluster Tools (Intel) MPE/Jumpshot (ANL) Dimemas & Paraver (European Ctr. for Parallelism of Barcelona) MPICL/ParaGraph (Univ. of Illinois, Univ. of Tennessee, ORNL) List of Tools Evaluated
Other tools KOJAK (Forschungszentrum Jülich, ICL @ UTK) Paradyn (Univ. of Wisconsin, Madison) Also quickly reviewed CrayPat/Apprentice2 (Cray) DynTG (LLNL) AIMS (NASA) Eclipse Parallel Tools Platform (LANL) Open/Speedshop (SGI) List of Tools Evaluated (2)
Tuning and Analysis Utilities (TAU) • Developer: University of Oregon • Current versions: • TAU 2.14.4 • Program database toolkit 3.3.1 • Website: • http://www.cs.uoregon.edu/research/paracomp/tau/tautools/ • Contact: • Sameer Shende: sameer@cs.uoregon.edu
TAU Overview • Measurement mechanisms • Source (manual) • Source (automatic via PDToolkit) • Binary (DynInst) • Key features • Supports both profiling and tracing • No built-in trace viewer • Generic export utility for trace files (.vtf, .slog2, .alog) • Many supported architectures • Many supported languages: C, C++, Fortran, Python, Java, SHMEM (TurboSHMEM and Cray SHMEM), OpenMP, MPI, Charm • Hardware counter support via PAPI
mpiP • Developer: ORNL, LLNL • Current version: • mpiP v2.8 • Website: • http://www.llnl.gov/CASC/mpip/ • Contacts: • Jeffrey Vetter: vetterjs@ornl.gov • Chris Chambreau: chcham@llnl.gov
mpiP Overview • Measurement mechanism • Profiling via MPI profiling interface • Key features • Simple, lightweight profiling • Source code correlation (facilitated by mpipview) • Gives profile information for MPI callsites • Uses PMPI interface with extra libraries (libelf, libdwarf, libunwind) to do source correlation
HPCToolkit • Developer: Rice University • Current version: • HPCToolkit v1.1 • Website: • http://www.hipersoft.rice.edu/hpctoolkit/ • Contact: • John Mellor-Crummey: johnmc@cs.rice.edu • Rob Fowler: rjf@cs.rice.edu
HPCToolkit Overview • Measurement Mechanism • Hardware counters (requires PAPI on Linux) • Key Features • Create hardware counter profiles for any executable via sampling • No instrumentation necessary • Relies on PAPI overflow events and program counter values to relate PAPI metrics to source code • Source code correlation of performance data, even for optimized code • Navigation pane in viewer assists in locating resource-consuming functions
SvPablo • Developer: University of Illinois • Current versions: • SvPablo 6.0 • SDDF component 5.5 • Trace Library component 5.1.4 • Website: • http://www.renci.unc.edu/Software/Pablo/pablo.htm • Contact: • ?
SvPablo Overview • Measurement mechanism • Profiling via source code instrumentation • Key features • Single GUI integrates instrumentation and performance data display • Assisted source code instrumentation • Management of multiple instances of instrumented sourced code and corresponding performance data • Simplified scalability analysis of performance data from multiple runs
Dynaprof • Developer: Philip Mucci (UTK) • Current versions: • Dynaprof CVS as of 2/21/2005 • DynInst API v4.1.1 (dependency) • PAPI v3.0.7 (dependency) • Website: • http://www.cs.utk.edu/~mucci/dynaprof/ • Contact: • Philip Mucci: mucci@cs.utk.edu
Dynaprof Overview • Measurement mechanism • Profiling via PAPI and DynInst • Key features • Simple, gdb-like command line interface • No instrumentation step needed – binary instrumentation at runtime • Produces simple text-based profile output similar to gprof for • PAPI metrics • Wallclock time • CPU time (getrusage)
Intel Trace Collector/Analyzer • Developer: Intel • Current versions: • Intel Trace Collector 5.0.1.0 • Intel Trace Analyzer 4.0.3.1 • Website: • http://www.intel.com/software/products/cluster • Contact: • http://premier.intel.com
Intel Trace Collector/Analyzer Overview • Measurement Mechanism • MPI profiling interface for MPI programs • Static binary instrumentation (proprietary method) • Key Features • Simple, straightforward operation • Comprehensive set of visualizations • Source code correlation pop-up dialogs • Views are linked, allowing analysis of specific portions/phases of execution trace
MPE/Jumpshot • Developer: Argonne National Laboratory • Current versions: • MPE 1.26 • Jumpshot-4 • Website: • http://www-unix.mcs.anl.gov/perfvis/ • Contacts: • Anthony Chan: chan@mcs.anl.gov • David Ashton: ashton@mcs.anl.gov • Rusty Lusk: lusk@mcs.anl.gov • William Gropp: gropp@mcs.anl.gov
MPE/Jumpshot Overview • Measurement Mechanism • MPI profiling interface for MPI programs • Key Features • Distributed with MPICH • Easy to generate traces of MPI programs • Compile with mpicc -mpilog • Scalable logfile format for efficient visualization • Java-based timeline viewer with extensive scrolling and zooming support
CEPBA Tools (Dimemas, Paraver) • Developer: European Center for Parallelism of Barcelona • Current versions: • MPITrace 1.1 • Paraver 3.3 • Dimemas 2.3 • Website: • http://www.cepba.upc.es/tools_i.htm • Contact: • Judit Gimenez: judit@cepba.upc.edu
Dimemas/Paraver Overview • Measurement Mechanism • MPI profiling interface • Key Features • Paraver • Sophisticated trace file viewer, uses “tape” metaphor • Supports displaying hardware counter metrics along in trace visualization • Uses modular software architecture, very customizable • Dimemas • Trace-driven simulator • Uses simple models for real hardware • Generates “predictive traces” that can be viewed by Paraver
MPICL/ParaGraph • Developer: • ParaGraph: University of Illinois, University of Tennessee • MPICL: ORNL • Current versions: • Paragraph (no version number, but last available update 1999) • MPICL 2.0 • Website: • http://www.csar.uiuc.edu/software/paragraph/ • http://www.csm.ornl.gov/picl/ • Contacts: • ParaGraph • Michael Heath: heath@cs.uiuc.edu • Jennifer Finger • MPICL • Patrick Worley: worleyph@ornl.gov
MPICL/Paragraph Overview • Measurement Mechanism • MPI profiling interface • Other wrapper libraries for obsolete vendor-specific message-passing libraries • Key Features • Large number of different visualizations (about 27) • Several types • Utilization visualizations • Communication visualizations • “Task” visualizations • Other visualizations
KOJAK • Developer: Forschungszentrum Jülich, ICL @ UTK • Current versions: • Stable: KOJAK-v2.0 • Development: KOJAK v2.1b1 • Website: • http://icl.cs.utk.edu/kojak/ • http://www.fz-juelich.de/zam/kojak/ • Contacts: • Felix Wolf: fwolf@cs.utk.edu • Bernd Mohr: b.mohr@fz-juelich.de • Generic email: kojak@cs.utk.edu
KOJAK Overview • Measurement Mechanism • MPI profiling interface • Binary instrumentation on a few platforms • Key Features • Generates and analyzes trace files • Automatic classification of bottlenecks • Simple, scalable profile viewer with source correlation • Exports traces to Vampir format
Paradyn • Developer: University of Wisconsin, Madison • Current versions: • Paradyn: 4.1.1 • DynInst: 4.1.1 • KernInst: 2.0.1 • Website: • http://www.paradyn.org/index.html • Contact: • Matthew Legendre: legendre@cs.wisc.edu
Paradyn Overview • Measurement Mechanism • Dynamic binary instrumentation • Key Features • Dynamic instrumentation at runtime • No instrumentation phase • Visualizes user-selectable metrics while program is running • Automatic performance bottleneck detection via Performance Consultant • Users can define their own metrics using a TCL-like language • All analysis happens while program is running
Scores given for each category Usability/productivity Portability Scalability Miscellaneous Scoring formula shown below Used to generate scores for each category Weighted sum based on characteristic’s importance Importance multipliers used Critical: 1.0 Important: 0.75 Average: 0.5 Minor: 0.25 Overall score is sum of all category scores Scoring System