Download
parallel performance analysis with open speedshop half day tutorial @ sc 2008 austin tx n.
Skip this Video
Loading SlideShow in 5 Seconds..
Parallel Performance Analysis with Open|SpeedShop Half Day Tutorial @ SC 2008 Austin, TX PowerPoint Presentation
Download Presentation
Parallel Performance Analysis with Open|SpeedShop Half Day Tutorial @ SC 2008 Austin, TX

Parallel Performance Analysis with Open|SpeedShop Half Day Tutorial @ SC 2008 Austin, TX

104 Views Download Presentation
Download Presentation

Parallel Performance Analysis with Open|SpeedShop Half Day Tutorial @ SC 2008 Austin, TX

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Parallel Performance Analysiswith Open|SpeedShop Half Day Tutorial @ SC 2008Austin, TX

  2. What is Open|SpeedShop? • Comprehensive Open Source Performance Analysis Framework • Combining Profiling and Tracing • Common workflow for all experiments • Flexible instrumentation • Extensibility through plugins • Partners • DOE/NNSA Tri-Labs (LLNL, LANL, SNLs) • Krell Institute • Universities of Wisconsin and Maryland

  3. Highlights • Open Source Performance Analysis Tool Framework • Most common performance analysis steps all in one tool • Extensible by using plugins for data collection and representation • Several Instrumentation Options • All work on unmodifiedapplicationbinaries • Offline and online data collection / attachto running applications • Flexible and Easy to use • User access through GUI, CommandLine, and PythonScripting • Large Range of Platforms • LinuxClusters with x86, IA-64, Opteron, and EM64T CPUs • Easier portability with offline data collection mechanism • Availability • Used at allthreeASClabs with lab-size applications • Full source available on sourceforge.net

  4. O|SS Target Audience • Programmers/code teams • Use Open|SpeedShop out of the box • Powerful performance analysis • Ability to integrate O|SS into projects • Tool developers • Single, comprehensive infrastructure • Easy deployment of new tools • Project/product specific customizations • Predefined/custom experiments

  5. Tutorial Goals • Introduce Open|SpeedShop • Basic concepts & terminology • Running first examples • Provide Overview of Features • Sampling & Tracing in O|SS • Performance comparisons • Parallel performance analysis • Overview of advanced techniques • Interactive performance analysis • Scripting & Python integration

  6. “Rules” • Let’s keep this interactive • Feel free to ask as we go along • Online demos as we go along • Feel free to play along • Live CDs with O|SS installed (for PCs) • Ask us if you get stuck • Feedback on O|SS • What is good/missing in the tool? • What should be done differently? • Please report bugs/incompatibilities

  7. Presenters • Martin Schulz, LLNL • Jim Galarowicz, Krell • Don Maghrak, Krell • David Montoya, LANL • Scott Cranford, Sandia Larger Team: • William Hachfeld, Krell • Samuel Gutierrez, LANL • Joseph Kenny, Sandia NLs • Chris Chambreau, LLNL

  8. Outline • Introduction & Overview • Running a First Experiment • O|SS Sampling • Simple Comparisons • Break (30 minutes) • I/O Tracing Experiments • Parallel Performance Analysis • Installation Requirements and Process • Advanced Capabilities

  9. Section 1 Overview & Terminology Parallel Performance Analysiswith Open|SpeedShop

  10. Experiment Workflow Application “Experiment” Consists of one or more data “Collectors” Run Process Management Panel Results Results can be displayed using several “Views” Stored in SQL database

  11. High-level Architecture GUI CLI pyO|SS CLI Experiments Code Instrumentation Open SourceSoftware AMD and Intel based clusters/SSI using Linux

  12. Performance Experiments • Concept of an Experiment • What to measure and analyze? • Experiment chosen by user • Any experiment can be applied to any code • Consists of Collectors and Views • Collectors define specific data sources • Hardware counters • Tracing of certain routines • Views specify data aggregation and presentation • Multiple collectors per experiment possible

  13. Experiment Types in O|SS • Sampling Experiments • Periodically interrupt run and record location • Report statistical distribution of these locations • Typically provides good overview • Overhead mostly low and uniform • Tracing Experiments • Gather and store individual application events, e.g., function invocations (MPI, I/O, …) • Provides detailed, low-level information • Higher overhead, potentially bursty

  14. Sampling Experiments * Updated • PC Sampling (pcsamp) • Record PC in user defined time intervals • Low overhead overview of time distribution • User Time (usertime) • PC Sampling + Call stacks for each sample • Provides inclusive & exclusive timing data • Hardware Counters (hwc, hwctime) • Sample HWC overflow events • Access to data like cache and TLB misses

  15. Tracing Experiments * Updated • I/O Tracing (io, iot) • Record invocation of all POSIX I/O events • Provides aggregate and individual timings • MPI Tracing (mpi, mpit, mpiotf) • Record invocation of all MPI routines • Provides aggregate and individual timings • Floating Point Exception Tracing (fpe) • Triggered by any FPE caused by the code • Helps pinpoint numerical problem areas

  16. Parallel Experiments • O|SS supports MPI and threaded codes • Tested with a variety of MPI implementation • Thread support based on POSIX threads • OpenMPI only supported through POSIX threads • Any experiment can be parallel • Automatically applied to all tasks/threads • Default views aggregate across all tasks/threads • Data from individual tasks/threads available • Specific parallel experiments (e.g., MPI)

  17. High-level Architecture GUI CLI pyO|SS CLI Code Instrumentation Open SourceSoftware AMD and Intel based clusters/SSI using Linux

  18. Code Instrumentation in O|SS • Dynamic Instrumentation through DPCL • Data delivered directly to tool online • Ability to attach to running application

  19. Initial Solution: DPCL DPCL MPI Application CommunicationBottleneck OS limitations O|SS

  20. Code Instrumentation in O|SS • Dynamic Instrumentation through DPCL • Data delivered directly to tool online • Ability to attach to running application • Offline/External Data Collection • Instrument application at startup • Write data to raw files and convert to O|SS

  21. Offline Data Collection Offline DPCL MPI Application MPI Application post-mortem O|SS O|SS

  22. Code Instrumentation in O|SS • Dynamic Instrumentation through DPCL • Data delivered directly to tool online • Ability to attach to running application • Offline/External Data Collection • Instrument application at startup • Write data to raw files and convert to O|SS • Scalable Data Collection with MRNet • Similar to DPCL, but scalable transport layer • Ability for interactive online analysis

  23. Hierarchical Online Collection Offline DPCL MRNet MPI Application MPI Application MPI Application post-mortem O|SS O|SS O|SS

  24. Code Instrumentation in O|SS • Dynamic Instrumentation through DPCL • Data delivered directly to tool online • Ability to attach to running application • Offline/External Data Collection • Instrument application at startup • Write data to raw files and convert to O|SS • Scalable Data Collection with MRNet • Similar to DPCL, but scalable transport layer • Ability for interactive online analysis

  25. High-level Architecture GUI CLI pyO|SS CLI Code Instrumentation Open SourceSoftware AMD and Intel based clusters/SSI using Linux

  26. Three Interfaces Experiment Commands expAttach expCreate expDetach expGo expView List Commands listExp listHosts listStatus Session Commands setBreak openGui import openss my_filename=oss.FileList("myprog.a.out") my_exptype=oss.ExpTypeList("pcsamp") my_id=oss.expCreate(my_filename,my_exptype) oss.expGo() My_metric_list = oss.MetricList("exclusive") my_viewtype = oss.ViewTypeList("pcsamp“) result = oss.expView(my_id,my_viewtype,my_metric_list)

  27. Summary • Open|SpeedShop provides comprehensive performance analysis options • Important terminology • Experiments: types of performance analysis • Collectors: data sources • Views: data presentation and aggregation • Sampling vs. Tracing • Sampling: overview data at low overhead • Tracing: details, but at higher cost

  28. Section 2 Running you First Experiment Parallel Performance Analysiswith Open|SpeedShop

  29. Running your First Experiment What do we mean by an experiment? Running a very basic experiment What does the command syntax look like? What are the outputs from the experiment? Viewing and Interpreting gathered measurements Introduce additional command syntax

  30. What is an Experiment? The concept of an experiment Identify the application/executable to profile Identify the type of performance data that is to be gathered Together they form what we call an experiment Application/executable Doesn’t need to be recompiled but needs –g type option to associate gathered data with functions and/or statements. Type of performance data (metric) Sampling based (program counter, call stack, # of events) Tracing based (wrap functions and record information)

  31. Basic experiment syntax openss –offline –f “executable” pcsamp openss is the command to invoke Open|SpeedShop -offline indicates the user interface to use (immediate command) There are a number of user interface options -f is the option for specifying the executable name The “executable” can be a sequential or parallel command pcsamp indicates what type of performance data (metric) you will gather Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to. We will associate that address with a function and/or source line. There are several existing performance metric choices

  32. What are the outputs? Outputs from : openss –offline –f “executable” pcsamp Normal program output while executable is running The sorted list of performance information A list of the top time taking functions The corresponding sample derived time for each function A performance information database file The database file contains all the information needed to view the data at anytime in the future without the executable(s). Symbol table information from executable(s) and system libraries Performance data openss gathered Time stamps for when dso(s) were loaded and unloaded

  33. Example Run with Output * Updated openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

  34. Example Run with Output * Updated openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

  35. Using the Database file Database file is one of the outputs from running: openss –offline –f “executable” pcsamp Use this file to view the data How to open the database file with openss openss –f <database file name> openss (then use menus or wizard to open) openss –cli exprestore –f <database file name> In this example, we show both: openss–f X.0.openss (GUI) openss –cli –f X.0.openss (CLI) X.0.openss is the file name openss creates by default * Updated

  36. Output from Example Run * NEW Loading the database file: openss –cli –f X.0.openss

  37. Process Management Panel * NEW openss –f X.0.openss: Control your job, focus stats panel, create process subsets

  38. GUI view of gathered data * Updated

  39. Associate Source & Data * Updated

  40. Additonal experiment syntax openss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism. openss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line. Uses online (dynamic instrumentation) collection mechanism. openss –f “executable” pcsamp No option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism. openss –batch < input.commands.file Executes from file of cli commands

  41. Demo of Basic Experiment Run Program Counter Sampling (pcsamp) on a sequential application.

  42. Section 3 Sampling Experiments Parallel Performance Analysiswith Open|SpeedShop

  43. Sampling Experiments • PC Sampling • Approximates CPU Time For Line and Function • No Call Stacks • User Time • Inclusive vs. Exclusive Time • Includes Call stacks • HW Counters • Samples Hardware Counter Overflows

  44. Sampling - Considerations • Sampling: Statistical Subset of All Events • Low Overhead • Low Perturbation • Good to Get Overview / Find Hotspots

  45. Example 1 • Offline usertime Experiment – smg2000 • [samuel@yra084 test]$ openss -cli • Welcome to OpenSpeedShop 1.6 • openss>>RunOfflineExp("mpirun -np 16 smg2000 -n 100 100 100","usertime") Experiment Application

  46. Example 1- Views • Default View • Values Aggregated Across All Ranks • Manually Include/Exclude Individual Processes

  47. Example 1- Views Cont. • Load Balance View • Calculates min, max, average Across • Ranks, Processes, or Threads

  48. Example 1- Views Cont. * Updated • Butterfly View Callers Of hypre_SMGResidual Callees Of hypre_SMGResidual

  49. Example 1 Cont. * Updated • Source Code Mapping

  50. Example 2 – hwc • Offline hwc Experiment – gamess • openss -offline -f "./rungms tools/fmo/samples/PIEDA 01 1" hwc -OR- • openss -cli • Welcome to OpenSpeedShop 1.6 • openss>>RunOfflineExp("./rungms tools/fmo/samples/PIEDA 01 1","hwc")