1 / 51

Parallel Performance Analysis with Open|SpeedShop Seminar @ NASA

This seminar introduces Open|SpeedShop, covering basic concepts, terminology, running examples, and an overview of features like sampling and tracing. It also discusses parallel performance analysis and the status and roadmap of the tool.

falboj
Télécharger la présentation

Parallel Performance Analysis with Open|SpeedShop Seminar @ NASA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Performance Analysiswith Open|SpeedShop Seminar@ NASANASA Ames Research Center October 29, 2008

  2. Presenters and Partners • Jim Galarowicz, Krell • Don Maghrak, Krell Larger Team: • Martin Schulz, LLNL • David Montoya, LANL • Scott Cranford, Sandia NLs University of Wisconsin • William Hachfeld, Krell University of Maryland • Samuel Gutierrez, LANL Rice University • Joseph Kenny, Sandia NLs • Chris Chambreau, LLNL

  3. Seminar Goals • Introduce Open|SpeedShop • Basic concepts, terminology, modes of operation • Running first examples • Provide Overview of Features • Sampling & Tracing in O|SS • Performance comparisons • Parallel performance analysis • Status and Roadmap

  4. Highlights • Open Source Performance Analysis Tool Framework • Most common performance analysis steps in one tool • Extensible by using plugins for data collection and representation • Profiling (sampling) and Tracing (wrapping functions) • Multiple Instrumentation Options • All work on unmodifiedapplicationbinaries • Need –g, but can be with –O3, O2, etc., in order to map to source lines. • Offline data collection: run program start to end • Online data collection withability to attachto running applications. Start and stop data collection.

  5. Highlights • Flexible and Easy to use • User access through: • Graphical User Interface (GUI) • Interactive CommandLine • PythonScripting API • Large Range of Platforms • LinuxClusters/SSI with x86, IA-64, Opteron, and EM64T CPUs • New: more portable offline data collection mechanism • Availability • Full source available on sourceforge.net • Release tar balls on sourceforge.net

  6. O|SS Target Audience • Programmers/code teams • Use Open|SpeedShop out of the box • Powerful performance analysis • Ability to integrate O|SS into projects • Tool developers • Single, comprehensive infrastructure • Easy deployment of new tools • Project/product specific customizations • Predefined/custom experiments

  7. Performance Experiments • Concept of an Experiment • What program to analyze • What type of performance data to gather • How often the performance data is gathered • Consists of Collectors and Views • Collectors define specific type of performance data • Hardware counters, program counter samples • Tracing of certain routines (I/O, MPI) • Views specify data aggregation and presentation • Multiple collectors per experiment possible

  8. Experiment Workflow Application “Experiment” Consists of one or more data “Collectors” Run Process Management Panel Results Results can be displayed using several “Views” Stored in SQL database

  9. Experiment Types in O|SS • Sampling Experiments • Periodically interrupt run and record location • Report statistical distribution of these locations • Typically provides good overview • Overhead mostly low and uniform • Tracing Experiments • Gather and store individual application events, e.g., function invocations (MPI, I/O, …) • Provides detailed, low-level information • Higher overhead, potentially bursty

  10. Sampling Experiments • PC Sampling (pcsamp) • Record PC in user defined time intervals • Low overhead overview of time distribution • User Time (usertime) • PC Sampling + Call stacks for each sample • Provides inclusive & exclusive timing data • Hardware Counters (hwc, hwctime) • Sample HWC overflow events • Access to data like cache and TLB misses

  11. Tracing Experiments • I/O Tracing (io, iot) • Record invocation of all POSIX I/O events • Provides aggregate and individual timings • MPI Tracing (mpi, mpit, mpiotf) • Record invocation of all MPI routines • Provides aggregate and individual timings • Floating Point Exception Tracing (fpe) • Triggered by any FPE caused by the code • Helps pinpoint numerical problem areas

  12. Parallel Experiments • O|SS supports MPI and threaded codes • Tested with a variety of MPI implementation • Thread support based on POSIX threads • Any collector can be applied to parallel job • Automatically applied to all tasks/threads • Default views aggregate across all tasks/threads • Data from individual tasks/threads available • Specific parallel experiments (e.g., mpi, mpit)

  13. High-level Architecture GUI CLI pyO|SS CLI Code Instrumentation Open SourceSoftware AMD and Intel based clusters/SSI using Linux

  14. Code Instrumentation in O|SS • Offline/External Data Collection • Instrument application at start-up • Write data to raw files and convert to O|SS • Performance data available at end of execution. • Online Scalable Data Collection via MRNet • Scalable transport layer • Performance data delivered directly to tool online • Ability for interactive online analysis and viewing intermediate results

  15. Offline & Online Data Collection Offline MRNet MPI Application MPI Application post-mortem O|SS O|SS

  16. High-level Architecture GUI CLI pyO|SS CLI Code Instrumentation Open SourceSoftware AMD and Intel based clusters/SSI using Linux

  17. Three Interfaces (GUI, CLI, Python) Experiment Commands expAttach expCreate expDetach expGo expView List Commands list -v exp list -v hosts list -v status Session Commands setBreak openGui import openss my_filename=oss.FileList("myprog.a.out") my_exptype=oss.ExpTypeList("pcsamp") my_id=oss.expCreate(my_filename,my_exptype) oss.expGo() My_metric_list = oss.MetricList("exclusive") my_viewtype = oss.ViewTypeList("pcsamp“) result = oss.expView(my_id,my_viewtype,my_metric_list)

  18. Running an Experiment Running a simple example experiment Examine the command syntax List the outputs from the experiment Viewing and Interpreting gathered measurements GUI, CLI via the experiment database file Show “–offline” example in more detail Introduce additional command syntax

  19. Basic offline experiment syntax openss –offline –f “executable” pcsamp openss is the command to invoke Open|SpeedShop -offline indicates the user interface to use (immediate command) There are a number of user interface options -f is the option for specifying the executable name The “executable” can be a sequential or parallel command pcsamp indicates what type of performance data (metric) you will gather Here pcsamp indicates that we will periodically take a sample of the address that the program counter is pointing to. We will associate that address with a function and/or source line. There are several existing performance metric choices

  20. What are the outputs? Outputs from : openss –offline –f “executable” pcsamp Normal program output while executable is running The sorted list of performance information A list of the top time taking functions The corresponding sample derived time for each function A performance information database file The database file contains all the information needed to view the data at anytime in the future without the executable(s). Symbol table information from executable(s) and system libraries Performance data openss gathered Time stamps for when dso(s) were loaded and unloaded

  21. Example Parallel Run with Output openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

  22. Output from Example Run openss –offline –f “orterun -np 128 sweep3d.mpi” pcsamp

  23. Using the Database file Database file is one of the outputs from running: openss –offline –f “executable” pcsamp Use this file to view the data How to open the database file with openss openss –f <database file name> openss (then use menus or wizard to open) openss –cli exprestore –f <database file name> In this example, we show: both openss –cli –f X.0.openss (CLI) openss –f X.0.openss (GUI) X.0.openss is the file name openss creates by default

  24. Output from Example Run Loading the database file: openss –cli –f X.0.openss

  25. Process Management Panel Control your job, focus stats panel, create process subsets

  26. Default Stats Panel View openss –f X.0.openss: Performance statistics by function is default view

  27. Results map to Source Split screen mapping of performance data to source line

  28. Min,Max,Average (Load Balance) View Select “LB” in Toolbar to generate Load Balance View

  29. Comparative Analysis: Clustering Ranks Select “CA” in Toolbar to generate Comp. Analysis View

  30. Comparative Analysis: Clustering Ranks Select “CA” in Toolbar to generate Comp. Analysis View

  31. Additional experiment syntax openss –offline –f “executable” pcsamp -offline indicates the user interface is immediate command mode. Uses offline (LD_PRELOAD) collection mechanism. openss –cli –f “executable” pcsamp -cli indicates the user interface is interactive command line. Uses online (dynamic instrumentation) collection mechanism. openss –f “executable” pcsamp No interface option indicates the user interface is graphical user. Uses online (dynamic instrumentation) collection mechanism. openss –batch < input.commands.file Executes from file of cli commands

  32. Wizard Panel – page 1 Analyze and/or compare existing data from previous runs Gather data fromnew runs O|SS CommandLine Interface

  33. Wizard Panel – Gather new data Select type of data to be gathered by Open|SpeedShop

  34. Compare Wizard • Side by side performance results

  35. Compare Wizard • Side by Side Source for the two versions

  36. Comparing MPI Ranks Rank 0 Rank 1

  37. CLI Language An interactive command Line Interface gdb/dbx like processing Several interactive commands Create Experiments Provide Process/Thread Control View Experiment Results Where possible commands execute asynchronously http://www.openspeedshop.org/docs/cli_doc/

  38. CLI Command Overview Experiment Creations expcreate expattach Experiment Control expgo expwait expdisable expenable Experiment Storage expsave exprestore Result Presentation expview opengui Misc. Commands help list log record playback history quit

  39. User-Time Example Create experiments and load application lnx-jeg.americas.sgi.com-17>openss -cli openss>>Welcome to OpenSpeedShop 1.9 openss>>expcreate -f test/executables/ fred/fred usertime The new focused experiment identifier is: -x 1 openss>>expgo Start asynchronous execution of experiment: -x 1 openss>>Experiment 1 has terminated. Start application

  40. Showing CLI Results openss>>expview Excl CPU time Inclu CPU time % of Total Exclusive Function in seconds. in seconds. CPU Time (defining location) 5.2571 5.2571 49.7297 f3 (fred: f3.c,2) 3.3429 3.3429 31.6216 f2 (fred: f2.c,2) 1.9714 1.9714 18.6486 f1 (fred: f1.c,2) 0.0000 10.5714 0.0000 __libc_start_main (libc.so.6) 0.0000 10.5714 0.0000 _start (fred) 0.0000 10.5429 0.0000 work(fred:work.c,2) 0.0000 10.5714 0.0000 main (fred: fred.c,5)

  41. CLI Batch Scripting (1) Create batch file with CLI commands Plain text file Example: # Create batch file echo expcreate -f fred pcsamp >> input.script echo expgo >> input.script echo expview pcsamp10 >>input.script # Run OpenSpeedShop openss -batch < input.script

  42. CLI Batch Scripting (2) Open|SpeedShop Batch Example Results The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

  43. CLI Batch Scripting (3) Open|SpeedShop Batch Example: direct #Run Open|SpeedShop as a single non-interactive command openss –batch –f fred pcsamp The new focused experiment identifier is: -x 1 Start asynchronous execution of experiment: -x 1 Experiment 1 has terminated. CPU Time Function (defining location) 24.2700 f3 (mutatee: mutatee.c,24) 16.0000 f2 (mutatee: mutatee.c,15) 8.9400 f1 (mutatee: mutatee.c,6) 0.0200 work (mutatee: mutatee.c,33)

  44. Python Scripting Open|SpeedShop Python API that executes “same” Interactive/Batch Open|SpeedShop commands User can intersperse “normal” Python code with Open|SpeedShop Python API Run Open|SpeedShop experiments via the Open|SpeedShop Python API

  45. Python Example (1) Necessary steps: Import O|SS Python module Prepare arguments for target application Set view and experiment type Create experiment import openss my_filename=openss.FileList("usability/phaseII/fred") my_viewtype = openss.ViewTypeList() my_viewtype += "pcsamp" exp1=openss.expCreate(my_filename,viewtype)

  46. Python Example (2) After experiment creation Start target application (asynchronous!) Wait for completion Write results openss.expGo() openss.wait() except openss.error: print "expGo(exp1,my_modifer) failed" openss.dumpView()

  47. Python Example Output Two interfaces to dump data Plain text (similar to CLI) for viewing As Python objects for post-processing >python example.py /work/jeg/OpenSpeedShop/usability/phaseII/fred: successfully completed. Excl. CPU time % of CPU Time Function (def. location) \ 4.6700 47.7994 f3 (fred: f3.c,23) 3.5100 35.9263 f2 (fred: f2.c,2) 1.5900 16.2743 f1 (fred: f1.c,2)

  48. Extensibility • O|SS is more than a performance tool • All functionality in one toolset with one interface • General infrastructure to create new tools • Plugins to add new functionality • Cover all essential steps of performance analysis • Automatically loaded at O|SS startup • Three types of plugins • Collectors: How to acquire performance data? • Views: How to aggregate and present data? • Panels: How to visualize data in the GUI?

  49. Overview Summary • Two techniques for instrumentation • Online vs. Offline • Different strength for different target scenarios • Flexible GUI that can be customized • Several compatible scripting options • Command Line Language • Direct batch interface • Integration of O|SS into Python • GUI and scripting interoperable • Plugin concept to extend Open|SpeedShop

  50. Status & Future Plans • Open|SpeedShop 1.9 available shortly • Packages and source from sourceforge.net • Tested on a variety of platforms • Offline version featured in version 1.9 • Online (MRNet) work in progress • Target is version 2.0 in December • Working on some platforms but not all • Focus on Scalability in coming months • Support for capability machines via Office of Science proposal with ASC assistance

More Related