1 / 26

Profiling S3D on Cray XT3 using TAU

Profiling S3D on Cray XT3 using TAU. Sameer Shende tau-team@cs.uoregon.edu. Acknowledgements. Alan Morris [UO] Kevin Huck [UO] Allen D. Malony [UO] Kenneth Roche [ORNL] Bronis R. de Supinski [LLNL]. TAU Parallel Performance System. http://www.cs.uoregon.edu/research/tau/

syshe
Télécharger la présentation

Profiling S3D on Cray XT3 using TAU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profiling S3D on Cray XT3 using TAU Sameer Shende tau-team@cs.uoregon.edu

  2. Acknowledgements • Alan Morris [UO] • Kevin Huck [UO] • Allen D. Malony [UO] • Kenneth Roche [ORNL] • Bronis R. de Supinski [LLNL]

  3. TAU Parallel Performance System • http://www.cs.uoregon.edu/research/tau/ • Multi-level performance instrumentation • Multi-language automatic source instrumentation • Flexible and configurable performance measurement • Widely-ported parallel performance profiling system • Computer system architectures and operating systems • Different programming languages and compilers • Support for multiple parallel programming paradigms • Multi-threading, message passing, mixed-mode, hybrid

  4. TAU Performance System Architecture event selection

  5. TAU Performance System Architecture

  6. Program Database Toolkit (PDT) Application / Library C / C++ parser Fortran parser F77/90/95 Program documentation PDBhtml Application component glue IL IL SILOON C / C++ IL analyzer Fortran IL analyzer C++ / F90/95 interoperability CHASM Program Database Files Automatic source instrumentation TAU_instr DUCTAPE

  7. PAPI • Performance Application Programming Interface • The purpose of the PAPI project is to design, standardize and implement a portable and efficient API to access the hardware performance monitor counters found on most modern microprocessors. • Parallel Tools Consortium project • Developed by University of Tennessee, Knoxville • http://icl.cs.utk.edu/papi/

  8. S3D - Building with TAU • Change name of compiler in build/make.XT3 • ftn=> tau_f90.sh • cc => tau_cc.sh • Set compile time environment variables • setenv TAU_MAKEFILE /spin/proj/perc/TOOLS/tau_latest/xt3/lib/ Makefile.tau-callpath-multiplecounters-mpi-papi-pdt-pgi • Choose callpath, PAPI counters, MPI profiling, PDT for source instrumentation • setenv TAU_OPTIONS ‘-optTauSelectFile=select.tau -optPreProcess’ • Selective instrumentation file eliminates instrumentation in lightweight routines • Pre-process Fortran source code using cpp before compiling • Set runtime environment variables for instrumentation control and event PAPI counter selection in job submission script: • export TAU_THROTTLE=1 • export COUNTER1 GET_TIME_OF_DAY • export COUNTER2 PAPI_FP_INS • export COUNTER3 PAPI_L1_DCM • export COUNTER4 PAPI_RES_STL • export COUNTER5 PAPI_L2_DCM

  9. Selective Instrumentation in TAU % cat select.tau BEGIN_EXCLUDE_LIST MCADIF GETRATES TRANSPORT_M::MCAVIS_NEW MCEDIF MCACON CKYTCP THERMCHEM_M::MIXCP THERMCHEM_M::MIXENTH THERMCHEM_M::GIBBSENRG_ALL_DIMT CKRHOY MCEVAL4 THERMCHEM_M::HIS THERMCHEM_M::CPS THERMCHEM_M::ENTROPY END_EXCLUDE_LIST BEGIN_INSTRUMENT_SECTION loops routine="#" END_INSTRUMENT_SECTION

  10. TAU’s ParaProf Profile Browser - Manager Derived Metrics Flops = PAPI_FP_INS/wallclock time

  11. Main Window - 8 cpus (MPI Ranks 0-7) Some routines execute on different sets of processors

  12. Mean Profile Over 8 cpus -- Exclusive Time

  13. Mean Percentage -- Exclusive Time

  14. Loop Level Profile With PAPI Counter Data

  15. ParaProf’s Source Browser

  16. Exclusive MFLOPS

  17. FP Instructions per L1 Data Cache Miss (rank 0)

  18. Level 1 Data Cache Misses

  19. Callpath Profiles

  20. Callpath Profiles: Flops, Resource Stalls

  21. Callpath Thread Relations Window parent routine children

  22. Flat Profile

  23. TAU’s ParaProf Profile Browser - Manager Different sections of code within the same routine execute on odd and even processors!

  24. 3D Window: Rank, Routine, Time, Instructions

  25. 3D Window: Variations in FP/L1 DCM ratios

  26. Getting Access to TAU on Jaguar • set path=(/spin/proj/perc/TOOLS/tau_latest/x86_64/bin $path) • Choose Stub Makefiles (TAU_MAKEFILE env. var.) from /spin/proj/perc/TOOLS/tau_latest/xt3/lib/Makefile.* • Makefile.tau-mpi-pdt-pgi (flat profile) • Makefile.tau-mpi-pdt-pgi-trace (event trace, for use with Vampir) • Makefile.tau-callpath-mpi-pdt-pgi (single metric, callpath profile) • Binaries of S3D can be found in: • ~sameer/scratch/S3D-BINARIES • withtau • papi, multiplecounters, mpi, pdt, pgi options • without_tau

More Related