Performance Analysis, Tools and Optimization

Performance Analysis, Tools and Optimization Philip J. Mucci Kevin S. London University of Tennessee, Knoxville ARL MSRC Users’ Group Meeting September 2, 1998

PET, UT and You • Training • Environments • Benchmarking • Evaluation and Reviews • Consulting • Development

Training • Courses on Benchmarking, Performance Optimization, Parallel Tools • Provides good mechanism for technology transfer • Develop needs and direction from the interaction with the user community • Tremendous knowledge base from which to draw

Environments • Use of the MSRC environments provides • Bug reports to the vendor • System tuning • System administrator support • Analysis of software needs • Performance evaluation • Researchers access to advanced hardware

Performance Understanding • In order to optimize we must understand • Why is our code performing a certain way? • What can be done about it? • How good can we do? • Results in confidence, efficiency and better code development • Time spent is an investment in the future

Tool EvaluationPtools Consortium • Review of available performance tools, particularly parallel • Regular reports are issued • Tools that we find useful get presented to the developers in training or consultation • Installation, testing and training • Example: VAMPIR for scalability analysis

Optimization Course • Course focuses on compiler options, available tools and single processor performance • Single biggest bottleneck to many codes, especially cache performance • Why? Link speeds have increased within an order of magnitude of memory bandwidths • Also, MPI and language specific issues

Benchmarks • CacheBench - performance of the memory hierarchy • MPBench - performance of core MPI operations • BLASBench - performance of dense numerical kernels • Intended to provide an orthogonal set of low-level benchmarks with which we can parameterize codes

Cache Performance

Cache Performance • Tuning for caches is difficult without some understanding of computer architecture • No way to really know what’s in the cache during a given point in an application • Factor of 2-4 performance increase is common • Develop a tool to help identify regions in the source code, a specific reference.

Cache Simulator • Profiling the code reveals cache problems • Automated instrumentation of offending routines via a GUI or by hand • Link with simulator library • Make architecture configuration file • Addresses are traced and simulated • Miss locations are recorded and reports are generated

PerfAPI • A standardized interface to hardware performance counters • Easily usable by application engineers as well as tool developers • Intended for • Performance tools • Evaluation • Modeling • Watch http://www.cs.utk.edu/~mucci/pdsa

High Performance Debugger • Industry wide lack of good debugging support for parallel programs • TotalView is expensive and GUI only • Bandwidth is often not-available off-site • Based on dbx and gdb as backends • Uses p2d2 from NASA as a framework • Standardized, familiar command-line interface

MPI Connect • Connects separate MPI jobs with PVM • 3 function calls to enroll • Uses include • Metacomputing with Vendor MPI • Dynamic and Fault Tolerant MPI jobs now

The Future • BYOC Workshops • Regular Training Schedule • Web Based Training • Consulting • Cross-MSRC Information Exchange • Technology Transfer • Tool development

Origin 2000 Performance Prescription • Always use dplace on all codes • Always use -LNO:cache_size2=4096 • For accuracy compile and link with -O2 -IPA -SWP:=ON -LNO -TENV:X=0-5 • or -Ofast=ip27 -OPT:roundoff=0-3 -OPT:IEEE_arithmetic=1-3

Origin 2000 Performance Prescription • In Fortran, innermost array index should change fastest • Use functions in -lcomplib.sgimath or -lscs -lfastm -lm • Use MPI_Ixxxx primitives • Always execute IRECV early

Vampir Timeline Display

Vampir Global Activity Chart

Identifying a Message in Vampir

Nupshot Display

Performance Analysis, Tools and Optimization

Performance Analysis, Tools and Optimization

Presentation Transcript

Code Optimization and Performance

Performance Prediction and Design Optimization

Windows Performance Analysis: Using Windows Performance Tools

Performance Analysis and Monitoring Facilities in CPN Tools

Performance Optimization

SQL Server Performance and Analysis Tools

Performance Tools

Analysis Performance and I/O Optimization

Performance Tools

Performance Optimization and Tuning

Performance Analysis and Optimization T ool

Performance Analysis and Debugging Tools

Debugging and Optimization Tools

Power Visualization, Analysis, and Optimization Tools for FPGAs

Content Optimization Tools

Performance Analysis Tools

Performance Tools

Performance Analysis, Modeling, and Optimization: Understanding the Memory Wall

Performance Optimization

contentXXL: performance analysis and optimization

Performance Analysis Tools