Optimizing Program Development for Extreme-Scale Computing Through Efficient Sampling Techniques
220 likes | 365 Vues
This document outlines the benefits of sampling in trace files for program development in extreme-scale computing. Key topics include instrumentation and sampling methodologies, along with their impacts on the granularity of performance data collected from applications. Techniques like folding are discussed to enhance the quality and detail of performance analysis while managing sampling frequency. It highlights current work and results achieved in detailed performance metrics, emphasizing the need for scalability in tracing and visualization tools.
Optimizing Program Development for Extreme-Scale Computing Through Efficient Sampling Techniques
E N D
Presentation Transcript
Harald Servat Program Development for Extreme-Scale Computing May 3rd, 2010 Benefits of sampling in tracefiles
Program Development for Extreme-Scale Computing Outline • Instrumentation and sampling • Folding • Summarized traces • Some results • Current work
Program Development for Extreme-Scale Computing Instrumentation • Performance tools based on instrumentation • Granularity of the results depends on the application structure • Data gathered includes: • Performance counters, callstack, message size…
Program Development for Extreme-Scale Computing Sampling • Sampling reaches any application point at a interval • Easily tunable frequency • Gather performance counters and callstack
Program Development for Extreme-Scale Computing Main objective • Combine both mechanisms • Deeper performance details • Using PAPI_overflow(..) • ... what about frequency trade-off? • Not too high to disrupt the performance data • Not too low to get useful information
Program Development for Extreme-Scale Computing Work done: Folding • Harald Servat, Germán Llort, Judit Giménez, Jesús Labarta: Detailed performance analysis using coarse grain sampling. PROPER, 2009. • Objective: get detailed metrics with few samples • Benefits from both high and low frequencies! • Take advantage of stationary behavior of scientific applications • Build synthetic region from scattered samples • Reintroduce into the tracefile at chosen ratio
Program Development for Extreme-Scale Computing Folding: Moving samples Steps • Main idea: Move samples to the target iteration preserving their original relative time.
Program Development for Extreme-Scale Computing Folding: Interpolation • Instructions evolution for routine copy_faces of NAS MPI BT B • No instrumentation points within the routine, but we got details • Red crosses represent the folded samples and show the completed instructions from the start of the routine • Green line is the curve fitting of the folded samples and is used to reintroduce the values into the tracefile • Blue line is the derivative of the curve fitting
Program Development for Extreme-Scale Computing Folding areas • Folding is applied to delimited regions • Previously instrumented • User function • Iteration • Automatically obtained from the gathered results • Clusters of computation bursts • Juan González, Judit Giménez, Jesús Labarta, Automatic detection of parallel applications computation phases, IPDPS 2009 • Delimited time regions • Marc Casas, Rosa M. Badia, Jesús Labarta, Automatic Structure Extraction from MPI Applications Tracefiles, Euro-Par 2007
Program Development for Extreme-Scale Computing Impact of the sampling frequency • The more samples being fold, the more detailed results • Longer executions • Increase frequency • Reach stability? • Example: • NAS BT class B copy_faces • showing from 10 to 200 iterations • 20 samples per second @ SGI Altix
Program Development for Extreme-Scale Computing Impact of the sampling frequency • Choosing a sampling frequency is important • Sampling frequency can couple with application frequency • Choose frequencies based on prime factors
Program Development for Extreme-Scale Computing Outline • Instrumentation and sampling • Folding • Summarized traces • Some results • Current work
Program Development for Extreme-Scale Computing Dealing with large scale traces • Jesús Labarta, Judit Giménez, Eloy Martínez, Pedro González, Harald Servat, Germán Llort, Xavier Aguilar: Scalability of tracing and visualization tools, PARCO 2005. • Application’s behavior can be divided in: • Communication phases • Intensive computation phases • Instrumentation library that identifies relevant computation phases
Program Development for Extreme-Scale Computing Dealing with large scale traces • Information emitted at phase change • Punctual (callstack) • Aggregated • Hardware Counters • Software Counters • Number of point-to-point and collective operations • Number of bytes transferred • Time in MPI
Program Development for Extreme-Scale Computing Example • PEPC 16384 tasks on Jaguar Duration of the computation bursts # of MPI collective operations
Program Development for Extreme-Scale Computing Benefits of summarized tracefiles • Important trace size reduction • Gadget2 (128) – 10 Gbytes down to 428 Mbytes • PEPC (16k) – 19 Gbytes down to 400 Mbytes • PFLOTRAN (16k) – +250Gbytes down to 6 Gbytes • Whole execution analysis
Program Development for Extreme-Scale Computing Working with large traces? • We're dealing with large scale executions • Maintain scalability of tracing + sampling • By adding more data? • Use folding to reduce data • Example (Gadget2 using 128 tasks) • 100 its, 5 samples/s during 90minutes ~ 236MB • Folding on 1 iteration @ 200 samples/s ~ 64 MB
Program Development for Extreme-Scale Computing Outline • Instrumentation and sampling • Folding • Summarized traces • Combining mechanisms • Some results • Current work
Program Development for Extreme-Scale Computing Gadget2 analysis, 128 tasks force_tree.c +75 - gravity_tree.c +167 predict.c +92 - pm_periodic.c +385 32% 16% 13% 8% gravity_tree.c +528 - density.c +167 force_tree.c +1701 - hydra.c +246
Program Development for Extreme-Scale Computing PEPC analysis, 32 tasks tree_aswalk.f90 +162 - tree_aswalk.f90 +380 tree_aswalk.f90 +380 - tree_aswalk.f90 +162 45% 37% 5% 3% tree_domains.f90 +548 - tree_branches.f90 +155 tree_branches.f90 +548 - tree_properties.f90 +328
Program Development for Extreme-Scale Computing Current directions • We work on: • Is there an optimal sampling frequency? • Quantify correctness and validate the results • Callstack analysis
Program Development for Extreme-Scale Computing • Thank you!