html5-img
1 / 35

Application Performance Profiling and Prediction in Grid Environment

Presented by: Marlon Bright 1 August 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University. Application Performance Profiling and Prediction in Grid Environment . Outline . Grid Enablement of Weather Research and Forecasting Code (WRF) Profiling and Prediction Tools

jonco
Télécharger la présentation

Application Performance Profiling and Prediction in Grid Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presented by: Marlon Bright 1 August 2008 Advisor: Masoud Sadjadi, Ph.D. REU – Florida International University Application Performance Profiling and Prediction in Grid Environment

  2. Outline • Grid Enablement of Weather Research and Forecasting Code (WRF) • Profiling and Prediction Tools • Research Goals • Project Timeline • Project Status • Challenges Overcome • Remaining Work REU - Florida International University

  3. Motivation – Weather Research and Forecasting Code (WRF) • Goal – Improved Weather Prediction • Accurate and Timely Results • Precise Location Information • WRF Status • Over 160,000 lines (mostly FORTRAN and C) • Single Machine/Cluster compatible • Single Domain • Fine Resolution -> Resource Requirements • How to Overcome this? • Through Grid Enablement • Expected Benefits to WRF • More available resources – Different Domains • Faster results • Improved Accuracy REU - Florida International University

  4. System Overview • Web-Based Portal • Grid Middleware (Plumbing) • Job-Flow Management • Meta-Scheduling • Performance Prediction • Profiling and Benchmarking • Development Tools and Environments • Transparent Grid Enablement (TGE) • TRAP: Static and Dynamic adaptation of programs • TRAP/BPEL, TRAP/J, TRAP.NET, etc. • GRID superscalar: Programming Paradigm for parallelizing a sequential application dynamically in a Computational Grid REU - Florida International University

  5. Performance Prediction IMPORTANT part of Meta-Scheduling Allows for: • Optimal usage of grid resources through “smarter” meta-scheduling • Many users overestimate job requirements • Reduced idle time for compute resources • Could save costs and energy • Optimal resource selection for most expedient job return time Tools: Amon /Aprof and Paraver/Dimemas REU - Florida International University

  6. Research Goals • Extend Amon/Aprof research to larger number of nodes, different archtitecture, and different version of WRF (Version 2.2.1). • Compare/contrast Aprof predictions to Dimemas predictions in terms of accuracy and prediction computation time. • Analyze if/how Amon/Aprof could be used in conjunction with Dimemas/Paraver for optimized application performance prediction and, ultimately, meta-scheduling REU - Florida International University

  7. Timeline • End of June: • Get MPItrace linking properly with WRF Version Compiled on GCB, then Mind COMPLETE • a) Install Amon and Aprof on MareNostrum and ensure proper functioning AMON COMPLETE; APROF FINAL STAGES b) Run Amon benchmarks on MareNostrum COMPLETE • Early/Mid July: • Use and analyze Aprof predictions within MareNostrum (and possibly between MareNostrum, GCB, and Mind) MN COMPLETE • Use generated MPI/ OpenMP tracefiles (Paraver/Dimemas) to predict within (and possibly between) Mind, GCB, and MareNostrum IN PROGRESS • Late July/Early August: • Experiment with how well Amon and Aprof relate to/could possibly be combined with Dimemas IN PROGRESS • Compose paper presenting significant findings. IN PROGRESS • Analyze how findings relate to bigger picture. Make optimizations on grid-enablement of WRF. REU - Florida International University

  8. The Tools:Amon / Aprof Dimemas / Paraver

  9. Amon / Aprof • Amon – monitoring program that runs on each compute node recording new processes • Aprof – regression analysis program running on head node; receives input from Amon to make execution time predictions (within cluster & between clusters) REU - Florida International University

  10. Amon / AprofMonitoring and Prediction REU - Florida International University

  11. Amon / Aprof Approach to Modeling Resource Usage WRF Network Latency CPU Speed Hard Disk I/O Number of Nodes Network Bandwidth FSB Bandwidth RAM Size L2 Cache Application Resource Usage Model REU - Florida International University

  12. Previous Findings for Amon / Aprof Experiments were performed on two clusters at FIU—Mind (16 nodes) and GCB (8 nodes) • Experiments were run to predict for different number of nodes and cpu loads (i.e. 2,3,…,14,15 and 20%, 30%,…,90%, 100%) • Aprof predictions were within 10% error versus actual recorded runtimes within Mind and GCB and between Mind and GCB • Conclusion: first step assumption was valid. -> Move to extending research to higher number of nodes. REU - Florida International University

  13. How’d they do that? • Developed a benchmarking script that edits and submits a job file to MareNostrum (MN) scheduler • Runs for each number of nodes (8, 16, 32, 64, 96, 128) • Runs for each cpu percentage (100, 75, 50, 25) • Records execution time, average cpu utilization, participating nodes, etc. • Job file: • Requests desired number of nodes from MN • Starts Amon on each returned node to monitor and return processes • Starts cpulimit on each returned node limiting the effective power given to the WRF process • Executes WRF as parallel job across the returned nodes • Developed modification script • Combines Amon output to one file • Filters processes to solely WRF processes • Edits processes to Aprof friendly format REU - Florida International University

  14. How’d they do that ? (cont’d) • Start Aprof loading input file as data • Executed Aprof Query Automation script • Starts telnet session querying Aprof for benchmarked scenarios • Compares predicted values to actual values returned in run • Outputs text file and graphing plot file of comparison statistics REU - Florida International University

  15. Experimental Process REU - Florida International University

  16. Extreme? Makeover --- (464) --- name: wrf.exe cpus: 4 cpu MHz: 1/0.000 [MHz] cache size: 1/0 [KB] elapsed time: 957952 [msec] utime: 956370 [msec] 957810 [msec] stime: 570 [msec] 860 [msec] intr: 18783 ctxt switch: 58290 fork: 95 storage R: 0 [blocks] 0 [blocks] storage W: 0 [blocks] network Rx: 19547308 [bytes] network Tx: 1434925 [bytes] --- (464) --- name: wrf.exe inv #cpu: 1/16 inv clock: 1/574 cache size: 1/1024 [KB] elapsed time: 1990992 [msec] inv clock*#cpu: 1/(36763) Why: Version of Linux on MN does not report some characteristics (i.e. cache size). From its initial design, Amon reports in different format than Aprof reads. Amon Output Process Aprof Input Process REU - Florida International University

  17. Aprof Prediction name: wrf.exe elapsed time: 5.783787e+06 =========================================================== explanatory: value parameter std.dev ----------------- ------------- ------------- ------------- : 1.000000e+00 5.783787e+06 1.982074e+05 =========================================================== predicted: value residue rms std.dev ----------------- ------------- ------------- ------------- elapsed time: 5.783787e+06 4.246451e+06 1.982074e+05 =========================================================== REU - Florida International University

  18. Query Automation Script Output adj. cpu speed, processors, actual, predicted, rms, std. dev, actual difference, 3591.363, 1, 5222, 5924.82, 1592.459, 415.3491, 13.4588280352 3591.363, 2, 2881, 3246.283, 1592.459, 181.5382, 12.6790350573 3591.363, 3, 2281, 2353.438, 1592.459, 105.334, 3.17571240684 3591.363, 4, 1860, 1907.015, 1592.459, 69.19778, 2.52768817204 3591.363, 5, 1681, 1639.161, 1592.459, 49.83672, 2.48893515764 3591.363, 6, 1440, 1460.592, 1592.459, 39.5442, 1.43 3591.363, 7, 1380, 1333.043, 1592.459, 34.76459, 3.40268115942 3591.363, 8, 1200, 1237.381, 1592.459, 33.27651, 3.11508333333 3591.363, 9, 1200, 1162.977, 1592.459, 33.56231, 3.08525 3591.363, 10, 1080, 1103.454, 1592.459, 34.68943, 2.17166666667 3591.363, 11, 1200, 1054.753, 1592.459, 36.15324, 12.1039166667 3591.363, 12, 1080, 1014.169, 1592.459, 37.70271, 6.09546296296 3591.363, 13, 1200, 979.8292, 1592.459, 39.22018, 18.3475666667 3591.363, 14, 1021, 950.3947, 1592.459, 40.65455, 6.91530852106 3591.363, 15, 1020, 924.8848, 1592.459, 41.9872, 9.32501960784 REU - Florida International University

  19. Paraver / Dimemas • Dimemas - simulation tool for the parametric analysis of the behavior of message-passing applications on a configurable parallel platform. • Paraver – tool that allows for performance visualization and analysis of trace files generated from actual executions and by Dimemas Tracefiles generated by MPItrace that is linked into execution code REU - Florida International University

  20. Dimemas Simulation Process Overview • Link MPItrace into application source code—dynamically generates tracefiles for each node application running on • Identify computation iterations in Paraver compose a smaller trace file by selecting a few iterations, preserving communications and eliminating initialization phases • Convert the new tracefile to Dimemas format (.trf) using CEPBA provided ‘prv2trf’ tool • Load tracefile into Dimemas simulator, configure target machine, and with information generate Dimemas configuration file • Call simulator with or without option of generating a Paraver (.prv) tracefile for viewing. REU - Florida International University

  21. Paraver/Dimemas – DiP Environment REU - Florida International University

  22. How’d they do that? • Generated Paraver tracefiles, Dimemas tracefiles, and simulation configuration files for each number of nodes • Developed Dimemas simulation script –”simulation_automater.sh” • Selects configuration file for desired number of nodes • Edits configuration file for desired cpu percentage • Records execution time, average cpu utilization • Finalizing development of prediction validation script. Will: • Compare Dimemas predicted values to actual run values • Outputs text file and graphing plot file of comparison statistics REU - Florida International University

  23. Dimemas Prediction Execution time: 36.354146 Speedup: 5.34 CPU Time: 194.066431 Id. Computation %time Communication 1 31.224017 91.21 3.008552 2 20.089440 78.20 5.599083 3 19.305673 76.84 5.818317 4 28.672368 83.27 5.762332 5 29.058603 85.36 4.982049 6 19.488003 77.63 5.614155 7 18.727851 78.57 5.108366 8 27.500476 84.29 5.123971 Id. Mess.sent Bytes sent Immediate recv Waiting recv Bytes re cv Coll.op. Block time Comm. time Wait link time Wait bus es time I/O time 1 7.577000e+03 1.583659e+08 3.539000e+03 4.080000e+03 1.671666 e+08 1.475000e+03 0.247092 0.383663 0.319859 0.000000 0.000000 2 8.948000e+03 2.200029e+08 8.797000e+03 1.440000e+02 2.186629 e+08 1.475000e+03 3.710867 0.383663 0.098868 0.000000 0.000000 3 8.948000e+03 2.176712e+08 6.904000e+03 2.037000e+03 2.163992 e+08 1.475000e+03 3.453668 0.383663 0.243052 0.000000 0.000000 REU - Florida International University

  24. Project Status REU - Florida International University

  25. Amon / Aprof • Software installed and tailored to MareNostrum • Proficient in executing software • Amon benchmarking completed • Aprof query automation complete and results generated • Lessons learned on extending Amon/Aprof to different architecture REU - Florida International University

  26. Dimemas / Paraver • Proficient in executing software • Paraver and Dimemas tracefiles generated for each number of nodes (8, 16, 32, 64, 96, 128) • Benchmarking script complete • Simulations generated • Comparison script being finalized REU - Florida International University

  27. Quick Comparison • Pros: • Simpler to deploy in comparison • Scalability of model is promising with first results • Feasible solution for performance prediction purposes • Cons: • Requires more base executions for accurate performance in comparison • Pros: • More features—could be more useful to experienced user (i.e. adjustment of system characteristics) • Visualization and analysis of execution for analysis purposes • Graphical User Interface • Cons: • Requires special compilation of applications • Requires non-trivial-to-install kernel patch • Large tracefiles (sometimes gigabytes) Amon / Aprof Dimemas / Paraver REU - Florida International University

  28. Aprof Results 100% CPU Utilization REU - Florida International University

  29. Aprof Results 100% CPU Utilization REU - Florida International University

  30. Significant Challenges Overcome • Amon: • Adjustment of source code to proper functioning on MareNostrum (MN) • Development of benchmarking script to conform to system architecture of MareNostrum (i.e. going through its scheduler; one process per node; etc.) • Proper functioning of CPU limit for accurate cpu percentage • Job termination by MN Scheduler due to execution surpassing wall clock limit REU - Florida International University

  31. Significant Challenges Overcome(cont’d) • Aprof: • Adjustment of source code for less complex, more consistent data input • Development of prediction and comparison scripts for MareNostrum • Dimemas/Paraver • MPItrace properly linked in with WRF on GCB and Mind • Generation of trace and configuration files • WRF • Version 2.2 installed and compiled on Mind REU - Florida International University

  32. Challenges Remaining • Lengthy Amon benchmarking runs due to job times spent in queue • Complexities in preparing Dimemas tracefiles for simulation purposes • Extracting accurate predictions from Dimemas – trace files are reduced in order to speed up prediction process; therefore, predicted times must be multiplied by a determined factor REU - Florida International University

  33. Remaining Work • Next Week: • Finalizing scripting of Dimemas prediction simulations for the same scenarios of those of Amon and Aprof • Fall 2008: • Experiment with how well Amon and Aprof relate to/could possibly be combined with Dimemas • Decide if and how to compare results from MareNostrum, GCB, and Mind (i.e. the same versions of WRF would have to be running in all three locations) • Compose paper presenting significant results and submit paper to conference. • Future Work: • Work with metascheduling team on implementation of tools. REU - Florida International University

  34. References • S. MasoudSadjadi, Liana Fong, Rosa M. Badia, Javier Figueroa, Javier Delgado, Xabriel J. Collazo-Mojica, Khalid Saleem, RajuRangaswami, Shu Shimizu, Hector A. Duran Limon, Pat Welsh, SandeepPattnaik, Anthony Praino, David Villegas, SelimKalayci, GargiDasgupta, OnyekaEzenwoye, Juan Carlos Martinez, Ivan Rodero, Shuyi Chen, Javier Muñoz, Diego Lopez, JulitaCorbalan, Hugh Willoughby, Michael McFail, Christine Lisetti, and MalekAdjouadi. Transparent grid enablement of weather research and forecasting. In Proceedings of the Mardi Gras Conference 2008 - Workshop on Grid-Enabling Applications, Baton Rouge, Louisiana, USA, January 2008. http://www.cs.fiu.edu/~sadjadi/Presentations/Mardi-Gras-GEA-2008-TGE-WRF.ppt • S. MasoudSadjadi, Shu Shimizu, Javier Figueroa, RajuRangaswami, Javier Delgado, Hector Duran, and XabrielCollazo. A modeling approach for estimating execution time of long-running scientific applications. In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium (IPDPS-2008), the Fifth High-Performance Grid Computing Workshop (HPGC-2008), Miami, Florida, April 2008. http://www.cs.fiu.edu/~sadjadi/Presentations/HPGC-2008-WRF%20Modeling%20Paper%20Presentationl.ppt • “Performance/Profiling”. Presented by Javier Figueroa in Special Topics in Grid Enablement of Scientific Applications Class. 13 May 2008 REU - Florida International University

  35. Acknowledgements • REU • Partnerships for International Research and Education (PIRE) • The Barcelona SuperComputing Center (BSC) • MasoudSadjadi, Ph. D. - FIU • Rosa Badia, Ph.D. - BSC • Javier Delgado – FIU • Javier Figueroa – Univ. of Miami REU - Florida International University

More Related