1 / 31

GHS: A Performance Prediction and Task Scheduling System for Grid Computing

GHS: A Performance Prediction and Task Scheduling System for Grid Computing. Xian-He Sun Department of Computer Science Illinois Institute of Technology sun@iit.edu. SC/APART Nov. 22, 2002. SCS. Outline. Introduction Concept and challenge The Grid Harvest Service (GHS) System

haven
Télécharger la présentation

GHS: A Performance Prediction and Task Scheduling System for Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GHS:A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology sun@iit.edu SC/APART Nov. 22, 2002 SCS

  2. Outline • Introduction Concept and challenge • The Grid Harvest Service (GHS) System • Design methodology • Measurement system • Scheduling algorithms • Experimental testing • Conclusion SCS Scalable Computing Software Laboratory

  3. Introduction • Parallel Processing • Two or more working entities work together toward a common goal for a better performance • Grid Computing • Use distributed resources as a unified compute platform for a better performance • New Challenges of Grid Computing • Heterogeneous system, Non-dedicated environment, Relative large data access delay SCS

  4. Degradations of Parallel Processing Unbalanced Workload Communication Delay Overhead Increases with the Ensemble Size SCS

  5. Degradations of Grid Computing Unbalanced Computing Power and Workload Shared Computing and Communication Resource Uncertainty, Heterogeneity, and Overhead Increases with the Ensemble Size SCS

  6. Performance Evaluation (Improving performance is the goal) • Performance Measurement • Metric, Parameter • Performance Prediction • Model, Application-Resource, Scheduling • Performance Diagnose/Optimization • Post-execution, Algorithm improvement, Architecture improvement, State-of-the-art SCS

  7. Parallel Performance Metrics(Run-time is the dominant metric) • Run-Time (Execution Time) • Speed: mflops, mips, cpi • Efficiency: throughput • Speedup • Parallel Efficiency • Scalability: The ability to maintain performance gain when system and problem size increase • Others: portability, programming ability,etc SCS

  8. Parallel Performance Models(Predicting Run-time is the dominant goal) • PRAM (parallel random-access model) • EREW, CREW, CRCW • BSP (bulk synchronous parallel) Model • Supersteps, phase parallel model • Alpha and Beta Model • comm. startup time, data trans. time per byte • Scalable Computing Model • Scalable speedup, scalability • Log(P) Model • L-latency, o-overhead, g-gap, P-the number of processors • Others SCS

  9. Research Projects and Tools • Parallel Processing • Paradyn, W3 (why, when, and where) • TAU, tuning and analysis utilities • Pablo, Prophesy, SCALEA, SCALA, etc • for dedicated systems • instrumentation, post-execution analysis, visualization, prediction, application performance, I/O performance

  10. Research Projects and Tools • Grid Computing • NWS (Network Weather Service) • monitors and forecasts resource performance • RPS (Resource Prediction System) • predicts CPU availability of a Unix system • AppLeS (Application-Level Scheduler) • A application-level scheduler extended to non-dedicated environment based on NWS • Short-term system-level prediction

  11. Do We Need • New Metric for Computation Grid ? • ???? • New Model for Computation Grid ? • Yes • Application-level performance prediction • New Model for other Technical Advance? • Yes • Date access in hierarchical memory systems

  12. The Grid Harvest Service (GHS) System Sun/Wu 02 • A long-term application-level performance prediction and scheduling system for non-dedicated (Grid) environments • A new prediction model derived by probability analysis and simulation • Non-intrusive measurement and scheduling algorithms • Implementation and testing SCS

  13. ws(k) t Performance Model(Gong,Sun,Watson,02) • Remote job has low priority • Local job arriving and service time based on extensive monitoring and observation

  14. Predication Formula • Arrival of local jobs follow a Poisson distribution with rate • Execution time of the owner job follows a general distribution with mean and standard deviation • Simulate the distribution of the local service rate, approaches with a know distribution Uk(S)|Sk>0 Gamma distribution

  15. Prediction Formula • Parallel task completion time • Homogeneous parallel task completion time • Mean time balancing partition

  16. Measurement Methodology • A parameter has a population with a mean and a standard deviation, a confidence interval for the population mean is given • The smallest sample size n with a desired confidence interval and a required accuracy r is given

  17. Measurement and Prediction of Parameters • Utilization • Job Arrival • Standard Deviation of Service Rate • Least-Intrusive Measurement

  18. Select previous days, in the system measurement history; For each day , where means the set of measured during the time interval beginning from the day ; End For Select previous continuous time interval before , calculate where means the set of measured during ; output while and

  19. Scheduling Algorithm Scheduling with a Given Number of Sub-tasks • List a set of lightly loaded machines ; • List all possible sets of machines, such as • For each machine set , • Use mean time balancing partition to partition the task • Use the formula to calculate the mean and coefficient of variation • If > , then ; • End For • Assign parallel task to the machine set ;

  20. Optimal Scheduling Algorithm List a set of lightly loaded machines ; While do Scheduling with Sub-tasks If > , then ; End If End while Assign parallel task to the machine set .

  21. Heuristic Scheduling Algorithm • List a set of lightly loaded machines ; • Sort the machines in a decreasing order with ; • Use the task ratio to find the upper limit q ; • Use bi-section search to find the p such as • is minimum

  22. Embedded in Grid Run-time System

  23. Application-level Prediction Experimental Testing Remote task completion time on single machine

  24. Prediction of parallel task completion time Prediction of a multi-processor with local scheduler

  25. Partition and Scheduling Comparison of three partition approaches

  26. Performance Gain with Scheduling Execution time with different scheduling strategies

  27. Cost and Gain Measurement reduces when system steady

  28. Node Number 8 16 32 64 128 256 512 1024 Time (s) 0.00 0.01 0.02 0.04 0.08 0.16 0.31 0.66 The calculation time of the prediction component

  29. The GHS System • A Good Sample and Successful Story • Performance modeling • Parameter measurement and prediction schemes • Application-level performance prediction • Partition and Scheduling • It has its limitation too • Communication and data access delay SCS

  30. What We Know, What We Do Not • We know there is no deterministic prediction in a non-deterministic shared environment. We do not know how to reach a fussy engineering solution Rule of thumb Stochastic Heuristic algorithms AI Statistic Data Mining Innovative method etc etc SCS

  31. Conclusion • Application-level Performance Evaluation • Code-machine versus machine, alg., alg.-machine • New Requirement under New Environments We know we are making progress. We do not know if we can keep up with the technology improvement SCS

More Related