390 likes | 421 Vues
Explore a system to predict task running time within confidence intervals, aiding in efficient scheduling and decision-making in heterogeneous host environments. Utilize host load data to compute accurate predictions for various tasks.
 
                
                E N D
Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer ScienceNorthwestern University http://www.cs.northwestern.edu/~pdinda
Overview • Predict running time of task • Application supplies task size (0.1-10 seconds currently) • Task is compute-bound (current limit) • Prediction is a confidence interval • Expresses prediction error • Statistically valid decision-making in scheduler • Based on host load prediction • Homogenous Digital Unix hosts (current limit) • System is portable to many operating systems Everything in talk is publicly available
Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions
A Universal Challenge in High Performance Distributed Applications Highly variable resource availability • Shared resources • No reservations • No globally respected priorities • Competition from other users - “background workload” Running time can vary drastically Adaptationexample goal: soft real-time for interactivity example mechanism: server selection Performance queries
Running Time Advisor (RTA) background workload What will be the running time of this 3 second task if started now? App It will be 5.3 seconds Host nominal time: running time on empty host, task size • Entirely user-leveltool • No reservations or admission control • Query result is aprediction
Variability and Prediction Prediction resource High Resource Availability Variability t Low Prediction Error Variability Predictor resource error t t Characterization of variability ACF t Exchange high resource availability variability for low prediction error variability and a characterization of that variability
Running Time Advisor (RTA) background workload With 95% confidence, what will be the running time of this 3 second task if started now? App It will be 4.1 to 6.3 seconds Host CI captures prediction error to the extentthe application is interested in it Independent of prediction techniques
Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions
Host Load Traces • DEC Unix 5 second exponential average • Full bandwidth captured (1 Hz sample rate) • Long durations • http://www.cs.northwestern.edu/~pdinda/LoadTraces
Host Load Properties • Self-similarity • long-range dependence • Epochal behavior • non-stationarity • Complex correlation structure[LCR ’98, Scientific Programming, 3:4, 1999]
Host Load Prediction • Fully randomized study on traces • MEAN, LAST, AR, MA, ARMA, ARIMA, ARFIMA models • AR(16) models most appropriate • Covariance matrix for prediction errors • Low overhead: <1% CPU [HPDC ’99, Cluster Computing, 3:4, 2000]
RPS Toolkit • Extensible toolkit for implementing resource signal prediction systems • Easy “buy-in” for users • C++ and sockets (no threads) • Prebuilt prediction components • Libraries (sensors, time series, communication) • Users have bought in • Incorporated in CMU Remos, BBN QuO [CMU-CS-99-138] http://www.cs.northwestern.edu/~RPS
Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions
A Model of the Unix Scheduler tact = f(tnom, background workload) Nominal running time Task tnom Background workload Unix Scheduler Actual running time Task tact Actual Load <zt>
A Model of the Unix Scheduler Nominal running time Task tnom Background workload Unix Scheduler Predicted running time > Task texp PredictedLoad <zt> > texp = g(tnom,<zt>) = tact + Error
Available Time and Average Load Available time from 0 to t Average load from 0 to t Load Signal – replace with prediction of load signal tact is minimum t where at(t)=tnom Fluid model, Processor Sharing,Idealized Round-Robin, …
Discrete Time • No magic here – this is the obvious discretization • is the sample interval zt+j replaced with prediction
Confidence Intervals > > > > zt+j replaced with zt+j in prediction, giving ali, ati, at(t) > > Confidence interval for at(t) is a CI for ali… prediction errors Since this is a sum, the central limit theorem applies… Then a 95% confidence interval is
The Variance of the Sum • Prediction errors at+j are not independent • Predictor’s covariance matrix captures this Predictor makes it possible to compute this variance and thus the CI Important detail: load discounting
Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions
Experimental Setup • Environment • Alphastation 255s, Digital Unix 4.0 • Workload: host load trace playback [LCR 2000] • Prediction system on each host • AR(16), MEAN, LAST • Tasks • Nominal time ~ U(0.1,10) seconds • Interarrival time ~ U(5,15) seconds • 95 % confidence level • Methodology • Predict CIs • Run task and measure http://www.cs.northwestern.edu/~pdinda/LoadTraces/playload
Metrics • Coverage • Fraction of testcases within confidence interval • Ideally should equal the target 95 % • Span • Average length of confidence interval • Ideally as short as possible • R2 between texp and tact
General Picture of Results • Five classes of behavior • I’ll show you two • RTA Works • Coverage near 95% in most cases is possible • Predictor quality matters • Better predictors lead to smaller spans on lightly loaded hosts and to correct coverage on heavily loaded hosts • AR(16) >= LAST >= MEAN • Performance is slightly dependent on nominal time
Related Work • Distributed interactive applications • QuakeViz/ Dv, Aeschlimann [PDPTA’99] • Quality of service • QuO, Zinky, Bakken, Schantz [TPOS, April 97] • QRAM, Rajkumar, et al [RTSS’97] • Distributed soft real-time systems • Lawrence, Jensen [assorted] • Workload studies for load balancing • Mutka, et al [PerfEval ‘91] • Harchol-Balter, et al [SIGMETRICS ‘96] • Resource signal measurement systems • Remos [HPDC’98] • Network Weather Service [HPDC‘97, HPDC’99] • Host load prediction • Wolski, et al [HPDC’99] (NWS) • Samadani, et al [PODC’95] • Hailperin [‘93] • Application-level scheduling • Berman, et al [HPDC’96] • Stochastic Scheduling, Schopf [Supercomputing ‘99]
Conclusions • Predict running time of compute-bound task • Based on host load prediction • Prediction is a confidence interval • Confidence interval algorithm • Covariance matrix • Load discounting • Effective for domain • Digital Unix, 0.1-10 second tasks, 5-15 second interarrival • Extensions in progress
For More Information • All software and traces are available • RPS + RTA + RTSA http://www.cs.northwestern.edu/~RPS • Load Traces and playbackhttp://www.cs.northwestern.edu/~pdinda/LoadTraces • Prescience Lab • Peter Dinda, Jason Skicewicz, Dong Lu • http://www.cs.northwestern.edu/~plab
Outline • Running time advisor • Host load results • Computing confidence intervals • Performance evaluation • Related work • Conclusions
A Universal Problem Which host should the application send the task to so that its running time is appropriate? ? Task Example: Real-time Known resource requirements What will the running time be if I...
Running Time Advisor Predicted Running Time Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host ? Task nominal time
Real-time Scheduling Advisor Application specifies task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor chooses one of the hosts where the deadline can be met Predicted Running Time deadline ? Task nominal time deadline
Confidence Intervals to Characterize Variability “3 to 5 seconds with 95% confidence” Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it Predicted Running Time deadline ? Task nominal time deadline 95% confidence
Prototype System This Paper
Load Discounting Motivation • I/O priority boost • Short tasks less effected by load
Load Discounting • Apply before using load predictions • tdiscount is estimatable machine property