Predicting Queue Waiting Time in Batch Controlled Systems

Predicting Queue Waiting Time inBatch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University of California, Santa Barbara

Problem: Predicting Delay in Batch Queues • Time in queue is experienced as application delay • Sounds like an easy problem, but • Distribution of load from users is a matter of some debate • Scheduling policy is partially hidden • Sites need to change the policies dynamically and without warning • Job execution times are difficult to predict • Much research in this area over the past 20 years, but few solutions • Current commercial systems provide high variance estimates • Most sites simply disable this feature

Hard Problem

For Scheduling: It’s all about the big Q • Predictions of the form • “What is the maximum time my job will wait with X% certainty?” • “What is the minimum time my job will wait with X% certainty?” • Requires two estimates if certainty is to be quantified • Estimate the (1-X) quantile for the distribution of availability => Qx • Estimate the upper or lower X% confidence bound on the statistic Qx=> Q(x,lb) • If the estimates are unbiased, and the distribution is stationary, future availability duration will be larger than Q(x,lb)X% of the time, guaranteed

New Predictive Methodology • New quantile estimator invention based on Binomial distribution • Requires carefully engineered numerical system to deal with large-scale combinatorics • New changepoint detector • Binomial method in a time series context is difficult • Need a system to determining • Stationary regions in the data • Minimum statistically meaningful history in each region • New clustering methodology (coming soon) • More accurate estimates are possible if predictions are made from jobs with similar characteristics • Takes dynamic policy changes into account more effectively

Ten Years of Supercompuuting

See it In Action • http://pompone.cs.ucsb.edu/~rgarver/bqindex.php

Predicting Things Upside Down • Deadline scheduling: My job needs to start in the next X seconds for the results to be meaningful. • Amitava Mujumdar, Tharaka Devaditha, Adam Birnbaum (SDSC) • Need to run a 4 minute image reconstruction that completes in the next 8 minutes • Given a • Machine • Queue • Processor count • Run time • Deadline • What is the probability that a job will meet the deadline? • http://pompone.cs.ucsb.edu/~rgarver/invbqueue.php

How Well Does it Work with an Application? Refine Electron Micrograph Final 3D model Preliminary 3D Model EMAN Preliminary 3D model Particles EMAN has been developed at Baylor College of Medicine by Research group of Wah Chiu and Steven Ludtke {wah,sludtke}@bcm.tmc.edu

VGrADS EMAN Batch Scheduler • EMAN emulator • Run the EMAN scheduler to determine a job launch sequence • Launch the jobs by submitting them to the queues specified by the scheduler • When an EMAN job acquires the processors, exit and “sleep” the emulator for the predicted execution time • Saves system allocation time • Record the overall makespan • Experiment: • Chicago TeraGrid, SDSC TeraGrid, NCSA TeraGrid and CNSI Dell at UCSB • 57 separate runs • Results: mean observed and mean predicted makespans are not significantly different at alpha = 0.05

95% Upper Bound on Median

Clustering • RMS ratio of Binomial with Clustering to without • Both achieve 95% correctness • Measures “tightness” improvement through clustering

Batch Queue Prediction for Grid Systems • A good point-valued prediction remains elusive • Grid users certainly can use bounds instead • Early job completion is okay, typically • Bounds give a good intuitive feel for which queue will be quickest • Automatic schedulers are coming • EMAN doesn’t use ranges…it should • VGrADS is developing new schedulers (workflow) • NEESGrid and ISI are in development (workflow) • Large-scale sensor network simulation

What’s Next? • Open questions: • Does the availability of predictions affect load? • Rolling out production tools now and we will be monitoring • Job cancellation does not affect results • If it does, will allocations be stable? • Grid economies • Virtual resource reservations (VGrADS) • Conditional prediction and resubmission • Virtual Cluster?? • Thanks • NSF SCI, VGrADS, SDSC, TACC • Us: rich@cs.ucsb.edu, nurmi@cs.ucsb.edu

Predicting Queue Waiting Time in Batch Controlled Systems

Predicting Queue Waiting Time in Batch Controlled Systems

Presentation Transcript

Batch Systems

Scheduling in Batch Systems

Waiting Time Management

Polling: Lower Waiting Time, Longer Processing Time (Perhaps)

COMPUTER CONTROLLED IGNITION SYSTEMS

Speech Controlled Automated Systems

AOC-Based Efficient Waiting Time Management in Hospital

Electronic Batch Record Systems

Batch Testing Results: Contact Time

Just-In-Time Systems

Analytic Modeling Techniques for Predicting Batch Window Elapsed Time

Solar flare waiting time distribution (WTD)

Predicting Queue Waiting Time For Individual User Jobs

CMS Usage of batch systems

Batch Queuing Systems

JUST IN TIME SYSTEMS

Predicting Queue Waiting Time For Individual TeraGrid Jobs

Batch Queue Prediction

TXOP Request: in Time vs. in Queue Size?