Predictive Application-Performance Modeling in a Computational Grid Environment (HPDC ‘99)

Predictive Application-Performance Modeling in a Computational Grid Environment(HPDC ‘99) Nirav Kapadia, José Fortes, Carla Brodley ECE, Purdue Presented by Peter Dinda, CMU

Summary • Use locally-weighted memory-based learning (instance-based learning) to predict each application run’s resource usage based on parameters specified by an application expert and measurements of previous application runs. • Surprising result: simplest is best • Implemented in the PUNCH system

Outline • PUNCH • Resource usage and application parameters • Locally-weighted, memory-based learning • Synthetic datasets argue for a sophisticated approach • Algorithm optimizations in PUNCH • Datasets from a real application argue for a mind-numbingly simple approach

PUNCH • “Purdue University Network Computing Hub” • Web-based batch-oriented system for accessing non-interactive tools • Tool-specific forms guide user in setting up a run • command-line parameters, input and output files • PUNCH schedules run on shared resources • Extensively used: 500 users, 135K runs • Mostly students taking ECE classes • Wide range of tools (over 40) • Paper focuses on T-Supreme3 • Simulates silicon fabrication • Really bad ideas: batch-oriented matlab

Resource Usage • PUNCH needs to know resource usage (CPU time) to schedule run • Resource usage depends on application-specific parameters • command-line and input file parameters • Which ones? Specified by app expert • 7 parameters for T-Supreme3 • What is the relationship? Learn it on-line using locally-weighted memory-based learning

Locally-weighted Memory-based Learning • Each time you run the application, record the parameter values and the resource usage in a database • Parameter values x -> resource usage y is function to be learned • Parameter values x define a point in domain • Predict resource usage yq of a new run whose parameters are xqbased on database records xi->yi where the xi are “close” to xq

Answering a Query • Compute distance d from query point xq to all points xi in database • Select subset of points within some distance (the neighborhood kw) • Transform distances to neighborhood points into weights using a kernel functionK (Gaussian, say) • Fit a local model that tries to minimize the weighted sum of squared errors for the neighborhood • linear regression, ad hoc, mind-numbingly simple, ... • Apply the model to the query

PUNCH Approaches • I don’t understand their distance metric • Kernel is 1.0 to nearest neighbor and then Gaussian • 1-Nearest-Neighbor • Return the nearest neighbor • 3-Point Weighted Average • Return weighted average of 3 nearest points • Linear regression • 16 nearest points for T-Supreme3 • Theoretically much better than the others

Optimizations • 2-level database • Recent runs are preferred • Not clear how • May help when function is time dependent • when all students are doing the same homework • Significantly reduces query time • Instance editing • Add new runs only if incorrectly predicted • Remove runs that produce incorrect predictions • Shrink database without losing information

Conclusions • LWMBL looks like a promising approach to resource usage prediction in some cases • Needs a much more thorough study, though, even for this batch-oriented use • Simplest is best is difficult to believe • Paper is a reasonable introduction to LWMBL for the grid community

Predictive Application-Performance Modeling in a Computational Grid Environment (HPDC ‘99)