Automated Parameter Setting Based on Runtime Prediction:

Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada Youssef Hamadi, Microsoft Research, Cambridge, UK

Motivation(1): Why automated parameter setting ? • We want to use the best available heuristic for a problem • Strong domain-specific heuristics in tree search • Domain knowledge helps to pick good heuristics • But maybe you don‘t know the domain ahead of time ... • Local search parameters must be tuned • Performance depends crucially on parameter setting • New application/algorithm: • Restart parameter tuning from scratch • Waste of time both for researchers and practicioners • Comparability • Is algorithm A faster than algorithm B because they spent more time tuning it ?  Automated Parameter Setting

Motivation(2): operational scenario • CP solver has to solve instances from a variety of domains • Domains not known a priori • Solver should automatically use best strategy for each instance • Want to learn from instances we solve Frank Hutter: Frank Hutter: Automated Parameter Setting

Overview • Previous work on runtime prediction we base on[Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

Previous work on runtime prediction for algorithm selection • General approach • Portfolio of algorithms • For each instance, choose the algorithm that promises to be fastest • Examples • [Lobjois and Lemaître, AAAI’98] CSP • Mostly propagations of different complexity • [Leyton-Brown et al., CP’02] Combinatorial auctions • CPLEX + 2 other algorithms (which were thought incompetitive) • [Nudelman et al., CP’04] SAT • Many tree-search algorithms from last SAT competition • On average considerably faster than each single algorithm Automated Parameter Setting

Runtime prediction: Basics (1 algorithm)[Leyton-Brown, Nudelman et al. ’02 & ’04] • Training: Given a set of t instances z1,...,zt • For each instance zi • Compute features xi = (xi1,...,xim) • Run algorithm to get its runtime yi • Collect (xi ,yi) pairs • Learn function f: X!R (features ! runtime), yi  f (xi) • Test: Given a new instance zt+1 • Compute features xt+1 • Predict runtime yt+1 = f(xt+1) Expensive Cheap Automated Parameter Setting

Runtime prediction: Linear regression [Leyton-Brown, Nudelman et al. ’02 & ’04] • The learned function f has to be linear in the features xi = (xi1,...,xim) • yi¼ f(xi) = j=1..m (xij * wj) = xi * w • The learning problem thus reduces to fitting the weights w =w1,...,wm • To grasp the vast different in runtime better, estimate the logarithm of runtime: e.g. yi = 5  runtime is 105 sec Automated Parameter Setting

Runtime prediction: Feature engineering [Leyton-Brown, Nudelman et al. ’02 & ’04] • Features can be computed quickly (in seconds) • Basic properties like #vars, #clauses, ratio • Estimates of search space size • Linear programming bounds • Local search probes • Linear functions are not very powerful • But you can use the same methodology to learn more complex functions • Let  = (1,...,q) be arbitrary combinations of the features x1,...,xm (so-called basis functions) • Learn linear function of basis functions: f() =  * w • Basis functions used in [Nudelman et al. ’04] • Original features: xi • Pairwise products of features: xi * xj • Only subset of these (drop useless basis functions) Automated Parameter Setting

Algorithm selection based on runtime prediction[Leyton-Brown, Nudelman et al. ’02 & ’04] • Given n different algorithms A1,...,An • Training: • Learn n separate functions fj:!R, j=1...n • Test: • Predict runtime yjt+1 = fj(t+1) for each of the algorithms • Choose algorithm Aj with minimal yjt+1 Really Expensive Cheap Automated Parameter Setting

Overview • Previous work on runtime prediction we base on [Leyton-Brown, Nudelman et al. ’02 & ’04] • Part I: Automated parameter setting based on runtime prediction • Part II: Incremental learning for runtime prediction in a priori unknown domains • Experiments • Conclusions Automated Parameter Setting

Parameter setting based on runtime prediction Finding the best default parameter setting for a problem class Generate special purpose code [Minton ’93] Minimize estimated error [Kohavi & John ’95] Racing algorithm [Birattari et al. ’02] Local search [Hutter ’04] Experimental design [Adenso-Daz & Laguna ’05] Decision trees [Srivastava & Mediratta, ’05] Runtime prediction for algorithm selection on a per-instance base Predict runtime for each algorithm and pick the best [Leyton-Brown, Nudelman et al. ’02 & ’04] Runtime prediction for setting parameterson a per-instance base Automated Parameter Setting

Naive application of runtime prediction for parameter setting • Given one algorithm with n different parameter settings P1,...,Pn • Training: • Learn n separate functions fj:!R, j=1...n • Test: • Predict runtime yjt+1 = fj(t+1) for each of the parameter settings • Run algorithm with setting Pj with minimal yjt+1 Too expensive Fairly Cheap • If there are too many parameter configurations: • Cannot run each parameter setting on each instance • Need to generalize (cf. human parameter tuning) • With separate functions there is no way to generalize Automated Parameter Setting

X1:t w1 w2 wn y11:t y21:t yn1:t w X1:t y11:t y21:t yn1:t Generalization by parameter sharing • Naive approach: n separate functions. • Information on theruntime of setting icannot inform predictions for setting j i • Our approach: 1 single function. • Information on theruntime of setting ican inform predictions for setting i j Automated Parameter Setting

Application of runtime prediction for parameter setting • View the parameters as additional features, learn a single function • Training: Given a set of instances z1,...,zt • For each instance zi • Compute features xi • Pick some parameter settings p1,...,pn • Run algorithm with settings p1,...,pn to get runtimes y1i ,...,yni • Basic functions 1i, ..., ni include the parameter settings • Collect pairs (ji,yji) (n data points per instance) • Only learn a single function g:!R • Test: Given a new instance zt+1 • Compute features xt+1 • Search over parameter settings pj. Evaluation: compute jt+1, check g(jt+1) • Run with best predicted parameter setting p* Moderately Expensive Cheap Automated Parameter Setting

Summary of automated parameter setting based on runtime prediction • Learn a single function that maps features and parameter settings to runtime • Given a new instance • Compute the features (they are fix) • Search for the parameter setting that minimizes predicted runtime for these features Automated Parameter Setting

Problem setting: Incremental learning for multiple domains Frank Hutter: Frank Hutter: Automated Parameter Setting

Solution: Sequential Bayesian Linear Regression Update “knowledge“ as new data arrives:probability distribution over weights w • Incremental (one (xi, yi) pair at a time) • Seemlessly integrate this new data • “Optimal“: yields same result as a batch approach • Efficient • Computation: 1 matrix inversion per update • Memory: can drop data we integrated • Robust • Simple to implement (3 lines of Matlab) • Provides estimates of uncertainty in prediction Automated Parameter Setting

What are uncertainty estimates? Automated Parameter Setting

Instead of predicting a single runtime y, use a probability distribution P(Y) The mean of P(Y) is exactly the prediction of the non-Bayesian approach, but we get uncertainty estimates Sequential Bayesian linear regression – intuition Uncertainty of prediction P(Y) Log. runtime Y Mean predicted runtime Automated Parameter Setting

Gaussian Assumed Gaussian Gaussian Sequential Bayesian linear regression – technical • Standard linear regression: • Training: given training data 1:n, y1:n, fit the weights w such that y1:n¼1:n* w • Prediction: yn+1 = n+1 * w • Bayesian linear regression: • Training: Given training data 1:n, y1:n, infer probability distribution P(w|1:n, y1:n) / P(w) * i P(yi|i, w) • Prediction: P(yn+1|n+1, 1:n, y1:n) = sP(yn+1|w, n+1) * P(w|1:n, y1:n) dw • “Knowledge“ about the weights: Gaussian (w, w) Automated Parameter Setting

Start with a prior P(w)with very high uncertainty First data point (1,y1) P(w|1, y1) /P(w) * P(y1|1,w) Prediction with prior w Prediction with posterior w|1, y1 P(y2|,w) P(y2|,w) Log. runtime y2 Log. runtime y2 Sequential Bayesian linear regression – visualized P(wi) Weight wi P(y1|1,w) Weight wi P(wi|1, y1) Automated Parameter Setting

Summary of incremental learning for runtime prediction • Have a probability distribution over the weights: • Start with a Gaussian prior, incremetally update it with more data • Given the Gaussian weight distribution, the predictions are also Gaussians • We know how uncertain our predictions are • For new domains, we will be very uncertain and only grow more confident after having seen a couple of data points Frank Hutter: Frank Hutter: Automated Parameter Setting

Domain for our experiments • SAT • Best studied NP-hard problem • Good features already exist [Nudelman et al.’04] • Lots of benchmarks • Stochastic Local Search (SLS) • Runtime prediction has never been done for SLS before • Parameter tuning is very important for SLS • Parameters are often continuous • SAPS algorithm [Hutter, Tompkins, Hoos ‘02] • Still amongst the state-of-the-art • Default setting not always best • Well, I also know it well ;-) • But the approach is applicable to about anything whenever we can compute features!! Automated Parameter Setting

Stochastic Local Search for SAT:Scaling and Probabilistic Smoothing (SAPS)[Hutter, Tompkins, Hoos ‘02] • Clause weighting algorithm for SAT, was state-of-the-art in 2002 • Start with all clause weights set to 1 • Hillclimbing until you hit a local minimum • In local minima: • Scaling: scale weights of unsatisfied clauses: wcÃ * wc • Probabilistic smoothing: with probability Psmooth, smooth all clause weights: wcÃ * wc + (1-) * average wc • Default parameter setting: (, , Psmooth) = (1.3,0.8,0.05) • Psmooth and  are very closely related Automated Parameter Setting

Benchmark instances • Only satisfiable instances! • SAT04rand: SAT ‘04 competition instances • mix: mix of lots of different domains from SATLIB: random, graph colouring, blocksworld, inductive inference, logistics, ... Automated Parameter Setting

Adaptive parameter setting vs. SAPS default on SAT04rand • Trained on mix and used to choose parameters for SAT04rand • 2 {0.5,0.6,0.7,0.8} • 2 {1.1,1.2,1.3} • For SAPS: #steps  time • Adaptive variant on average 2.5 times faster than default • But default is not strong here Automated Parameter Setting

Where uncertainty helps in practice: qualitative differences in training & test set • Trained on mix, tested on SAT04rand Estimates of uncertaintyof prediction Optimal prediction Automated Parameter Setting

Where uncertainty helps in practice (2):Zoomed to predictions with low uncertainty Optimal prediction Automated Parameter Setting

Conclusions • Automated parameter tuning is needed and feasible • Algorithm experts waste their time on it • Solver can automatically choose appropriate heuristics based on instance characteristics • Such a solver could be used in practice • Learns incrementally from the instances it solves • Uncertainty estimates prevent catastrophic errors in estimates for new domains Automated Parameter Setting

Future work along these lines • Increase predictive performance • Better features • More powerful ML algorithms • Active learning • Run most informative probes for new domains (need the uncertainty estimates) • Use uncertainty • Pick algorithm with maximal probability of success (not the one with minimal expected runtime!) • More domains • Tree search algorithms • CP Automated Parameter Setting

Future work along related lines • If there are no features: • Local search in parameter space to find the best default parameter setting [Hutter ‘04] • If we can change strategies while running the algorithm: • Reinforment learning for algorithm selection[Lagoudakis & Littman ‘00] • Low knowledge algorithm control[Carchrae and Beck ‘05] Automated Parameter Setting

The End • Thanks to • Youssef Hamadi • Kevin Leyton-Brown • Eugene Nudelman • You for your attention  Automated Parameter Setting

Automated Parameter Setting Based on Runtime Prediction:

Automated Parameter Setting Based on Runtime Prediction:

Presentation Transcript

Seal Surfer

Summary of part I: prediction and RL

EVIDENCE-BASED PRACTICE

Why use Fuzzy Logic in Power Electronics

API for CUIx Files And Runtime Ribbon in AutoCAD 2010

Windchill Architecture

Prediction of Watershed Runoff

Prediction and prevention of OHSS - an evidence-based approach

Runtime Software Monitoring

Protein Structure, Classification and Prediction BMI 730

Dependence-Based Value Prediction

Speech Recognition Chapter 3

Gene Prediction

Quality Prediction for Component-Based Software Development: Techniques and A Generic Environment

Gene Prediction: Computational Challenge

Classification and Prediction

TECO Servo Drives JSDA Series parameter description

Part 2: part-based models

Parameter Setting

Data Mining: Classification and Prediction