Optimal Design for Experimental Inputs in Simulation

Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall CHAPTER 17OPTIMALDESIGNFOREXPERIMENTALINPUTS Organization of chapter in ISSO* Background Motivation Finite sample and asymptotic (continuous) designs Precision matrix and D-optimality Linear models Connections to D-optimality Key equivalence theorem Response surface methods Nonlinear models *Note: Appendix to these slides is brief discussion of factorial design (not in ISSO)

Optimal Design in Simulation • Two roles for experimental design in simulation • Building approximation to existing large-scale simulation via “metamodel” • Building simulation model itself • Metamodels are “curve fits” that approximate simulation input/output • Usual form is low-order polynomial in the inputs; linear in parameters  • Lineardesign theory useful • Building simulation model • Typically need nonlinear design theory • Some terminology distinctions: • “Factors” (statistics term) “Inputs” (modeling and simulation terms) • “Levels” “Values” • “Treatments”  “Runs”

Unique Advantages of Design in Simulation • Simulation experiments may be considered special case of general experiments • Some unique benefits occur due to simulation structure • Can control factors not generally controllable (e.g., arrival rates into network) • Direct repeatability due to deterministic nature of random number generators • Variance reduction (CRNs, etc.) may be helpful • Not necessary to randomize runs to avoid systematic variation due to inherent conditions • E.g., randomization in run order and input levels in biological experiment to reduce effects of change in ambient humidity in laboratory • In simulation, systematic effects can be eliminated since analyst controls nature

Design of Computer Experiments in Statistics • There exists significant activity among statisticians for experimental design based on computer experiments • T. J. Santner et al. (2003), The Design and Analysis of Computer Experiments, Springer-Verlag • J. Sacks et al (1989), “Design and Analysis of Computer Experiments (with discussion),” Statistical Science, 409–435 • Etc. • Above statistical work differs from experimental design with Monte Carlo simulations • Above work assumes deterministic function evaluations via computer (e.g., solution to complicated ODE) • One implication of deterministic function evaluations: no need to replicate experiments for given set of inputs • Contrasts with Monte Carlo, where replication provides variance reduction

General Optimal Design Formulation (Simulation or Non-Simulation) • Assume model z = h(,x) + v , where x is an input we are trying to pick optimally • Experimental design  consists of N specific input values x = iand proportions (weights) to these input values wi: • Finite-sample design allocates n N available measurements exactly; asymptotic (continuous) design allocates based on n 

D-Optimal Criterion • Picking optimal design  requires criterion for optimization • Most popular criterion is D-optimal measure • Let M(,) denote the “precision matrix” for an estimate of  based on a design  • M(,) is inverse of covariance matrix for estimate and/or • M(,) is Fisher information matrix for estimate • D-optimal solution is

Equivalence Theorem • Consider linear model • Prediction based on parameter estimate and “future” measurement vector hT is • Kiefer-Wolfowitz equivalence theorem states: D-optimal solution for determining  to be used in formingis the same  that minimizes the maximum variance of predictor • Useful in practical determination of optimal 

Variance Function as it Depends on Input: Optimal Asymptotic Design for Example 17.6 in ISSO

Orthogonal Designs • With linear models, usually more than one solution is D-optimal • Orthogonality is means of reducing number of solutions • Orthogonality also introduces desirable secondary properties • Separates effects of input factors (avoids “aliasing”) • Makes estimates for elements of  uncorrelated • Orthogonal designs are not generally D-optimal; D-optimal designs are not generally orthogonal • However, some designs are both • Classical factorial (“cubic”) designs are orthogonal (and often D-optimal)

Example Orthogonal Designs, r = 2 Factors x x k 2 k 2 x x k 1 k 1 r design) design) Cube (2 Star (2r

xk2 xk2 xk1 xk1 xk3 Star (2r design) Cube (2r design) Example Orthogonal Designs, r = 3 Factors xk3

Response Surface Methodology (RSM) • Suppose want to determine inputs x that minimize the mean response z of some process (E(z)) • There are also other (nonoptimization) uses for RSM • RSM can be used to build local models with the aim of finding the optimal x • Based on building a sequence of local models as one moves through factor (x) space • Each response surface is typically a simple regression polynomial • Experimental design can be used to determine input values for building response surfaces

Steps of RSM for Optimizing x Step 0 (Initialization)Initial guess at optimal value of x. Step 1 (Collect data) Collect responses z from severalxvalues in neighborhood of current estimate of best x value (can use experimental design). Step 2 (Fit model) From the x, z pairs in step 1, fit regression model in region around current best estimate of optimal x. Step 3 (Identify steepest descent path) Based on response surface in step 2, estimate path of steepest descent in factor space. Step 4 (Follow steepest descent path) Perform series of experiments at xvalues along path of steepest descent until no additional improvement in z response is obtained. This x value represents new estimate of best vector of factor levels. Step 5 (Stop or return)Go to step 1 and repeat process until final best factor level is obtained.

Conceptual Illustration of RSM for Two Variables in x; Shows More Refined Experimental Design Near Solution Adapted from: Montgomery (2005), Design and Analysis of Experiments, Fig. 11-3

Nonlinear Design • Assume model z = h(,x) + v , where  enters nonlinearly and x is r-dimensional input vector • D-optimality remains dominant measure • Maximization of determinant of Fisher information matrix (from Chapter 13 of ISSO: Fn(,X) is Fisher information matrix based on n inputs in n×r matrix X) • Fundamental distinction from linear case is that D-optimal criterion depends on  • Leads to conundrum: Choosing X to best estimate , yet need to know  to determineX

Strategies for Coping with Dependence on  • Assume nominal value of  and develop an optimal design based on this fixed value • Sequential design strategy based on an iterated design and model fitting process. • Bayesian strategy where a prior distribution is assigned to , reflecting uncertainty in the knowledge of the true value of 

Sequential Approach for Parameter Estimation and Optimal Design Step 0 (Initialization) Make initial guess at , Allocate n0 measurements to initial design. Set k = 0 and n = 0. • Step 1 (D-optimal maximization)Given Xn, choose the nk inputs in X = to maximize • Step 2 (Update  estimate)Collect nk measurements based on inputs from step 1. Use measurements to update from to • Step 3 (Stop or return)Stop if the value of  in step 2 is satisfactory. Else return to step 1 with the new k set to the former k + 1 and the new n set to the former n + nk (updated Xn now includes inputs from step 1).

Comments on Sequential Design • Note two optimization problems being solved: one for , one for  • Determine next nk input values (step 1) conditioned on current value of  • Each step analogous to nonlinear design with fixed (nominal) value of  • “Full sequential” mode (nk = 1) updates  based on each new inputouput pair (xk, zk) • Can use stochastic approximation to update : where

Bayesian Design Strategy • Assume prior distribution (density) for , p(), reflecting uncertainty in the knowledge of the true value of . • There exist multiple versions of D-optimal criterion • One possible D-optimal criterion: • Above criterion related to Shannon information • While log transform makes no difference with fixed , it does affect integral-based solution • To simplify integral, may be useful to choose discrete prior p()

Appendix to Slides for Chapter 17: Factorial Design (not in ISSO; see ref. [1] below) • Classical experimental design deals with linear models • Factorial design is most popular classical method • All r inputs (“factors”) changed at one time (note: ref. [1] uses notation m instead of r) • Factorial design provides two key advantages over one-at-a-time changes: • Greater efficiency in extracting information from given number of experiments • Ability to determine if there are interaction effects • Standard method is 2r factorial; “2” comes about by looking at each input at two levels: low () and high (+) • E.g., if r = 3, then have 23 = 8 input combinations: ( ), (+  ), ( + ), (  +), (++), (+  +), ( + +), (+ + +) [1] Spall, J. C. (2010), “Factorial Design for Choosing Input Values in Experimentation: Generating Informative Data for System Identification,” IEEE Control Systems Magazine, vol. 30(5), pp. 38−53.

Appendix to Slides (cont’d): Factorial Design with 3 Inputs • Consider r = 3 linear model zk = 0+ 1xk1 + 2xk2 + 3xk3 + 4xk1xk2 + 5xk1xk3 + 6xk2xk3 + 7xk1xk2xk3 + noise, where  = [0, 1,…, 7]T represents vector of (unknown) parameters and xki represents ith term in input vector xk • 23 factorial design allows for efficient estimation of all parameters in  • In contrast, one-at-a-time provides no information for estimating 4 to7 • However, 23 factorial design must be augmented in some way if wish to add quadratic (e.g., ) or other higher-order polynomial terms to model

zk (++) Xk1= high (+) (+) Xk1= low () xk2 Appendix to Slides (cont’d): Illustration of Interaction with 2 Inputs • Example responses for r = 2: no interaction and interactionbetween input variables • Left plot (no interaction) shows that change in zkwith change inxk2does not depend onxk1; right plot (interaction) shows change in zkdoesdepend onxk1 No interaction Interaction zk Xk1= high (+) (+) Xk1= low () (++) xk2

Appendix to Slides (cont’d): Efficiency of Factorial Design for Main Effects 8 • Factorial design estimates “main effects” (non-interaction) with greater efficiency than one-at-a-time changes • Plot below based on same accuracy in estimation for the two methods 7 6 Ratio of number of runs needed: one-at-a-time / factorial 5 4 3 2 1 2 4 6 8 10 12 14 16 Input dimension r

Optimal Design for Experimental Inputs in Simulation

Optimal Design for Experimental Inputs in Simulation

Presentation Transcript

D O - I T - O N - A - D I M E

D esign patterns

Game D esign

O rtho T ech D esign Ltd.

A nalyze , D esign , D evelop , I mplement , E valuate

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS

E leven D esign P rinciples

O rto T ech D esign Ltd.

D ESIGN FOR C ONSTRAINTS

Chapter 17 Binary I/O