CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION , C OMMON R ANDOM N UMBERS , AND R ELATED M ETHOD

Slides for Introduction to Stochastic Search and Optimization (ISSO)by J. C. Spall CHAPTER 14SIMULATION-BASED OPTIMIZATIONI: REGENERATION, COMMON RANDOM NUMBERS, ANDRELATED METHODS Organization of chapter in ISSO Background Simulation-based optimization vs. model building Regenerative processes Special structure for loss estimation and optimization FDSA and SPSA in simulation-based optimization Improved convergence through common random numbers Discrete optimization via statistical selection

Background: Simulation-Based Optimization • Optimization arises in two ways in simulation: • Building simulation model (parameter estimation) • Using simulation for optimization of real system given that problem A has been solved • Focus here is problem B • Fundamental goal is to optimize design vector  in real system; simulation is proxy in optimization process • Loss function to be minimized L() represents average system performance at given ; simulation runs produce noisy (approximate) value of L() • Appropriate stochastic optimization method yields “intelligent” trial-and-error in choice of how to run simulation to find best 

Background (cont’d) • Many modern processes are studied by Monte Carlo simulation (manufacturing, defense, epidemiological, transportation, etc.) • Loss functions for such systems typically have form where Q(•) represents a function describing output of process based on Monte Carlo random effects in V • Simulation produces sample replications of Q(,V) (typically one simulation produces one value of Q(•)) • Examples of Q(•) might be defective products in manufacturing process, accuracy of weapon system, disease incidence in particular population, cumulative vehicle wait time at traffic signals, etc.

Background (cont’d) • Important assumption is that simulation is faithful representation of true system • Recall that overall goal is to find  that minimizes mean value of Q(,V) • Equivalent to optimizing average performance of true system • Simulation-based optimization rests critically on simulation and true system being statistically equivalent • As with earlier chapters, need optimization method to cope with noise in input information • Noisy measurements of loss function and/or gradient of loss function • Focus in this chapter is simulation-based optimization without direct (noisy or noise-free) gradient information

Comments on Gradient-Based and Gradient-Free Methods • In complex simulations, L/q (for use in deterministic optimization such as steepest descent) or Q/q (for use in stochastic gradient search [Chap. 5]) often not available • “Automatic differentiation” techniques (e.g., Griewank and Corliss, 1991) also usually infeasible due to software and storage requirements • Optimize q by using simulations to produce Q(,V) for varying q and V • Unlike Q/q (and E[Q(,V)]), Q(,V) is available in even the most complex simulations • Can use gradient-free optimization that allows for noisy loss measurements (since Q(,V)  E[Q(,V)] = L(), i.e., Q(,V) = L() + noise) • Appropriate stochastic approximation methods (e.g., FDSA, SPSA, etc.) may be used based on measurements Q(,V)

Regenerative Systems • Common issue in simulation of dynamic systems is choice of amount of time to be represented • Regeneration is useful for addressing issue • Regenerative systems have property of returning periodically to some particular probabilistic state; system effectively starts anew with each period • Queuing systems are common examples • Day-to-day traffic flow; inventory control; communications networks; etc. • Advantage is that regeneration periods may be considered i.i.d. random processes • Typical loss has form:

Queuing System with Regeneration; Periods Begin with Arrivals 1,3,4,7,11,16 (Example 14.2 in ISSO)

Care Needed in Loss Estimators for Optimization of Regenerative Systems • Optimization of  commonly based on unbiased estimators of L() and/or gradient • Straightforward estimator of L() is • Above estimator is biased in general (i.e., ) • Biasedness follows from relationship for positive random variable X • not acceptable estimator of L() in general • Special cases may eliminate or minimize bias (e.g., when length of period is deterministic; see Sect. 14.2 of ISSO) • For such special cases, is acceptable estimator for use in optimization

FDSA and SPSA in Simulation-Based Optimization • Stochastic approximation provides ideal framework for carrying out simulation-based optimization • Rigorous means for handling noisy loss information inherent in Monte Carlo simulation: y() = Q(,V) = L() + noise • Most other optimization methods (GAs, nonlinear programming, etc) apply only on ad hoc basis • “…FDSA, or some variant of it, remains the method of choice for the majority of practitioners” (Fu and Hu, 1997) • No need to know “inner workings” of simulation, as in gradient-based methods such as IPA, LR/SF, etc. • FDSA and SPSA-type methods much easier to use than gradient-based method as they only require simulation inputs/outputs

Common Random Numbers • Common random numbers (CRNs) provide a way for improving simulation-based optimization by reusing the Monte-Carlo-generated random variables • CRNs based on the famous formula for two random variables X, Y: var(X Y) = var(X) + var(Y)  2cov(X,Y) • Maximizing the covariance minimizes the variance of the difference • The aim of CRNs is to reduce variability of the gradient estimate • Improves convergence in algorithm

CRNs (cont’d) • For SPSA, the gradient variability is largelydriven by the numerator • Two effects contribute to variability:(i) difference due to perturbations (desirable)(ii) difference due to noise effects in measurements (undesirable) • CRNs useful for reducing undesirable variability in (ii) • Using CRNs maximizes covariance between two y() values in numerator • Minimizes variance of difference

CRNs (cont’d) • In simulation (vs. most real systems) some form of CRNs is often feasible • The essence of CRN is to use same random numbers in both and • Achieved by using same random number seed for both simulations and synchronizing the random numbers • Optimal rate of convergence of iterate to (à la k–/2 ) is k–1/2(Kleinman et al., 1999); this rate is same as stochastic gradient-based method • Rate is improvement on optimal non-CRN rate of k–1/3 • Unfortunately, “pure CRN” may not be feasible in large-scale simulations due to violating synchronization requirement • e.g., if  represents service rates in a queuing system, difference between and may allow additional (stochastic) arrivals to be serviced in one case

Numerical Illustration (Example 14.8 in ISSO) • Simulation using exponentially distributed random variables and loss function with p = dim() = 10 • Goal is to compare CRN and non-CRN •  is minimizing value for L() • Table below shows improved accuracy of solution under CRNs; plot on next slide compares rate of convergence

Rates of Convergence for CRN and Non-CRN (Example 14.9 in ISSO) 7.0 6.0 5.0 Non-CRN, =1 Mean Values of 4.0 3.0 2.0 CRN, =1 1.0 Non-CRN, =2/3 0 10,000 100,000 1000 100 n (log scale)

Partial CRNs • By using the same random number seed for and it is possible to achieve a partial CRN • Some of the events in the simulations will be synchronized due to common seed • Synchronization is likely to break down during course of simulation, especially for small k when ck is relatively large • Asymptotic analysis produces convergence rate identical to pure CRN since synchronization occurs as ck 0 • Also require new seed for simulations at each iteration (common for both y(•) values) to ensure convergence tominL() = minE[Q(,V)]) • In partial CRN,practical finite sample rate of convergence for SPSA tends to be lower than in pure CRN setting

Numerical Example: Partial CRNs(Kleinman et al., 1999; see p. 398 of ISSO) • A simulation using exponentially distributed random variables was conducted in Kleinman, et al. (1999) forp = 10 • Simulation designed so that it is possible to implement pure CRN (not available in most practical simulations) • Purpose is to evaluate relative performance of non-CRN, partial CRN, and pure CRN

Numerical Example (cont’d) • Numerical Results for 100 replications of SPSA and FDSA (no. of y(•) measurements in SPSA and FDSA are equal with total iterations of 10000 and 1000 respectively): • 0.0190 • 0.0071 • 0.0065 • 0.0410 • 0.0110 • 0.0064 • Non-CRN • Partial CRN • Pure CRN • Partial CRN offers significant improvement over non-CRN and SPSA outperforms FDSA (except in idealized pure CRN case)

Indifference Zone Methods for Choosing Best Option • Consider use of simulation to determine the best of K possible options, represented 1, 2,…, K • Simulation produces noisy loss measurements yk(i) • Other methods for discrete optimization (e.g., random search, simulated annealing, genetic algorithms, etc.) generally inappropriate • Suppose analyst is willing to accept any i such that L(i) is in indifference zone[L(), L()+) • Analyst can specify  such that P(correct selection)  1 whenever L(i)L()  for all i • Can use independent sampling or common random numbers (steps for independent sampling on next slide)

Two-Stage Indifference Zone Selection withIndependent Sampling Step 0 (initialization) Choose , , and initial sample size n0. • Step 1 (first stage)Run simulationn0times at eachi. • Step 2 (variance estimation) Compute sample variance at eachi. • Step 3 (sample sizes)Using above variance estimates and table look-up, compute the total sample size ni at eachi. • Step 4 (second stage)Run simulationni– n0 additional times at eachi. • Step 5 (sample means)Compute sample means of simulation outputs at eachiover allniruns. • Step 6 (decision step)Select the i corresponding to the lowest sample mean from step 5.

Two-Stage Indifference Zone Selection withCRN (Dependent) Sampling Step 0 (initialization) Choose , , and initial sample size n0. • Step 1 (first stage) Run simulation n0 times at each i. The kth simulation runs for the i are dependent. • Step 2 (variance estimation) Compute overall sample variance for Kn0 runs. • Step 3 (sample sizes)Using above variance estimate and table look-up, compute total sample size n; n applies for all i. • Step 4 (second stage)Run simulation n– n0additional times at each i. • Step 5 (sample means)Compute sample means of simulation outputs at eachiover all n runs. • Step 6 (decision step)Select the i corresponding to the lowest sample mean from step 5.

CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION , C OMMON R ANDOM N UMBERS , AND R ELATED M ETHOD

CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION , C OMMON R ANDOM N UMBERS , AND R ELATED M ETHOD

Presentation Transcript

A learning-based transportation oriented simulation system

Chapter 6: Birth Control

Search Engine Optimization (SEO)

Data Warehousing/Mining Comp 150 DW Chapter 8. Cluster Analysis

Cluster and Outlier Analysis

Chapter 7. Cluster Analysis

Python Programming: An Introduction to Computer Science

Conditional Random Fields

Chapter 6: Troubleshooting Addressing Services

Chapter 3. Discrete Random Variables and Probability Distributions

Chapter 7: Random Variables and Probability Distributions

Statistics

Monte Carlo Detector Simulation

Monte Carlo Detector Simulation

Chapter 3-2 Discrete Random Variables

Chapter 2-OPTIMIZATION

Chapter 3 Discrete Random Variables

Chapter 5. Joint Probability Distributions and Random Sample

Chapter 4. Continuous Random Variables and Probability Distributions

Chapter 6: Troubleshooting Addressing Services

Chapter 5: Continuous Random Variables