Verification and Validation

Verification and Validation

Verification and Validation Statistical analysis of steady-state simulations Warmup and run length Truncated replications Batching What we will do…

In a simulation, the real-world system is abstracted by a conceptual model a series of mathematical and logical relationships concerning the components and structure of the system. The conceptual model is then coded into a computer recognizable form (i.e., an operational model), which we hope is an accurate imitation of the real-world system. The accuracy of the simulation must be checked before we can make valid conclusions based on the results from a number of runs. Real World Conceptual Model Operational Model Quantitative Models “Code” V & V Analytical Models V & V Introduction

This checking process consists of two main components: Verification: Is “Code” = Model? (debugging) Determine if the computer implementation of the conceptual model is correct. Does the computer code represent the model that has been formulated? Validation: Is Model = System? Determine if the conceptual model is a reasonable representation of the real-world system. V & V is an iterative process to correct the “Code” errors and modify the conceptual model to better represent the real-world system The Truth: Can probably never completely verify, especially for large models Introduction Cont…

Incorrect data Mixed units of measure Hours Vs. Minutes Blockages and dead locks Seize a resource but forgot to release Forgot to dispose the entity at the end Incorrectly overwriting attributes and variables Names Incorrect indexing When you index beyond available queues and resources Common Errors While Developing Models

Verification is debugging of code so conceptual model is accurately reflected by the operational model Various common sense suggestions that can be used in the verification process: Write the simulation program in a logical, well-ordered manner. Make use of detailed flowcharts when writing the code Make the code as self-documenting as possible. Define all variables and state the purpose of each section of the program. Have the computer code checked by more than one person. Verification

Check to see that the values of the input parameters have not been changed inadvertently during the course of a simulation run. For a variety of input parameter values, examine the output of simulation runs for reasonableness. Use traces to check that the program performs as intended. Break point: Stop at a particular block Watch point: Stop when a condition is true NQ(1) > 10 (If the queue length is > 10, stop and check) Intercept: Stop whenever a particular entity moves VerificationCont…

Some techniques to attempt verification Eliminate error messages (obviously) Single entity release, Step through logic Set Batch Size = 1 in Arrive Replace distributions with a constant “Stress” model under extreme conditions Performance estimation Look at generated SIMAN .mod and .exp files Run > SIMAN > View Verification Cont…

Process of developing confidence that inference drawn on model tell us something about the real system Conceptual Validity Does the model structured, adequately represent the system? Rationalism Operational Validity Is behavior of model is characteristic of real world system? Empericalism Believability Do ultimate users have confidence in this model? Validation

A variety of subjective and objective techniques can be used to validate the conceptual model. Face Validity Validation of Model Assumptions Validating Input-Output Transformations Validation

A conceptual model must be reasonable “on its face” to those who are knowledgeable about the real-world system. Have experts examine the assumptions or the mathematical relationships of the conceptual model for correctness. Such a critique by experts would be of aid in identifying any deficiencies or errors in the conceptual model (Turing Test: compare simulation Vs actual system) The credibility of the conceptual model would be enhanced as these deficiencies are corrected during the iterative verification and validation process. If the conceptual model is not overly complicated, additional methods can be used to check face validity. Conduct a manual trace of the conceptual model. Perform elementary sensitivity analysis by varying selected “critical” input parameters and observing whether the model behaves as expected. Face Validity

We consider two types of model assumptions: Structural assumptions – i.e., assumptions concerning the operation of the real-world system Data assumptions Structural assumptions can be validated by observing the real-world system and by discussing the system with the appropriate personnel Validation of Model Assumptions

We could make the following structural assumptions about the queues that form in the customer service area at a bank. Patrons form one long line, with the person at the front of the line receiving service as soon as one of the tellers becomes idle. A customer might leave the line if the others in line are moving too slowly. A customer seeing 10 or more patrons in the system may decide not to join the line. Validation of Model Assumptions – Examples

Assumptions concerning the data that are collected may also be necessary. Consider the interarrival times at the above bank during peak banking periods. Could assume these interarrivals are i.i.d. exponential random variables. In order to validate these assumptions, we should proceed as follows. Consult with bank personnel to determine when peak banking periods occur. Collect interarrival data from these periods. Conduct a statistical test to check that the assumption of independent interarrivals is reasonable. Estimate the parameter of the (supposedly) exponential distribution. Conduct a statistical goodness-of-fit test to check that the assumption of exponential interarrivals is reasonable. Validation of Model Assumptions – Examples

We can treat the conceptual model as a function that transforms certain input parameters into output performance measures. In the banking example, input parameters could include: The distributional forms of the patron interarrival times and teller service times. The number of tellers present. The customer queuing discipline. The average customer waiting time and server utilization might be the output performance measures of interest. The basic principle of input-output validation is the comparison of output from the verified operating model to data from the real-world system. Input-output validation requires that the real-world system currently exist. Validating Input – Output Transformations

One method of comparison uses the familiar t test. Suppose we collected data from the bank under study, and the average customer service time during a particular peak banking period was 2.50 minutes. Further suppose that five independent simulation runs of this banking period were conducted (and that the simulations were all initialized under the same conditions). The average customer service times from the five simulations were 1.60, 1.75, 2.12, 1.94, 1.89 minutes. Example

We would expect the simulated average service times to be consistent with the observed average service time. Therefore, the hypothesis to be tested is: H0 : E[Xi] = 2.50 min Versus H1 : E[Xi] ≠ 2.50 min where Xi is the random variable corresponding to the average customer service time from the ith simulation run. Example Cont…

Define μ0 = 2.50 (= E[Xi] under H0), n = 5 (the number of independent simulation runs), n ∑ Xi i=1 n (sample mean of runs) (sample variance of runs) = Example Cont…

By design and a central limit theorem, the Xi’s are approximately i.i.d. normal random variables. So, (X – μ0) t0 = S / n For this example, = 1.86, S2 = 0.0387, and t0 = -7.28 X Example Cont… Is approximately a t random variable with n-1 degrees of freedom if H0 is true. Taking  = 0.05, t table gives t4,0.025 = 2.78 Therefore H0 is rejected. This suggests that our operational model does not produce realistic customer service times. Changes in the conceptual model or computer code may be necessary, leading to another iteration of the verification and validation process.

Suppose we have validated the conceptual model (and verified the associated simulation code) of the existing real-world system. So we can say that the simulation adequately mimics the real-world system. And we can assume that some non-existing system of interest and our conceptual model have only minor differences. If we wish to compare the real-world system to non-existing systems with alternative designs or with different input parameters, the conceptual model (and associated code) should be robust. Should be able to make small modifications in our operational model and then use this new version of the code to generate valid output performance values for the non-existing system. Such minor changes might involve certain numerical input parameters (e.g., the customer inter-arrival rate) or the form of a certain statistical distribution (e.g., the service time distribution). But it may be difficult to validate the model of a non-existing system if it differs substantially from the conceptual model of the real-world system. Robustness

Instead of running the operational model with artificial input data, we could drive the model with the actual historical record. Then it’s reasonable to expect the simulation to yield output results very close to those observed from the real-world system. Historical Data Validation

Suppose we have collected interarrival and service time data from the bank during n independent peak periods. Let Wj denote the observed average customer waiting time from the jth peak period, j = 1…n. For fixed j, we can drive the operational model with the actual interarrival and service times to get the (simulated) average customer waiting time Yj. We hope that Dj ≡ Wj – Yj ≈ 0 for all j. We could do a paired t test to test H0: E[Dj] = 0 Example Outline

Steady State Simulation

Terminating: Specific starting, stopping conditions Run length will be well-defined (and finite; Known starting and stopping conditions) Steady-state: Long-run (technically forever) Theoretically, initial conditions don’t matter (but practically they usually do) Not clear how to terminate a simulation run (theoretically infinite) Interested in system response over long period of time This is really a question of intent of the study Has major impact on how output analysis is done Sometimes it’s not clear which is appropriate Time Frame of Simulations

The main difficulty is to obtain independent simulation runs with exclusion of the transient period . If model warms up very slowly, truncated replications can be costly Have to “pay” warm-up on each replication Two techniques commonly used for steady state simulation are: Method of Batch means, and Independent Replication. None of these two methods is superior to the other in all cases. Techniques for Steady State Simulation

Most models start empty and idle Empty: No entities are present at time 0 Idle: All resources are idle at time 0 In a terminating simulation this is OK if realistic In a steady-state simulation, though, this can bias the output for a while after startup Usually downward (results are biased low) in queueing-type models that eventually get congested Depending on model, parameters, and run length, the bias can be very severe Warm Up and Run Length

Remedies for initialization bias Better starting state, more typical of steady state Throw some entities around the model How do you know how many to throw and where? This is what you’re trying to estimate in the first place! Make the run so long that bias is overwhelmed Might work if initial bias is weak or dissipates quickly Let model warm up, still starting empty and idle Run > Setup > Replication Parameters: Warm-up Period Time units! “Clears” all statistics at that point for summary report, any Outputs saved data from Statistic module of results across replications Warm Up and Run Length (cont’d.)

Method of Independent Replications

Suppose you have n equal batches of m observations each. n m n ∑ ∑ ∑ Xij i=1 i=1 j=1 meani Overall estimate is: Estimate = n (meani – Estimate)2 Where the variance S2 = n - 1 Method of Independent Replications (cont’d.) The mean of each batch is: meani = m The 100(1 - /2)% CI using t table is: [ Estimate  t S ]

Warm-up and run length times? Most practical idea: preliminary runs, plots Simply “eyeball” them Be careful about variability — make multiple replications, superimpose plots Also, be careful to note “explosions” Possibility – different Warm-up Periods for different output processes To be conservative, take the max Must specify a single Warm-up Period for the whole model Warm Up and Run Length (cont’d.)

Example: Lengthen Replications to 5 days (7200 min), do 10 Replications Warm Up and Run Length (cont’d.)

If you can identify appropriate warm-up and run-length times, just make replications as for terminating simulations Only difference: Specify Warm-up Period inRun > Setup > Replication Parameters Proceed with confidence intervals, comparisons, all statistical analysis as in terminating case So… What should be the length of warm-up period? Abate J., and W. Whitt, Transient behavior of regular Brownian motion, Advance Applied Probability, 19, 560-631, 1987 Truncated Replications

Alternative: Just one R E A L L Y long run Only have to “pay” warm-up once Problem: Have only one “replication” and you need more than that to form a variance estimate (the basic quantity needed for statistical analysis) Big no-no: Use the individual points within the run as “data” for variance estimate Usually correlated (not indep.), variance estimate biased Batching in a Single Run

Break each output record from the run into a few large batches Tally (discrete-time) outputs: Observation-based Time-Persistent (continuous-time): Time-based Take averages over batches as “basic” statistics for estimation: Batch means Tally outputs: Simple arithmetic averages Time-Persistent: Continuous-time averages Treat batch means as IID Key: batch size must be big enough for low correlation between successive batches Still might want to truncate (once, time-based) Batching in a Single Run (cont’d.)

Batching in a Single Run (cont’d.)

Suppose you have n equal batches of m observations each. n m n ∑ ∑ ∑ Xij i=1 i=1 j=1 meani Overall estimate is: Estimate = n (meani – Estimate)2 Where the variance S2 = n - 1 Batching in a Single Run (cont’d.) The mean of each batch is: meani = m The 100(1 - /2)% CI using t table is: [ Estimate  t S ]

One replication of 50 days (about the same effort as 10 replications of 5 days each) Batching in a Single Run (cont’d.) How to choose batch size? Equivalently, how to choose the number of batches for a fixed run length? Want batches big enough so that batch means appear uncorrelated.

Arena automatically attempts to form 95% confidence intervals on steady-state output measures via batch means from within each single replication “Half Width” column in reports from one replication In Category Overview report if you just have one replication In Category by Replication report if you have multiple replications Ignore if you’re doing a terminating simulation Won’t report anything if your run is not long enough “(Insufficient)” if you don’t have the minimum amount of data Arena requires even to form a CI “(Correlated)” if you don’t have enough data to form nearly-uncorrelated batch means, required to be safe Batching in a Single Run (cont’d.)

Verification and Validation