Materials for Lecture 11

# Materials for Lecture 11

Télécharger la présentation

## Materials for Lecture 11

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Materials for Lecture 11 • Chapters 3 and 6 • Chapter 16 Section 4.0 and 5.0 • Lecture 11 Pseudo Random LHC.xls • Lecture 11 Validation Tests.xls • Next 4 slides were added because right about now most students are confused about PDF parameters and what functions to use

2. Parameter Estimation • Parameters for a distribution define the shape and position on the number scale • Uniform( Min, Max) • Norm( Mean, Std Dev) • Empirical( Si, P(Si)) • Shape can be skewed right or left, can be tall or squatty (kurtosis) • Parameters reflect amount of variability in the stochastic variable • Must validate random variables against their parameters • We use the parameters to simulate the distributions

3. Review Steps for Parameter Estimation • Step 1: Check for presence of a trend, cycle or structural pattern • If present remove it & work with the residuals (ẽt) • If no trend or structural model, use actual data (X’s) • Step 2: Estimate parameters for several assumed distributions using the X’s or the residuals (ẽt) • Step 3: Simulate the different distributions • Step 4: Pick the best match based on • Mean, Variability -- use validation tests • Minimum and Maximum • Shape of the CDF vs. historical series • Penalty function CDFDEV() to quantify differences

4. Univariate Parameter Estimation • When do you use UPES? • When there is no trend in the data • When you want to use the historical mean as your forecasted y-hat • Test an unknown random variable for its shape

5. Univariate Parameter Estimation • Empirical distribution fits your data best because it lets the data define the shape • Prefer to use the EMP with deviations as a percent or fraction from Y-hat • If there is a trend, then account for it with deviations from trend • Else use deviations from mean • Allows us to model low probability events • Test with the CDFDEV

6. Validation • Do the simulated values for the random variables reproduce their parameters? • Does the model accurately forecast the system? • Do the results conform to theoretical expectations? • Do the results conform to expectations of experts? • Touring Test of simulation model results • Show the results to experts, using alternative assumptions about the input values

7. Four P’s for Validation • Planning – in the initial model preparation mode, developer should plan how to validate the model • Personal – it’s the developer’s responsibility to verify every equation, coefficient, and random variable; check if results are theoretically correct? • Peers – utilize experts in the field to review model results using Touring Test; use sensitivity testing of model • Prospective Clients – do the results conform to their expectations? Are the results useful to the client?

8. Verification • Check all equations for arithmetic accuracy • Use Excel’s “Trace Dependence” functions • Check model in “Expected Value” and “Stochastic” mode • Check linkage of variables coming into each equation • Insure that the variables in each equation are theoretically correct • Make sure the model contains all of the necessary equations to calculate the KOVs

9. Validation • Use statistical tests of each random variable to insure that it: • reproduces the historical distribution • reproduces the historical correlation matrix among random variables

10. Statistical Tests for Validation • Test the means of the random variables against their historical values • Statistically equal at 95% level based on a t-test? • Test the variance against historical values • Statistically equal at 95% level based on an F-test? • Check the historical vs. simulated coefficient of variation • Needs to be constant over time • Check the minimum and maximum • For a Normal distribution are they reasonable? • For an Empirical distribution compare simulated min and max to values the model “should” simulate or Xmin should get = Y-hat * (1+Minimum Fractional Deviate) Xmax should get = Y-hat * (1+Maximum Fractional Deviate) • Check the correlation matrix for the simulated variables vs. the historical correlation matrix using t-tests

11. Validation Tests in Simetar • Verification/Validation tests in Simetar • Ho Hi Icon for hypothesis tests • Compare Two Series Historical Data vs. Simulated Values • 1st Data Series is history • 2nd Data Series is simulated • Tests means and variances for two series, i.e., are they statistically equal • Test works for a pair of variables and for comparing two multivariate distributions (matrices)

12. Statistical Tests for Validation • Compare Two Series Historical Data vs. Simulated Values • 1st Data Series is history • 2nd Data Series is simulated

13. Validation Tests in Simetar • Compare mean and standard deviation of simulated data to the user’s specified values • “Data Series” is the simulated values • Type in the mean or cell • Specify the Std Dev as a value or a cell location • The test is used when • Only mean and std dev are known, i.e., there is no history for the variable • The mean is a projected value different from the history

14. Validation Tests in Simetar • Compare mean and standard deviation of simulated data to the user’s specified values • The test is used when only mean and std dev are known, i.e., there is no history for the variable Or the mean is a projected value different from history • Note the Given Values are Mean = 10 and Std Dev = 3

15. Validation Tests in Simetar • Test simulated values for Multivariate Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation • Data Series is the simulated values for all random variables in the MV distribution, a matrix of variables in SimData • The original correlation matrix used to simulate the MVE or MVN distribution • OK, if the majority of correlation coefficients are statistically the same as the historical correlation matrix

16. Charts for Validation • Test simulated values for Multivariate Distributions (MVE and MVN) to test if the historical correlation matrix is reproduced in the simulation

17. Using Charts for Validation • Use a CDF to compare historical series to simulated series, tests the min and max • Use a PDF to compare historical series to simulated series, tests the shape • Use a Box Plot to compare historical series to simulated series, checks the variability • Use a Probability graph to compare historical series to simulated series, P(x) vs. F(x) • Use a Fan graph to show the range of the risk and level of the mean over time, visual test of CV constant over time

18. How Simetar Simulates Random Numbers • A pseudo random number generator is used so we can reproduce the simulation results from day to day with the same inputs • Pseudo random number generator uses a seed to start the sampling sequence • The default seed in Simetar is 31517 • Change the seed if you like • If you do not use a pseudo random number generator then every time you simulate the model you get different answers, even if the input has not changed

19. Latin Hyper Cube vs. Monte Carlo Simulated Numbers • Monte Carlo simulation procedure samples randomly from the full range of the possible values for a random variable • Requires large number of iterations for adequate coverage over possible range of a variable • For small number of iterations does not sample adequately

20. Latin Hyper Cube vs. Monte Carlo Simulated Numbers • Latin Hyper Cube systematically samples all segments of the distribution for a random variable • If 100 iterations are to be simulated, LHC samples one value randomly from each of 100 intervals of equal length • Insures all segments of distribution are sampled, even at small numbers of iterations • With LHC get “adequate” sampling coverage of a distribution with fewer iterations

21. Latin Hyper Cube vs. Monte Carlo • A Uniform distribution defined as U(0,1) is a straight line with a 450 angle out of the origin • A perfect sample would lie on the straight line • Use the following USDs • Excel’s =RAND() • Simetar’s =UNIFORM() • Simulate these two USDs • Draw a CDF with the two random variables, Which one lies on the straight line between 0 and 1? 1.0 F(x) 0.0 X 1.0