Lecture 10: Probability and Statistics (part 2)

COMP155Computer Simulation October 1, 2008 Lecture 10:Probability and Statistics (part 2)

This Week • Review of probability and statistics needed to understand simulation • follow Appendix C in Arena text • Outline • Monday (C.1, C.2): • Probability – basic ideas, terminology • Random variables, joint distributions • Today (C.3-C.5): • Sampling • Statistical inference – point estimation, confidence intervals, hypothesis testing

Sampling • Statistical analysis: purpose is to estimate or infer something about a large population • population is a set of data points • population is too large to look at completely,so we only look at a sample from the population • if the sample is randomly selected from the population, the distribution of the sample should be the same as the distribution of the population • in practice: determine a PMF or PDF for a sampleand assume that distribution holds for the entire population

Sampling • Random sampleis a set of independent and identically distributed (IID) observationsX1, X2, …, Xnfrom the population • Input modeling: • observations come from the real world • Arena’s input analyzer can be used to determine distribution function • Output analysis: • observations are the results of multiple runs/replications of the simulation • Arena’s output analyzer can be used to characterize the output population from the observations.

Sampling: Simulation Output • Random sampleis a set of independent and identically distributed (IID) observationsX1, X2, …, Xnfrom the population • Input modeling: • observations come from the real world • Arena’s input analyzer can be used to determine distribution function • Output analysis: • observations are the results of multiple runs/replications of the simulation • Arena’s output analyzer can be used to characterize the output population from the observations.

Estimating Distribution from Samples • Samples: X1, X2, …, Xnassuming a normal distribution, compute: • sample mean • sample variance • These statistics have their own sampling distribution, which is generally normal

Sampling Distributions • If • If underlying distribution of X is normal, then the distribution of is also normal.

Point Estimation • Point estimates are estimates of population distribution parameters (m, s2, …) • Properties of point estimates • Unbiased: E(estimate) = parameter • Efficient: Var(estimate) is lowest among competing point estimators • Consistent: Var(estimate) decreases (usually to 0) as the sample size increases

Confidence Intervals • A confidence interval quantifies the likely imprecision in a point estimator • An interval that contains (covers) the unknown population parameter some specified probability • Called a 100 (1 – a)% confidence interval for the parameter • Example: 87 < m < 123 with probability 95% • The value of m is in (87, 123) with 95% confidence • We’ll leave the computation of confidence intervals to a statistics course … or to Arena’s output analyzer tool.

Confidence Intervals in Simulation • Run simulation replications, get results • View each replication of the simulation as a data point • Form a confidence interval • The confidence interval tells you how close you are to getting the “true” expected output (what you’d get by averaging an infinite number of replications)

Hypothesis Tests • A hypothesis test is used to test some assertion about the population or its parameters • With sampling, we don’t get true/false result, only get evidence that points one way or another • Null hypothesis(H0) – what is to be tested • Alternate hypothesis(H1 or HA) – denial of H0 H0: m = 6 vs. H1: m 6 H0: s < 10 vs. H1: s 10 H0: m1 = m2 vs. H1: m1m2 • Develop a decision rule to decide on H0 or H1 based on sample data

Errors in Hypothesis Testing 1-α is the probability of your confidence interval

Hypothesis Testing in Simulation • Input side • Specify input distributions to drive the simulation • Collect real-world data on corresponding processes • “Fit” a probability distribution to the observed real-world data • Test H0: the data are well represented by the selected distribution • Output side • Have two or more “competing” designs modeled • Test H0: all designs perform the same on output, or test H0: one design is better than another • Selection of a “best” model scenario

Lecture 10: Probability and Statistics (part 2)