1 / 65

Contact Information

Contact Information. Dr. Daniel Simons Vancouver Island University Faculty of Management Building 250 - Room 416 Office Hours: MTW 11:30 – 12:30 simonsd@viu.ca. Suggestions for Best Individual Performance. Attend all classes

dyllis
Télécharger la présentation

Contact Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Contact Information Dr. Daniel Simons Vancouver Island University Faculty of Management Building 250 - Room 416 Office Hours: MTW 11:30 – 12:30 simonsd@viu.ca

  2. Suggestions for Best Individual Performance Attend all classes Take notes. Course covers a lot of material and your notes are essential Complete all assignments (not for grade) Read the book Participate, enrich class discussion, provide feedback and ask questions Revise materials between classes, integrate concepts, make sure you understand the tools and their application Don’t hesitate to contact me if necessary

  3. Evaluation Method Tests have a mix of problems that evaluate • Concepts • Problem sets (assignments) • Class applications • Readings • New applications • Closed book time constrained tests to reward knowledge and speed • Each test covers slides, assignments, and required readings. • Evaluation system may not be perfect but it works

  4. Brief Overview of the Course

  5. This course is about using data to measure causal effects.

  6. STATISTICAL PRINCIPLES A review of the basic principles of statistics used in business settings. REVIEW OF QUME 232

  7. Basic Statistical Concepts • Important that students are comfortable with the following: • Concept of random variable (whether discreet or continuous) and their associated probability functions • Cumulative, marginal, conditional and joint probability functions • Mathematical expectations, concept of independence • Bernoulli, Binomial, Poisson, Uniform, Normal, t, F and χ2 distributions

  8. Summations • The S symbol is a shorthand notation for discussing sums of numbers. • It works just like the + sign you learned about in elementary school.

  9. Algebra of Summations

  10. Summations: A Useful Trick

  11. Double Summations • The “Secret” to Double Summations: keep a close eye on the subscripts.

  12. Descriptive Statistics • How can we summarize a collection of numbers? • Mean: the arithmetic average. The mean is highly sensitive to a few large values (outliers). • Median: the midpoint of the data. The median is the number above which lie half the observed numbers and below which lie the other half. The median is not sensitive to outliers.

  13. Descriptive Statistics (cont.) • Mode: the most frequently occurring value. • Variance: the mean squared deviation of a number from its own mean. The variance is a measure of the “spread” of the data. • Standard deviation: the square root of the variance. The standard deviation provides a measure of a typical deviation from the mean.

  14. Descriptive Statistics (cont.) • Covariance: the covariance of two sets of numbers, X and Y, measures how much the two sets tend to “move together.” If Cov(X,Y)  0, then if X is above its mean, we would expect that Y would also be above its mean.

  15. Descriptive Statistics (cont.) • Correlation Coefficient: the correlation coefficient between X and Y “norms” the covariance by the standard deviations of X and Y. You can think of this adjustment as a unit correction. The correlation coefficient will always fall between -1 and 1.

  16. A Quick Example

  17. A Quick Example (cont.)

  18. A Quick Example (cont.)

  19. Populations and Samples • Two uses for statistics: • Describe a set of numbers • Draw inferences from a set of numbers we observe to a larger population • The population is the underlying structure which we wish to study. Surveyors might want to relate 6000 randomly selected voters to all the voters in the United States. Macroeconomists might want to relate data about unemployment and inflation from 1958–2004 to the underlying process linking unemployment and inflation, to predict future realizations.

  20. Populations and Samples (cont.) • We cannot observe the entire population. • Instead, we observe a sample drawn from the population of interest. • In the Monte Carlo demonstration from last time, an individual dataset was the sample and the Data Generating Process described the population.

  21. Populations and Samples (cont.) • The descriptive statistics we use to describe data can also describe populations. • What is the mean income in the United States? • What is the variance of mortality rates across countries? • What is the covariance between gender and income?

  22. Populations and Samples (cont.) • In a sample, we know exactly the mean, variance, covariance, etc. We can calculate the sample statistics directly. • We must infer the statistics for the underlying population. • Means in populations are also called expectations.

  23. Populations and Samples (cont.) • If the true mean income in the United States is b, then we expect a simple random sample to have sample mean b. • In practice, any given sample will also include some “sampling noise.” We will observe not b, but b + e. • If we have drawn our sample correctly, then on average the sampling error over many samples will be 0. • We write this as E(e) = 0

  24. Probability • A random variable X is a variable whose numerical value is determined by chance, the outcome of a random phenomenon • A discrete random variable has a countable number of possible values, such as 0, 1, and 2 • A continuous random variable, such as time and distance, can take on any value in an interval • A probability distribution P[Xi] for a discrete random variable X assigns probabilities to the possible values X1, X2, and so on • For example, when a fair six-sided die is rolled, there are six equally likely outcomes, each with a 1/6 probability of occurring

  25. Mean, Variance, and Standard Deviation • The expected value (or mean) of a discrete random variable X is a weighted average of all possible values of X, using the probability of each X value as weights: • (17.1) • the variance of a discrete random variable X is a weighted average, for all possible values of X, of the squared difference between X and its expected value, using the probability of each X value as weights: • (17.2) • The standard deviationσ is the square root of the variance

  26. Continuous Random Variables • Our examples to this point have involved discrete random variables, for which we can count the number of possible outcomes: • The coin can be heads or tails; the die can be 1, 2, 3, 4, 5, or 6 • For continuous random variables, however, the outcome can be any value in a given interval • A continuous probability density curve shows the probability that the outcome is in a specified interval as the corresponding area under the curve

  27. Expectations • Expectations are means over all possible samples (think “super” Monte Carlo). • Means are sums. • Therefore, expectations follow the same algebraic rules as sums. • See the Statistics Appendix for a formal definition of Expectations.

  28. Algebra of Expectations • k is a constant. • E(k) = k • E(kY) = kE(Y) • E(k+Y) = k + E(Y) • E(Y+X) = E(Y) + E(X) • E(SYi ) = SE(Yi ), where each Yi is a random variable.

  29. Variances • Population variances are also expectations.

  30. Algebra of Variances • One value of independent observations is that Cov(Yi ,Yj ) = 0, killing all the cross-terms in the variance of the sum.

  31. 2 random variables: joint and marginal distributions • The joint probability distribution of X and Y, two discrete random variables, is the probability that the two variables simultaneously take on certain values, say x and y. Example: Weather conditions and commuting times • Let Y = 1 if the commute is short (less than 20 minutes) and = 0 if otherwise. • Let X = 0 if it is raining and 0 if it is not The joint probability is the frequency with which each of the four possible outcomes (X=0,Y=0) (X=1,Y=0) (X=0,Y=1) (X=1,Y=1) occurs over many repeated commutes

  32. Joint probability Distribution Over many commutes, 15% of the days have rain and long commute, that is P(X=0, Y=0) 0.15. This is a joint probability distribution

  33. Marginal Probability Distribution The marginal probability distribution of a random variable X is just another name for its probability distribution. The marginal distribution of X from above is Find E(X) and Var(X)

  34. Conditional Probability Distribution The probability distribution of a random variable X conditional on another random variable Y taking on a specific value. The probability of X given Y. P(X=x|Y=y) P(X=x|Y=y) = P(X=x, Y=y)/ P(Y=y) Conditional probability of X given Y = joint probability of x and y divided by marginal probability of Y (the condition)

  35. Conditional Distribution P(Y=0|X=0) = P(X=0,Y=0)/ P(X=0) = 0.15/0.30 =0.5 Conditional Expectation: The conditional expectation of Y given X, that is the conditional mean of Y given X, is the mean of the conditional distribution of Y given X

  36. The expected number of long commutes given that it is raining is E(Y|X=0) = (0)*(0.15/0.30) + (1)*(0.15/.30)=0.5

  37. Law of Iterated Expectations • The expected value of the expected value of Y conditional on X is the expected value of Y. • If we take expectations separately for each subpopulation (each value of X), and then take the expectation of this expectation, we get back the expectation for the whole population.

  38. Independence Two random variables X and Y are independently distributed, or independent, if knowing the value of one of the variables provides no information about the other. Specifically, when E(Y|X) = E(Y) Or alternatively, P(Y=y|X=x) = P(Y=y) for all values of x and y Or P(Y=y,X=x) = P(Y=y)*P(X=x) That is, the joint distribution of two independent variables is the product of their marginal distributions

  39. Independence Are commuting times and weather conditions independent? P(Y=0,X=0) = 0.15 P(Y=0) * P(X=0) = 0.22 * 0.3 = 0.066 Since X and Y are NOT independent

  40. Covariance and Correlation Covariance is a measure of the extent to which two random variables move together If X and Y are independent then the covariance is zero but a covariance of zero does not imply independence. A zero covariance implies only linear independence

  41. Covariance and Correlation There is a positive relationship between commuting times and weather conditions

  42. Correlation Correlation solves the units problem of covariance. It is also a measure of dependence. It is unitless and has values between -1 and 1. A value of zero implies that X and Y are uncorrelated.

  43. Standardized Variables • To standardize a random variable X, we subtract its mean and then divide by its standard deviation : (17.3) • No matter what the initial units of X, the standardized random variable Z has a mean of 0 and a standard deviation of 1 • The standardized variable Z measures how many standard deviations X is above or below its mean: • If X is equal to its mean, Z is equal to 0 • If X is one standard deviation above its mean, Z is equal to 1 • If X is two standard deviations below its mean, Z is equal to –2 • Figures 17.4 and 17.5 illustrates this for the case of dice and fair coin flips, respectively

  44. Figure 17.4a Probability Distribution for Six-Sided Dice, Using Standardized Z

  45. Figure 17.4b Probability Distribution for Six-Sided Dice, Using Standardized Z

  46. Figure 17.4c Probability Distribution for Six-Sided Dice, Using Standardized Z

  47. Figure 17.5a Probability Distribution for Fair Coin Flips, Using Standardized Z

  48. Figure 17.5b Probability Distribution for Fair Coin Flips, Using Standardized Z

  49. Figure 17.5c Probability Distribution for Fair Coin Flips, Using Standardized Z

  50. The Normal Distribution • The density curve for the normal distribution is graphed in Figure 17.6 • The probability that the value of Z will be in a specified interval is given by the corresponding area under this curve • These areas can be determined by consulting statistical software or a table, such as Table B-7 in Appendix B • Many things follow the normal distribution (at least approximately): • the weights of humans, dogs, and tomatoes • The lengths of thumbs, widths of shoulders, and breadths of skulls • Scores on IQ, SAT, and GRE tests • The number of kernels on ears of corn, ridges on scallop shells, hairs on cats, and leaves on trees

More Related