Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 2 – AExpectations of Random Variables 2-A Expectations of Random Variables 2-B Covariance and Correlation 2-C Limit Results for Sums

Expected Value of a Random Variable Weighted average of the values taken by the variable

Discrete Uniform • X = 1,2,…,J • Prob(X = x) = 1/J • E[X] = 1/J + 2/J + … + J/J = J(J+1)/2 * 1/J = (J+1)/2 • Expected toss of a die = 3.5 (J=6) • Expected sum of two dice = 7. Proof?

Poisson ()

Poisson (5)

The St. Petersburg Paradox • Coin toss game. If first heads comes up on nth toss, you win $2n • Entry fee to play a game is $C • Expected value of the game = E[Win] -C + (½)21 + (½)222 + … + (½)k2k  Game has infinite value. Noone would pay very much to play. Why not?

Continuous Random Variable

Gamma Random Variable

Gamma Function: (1/2)=

Expected Value of a Linear Translation • Z = aX+b • E[Z] = aE[X] + b • Proof is trivial using the definition of the expected value and the fact that the density integrates to 1 to have E[b]=b.

Normal(,) Variable • From the definition of the random variable,  is the mean. • Proof in Rice (119) uses the linear translation. • If X ~ N[0,1], X +  ~ N(,)

Cauchy Random Variables • f(x)=(1/) 1/(1+x2) • Mean does not exist. No higher moments exist. • If X~N[0,1] and Y ~ N[0,1] then X/Y has the Cauchy distribution. • Many applications obtain estimates of interesting quantities as ratios of estimators that are normally distributed.

Cauchy Random Sample

Expected Value of a Function of X • Y=g(X) • One to one case • E[Y] = expected value of Y(X) – find the distribution of the new variable • E[g(X)] = x g(x)f(x) will equal E[Y] • Many to one case – similar argument. Proceed without the transformation of the random variable. • E[g(X)] is generally not equal to g(E[X]) if g(X) is not linear

Linear Translation • Z = aX+b • E[Z] = E[aX+b] • E[Z] = aE[X] + b • Proof is trivial using the definition of the expected value and the fact that the density integrates to 1 to E[b]=b.

Powers of x - Moments • Moment = E[Xk] for positive integer x • Raw moment: E[Xk] • Central moment: E[(X – E[X])k] • Standard notation • E[Xk] = k • E[(X – E[X])k] = k • Mean = 1 = 

Variance as a g(X) • Variance = E[(X – E[X])2] • Standard deviation = square root of variance is usually more interesting

Variance of a Translation: Y = a + bX • Var[a] = 0 • Var[bX] = b2Var[X] • Standard deviation of Y = |b|S.D.(X)

Shortcut • Var[X] = E[X2] - {E[X]}2

Bernoulli • Prob(X=1)=; Prob(X=0)=1- • E[X] = 0(1- ) + 1 =  • E[X2] = 02(1- ) + 12 =  • Var[X] =  - 2 = (1-)

Poisson: Factorial Moment

Normal Moments

Gamma Random Variable

Chi Squared [1] • Chi squared [1] = Gamma(½, ½) P = ½ ,  = ½ • Mean = P/ = (½)/(½) = 1 • Variance = P/2= (½)/[(½)2] = 2

Higher Moments • Skewness: 3. • 0 for all symmetric distributions (not just the normal) • Standardized measure 3/3 • Kurtosis: 4. • Standardized 4/4. • Compare to normal, 3 • Degree of excess = 4/4 – 3.

Symmetric and Skewed Distributions

Kurtosis: t[5] vs. Normal Kurtosis of normal(0,1) = 3, Excess = 0 Excess Kurtosis of t[k] = 6/(k-4); for t[5] = 6/(5-4) = 6.

Approximations for g(X) • g(X) = continuous function • g() exists • Continuous first derivative not equal to zero at  • Taylor series approximation around mu • g(X) = g() + g’()(X- ) + ½ g’’()(X- )2 (+ higher order terms)

Approximation to the Mean • g(X) ~ g() + g’()(X- ) + ½ g’’()(X -)2 • E[g(X)] ~ E[approximation] = g() + 0 + ½ g’’() E[(X -)2] = g() + ½ g’’()2

Example: N[, ]. g(X)=exp(X). True mean = exp( +  2/2). Approximation: = exp() + ½ exp()  2 Example:  =0, s = 1, True mean = exp(.5) = 1.6487 Approximation = exp(0) + .5*exp(0)*1 = 1.5000

Delta method: Var[g(X)] • Use linear approximation • g(X) ~ g() + g’()(X - ) • Var[g(X)] ~ Var[approximation] = [g’()]22 • Example: Var[X2] ~ (2)22

Delta Method – x ~ N[, 2] • y = g(x) = exp(x) ~ lognormal • Exact • E[y] = exp( + ½ 2) • Var[y] = exp(2 + 2)[exp(2) – 1] • Approximate • E*[y] = exp() + ½ exp() 2 • V*[y] = [exp()]2 2 • N[0,1], exact mean and variance are exp(.5) =1.648 and exp(1)(exp(1)-1) = 4.671. Approximations are 1.5 and 1 (!)

Moment Generating Function • Let g(X) = exp(tX) • M(t) = E[exp(tX)] = the moment generating function for random variable X.

MGF Bernoulli • P(x) = (1-) for x=0 and  for x=1 • E[exp(tX)] = (1- )exp(0t) + exp(1t) = (1 - ) + exp(t).

MGF Poisson

MGF Gamma

MGF Normal • MX(t) for X ~ N[0,1] is exp(½ t2) • MY(t) for Y = X +  isexp(t)MX(t) = exp[t + ½ 2t2] • This is the moment generating function for N[,2]

Generating the Moments rth derivative of M(t) evaluated at t = 0 gives the rth raw moment, r’ M(r)(t) = drM(t)/dtr |t=0 = equals rth raw moment.

Poisson MGF • M(t) = exp((exp(t) – 1)); M(0)=1 • M’(t) = M(t) * exp(t); M’(0)=  •  = M’(0)=1  1 =  • 2’ = E[X2] = M’’(0) = M’(0) exp(0) + exp(0)M(0) = 2 +  • Variance = 2’ - 2 = 

Useful Properties • MGF of X = MX(t) and y = a+bX then • MY(t) for y is exp(at)MX(bt) • For independent X and Y, MX+Y (t) = is MX(t)MY(t) • The sequence of moments does not uniquely define the distribution

Side Results • MGF MX(t) = E[exp(tx)] does not always exist. • Characteristic function E[exp(itx)] always exists. Used to prove central limit theorems • Cumulant generating function logMX(t)is sometimes useful. Cumulants are functions of moments. First cumulant is the mean, second is the variance.

Part 2 – BCovariance and Correlation

Covariance • Random variables X,Y with joint discrete distribution p(X,Y) or continuous density f(x,y). • Covariance = E({X – E[X]}{Y-E[Y]}) = E[XY] – E[X] E[Y]. • (Note, Covariance of X,X = Var[X]. • Connection to joint distribution and covariation

Correlation and Covariance

Correlated Populations

Correlated Variables • X1 and X2 are independent with means 0 and standard deviations 1. • Y = aX1 + bX2. Choose a and b such that • X1 and Y have means 0, standard deviation 1 and correlation rho. • Var[Y] = a2 + b2 = 1 • Cov[X1,Y] = a = . b = sqr(1 – 2)

Conditional Distributions • f(y|x) = f(y,x) / f(x) • Conditional distribution of y given a realization of x • Conditional mean = mean of the conditional random variable = regression function • Conditional variance = variance of conditional random variable = scedastic function

Joint Normal Random Variables

Conditional Normal

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01