340 likes | 344 Vues
Lecture 26. Statistical Inference in Simple Linear Regression Multiple Linear Regression. 1. Point Estimation 1. Suppose Armand’s managers want a point estimate
E N D
Lecture 26 Statistical Inference in Simple Linear Regression Multiple Linear Regression 1
Point Estimation 1 Suppose Armand’s managers want a point estimate of the mean quarterly sales for all restaurants located near college campuses with 10,000 students ( ), i.e., a point estimate for A point estimate is or 110,000 Yuan.
Point Estimation 2 Suppose Armand’s managers want a point estimate of the quarterly sales for an individual restaurant located near a college with 10,000 students ( ), i.e., a point estimate for . A point estimate is or 110,000 Yuan.
Interval Estimation 1 Suppose Armand’s managers want an interval estimate of the mean quarterly sales for all restaurants located near college campuses with 10,000 students ( ), i.e., an interval estimate for . The interval estimate is called a confidence interval.
Interval Estimation 2 Suppose Armand’s managers want an interval estimate of the quarterly sales for an individual restaurant located near a college with 10,000 students ( ), i.e., an interval estimate for . The interval estimate is called a prediction interval.
Question • Which interval is wider? Confidence interval or prediction interval?
Prediction Interval • Suppose that we want to predict a new Y value, which is independent of the observed data (x1,Y1),…,(xn,Yn), when we knew the corresponding value of . Point prediction:
Prediction Results from Minitab 新观测值的预测值 新观 拟合值 测值 拟合值 标准误 95% 置信区间 95% 预测区间 1 110.00 4.95 (98.58, 121.42) (76.13, 143.87) 新观测值的自变量值 新观 Student 测值 Population 1 10.0 Confidence Interval Prediction Interval
妇女工资和一些社会经济变量的关系 Source: T.A. Mroz (1987), The Sensitivity of an Empirical Model of Married Womens Hours of Work to Economic and Statistical Assumptions, Econometrica 55, 765-799. • lwage: log of wage • exper: actual labor mkt exper • expersq: exper2 • educ: years of schooling • age: woman's age in yrs • kidslt6: number of kids < 6 years • kidsge6: number of kids 6-18
Multiple Linear Regression • E.g. • E.g. • E.g. • The term linear refers to the fact that the expectation of is a linear function of the unknown parameters
Classical Assumptions • Linearity: • Normality: each variable Yi has a normal distribution. • Independence: the variables Y1,…,Yn are independent. • Homoscedasticity: the variables Y1,…,Yn have the same variance .
Maximum Likelihood Estimators • The likelihood function of and : • The values of that maximize the likelihood function will be the values that minimize So the M.L.E. of are the least square estimators. • Define The M.L.E. for is
Explicit Form of the Estimators • The design matrix • Also define
The set of k+1 normal equations (by setting j=0,…,k) can be written as: So the least square estimators (the M.L.E.s) are: • We can see that the estimators will be linear combinations of Y1,…,Yn, so they follow multivariate normal distribution.
Theorem. Suppose that Y is an n-dimensional random vector, for which the mean vector E(Y) and the covariance matrix Cov(Y) exist. Suppose also that A is a p*n matrix whose elements are constants, and that W is a p-dimensional random vector defined by W=AY. Then E(W)=AE(Y) and Cov(W)=ACov(Y)A’.
The vector has a multivariate normal distribution with mean vector and covariance matrix . • For j=0,…,k, , marginally, has a normal distribution with mean and variance . • For , .
Testing Hypotheses where is an unbiased estimator of . The level a0 test rejects H0 if
R analysis # put data in the current workplace , and read data mroz <- read.csv ('mroz.csv ') # an overview head ( mroz ) # some exploration lm1 <- lm( lwage ~educ , data = mroz ) summary (lm1) lm2 <- lm( lwage ~exper , data = mroz ) summary (lm2) # multiple variable regression lm3 <- lm( lwage ~ exper + expersq + educ + age + kidslt6 + kidsge6 , data = mroz ) summary (lm3) # prediction newdata1 <- mroz [1:4,] predict (obj = lm3 , newdata = newdata1 , type = " response ")
Question • How well does the estimated regression equation fit the data?
Total Sum of Squares • The sum of squared deviations obtained by using the sample mean to estimate the value of quarterly sales for each restaurant in the sample. • Total sum of squares:
Sum of Squares Due to Error • Let denote the estimated value of the dependent variable using the linear regression model. is called the i th residual (残差). • Sum of squares due to error:
Sum of Squares Due to Regression • Measures how much the values on the estimated regression line deviate from . Note that the mean of is also . • Sum of squares due to regression:
Y X
R Square • Define coefficient of determination (R Square) It is the proportion of variation in Y explained by the regression.
F Test • F test can be used to test for an overall significant relationship between the response variable and all of the explanatory variables. H0: b1 = … = bk = 0 H1: At least one bj(j=1,…,k)is not equal to zero. at least one of the independent variables z1,…,zk is linearly related to Y