140 likes | 315 Vues
This comprehensive guide delves into multiple linear regression analysis as a vital tool for quantitative business decision-making. It covers foundational concepts, including the multiple regression model, estimation, and testing the significance of predictors. Key topics include multicollinearity effects, selection of predictors, and the use of diagnostic plots for model validation. Additionally, it highlights the importance of the coefficient of determination (R²) in evaluating model performance and introduces the use of indicator variables to represent qualitative factors.
E N D
Quantitative Business Analysis for Decision Making Multiple Linear Regression Analysis
Outlines • Multiple Regression Model • Estimation • Testing Significance of Predictors • Multicollinearity • Selection of Predictors • Diagnostic Plots 403.8
Multiple Regression Model Multiple linear regression model: are slope coefficients of X1, X2 ,… ,Xk. quantifies the amount of change in response Y for a unit change in Xi when all other predictors are held fixed. 403.8
Multiple Regression Model (con’t) In the model, is the mean of Y. • Contributes to the variation in Y values from their mean , and • is assumed normally distributed with mean 0 and standard deviation 403.8
Sampling A random sample of n units is taken. Then for each unit k+1 measurements are made: Y, X1 , X2 , …., Xk 403.8
Estimated Model Estimated multiple regression model is: Expressions for bi are cumbersome to write. is an estimate of 403.8
Standard Error Sample standard deviation around the mean (estimated regression model) is: It is an estimate of Standard error of (for specified values of predictors) is denoted by 403.8
Testing Significance of a Predictor For comparing with a reference ,test statistic is: and for estimating by a confidence interval, compute 403.8
Coefficient of Determination Coefficient of determination R2 quantifies the % of variation in the Y-distribution that is accounted by the predictors in the model. If • R2 = 80%, then 20% variation in the Y-distribution is due to factors other than those in the model. • R2 increases as predictors are added in the model but at the cost of complicating it. 403.8
Testing the Model for Significance Null hypothesis = predictors in the relationship have no predictive power to explain the variation in Y-distribution Test statistic: F = . It has F- distribution with k and (n-k-1) degrees of freedoms for the numerator and denominator. 403.8
Multicollinearity and Selection of Predictors • Multicollinearity - occurs when predictors are highly correlated among themselves. In its presence R2 may be high, but individual coefficients are less reliable. • Screening process (e.g. stepwise regression) can eliminate multicollinearity by selecting only those predictors that are not strongly correlated among themselves. 403.8
Diagnostic Plots • Residuals are used to diagnose the validity of the model assumptions. • A scatter plot of the residuals against the predicted values can serve as a diagnostic tool. • A diagnostic plot can identify outliers, unequal variability, and need for transformation to achieve homogeneity etc. 403.8
Indicator Variables • Indicator variables (also called dummy variables) are numerical codes that are used to represent qualitative variables. • For example, 0 for men and 1 for women. • For a qualitative variable with c categories, (c-1) indicator variables need to be defined. 403.8