180 likes | 319 Vues
Qunatitative Methods in Social Sciences (E774). Sudip Ranjan BASU , Ph.D 20 November 2009. Regressions: Causal relationships. Source: S.R.Basu (2008): A new way to link development to institutions, policies and geography , United Nations, New York and Geneva. Lecture 10-Sudip R. Basu. 2.
E N D
Qunatitative Methods in Social Sciences (E774) Sudip Ranjan BASU,Ph.D 20 November 2009
Regressions: Causal relationships Source: S.R.Basu (2008): A new way to link development to institutions, policies and geography, United Nations, New York and Geneva. Lecture 10-Sudip R. Basu 2
Linear relationship To analyse how values of Y tend to change according to X values Y: response variable X: explanatory variable Y is a linear function of observations on x α –y intercept β-slope Equation is Models are simple approximation for reality Lecture 10-Sudip R. Basu 3
Assumptions of statistical Inference The random sample is selected The mean of Y is related to X by the linear equation E(Y)=α+βX The conditional standard deviation σ is identical at each X-value, homoscedasticity The conditional distribution of Y at each value of X is normal Lecture 10-Sudip R. Basu 4
Lest Squares Prediction Prediction equation: To estimate linear relationship of sample equation Estimate of coefficients of prediction equation Residuals as prediction error Residual: , for an observation (difference between an observed value and the predicted value of y Least square estimates a and b are the values that provide the prediction equation …. For which the residual sum of squares, , is a minimum Lecture 10-Sudip R. Basu 5
Linear Regression Model Regression function is a mathematical function that describes how mean of Y changes according to the value of X β is regression coefficient σ Conditional standard deviation Estimate r2 of predicted equation Lecture 10-Sudip R. Basu 6
Inferences for « Slope » Test of independence: H0: β =0 (variables are statistically independent) Test statistic: Standard error of b: P-value for Ha: β ≠0 , two-tail probability distribution from t-distribution Confidence interval for the slope: With degrees of freedom df=n-2 A small P-value for H0: β =0regression line has nonzero slope Lecture 10-Sudip R. Basu 7
Inferences for « Correlation » Let r=0, like b=0 for sample Let ρ=0, like β =0 for population H0: ρ =0 –correlation are statistically independent Test statistic: Lecture 10-Sudip R. Basu 8
Model Assumption and Violations Linear regression equation Extrapolation is dangerous Influential observations Factors influencing correlation Regression model with error terms Models and reality Lecture 10-Sudip R. Basu 9
Some concepts Control variable-understanding influences of related variables Lurking variable-variable not measured in a model but that influences the association of interest Statistical interaction: It exists between X1 and X2 in their effects on Y when the true effect of one predictor on Y changes as the value of the other predictor changes Lecture 10-Sudip R. Basu 10
Multiple Regressions Source: S.R.Basu (2008): A new way to link development to institutions, policies and geography, United Nations, New York and Geneva. Lecture 10-Sudip R. Basu 11
Theory of Multiple Regression Model Multiple regression function (mrf) Slope in mrf describes the effect of an explanatory variable while controlling effects of other explanatory variables in the model β1 and β2are partial regression coefficients R-squared (0,1) Coefficient of multiple determinations If R2=1 If R2=0 Lecture 10-Sudip R. Basu 12
Inference for multiple regression coefficients Testing collective influence of Xi Alternative hypothesis Test statistic-F distribution: Lecture 10-Sudip R. Basu 13
Model Selection Procedures Selecting explanatory variables for a model Maximum R2 Backward elimination all significant coefficients Forward selection adding variables Stepwise regression drop variables if they loose their significance as other variables added Exploratory vs. Explanatory Research Lecture 10-Sudip R. Basu 14
Regression Diagnostics Examine the residuals Plotting Residuals against Explanatory variables Lecture 10-Sudip R. Basu 15
Detecting Influential Observations Remove Outliers Leverage is a nonnegative statistic such that larger its value, the greater weight that observation receives in determining DFFIT: effect on the fit of deleting observation The larger its absolute value, greater the influence that observation has on fitted values DFBETA: effect on the model parameter estimates of removing observation from dataset The larger the absolute value, the greater the influence of the observations on the parameter estimates Cook’s distance: effect that observation i has on all the predicted values Lecture 10-Sudip R. Basu 16
Effects of multicollinearity Multicollinearity: Explanatory variables ‘overlap’ considerably and higher R2 values Multicollinearity inflates standard errors Variance inflation factor: multiplicative increase in the variance (squared se) of the estimator due to xj being correlated with other predictors Lecture 10-Sudip R. Basu 17