Correlation and Regression

Correlation and Regression It’s the Last Lecture Hooray!

Correlation • Analyze  Correlate  Bivariate… • Click over variables you wish to correlate • Options  Can select descriptives and pairwise vs. listwise deletion • Pairwise deletion – only cases with data for all variables are included (default) • Listwise deletion - only cases with data for both variables are included

Correlation • Assumptions: • Linear relationship between variables • Inspect scatterplot • Normality • Shapiro-Wilk’s W • Other issues: • Range restriction & Heterogenous subgroups • Identified methodologically • Outliers • Inspect scatterplot

Correlation • Partial Correlation – removes variance from a 3rd variable, like ANCOVA • Analyze  Correlate  Partial…

Regression • Analyze  Regression  Linear… • Use if both predictor(s) and criterion variables are continuous • Dependent = Criterion • Independent = Predictor(s) • Statistics… • Regression Coefficients (b & β) • Estimates • Confidence intervals • Covariance matrix

Regression • Statistics… • Model fit • R square change • Descriptives • Part and partial correlations • Collinearity diagnostics • Recall that you don’t want your predictors to be too highly related to one another • Collinearity/Mulitcollinearity – when predictors are too highly correlated with one another • Eigenvalues of the scaled and uncentered cross-products matrix, condition indices, and variance-decomposition proportions are displayed along with variance inflation factors (VIF) and tolerances for individual predictors • Tolerances should be > .2; VIF should be < 4

Regression • Statistics… • Residuals • Durbin-Watson • Tests correlation among residuals (i.e. autocorrelation) - significant correlation implies nonindependent data • Clicking on this will also display a histogram of residuals, a normal probability plot of residuals, and the case numbers and standardized residuals for the 10 cases with the largest standardized residuals • Casewise diagnostics • Identifies outliers according to pre-specified criteria

Regression • Plots… • Plot standardized residuals (*ZRESID) on y-axis and standardized predicted values (*ZPRED) on x-axis • Check “Normal probability plot” under “Standardized Residual Plots”

Regression • Assumptions: • Observations are independent • Linearity of Regression • Look for residuals that get larger at extreme values, i.e. if residual are normally distributed • Save unstandardized residuals • Click Save…  Under “Residuals” click “Unstandardized” when you run your regression, • Run a Shapiro-Wilk’s W test on this variable (RES_1)

Regression • Normality in Arrays • Examine normal probability plot of the residuals, residuals should resemble normal distribution curve BADGOOD

Regression • Homogeneity of Variance in Arrays • Look for residuals getting more spread out as a function of predicted value – i.e. cone shaped patter in plot of standardized residuals vs. standardized predicted values BADGOOD

Regression Output

Logistic Regression • Analyze  Regression  Binary Logistic… • Use if criterion is dichotomous [no assumptions about predictor(s)] • Use “Multinomial Logistic…” if criterion polychotomous (3+ groups) • Don’t worry about that though

Logistic Regression • Assumptions: • Observations are independent • Criterion is dichotomous • No stats needed to show either one of these • Important issues: • Outliers • Save  Influence  Check “Cook’s” and “Leverage values” • Cook’s statistic – outlier = any variable > 4/(n-k-1), where n = # of cases & k = # of predictors • Leverage values – outlier = anything > .5

Logistic Regression • Multicollinearity • Tolerance and/or VIF statistics aren’t easily obtained with SPSS, so you’ll just have to let this one go  • Options… • Classification plots • Table of actual # of S’s in each criterion group vs. predicted group membership – Shows, in detail, how well regression predicted data

Logistic Regression • Options… • Hosmer-Lemeshow goodness-of-fit • More robust than traditional χ2 goodness-of-fit statistic, particularly for models with continuous covariates and small sample sizes • Casewise listing of residuals • Helps ID cases with large residuals (outliers)

Logistic Regression • Options… • Correlations of estimates • Just what it sounds like, correlations among predictors • Iteration history • CI for exp(B) • Provides confidence intervals for standardized logistic regression coefficient • Categorical… • If any predictors are discrete, they must be identified here, as well as which group is the reference group (identified as 0 vs. 1)

Logistic Regression Output

Logistic Regression Output • Step number: 1 • Observed Groups and Predicted Probabilities • 32 ô ô • ó ó • ó ó • F ó ó • R 24 ô ô • E ó N ó • Q ó N ó • U ó NN ó • E 16 ô NNNN ô • N ó NNNN ó • C ó NNNN ó • Y ó N NNNN ó • 8 ô NNNNNN ô • ó N NNNNNN ó • ó N NN NNANNNN ó • ó N N AA ANNAAAAAANN ó • Predicted òòòòòòòòòòòòòòôòòòòòòòòòòòòòòôòòòòòòòòòòòòòòôòòòòòòòòòòòòòòò • Prob: 0 .25 .5 .75 1 • Group: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNN • Predicted Probability is of Membership for Non-Attritor • The Cut Value is .50 • Symbols: A - Attritor • N - Non-Attritor • Each Symbol Represents 2 Cases.

Correlation and Regression