rstats statistics and research camp 2014 n.
Skip this Video
Loading SlideShow in 5 Seconds..
RStats Statistics and Research Camp 2014 PowerPoint Presentation
Download Presentation
RStats Statistics and Research Camp 2014

RStats Statistics and Research Camp 2014

83 Views Download Presentation
Download Presentation

RStats Statistics and Research Camp 2014

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. RStats Statistics and Research Camp 2014 Welcome!

  2. Helen Reid, PhD Dean of the College of Health and Human Services Missouri State University

  3. Welcome • Goal: keeping up with advances in quantitative methods • Best practices: use familiar tools in new situations • Avoid common mistakes Todd Daniel, PhD RStats Institute

  4. Coffee Advice Ronald A. Fisher

  5. Cambridge, England 1920 Box, 1976 Dr. Muriel Bristol

  6. Familiar Tools • Null Hypothesis Significance Testing (NHST) • a.k.a. Null Hypothesis Decision Making • a.k.a. Statistical Hypothesis Inference Testing p < .05 The probability of finding these results given that the null hypothesis is true

  7. Benefits of NHST • All students trained in NHST • Easier to engage researchers • Results are not complex Everybody is doing it

  8. Statistically Significant Difference What does p < .05 really mean? • There is a 95% chance that the alternative hypothesis is true • This finding will replicate 95% of the time • If the study was repeated, the null would be rejected 95% of the time

  9. The Earth is Round, p < .05 What you want… • The probability that an hypothesis is true given the evidence What you get… • The probability of the evidence assuming that the (null) hypothesis is true. Cohen, 1994

  10. Pigs Might Fly

  11. Null: There are no flying pigs H0: P = 0 • Random sample of 30 pigs • One can fly 1/30 = .033 • What kind of test? • Chi-Square? • Fisher exact test? • Binomial? Why do you even need a test?

  12. Daryl Bem and ESP • Assumed random guessing p = .50 • Found subject success of 53%, p < .05 • Too much power? • Everything counts in large amounts • What if Bem set p = 0 ? • One clairvoyant v. group that guesses 53% In the real world, the null is always false.

  13. Problems with NHST • Cohen (1994) reservations • Non-replicable findings • Poor basis for policy decisions • False sense of confidence NHST is “a potent but sterile intellectual rake who leaves in his merry path a long train of ravished maidens but no viable scientific offspring” - Paul Meehl Cohen, 1994

  14. What to do then? • Learn basic methods that improve your research (learn NHST) • Learn advanced techniques and apply them to your research (RStats Camp) • Make professional connections and access professional resources

  15. Agenda 9:30 Best Practices (and Poor Practices) In Data Analysis 11:00 Moderated Regression 12:00 – 12:45 Lunch (with Faculty Writing Retreat) 1:00 Effect Size and Power Analysis 2:00 Meta-Analysis 3:00 Structural Equation Modeling

  16. RStats Statistics and Research Camp 2014 Best Practices and Poor Practices Session 1 R. Paul Thomlinson PhD Burrell

  17. Poor Practices Common Mistakes

  18. Mistake #1 No Blueprint

  19. Mistake #2 Ignoring Assumptions Pre-Checking Data Before Analysis

  20. Assumptions Matter • Data: I call you and you don’t answer. • Conclusion: you are mad at me. • Assumption: you had your phone with you. If my assumptions are wrong, it prevents me from looking at the world accurately

  21. Assumptions for Parametric Tests • "Assumptions behind models are rarely articulated, let alone defended. The problem is exacerbated because journals tend to favor a mild degree of novelty in statistical procedures. Modeling, the search for significance, the preference for novelty, and the lack of interest in assumptions -- these norms are likely to generate a flood of nonreproducible results." • David Freedman, Chance 2008, v. 21 No 1, p. 60

  22. Assumptions for Parametric Tests • "... all models are limited by the validity of the assumptions on which they ride."Collier, Sekhon, and Stark, Preface (p. xi) to Freedman David A., Statistical Models and Causal Inference: A Dialogue with the Social Sciences. • Parametric tests based on the normal distribution assume: • Interval or Ratio Level Data • Independent Scores • NormalDistribution of the Population • Homogeneityof Variance

  23. Assessing the Assumptions • Assumption of Interval or Ratio Data • Look at your data to make sure you are measuring using scale-level data • This is common and easily verified

  24. Independence • Techniques are least likely to be robust to departures from assumptions of independence. • Sometimes a rough idea of whether or not model assumptions might fit can be obtained by either plotting the data or plotting residuals obtained from a tentative use of the model. Unfortunately, these methods are typically better at telling you when the model assumption does not fit than when it does.

  25. Independence • Assumption of Independent Scores • Done during research construction • Each individual in the sample should be independent of the others • The errors in your model should not be related to each other. • If this assumption is violated: • Confidence intervals and significance tests will be invalid.

  26. Assumption of Normality • You want your distribution to not be skewed • You want your distribution to not have kurtosis • At least, not too much of either

  27. Normally Distributed Something or Other • The normal distribution is relevant to: • Parameters • Confidence intervals around a parameter • Null hypothesis significance testing • This assumption tends to get incorrectly translated as ‘your data need to be normally distributed’.

  28. Assumption of Normality • Both skew and kurtosis can be measured with a simple test run for you in SPSS • Values exceeding +3 or -3 indicate very skewed

  29. Assessing Normality with Numbers

  30. Kolmogorov-Smirnov Test Tests if data differ from a normal distribution Significant = non-Normal data Non-Significant = Normal data Non-Significant is the ideal Tests of Normality SPSSExam.sav

  31. The P-P Plot Normal Not Normal

  32. Histograms & Stem-and Leaf Plots Normal-ish Bi-Modal SPSSExam.sav Double-click on Histogram in Output window to add the normal curve

  33. When does the Assumption of Normality Matter? • Normality matters most in small samples • The central limit theorem allows us to forget about this assumption in larger samples. • In practical terms, as long as your sample is fairly large, outliers are a much more pressing concern than normality

  34. Assessing the Assumptions • Assumption of Homogeneity of Variance • Only necessary when comparing groups • Levene’s Test

  35. Assessing Homogeneity of VarianceGraphs Number of hours of ringing in ears after a concert Homogeneous Heterogeneous

  36. Assessing Homogeneity of VarianceNumbers • Levene’sTests • Tests if variances in different groups are the same. • Significant = Variances not equal • Non-Significant = Variances areequal • Non-Significant is ideal • Variance Ratio (VR) • With 2 or more groups • VR = Largest variance/Smallest variance • If VR < 2, homogeneity can be assumed.

  37. Spotting problems with Linearity or Homoscedasticity

  38. Mistake #3 Ignoring Missing Data

  39. Missing Data Missing Data It is the lion you don’t see that eats you

  40. Amount of Missing Data • APA Task Force on Statistical Inference (1999) recommended that researchers report patterns of missing data and the statistical techniques used to address the problems such data create • Report as a percentage of complete data • “Missing data ranged from a low of 4% for attachment anxiety to a high of 12% for depression.” • If calculating total or scale scores, impute the values for the items first, then calculate scale

  41. Pattern of Missing Data • Missing Completely At Random (MCAR) • No pattern; not related to variables • Accidentally skipped one; got distracted • Missing At Random (MAR) • Pattern does not differ between groups • Not Missing At Random (NMAR) • Parents who feel competent are more likely to skip the question about interest in parenting classes

  42. Pattern of Missing Data Distinguish between MCAR and MAR • Create a dummy variable with two values: missing and non-missing • SPSS: recode new variable • Test the relation between dummy variable and the variables of interest • If not related: data are either MCAR or NMAR • If related: data are MAR or NMAR • Little’s (1988) MCAR Test • Missing Values Analysis add-on module in SPSS 20 • If the p value for this test is not significant, indicates data are MCAR

  43. What if my Data are NMAR? • You’re not screwed • Report the pattern and amount of missing data

  44. Deletion Listwise Deletion • Cases with any missing values are deleted from analysis • Default procedure for SPSS • Problems • If the cases are not MCAR remaining cases are a biased subsample of the total sample • Analysis will be biased • Loss of statistical power • Dataset of 302 respondents dropped to 154 cases

  45. Deletion Pairwise Deletion • Cases are excluded only if data are missing on a required variable • Correlating five variables: case that was missing data on one variable would still be used on the other four • Problems • Uses different cases for each correlation (n fluctuates) • Difficult to compare correlations • May mess with multivariate analyses

  46. Imputation Mean Substitution • Missing values are imputed with the mean value of that variable • Problems • Produces biased means with data that are MAR or NMAR • Underestimates variance and correlations • Experts strongly advise against this method

  47. Imputation Regression Substitution • Existing scores are used to predict missing values • Problems • Produces unbiased means under MCAR or MAR • Produces biases in the variances • Experts advise against this method

  48. Imputation Pattern-Matching Imputation • Hot-Deck Imputation • Values are imputed by finding participants who match the case with missing data on other variables • Cold-Deck Imputation • Information from external sources is used to determine the matching variables • Does not require specialized programs • Has been used with survey data • Reduces the amount of variation in the data

  49. Stochastic Imputation Stochastic Imputation Methods • Stochastic = random • Does not systematically change the mean; gives unbiased variance estimates • Maximum Likelihood (ML) Strategies • Observed data are used to estimate parameters, which are then used to estimate the missing scores • Provides “unbiased and efficient” parameters • Useful for exploratory factor analysis and internal consistency calculations

  50. Stochastic Imputation Multiple Imputation (MI) • Create several imputed data sets (3 – 5) • Analyze each data set and save the parameter estimates • Average the parameter estimates to get an unbiased parameter estimate • Most complex procedure • Computer-intensive