1 / 25

Advanced Statistics for Linguistics Students

Advanced Statistics for Linguistics Students. Syllabus. Data Screening Data cleaning Data transformation Boxplots Other charts and graphs Analysis of Variance Analysis of covariance Two-way ANOVA MANOVA. Syllabus. Regression Simple linear regression Multiple linear regression

Télécharger la présentation

Advanced Statistics for Linguistics Students

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Advanced Statistics for Linguistics Students

  2. Syllabus • Data Screening • Data cleaning • Data transformation • Boxplots • Other charts and graphs • Analysis of Variance • Analysis of covariance • Two-way ANOVA • MANOVA

  3. Syllabus • Regression • Simple linear regression • Multiple linear regression • Logistic regression • Data reduction • Exploratory Factor analysis • Structural Equation Modeling • Confirmatory factor analysis • Structural Equation Modeling

  4. Syllabus • Reliability and validity • Reliability • Validity • Qualitative analysis • Item response theory • Classical test theory • Item response theory • Rasch analysis

  5. Syllabus • Assignments and Grading: • Readings • Homework • References: • Relevant papers • Website: http://clal.gdufs.edu.cn/personal/statistics/phd/

  6. Prerequisites • Variables • t-test • One-way ANOVA • Correlation • Excel, SPSS, Statistica

  7. Session 1 Data Screening

  8. Some statistical considerations and precautions • Do the data accurately reflect the responses made by the participants of my study? • Are all the data in place and accounted for, or are some of the data absent or missing? • Is there a pattern to the missing data? • Are there any unusual or extreme responses present in the data set that may distort my understanding of the phenomena under study? • Do these data meet the statistical assumptions that underlie the statistical technique I will be using? • What can I do if some of the statistical assumptions turn out to be violated?

  9. Code and value cleaning • The data cleaning process ensures that once a given data set is in hand, a verification procedure is followed that checks for the appropriateness of numerical codes for the values of each variable under study. • Whether each variable contains only legitimate numerical codes or values • Whether these legitimate codes seem reasonable

  10. Distribution diagnosis • Data screening • Frequency tables • Histograms and bar graphs • Skewness and kurtosis • Close to 0 • Conservative: ±0.5 • Liberal: ±1 • Heuristic (SPSS): 2 x standard error

  11. Distribution diagnosis • Data screening • Stem-and-leaf plots • Weight cases • AnalyzeExplore • Box plots

  12. Distribution diagnosis • Data screening • Scatterplot matrices

  13. Dealing with missing values • Missing value patterns • Random patterns of missing data • Looking for patterns Variables containing missing data on 5% or fewer of the cases can be ignored.

  14. Dealing with missing values • Missing value patterns • Methods of handling missing data • Listwise deletion • Pairwise deletion • Imputation procedures • Mean distribution • Multiple regression imputation • Expectation maximization imputation (Missing value analysis in SPSS) • E step: calculates expected values of parameters • M step: calculates maximum likelihood estimates • Example

  15. Dealing with missing values • Missing value patterns • Methods of handling missing data • Recommendations: • Compare cases with and without missing values on variables of interest using independent sample t test. • Compare your statistical analysis with cases using only complete data. If no difference emerge between ‘complete’ versus ‘imputed’ data sets, then you can have confidence that your missing value interventions reflect statistical reality. • Use listwise case deletion • Use regression imputation procedures • Use SPSS ‘Missing Values Analysis”

  16. Regression Hypothetical data showing the relationship between SAT scores and GPA with a regression line drawn through the data points. The regression line defines a precise, one-to-one relationship between each X value (SAT score) and its corresponding Y value (grade-point average, GPA).

  17. Outliers • Causes of outliers • Data entry errors or improper attribute coding • A function of extraordinary events or unusual circumstances. Use this question to judge: “Does this outlier represent my sample?” • No explanation. Good candidate for deletion. • Pattern of combination of values on several variables, e.g., unusual combined patterns of age, gender, and number of arrests.

  18. Outliers • Detection of univariate outliers • Explore  descriptives • If outliers are few (less than 1% or 2% of n) and not very extreme, they are probably best left alone. • Detection of multivariate outliers • Scatterplots • Mahalanobis distance

  19. Multivariate Statistical Assumptions • Normality • Statistical approach • Explore • Graphical approach • Linearity • Variables in the analysis are related to each other in a linear manner (MANOVA, factor analysis) • Scatterplots • Regression analysis (residuals) • Homoscedasticity • Equal variance – equal levels of variability • ANOVA – homogeneity of variance – Levene’s test • MONOVA – Box’s M

  20. Data transformation • Use Excel and SPSS • Square root • Logarithm • Inverse • Square of X • ‘double-edged sword’ • Can significantly improve the precision of a multivariate analysis • Can pose a formidable data interpretation problem

  21. Data transformation

  22. Homework • Scores, Student Satisfaction, and Type of School • This study was conducted to assess if there are differences between scores and student satisfaction between public or private schools. • Use the SPSS data file to answer the following questions: • Identify the independent variable. Identify the dependent variable(s). • Are there any missing values for any of the variables? If there are, what do you recommend doing to address this issue? • Were there any outliers inthis data set? If outliers are present, what is your recommendation? • Check the independent and dependent variables for statistical assumptions violations. If there are violations, what do you recommend? • Write a sample result section, discussing your data screening activity.

More Related