1 / 38

Laboratory & professional skills for Bioscientists term 3 : Data Analysis in R

Laboratory & professional skills for Bioscientists term 3 : Data Analysis in R. Revision: Overview and Developing understanding of term 2 statistics. Module Learning Outcomes. The successful student will be able to: Explain the purpose of data analysis

perry
Télécharger la présentation

Laboratory & professional skills for Bioscientists term 3 : Data Analysis in R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Laboratory & professional skills for Bioscientists term 3:Data Analysis in R Revision: Overview and Developing understanding of term 2 statistics

  2. Module Learning Outcomes The successful student will be able to: • Explain the purpose of data analysis • Name, identify and choose classical univariate statistical tests (and some non-parametric equivalents) appropriate to a given scenario and recognise when these are not suitable • Use R to perform these analyses on data in a variety of formats • Interpret, report and graphically present the results of covered tests Meeting the learning outcomes will enable you to: • Write-up your laboratory report • Design and analyse experiments including those for projects in stages 2, 3, and 4 and year-away • Evaluate and interpret the data analysis in papers • Perform well in assessments • Improve your employability!

  3. Plan • What is the biological question • Organising your data • Overview for Choosing tests • Figures – use ggplot • Workshop

  4. Overview: Experiments Something we measure Some things we control, choose or set Dependent variables Response variables The ‘y’ s Independent variables Predictor variables treatments The ‘x’ s inferences made

  5. What is the biological question? • Very many ‘statistics’ problems are biology problems • Think about your questions before you design experiments • What are your null hypotheses? • What is your response variable? • What are your explanatory variables? • Generate dummy data

  6. What is the biological question? Response: blood pressure Explanatory variables you want to understand: drug Explanatory variable for control: sex, age, disease??

  7. Overview: data analysis Some things we control, choose or set Something we measure can be explained by Independent variables Dependent variable can be explained by Response variable can be explained by Predictor variables y can be explained by One or more x y ~ x y ~ x1 + x2 y ~ x1 * x2

  8. What is the biological question? • Tests apply to a particular question • Statistics are just evidence: as references are to your introduction, statistics are to your results. • At this level you should be able to say what your results are before analysis

  9. Collecting data • Design and execute your experiment • Experimental design and data analysis go hand-in-hand • Type of experiment determines statistics • But understanding of statistics informs design • Organise your data in ’tidy’ format • Workshop

  10. Collecting data • Organise your data in ’tidy’ format • Fine to use a spread sheet but save in a plain text format (txt, csv, dat, etc) • Write data straight into a blank tidy format

  11. Response all in one column Additional column for each explanatory i.e., case (individual) per row

  12. Explore your data • Frequency histograms • Scatter plots • Bar charts • Descriptive statistics – often absent from reports. • Sample sizes/number of cases • Variables • Means/medians, s.e • You don’t put all your exploring in your report

  13. Choose a test • Always better to do BEFORE you design experiments and collect data • Consider assumptions • Take appropriate action – transform or chose another test

  14. Execute the test • Do results make sense based on your data exploration? • Chose APPROPRIATE figures and/or tables • Report results scientifically • Appropriate descriptive statistics • Results of test – significance, magnitude, direction • Explaining results – leave for the Discussion

  15. Figures • Should match the test • t-tests, ANOVA tests on means thus figures show means • Wilcoxon, Mann-Whitney tests on ranks thus figures use medians/mean ranks • Correlation should NOT have line of best fit • Regression should have line of best fit • Likewise descriptive statistics

  16. Why chi-squared? • We often count the number of individuals or events in categories and compare the numbers we observe with numbers we expect under a null hypothesis. • H0 might expect numbers to • be the same, or • follow a particular pattern, or • match the pattern in another group • Chi-squared allows us to make the comparison statistically

  17. Two types of scenario thus two types of 2 test • Those with a priori expectations – we know what the proportions should be Goodness of fit • Those without a priori expectations – we know proportions should be same between groups Contingency

  18. 2 Goodness of fit test • The expected values (null hypothesis) are derived from some theory • We test the fit of our data to the theory

  19. 2 Contingency test • We use these when our expected values are derived from the data • Null hypothesis: proportion of foods taken by each breed is the same, i.e., no association between breed and food type • Expecteds are Row total x column total / grand total • d.f. are (r - 1)(c - 1)

  20. Why t-tests? • Continuous response, category explanatory. • Is there a difference between categories in response • Standard formula for all t-tests Size of difference and variability matter

  21. Assumptions of all t-tests • All t-tests assume the ‘data’ are normally distributed • 0ne-sample – the data themselves • Paired sample – the differences between the pairs • Two-sample – the data themselves • (Actually – the residuals in all cases) • Additionally the two-sample t-test assumes equal variances

  22. Figures t-tests • Based on the mean and s.e. thus figures should have Two sample One sample - probably not needed but possibly...

  23. Non-parametric alternatives • Non-parametric tests make fewer assumptions • Based on the ranks rather than the actual data • Null hypotheses are about the mean rank(not the mean) • More conservative (less likely to be significant) • P values maybe estimates (may generate a warning). You can’t usually ‘fix’ that

  24. Non-parametric alternatives • One sample or paired sample : Wilcoxon signed ranked test • Two sample: Wilcoxon rank sum test aka Mann-Whitney

  25. Why ANOVA? • Like a t-test • Continuous response, category explanatory • Is there a difference between categories in response • But more groups in the categorical variable (one way) • But an additional categorical variable (two-way) • Can test for interaction between the two explanatory variables

  26. Assumptions of one-way ANOVA • Like a t-test • Residuals must be normally distributed • Variance in the treatment groups is assumed to be equal • Both of these can be tested in R

  27. Figures ANOVA • Like a t-test. Based on the mean and s.e. thus figures should have

  28. Non-parametric equivalent: Kruskal-Wallis • Residuals not normal • Repeated values • Unequal variance • Small sample size • Unequal sample size, esp when variances look unequal

  29. Two-way ANOVA: Can be with or without replication Without reps With reps You have usually Cannot test for interaction

  30. Assumptions of 2-way ANOVA • Same as for t-tests and one-way ANOVA • Normality and ‘homoscedascity’ of residuals Normality examine residuals after ANOVA Equalvariance examine residuals after ANOVA Use common sense

  31. A figure Understand the interaction Sex – Sig Spp – Sig Int – Sig Effect of sex is greater in F.concocti ‘effect of one factor depends on the level of another’

  32. Possible results Sex – NS Spp – NS Int – Sig But sex does have an effect! It is just reversed If you have a significant interaction, interpret main effects with care.

  33. Type of question Is there an association/relationship between …..? • Correlation and regression • One continuous explanatory variable

  34. Choosing tests Explanatory variables to explain the response variable. The type of test depends on the type of question and the type of data. t-tests ANOVA Correlation Regression Continuous Categorical

  35. Always do a scatterplot first Correlation • Linear association • No cause and effect • Axes could be swapped

  36. Always do a scatterplot first Regression • Linear relationship • Cause and effect • Axes cannot be swapped Dependent variable / effect/ response Independent variable / cause / explanatory / predictor

  37. Assumptions • Correlation • Bivariate normal • Both variables are randomly sampled • Any association is linear • Regression • Normality and homoscedascity of residuals • y values are independent • x is measured without error • Linear regression assumes linear relationship

  38. Introduction to the practical Scenarios for biological phenomena identify an appropriate design and statistical test for the general research question generate the data using the random number functions that would give the effects specified. create figures to accompany the results

More Related