1 / 30

Statistical analysis methods

Hugh Morgan. Statistical analysis methods. Introduction. Role of statistics Current Methods EuroPhenome Numerical Parameters Categorical Parameters MGP Problems with these methods and alternatives Worked Example. Tasks. Role of statistics.

lael
Télécharger la présentation

Statistical analysis methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hugh Morgan Statistical analysis methods

  2. Introduction • Role of statistics • Current Methods • EuroPhenome • Numerical Parameters • Categorical Parameters • MGP • Problems with these methods and alternatives • Worked Example. • Tasks.

  3. Role of statistics • To determine the effect of the genomic alteration on the phenotype of the animal • Distinguish effect from substantial multi-factorial noise • Provide an estimate of the confidence in the veracity of the effect

  4. Current Methods • EuroPhenome • Numerical Parameters - Wilcoxon rank-sum test • Categorical Parameters – Fishers Exact or Chi-Squared • p-value threashold: 0.0001 (equivalent to 4% change of a false positive in 400 measured parameters) • Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Categorical Parameters – Fishers Exact with absolute change threshold

  5. Do them yourself • All commands are at: • http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-training • Get data: • Akt2, Fat mass, View Data, Get as CSV, Save Page • Install R (if required, google R) • akt2Fat=read.csv("akt2Fat.csv") • summary(akt2Fat) • Wilcoxon rank-sum test • wilcox.test(Value~Genotype, data = akt2Fat) • W = 1, p-value = 6.252e-06 • T Test • t.test(Value~Genotype, data = akt2Fat) • t = -9.5627, df = 23.909, p-value = 1.212e-09

  6. Do them yourself • All commands are at: • http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-training • Get data: • Akt2, Fat mass, View Data, Get as CSV, Save Page • Install R (if required, google R) • akt2Fat=read.csv("akt2Fat.csv") • summary(akt2Fat) • Wilcoxon rank-sum test • wilcox.test(Value~Genotype, data = akt2Fat) • W = 1, p-value = 6.252e-06 • T Test • t.test(Value~Genotype, data = akt2Fat) • t = -9.5627, df = 23.909, p-value = 1.212e-09

  7. Do them yourself • Get data: • Abcd4, Touch escape • R • abcd4Touch=matrix(c(122,9,2,8),2) • Fishers Exact Test • fisher.test(abc4Touch)

  8. abcd4Touch=matrix(c(122,9,2,8),2)

  9. Do them yourself • Get data: • Abcd4, Touch escape • R • abcd4Touch=matrix(c(122,9,2,8),2) • Fishers Exact Test • fisher.test(abcd4Touch) Fisher's Exact Test for Count Data data: abcd4Touch p-value = 3.052e-07 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 8.491575 550.552750 sample estimates: odds ratio 50.40908

  10. Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Calculate the range of values that encompases 95% of the baseline dataset • Call a line phenodeviant in a parameter if 60% or more of the animals fall outside of that range • Categorical Parameters – Fishers Exact with absolute change threshold • Fishers Exact test gives p-value < 5% AND • Absolute change of proportion > 60%

  11. Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Calculate the range of values that encompases 95% of the baseline dataset • Call a line phenodeviant in a parameter if 60% or more of the animals fall outside of that range

  12. Sanger Mouse Portal / MGP • Numerical Parameters – Reference Range • Calculate the range of values that encompases 95% of the baseline dataset • Call a line phenodeviant in a parameter if 60% or more of the animals fall outside of that range • Categorical Parameters – Fishers Exact with absolute change threshold • Fishers Exact test gives p-value < 5% AND • Absolute change of proportion > 60%

  13. Problems with these methods and alternatives • Local structure / Lack of independence • Numerical Parameters - Wilcoxon rank-sum test • Categorical Parameters – Fishers Exact or Chi-Squared • MGP • Numerical Parameters – Reference Range • Categorical Parameters – Fishers Exact with absolute change threshold

  14. Problems with these methods and alternatives • Local structure / Lack of independence • Inter day variance greater than intra day variance • 2 measurements on the same day are likely to be more similar than 2 measurements on different days • Cause • ? • Solution • Model the structure • Linear Mixed Model

  15. Mixed Model • Model data as sum of 2 normal distributions, plus a number of fixed effects • Normally distributed • Inter animal difference • Inter day difference • Fixed • Gender • Other parameters such as Weight • Genomic alteration (Genotype) • Gender / Genotype effect • Calculate p value given that Genotype effect is zero

  16. Do them yourself • Get data: • Ptk7, Grip-Strength, Forelimb grip strength measurement mean, View Data, Get as CSV, Save File • R • ptk7GS=read.csv("ptk7GS.csv") • summary(ptk7GS) Centre Strain Genotype Zygosity Gender Parameter WTSI:29 129/SvEv:29 Akt2 :14 :15 Male:29 Fat mass:29 baseline:15 Hom:14

  17. Do them yourself • Linear Model (no batch effect modeled) • ptk7GSLM=lm(Value~Genotype + Gender + Genotype*Gender, ptk7GS, na.action="na.omit") • summary(ptk7GSLM) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 68.777 2.475 27.794 < 2e-16 *** GenotypePtk7 -14.134 5.891 -2.399 0.01777 * GenderMale 11.454 4.011 2.855 0.00497 ** GenotypePtk7:GenderMale 1.987 8.966 0.222 0.82496 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 20.7 on 136 degrees of freedom Multiple R-squared: 0.1222, Adjusted R-squared: 0.1028 F-statistic: 6.311 on 3 and 136 DF, p-value: 0.0004862

  18. Do them yourself • Look at Fit • ptk7GSLMRes<-residuals(ptk7GSLM) • qqnorm(scale(ptk7GSLMRes))

  19. Do them yourself • Mixed Model • Excel • load ptk7GS.csv • =LEFT(H2,(SEARCH("_",H2)-2)) • Save ptk7GSLitter.csv • R • ptk7GSLitter=read.csv("ptk7GSLitter.csv") • ptk7GSMM=lme(Value~Genotype + Gender + Genotype*Gender,random=~1|Litter, ptk7GSLitter, na.action="na.omit“) • summary(ptk7GSMM)

  20. Do them yourself • Mixed Model • R • ptk7GSLitter=read.csv("ptk7GSLitter.csv") • ptk7GSMM=lme(Value~Genotype + Gender + Genotype*Gender, random=~1|Litter, ptk7GSLitter, na.action="na.omit“) • summary(ptk7GSMM) Linear mixed-effects model fit by REML Fixed effects: Value ~ Genotype + Gender + Genotype * Gender Value Std.Error DF t-value p-value (Intercept) 67.02067 3.377184 85 19.845137 0.0000 GenotypePtk7 -12.05973 7.461470 85 -1.616267 0.1097 GenderMale 12.59607 4.403984 85 2.860154 0.0053 GenotypePtk7:GenderMale 1.42342 8.819061 85 0.161403 0.8722

  21. Do them yourself • Mixed Model • ptk7GSMMRes<-residuals(ptk7GSMM) • qqnorm(scale(ptk7GSLMRes))

More Related