Gladstone Bioinformatics Core

+ Statistics in Science: Best Practices Gladstone Bioinformatics Core Kirsten E. Eilertson

Our Goal • “thoughtful” “insightful” “rigorous” statistical analyses • Meaningful and solid inference which can be the basis of future work Our Challenges • Every application is different • Precedents can be a problem • P-value centric publication system • Reproducibility

Reporting and Reproducibility • Reproducibility Crisis!!! • Dr. Ioannidis (2005) PLoS Medicine

Our Goal • “thoughtful” “insightful” “rigorous” statistical analyses • Meaningful and solid inference which can be the basis of future work Discussion Today: • Reporting results • Power and Experimental Design • Outliers Our Challenges • Every application is different • Precedents can be a problem • P-value centric publication system • Reproducibility

Guidelines for reporting • Resources: • Annals of Internal Medicine • http://www.people.vcu.edu/~albest/Guidance/guidelines_for_statistical_reporting.htm • American Physiological Society • http://physiolgenomics.physiology.org/content/18/3/249.full? • ‘Describe the statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results’ (Bailar & Mosteller, 1988, p. 266)

Guidelines for reporting • ‘The design of an experiment, the analysis of its data, and the communication of the results are intertwined. In fact, design drives analysis and communication.’ • Always report the test statistic, the degrees of freedom, the test value, and the P-value that the result occurred at chance under the null hypothesis. • Report how assumptions were checked (e.g. histograms of residuals, tests of normality, etc.) • Provide a clear description of the design of your study or experiment; state the null hypothesis and alternative

Guidelines for reporting • Control for multiple comparisons. • Report variability using a standard deviation (not standard error). • Avoid sole reliance on statistical hypothesis testing, such as the use of P values, which fails to convey important quantitative information. • Report uncertainty about scientific importance using a confidence interval. • Interpret each main result by assessing the numerical bounds of the confidence interval and by considering the precise p value.

Experimental Design & Power • The appropriate analysis depends on the design! • Can I peek at my data? • Can I add more samples later? • Sequential or Adaptive design • Multiple testing/Gambler’s ruin • Prism Graphpad Example: http://www.graphpad.com/guides/prism/6/statistics/index.htm?stat_why_you_shouldnt_recompute_p_v.htm

A stochastic process Error Statistics Blog Discussion Not an “Argument from intentions” but really a “Probabativecapacity of the test”.

Power analysis From the American Statistician (2001) Uses: • Pilot studies! (don’t forget to control for multiple comparisons) • Detectable Effect Size (when non-significant result) • Consider confidence intervals instead

Outliers measurement or model error? • Reasons for concern • Increases the estimated standard deviation • May indicate the model (e.g. assumption of normality) is not correct • May lead to model misspecification • Biased parameter estimation

Methods Detection Visual inspection Grubbs Test (assumes Normality) Chauvenet’s criterion (assumes Normality) Dixon’s Q test (assumes Normality) Based on interquartile range measure 2 standard deviations

Methods Analysis Delete the outlier Trimmed Mean/Winsorized Mean Weighted regression techniques Do nothing Report with & without outliers • Arguments for keeping • Methods for identification does not make the practice of deleting scientifically or methodologically sound • Minimal effect on estimates/model (low influence)

Influential Points Outlier Leverage Influence

Outliers: Decide whether or not deleting data points is warranted: Do not delete data points just because they do not fit your preconceived model. You must have a good, objective reason for deleting data points. Implausible; inaccuracy in measurement; from a different population If you delete any data after you've collected it, justify and describe it in your reports. If you are not sure what to do about a data point, analyze the data twice — once with and once without the data point — and report the results of both analyses.

Gladstone Bioinformatics Core

Gladstone Bioinformatics Core

Presentation Transcript

Gladstone Holiday Accommodation

Core 2: Bioinformatics

Gladstone Maritime Museum

Bioinformatics Core at Purdue University

Biostatistics Bioinformatics Core

About Gladstone Education

Alastair Kerr, Ph.D. WTCCB Bioinformatics Core

Dear W. Gladstone,

Core 2: Bioinformatics

Bioinformatics Core

Bioinformatics Core Facility

Arkansas inbre bioinformatics core

Arkansas inbre bioinformatics core

Bioinformatics Core Facility Ernesto Lowy

Bioinformatics and Computational Biology Core Facility

Bioinformatics and Data Analysis Core

Gladstone Accommodation

Asbestos Removal Gladstone

Gladstone Garage

Bioinformatics Core

Core 2: Bioinformatics

Core 2: Bioinformatics