Can I Believe It? Understanding Statistics in Published Literature

Can I Believe It?Understanding Statistics in Published Literature Keira Robinson – MOH Biostatistics Trainee David Schmidt – HETI Rural and Remote Portfolio

Agenda • Welcome • Understanding the context • Data types • Presenting data • Common tests • Tricks and hints • Practice • Wrap up

Understanding statistics • Never consider statistics in isolation • Consider the rest of the article • Who was studied • What was measured • Why was that measure used • Where was the study completed • When was it done • It is the author’s role to convince you that their results can be believed!

Types of Data

Types of data • Numeric • Continuous (height, cholesterol) • Discrete (number of floors in a building) • Categorical • Binary (yes/no, ie born in Australia?) • Categorical (cancer type) • Ordinal categorical (cancer stage)

Histograms • Represents continuous variables • Areas of the bars represent the frequency (count) or percent • Indicates the distribution of the data

Continuous Data

Measures of association

Stem and leaf plot- heights 6* 11 6* 2 6* 3333333 6* 44444444444 6* 555555555555 6* 66666666666666666666666 6* 777777777777777777777777777777 6* 8888888888888888 6* 99999999999999999999999999999999 7* 0000000000000000000000000 7* 1111111111111111111 7* 222222222222 7* 333333 7* 44 7* 55

Skewed Data

Salient features- the mean • The average value:

Salient features- the median • The observation in the middle • Example- newborn birth weights • 3100, 3100,3200,3300,3400,3500,3600,3650 g • (3300+3400)/2 = 3350 • Not affected by extreme values • Wastes information

Salient features- the mean and median

Mean and Median • Mean is preferable • Symmetric distributions mean ~ median • Present the Mean • Skewed distributions • Mean is pulled toward the ‘tail’ • Present the Median

Mean and Median

Variability – Standard deviation and variance • The average distance between the observations and the mean • Standard deviation : • with original units , ie. 0.3 % • Variance = • With the original units squared

Range • Example, infant birth weight • 3100, 3100,3200,3300,3400,3500,3600,3650, 3800 • Range = (3100 to 3800) grams or 700 grams • Interquartile range: the range between the first and 3rd quartiles (Q1 and Q3) • 3100, 3100,3200,3300,3400,3500,3600,3650 , 3800 • IQR = (3200 to 3600) grams or 400 grams

Presenting variability • Present standard deviation if the mean is used • Present Interquartile range if the median is used

Graphics for Continuous Variables • Boxplot : outlier Maximum in Q3 75th percentile (Q3) IQR Median Minimum in Q1 25th percentile (Q1)

Categorical Data

Categorical Variables- table summaries

Bar charts • Relative frequency for a categorical or discrete variable

Bar chart vs Histogram • Histogram • For continuous variables • The area represents the frequency • Bars join together • Bar chart • For categorical variables • The height represents the frequency • The bars don’t join together

Pie chart • Areas of “slices” represent the frequency

Precision

Presenting statistics • Tables should need no further explanation • Means • No more than one decimal place more than the original data • Standard deviations may need an extra decimal place • Percentages • Not more than one decimal place (sometimes no decimal place) • Sample size <100, decimal places are not necessary • If sample size <20, may need to report actual numbers

Example of data presentation

Statistical Inference

Sampling Inference Sampling

Sampling, cont’d • A statistic that is used as an estimate of the population parameter. • Example: average parity Population Mean Sample Mean

Confidence intervals • We are confident the true mean lies within a range of values • 95% Confidence Interval: We are 95% confident that the true mean lies within the range of values • If a study is repeated numerous times, we are confident the mean would contain the true mean 95% of the time • How does confidence interval change as the sample size increases?

Confidence intervals cont’d

Hypothesis testing • Is our sample of babies consistent with the Australian population with a known mean birth weight of 3500 grams? • Sample mean = 3800 grams, 95% CI of 3650 to 3950 grams • 3500 lies outside of this confidence interval range, indicating our sample mean is higher than the true Australian population

Hypothesis testing • State a null hypothesis: • There is no difference between the sample mean and the true mean: Ho = 3500 • Calculate a test statistic from the data t = 2.65 • Report the p-value = 0.012

What is a p-value? • The probability of obtaining the data, ie a mean weight of 3800 grams or greater if the null hypothesis is true • The smaller the p-value, the more evidence against the null hypothesis • < 0.0001 to 0.05 – evidence to reject the null hypothesis (statistically significant difference) • > 0.05 – evidence to accept the null hypothesis (not statistically significant)

Summary – Confidence intervals and p values • P –value: Indicates statistical significance • Confidence interval: range of values for which we are 95% certain our true value lies • Recommended to present confidence intervals where possible

Analysing Continuous Outcomes

T tests • What are they used for? • Analyse means • Provide estimate of the difference in means between the two groups and the 95% confidence interval of this difference • P-value – a measure of the evidence against the null hypothesis of no difference between the two groups

T tests- paired vs independent • Paired: • Outcome is measured on the same individual • Eg: before and after, cross-over trial • Pairs may be two different individuals who are matched on factors like age, sex etc.

Paired T-tests • Calculate the difference for each of the pairs • The mean weight at baseline was 93 kg and the mean weight at 3 months was 88 kg. The weight at 3 months was 5 kg less compared to the baseline weight 95% CI (-3, 12)

Paired T-tests • There was no evidence that there was a significant change in weight after 3 months (p value = 0.19) • Assumptions • Bell shaped curve with no outliers • Assess shape by graphing the difference • Use a histogram or stem and leaf plot

Independent T tests • Two groups that are unrelated • Eg: weights of different groups of people

Independent samples t-tests • Same assumption as for paired t tests plus the assumption of independence and equal variance

Interpretation –independent t tests • The mean weight in NW Public was 62 kg and the mean weight in SW Public was 61 kg • The mean difference in weight between the two schools was 1 kg (-22, 24) • There was no evidence of a significant difference in weight between the two schools (p=0.92)

One-way Analysis of Variance (ANOVA) • What happens when there are more than two groups to compare? • Null hypothesis: means for all groups are approximately equal • No way to measure the difference in means between more than two groups, so the variance between the groups is analysed • Can measure variance within a group as well as variance between groups

One-way ANOVA • Comparing multiple groups

Interpretations – One-way ANOVA • There was evidence of a difference between the average student weight between the four schools p<0.05 • There was evidence of no difference between the average student weight between the four schools p>0.05 • Not advised to compare all means against each other because there is an increased chance of finding at least 1 result that is significant the more tests that are done

Assumptions ANOVA • Normality, - observations for all groups are normally distributed, • Variance in all groups are equal • Independence – all groups are independent of each other

Extensions of one-way ANOVA • Two way-ANOVA: • Multiple factors to be considered. Eg school and type of school (public/private) • ANCOVA – Analysis of Covariance • Tests group differences while adjusting for a continuous variables (eg. age) and categorical variables

Can I Believe It? Understanding Statistics in Published Literature

Can I Believe It? Understanding Statistics in Published Literature

Presentation Transcript

Descriptive Statistics Introduction to Summary Statistics

Spatial Statistics III

Chapter Eight: Using Statistics to Answer Questions

E NGLISH LITERATURE

Statistics

Understanding and Answering

Bivariate Statistics and Linear Regression

Asian History and Literature

The Heart of Literature

As much as I can say about Statistics in 60 minutes …

Brain Drain

Literary Genres

Romantic Literature

Our Literature , Our Field: Findings and Trends From Postsecondary Disability Literature

How to do a literature search

Afro-Asian Literature

Descriptive Statistics

BASIC STATISTICS For the HEALTH SCIENCES Fifth Edition

PERIODS OF BRITISH LITERATURE

BUSINESS STATISTICS

Statistics Chapter 1 Introduction to Statistics

Importance of Statistics in Psychology