ESTIMATION & HYPOTHESIS TESTING

ESTIMATION &HYPOTHESIS TESTING Dr Liddy Goyder Dr Stephen Walters

At the end of session, you should know about: • The process of setting and testing statistical hypotheses At the end of session, you should be able to: • Explain: • Null hypothesis • P-value, and what different values mean • Type I error • Type II error • Understand what is meant by the term Power • Demonstrate awareness that the p-value does not give the probability of the null hypothesis being true • Demonstrate awareness that p>0.05 does not mean that we accept the null hypothesis • Distinguish between ‘statistical significance’ and ‘clinical significance’

Teenage Pregnancy • Our young doctor has noticed that there are differences between the teenage pregnancy rates in the two general practices that she has worked in. • The two practice populations are very different in terms of deprivation • She is interested in investigating whether there is a statistically significant relationship between deprivation and teenage pregnancy?

Teenage Pregnancy Example • What is the research question (what is being investigated)? • Is there a relationship between teenage pregnancy change and deprivation? • What is the outcome variable (how will they measure this)? • Teenage pregnancy rate

Statistical Analysis (1) • Last session we discussed why we take samples rather than study the whole population • We examine the behaviour of a sample as it is often not feasible to look at the entire population • From a sample we want to make inferences about the population from which it is drawn. • We do this by a process of statistical hypothesis testing: formulating a hypothesis and testing it • This session we will look at how you formulate and test a hypothesis. • You are not expected to know about individual tests, but need to understand the concept of setting and testing statistical hypotheses

Statistical Analysis (2): Population and Sample

Statistical Analysis (3) • The main aim of statistical analysis is to use the information gained from a sample of individuals to make inferences about the population of interest • There are two basic approaches to statistical analysis • Estimation (confidence intervals) • Hypothesis testing (p-values)

Hypothesis testing: the main steps Set null hypothesis Set study (alternative) hypothesis Carry out significance test Obtain test statistic Compare test statistic to hypothesized critical value Obtain p-value Make a decision

State your hypotheses (H0 & H1 ) • State your null hypothesis (H0) (statement you are looking for evidence to disprove) • State your study (alternative) hypothesis (H1 or HA) • Often statistical analyses involve comparisons between different treatments (eg standard and new) • we assume the treatment effects are equal until proven otherwise • Therefore the null hypothesis is usually the negation of the research hypothesis – new treatment will differ in effect from the standard treatment NB: It is easier to disprove things than prove them

Teenage pregnancy example Is there a relationship between teenage pregnancy rate and deprivation Teenage pregnancy rate There is no relationship between teenage pregnancy rate and deprivation There is a difference • What is the research question? • What is the outcome variable? • What is the null hypothesis? • What is the alternative hypothesis?

Teenage pregnancy example: Ref: www.empho.org.uk/whatsnew/teenage-pregnancy-presentation.ppt

Carry out significance test • Calculate a test statistic using your data (reduce your data down to a single value). The general formula for a test statistic is: test statistic = observed value-hypothesized value se of the hypothesized value • Compare this test statistic to a hypothesized critical value (using a distribution we expect if the null hypothesis is true (e.g. Normal distribution)) to obtain a p-value

Teenage pregnancy example

Teenage Pregnancy example • We can quantify the relationship using a regression analysis • This measures what the average change in the teenage pregnancy rates is for a given change in the deprivation score • The null hypothesis is that there is no change in the teenage pregnancy rate as the deprivation rate changes • The alternative hypothesis is that the teenage pregnancy rate does change as deprivation changes

Teenage pregnancy results • Thus as the deprivation score increases by 1 unit there are an additional 0.006 pregnancies per 1,000 women aged 15-17 years. • As deprivation score varies between about 1,000 and 8,000 the above expression can be rescaled • Thus as deprivation score increases by 1,000 units there are an additional 6 pregnancies per 1,000 women • A significance test for the regression coefficient gives also p-values of less than 0.001

Making a decision (1) • When making a decision you can either decide to reject the null hypothesis or not reject the null hypothesis. • Whatever you decide, you may have chosen correctly and: • rejected the null hypothesis, when in fact it is false • not rejected the null hypothesis, when in fact it is true • Or you may have chosen incorrectly and: • rejected the null hypothesis, when in fact it is true (false positive) • not rejected the null hypothesis, when in fact it is false (false negative)

Making a decision (2)

Making a decision (3)

Making a decision (4) The probability of rejecting the null hypothesis when it is actually false is called thePOWERof the study (Power=1-β). It is the probability of concluding that there is a difference, when a difference truly exists

Making a decision (5) The probability of rejecting the null hypothesis when it is actually false is called thePOWERof the study (Power=1-β). It is the probability of concluding that there is a difference, when a difference truly exists

Making a decision (6) The probability of rejecting the null hypothesis when it is actually false is called thePOWERof the study (Power=1-β). It is the probability of concluding that there is a difference, when a difference truly exists A p-valueis the probability of obtaining your results or results more extreme, if the null hypothesis is true. It is the probability of committing a false positive error i.e. of rejecting the null hypothesis when in fact it is true

Making a decision (7) • Use your p-value to make a decision about whether to reject, or not reject your null hypothesis • A p-value can range from 0 to 1 • But how small is small? The significance level is usually set at 0.05. Thus if the p-value is less than this value we reject the null hypothesis

Statistical significance (1) We say that our results are statistically significant if the p-value is less than the significance level () set at 5% We cannot say that the null hypothesis is true, only that there is not enough evidence to reject it

Statistical significance (2) • The significance level is usually set at 5% • The level is conventional rather than fixed • Sometimes, for stronger proof we require a significance level of 1% (or P<0.01)

Misinterpretation of P-values (1) • A common misinterpretation of the P-value is that it is: • The probability of the data having arisen by chance • The probability that the observed effect is not a real one • The distinction between this incorrect definition and the true definition is the absence of the phrase when the null hypothesis is true

Misinterpretation of P-values (2) • The omission of when the null hypothesis is true leads to the incorrect belief that it is possible to evaluate the probability of the observed effect being a real one • The observed effect in the sample is genuine, but we do not know what is true in the population • All we can do with this approach to statistical analysis is to calculate the probability of observing our data (or data more extreme) when the null hypothesis is true

Teenage pregnancy: making a decision • A p-value is the probability of obtaining your results or results more extreme, if the null hypothesis is true • The P-value for the regression coefficient is < 0.001 • Thus we reject the null hypothesis and conclude that there is statistically significant change in teenage pregnancy rates as deprivation rate changes. • The result is statistically significant at the 5% level

Teenage pregnancy example: making a decision • If however the P-value had been greater than 0.05 we would have concluded that there is insufficient evidence to reject the null hypothesis • The results would not be statistically significant at the 5% level • We do not conclude that the null hypothesis is true, only that there is insufficient evidence to reject it

Recap: making a decision Set study hypothesis Set null hypothesis Carry out significance test Obtain test statistic Compare test statistic to hypothesized critical value Obtain p-value Make a decision

Limitations of a hypothesis test • All that we know from a hypothesis test is how likely the difference we observed is given that the null hypothesis is true • The results of a significance test do not tell us what the difference is or how large the difference is • To answer this we need to supplement the hypothesis test with a confidence interval which will give us a range of values in which we are confident the true population mean difference will lie

Statistical & Clinical Significance (1) • A clinically significant difference is one that is big enough to make a worthwhile difference • Statistical significance does not necessarily mean the result is clinically significant • Supplementing the hypothesis test with an estimate of the effect with a confidence interval will indicate the magnitude of the result. This will help the investigators to decide whether the difference is of interest clinically

Statistical & Clinical Significance (2)

Statistical & Clinical Significance (3)95% Confidence intervals added

Statistical and clinical significance (4) • With a large enough sample the smallest of changes may be statistically significant but not clinically important. • If the sample size of the study is too small and has low power, a clinically significant result may not be regarded as statistically significant. • Therefore it is important that the size of the sample is adequate to detect the clinically significant result, at the 5% significance level with at least 80% power (something to look for in the methods section when reading the literature).

Relationship between confidence intervals and statistical significance (1) • There is a close relationship between hypothesis testing and confidence intervals • If the 95% CI does not include zero (or more generally the value specified in the null hypothesis) then a hypothesis test will return a statistically significant result • If the 95% CI does include zero then the hypothesis test will return a non-significant result

Relationship between confidence intervals and statistical significance (2) • 95% certain that the CI includes the true value • Thus there is a 5% probably that the true value lies outside the CI • If the CI does not include zero there is a less than 5% probability that the true vale is zero • The p-value represents the probability that you conclude there is a difference when in fact there is no difference • Thus when p=0.05 there is a 5% probability that we conclude there is a difference when in fact there is no difference i.e. there is 5% probability that the true value is zero

Relationship between confidence intervals and statistical significance (3) • The CI shows the most likely size of the difference given the data and the uncertainty or lack of precision around this difference. The p-value alone tells you nothing about the size nor its precision. Thus the CI conveys more useful information than p-values alone • eg whether a clinician will use a new treatment that reduces blood pressure will depend on the amount of that reduction and how consistent the effect is • So, the presentation of both the p-value and the confidence interval is desirable

Summary • Research questions need to be turned into a statement for which we can find evidence to disprove - the null hypothesis. • The study data is reduced down to a single probability - the probability of observing our result, or one more extreme, if the null hypothesis is true (P-value). • We use this P-value to decide whether to reject or not reject the null hypothesis. • But we need to remember that ‘statistical significance’ does not necessarily mean ‘clinical significance’. • Confidence intervals should always be quoted with a hypothesis test to give the magnitude and precision of the difference.

You should now know about: • The process of setting and testing statistical hypotheses You should now be able to: • Explain: • Null hypothesis • P-value • Type I error • Type II error • Power • Demonstrate awareness that the p-value does not give the probability of the null hypothesis being true • Demonstrate awareness that p>0.05 does not mean that we accept the null hypothesis • Distinguish between ‘statistical significance’ and ‘clinical significance’

Next week…….. • In the next Critical Numbers session we are going to look at risk!

One-sided vs two-sided significance testing • Two-sided : does not specify the direction of any effect • There is a difference between treatment A and treatment B • One-sided : specifies the direction of the effect • Treatment A is better than treatment B

One-sided significance testing • One-sided tests are rarely appropriate, even when there is a strong prior belief as to the direction of the effect, as by doing a one-sided test you do not allow for the possibility of finding an effect in the opposite direction to the one you are testing • This is similar to history taking, when it is important not to ask leading questions in case you miss the correct diagnosis • The decision to do one-sided tests must be made before the data are analysed; it must not depend on the outcome of the study • An example of when a one-sided test might be appropriate is in clinical trials looking at non-inferiority

ESTIMATION & HYPOTHESIS TESTING