Why do we Need Randomised Controlled Trials? David Torgerson Director, York Trials Unit
What works? • In most areas, education, health, criminal justice, etc, we want to know WHAT or WHETHER something works. • Do ‘bootcamps’ reduce criminal behaviour? • Are teaching volunteers effective? • Are computers effective at improving literacy and numeracy? • Of secondary importance is HOW.
The WHAT question • The ONLY way we can find out whether something works or not is by using a RANDOMISED CONTROLLED TRIAL. • All other evaluative methods are INFERIOR ways of answering the WHAT question and some cannot answer it at all (e.g., qualitative research).
Structure of Session • Randomised Controlled Trials ARE the ‘gold-standard’ evaluation method. • What is wrong with other research methods? • Why should we do trials
Clinical Practice in the 18th Century • "It is incident to physicians, I am afraid, beyond all other men, to mistake subsequence for consequence." Samuel Johnson, 1734
Background • Traditionally most interventions have been evaluated using a pre-test post-test or before and after design. • Participants are tested treated and then tested again any improvements are attributable to the intervention. • Currently this is probably the most POPULAR evaluative method in most fields.
Who uses before and after? • Policy makers • Teachers assessing individual children. • Action researchers. • Parents • Lecturers • We all do.
Problems • Problems include: • Temporal changes; • Regression to the mean.
Temporal Change • Self-learning irrespective of teaching occurs. • As children mature they will become better at learning. • Any intervention or treatment is mixed up with these temporal changes difficult to disentangle.
Changes in Outcomes • If we measured outcome on public examination results we will see an improvement. Is this because the intervention has worked? Or is it because exams have got easier? Or have children become more intelligent? • Without a control group we CANNOT know.
Regression to the Mean • As well as temporal changes before and after studies are confounded by a statistical phenomenon known as ‘Regression to or towards the mean’
Regression to the mean • This is a GROUP phenomenon and occurs when the group are measured with an inexact measurement tool and then remeasured. Those individuals with ‘extreme’ values will have a high probability of regressing towards the mean on the second measurement.
History of RTM • Galton’s work from 1869 started to provide the understanding of the phenomenon. • By 1886 Galton had described the phenomenon among the heights of children and their parents (children of tall parents tend to be shorter and vice versa – regression to mediocrity).
Economists and RTM • “I suspect that the regression fallacy is the most common fallacy in the statistical analysis of economic data” Milton Friedman 1992
Marking Exam Scripts • For MSc in Health Sciences system of double marking markers are blind to student identity and the other marker’s mark. • There is a tendency to disagree with marks at the extreme of the distribution. • Explanation: Regression to mean.
Did the Amnesty work? • Unclear, the year preceeding the amnesty had a large, unexpected, increase in offences, we would expect through regression to the mean that in the following year the rate of increase would ‘regress’ back to towards the ‘average’ annual increase.
Education intervention • Wheldall selected 40 pupils whose reading was at least 2 years behind their peers. • Half were exposed to an intervention. Wheldall Educational Review 2000;52:29.
Before and after reading programme Difference highly statistically significant p < 0.001
Before and after reading programme Differences between groups NOT statistically significant
RTM misunderstanding • “the mean gain scores translated to impressive effect sizes of 0.6.” • “It could be argued that it is asking too much of any program to demonstrate enhanced efficacy on top of such high existing efficacy” • “…control group gains were largely attributable to pre-existing …literacy programme..” • Perhaps, BUT much of the gain will be due to RTM.
RTM and School Exclusions • A qualitative and before and after evaluation of an intervention to reduce school exclusions said • “an RCT would not have been able to adequately address fundamental problems concerning the reliability and validity of quantitative data in relation to exclusions”
Flawed Methods • Selected schools with HIGH exclusion rates on which to intervene. Therefore we would EXPECT exclusions to fall. • They did by 15%. • BUT schools with the fewest exclusions INCREASED exclusions by 55% whilst schools with the highest exclusions had a fall of 32%.
Mentoring • In England, part of the KS3 Strategy • Backed by Government and private funding • ‘Mentoring’ means a lot of different things • Research evidence is • Case studies • Feelings and perceptions of participants • Completely inadequate to infer impact
Neil Appleby’s Experiment • A randomised controlled trial involving 20 underachieving Y8 (12-13 year-old) students • Matched in pairs on ability and gender • Randomly allocated: in each pair, one mentored, the other not • Mentored group had 20 mins individually every two weeks (11 sessions) • ‘It nearly killed me’ • Cost estimated at between £170 and £410 per mentored pupil, represents between 8-19% of the school’s annual per pupil funding for the whole of their education
What the teachers said about the mentored students … • “**** is a changed person this year she has progressed greatly and is a superb helpful student.” • “Better now, has achieved more, more confident.” • “Generally a great improvement recently.” • “****’s attitude and effort have improved over the year. He is a lot pleasanter and more willing to participate in lessons particularly oral work, he responds well to praise.”
What they said about the control group … • “Has improved overall this term.” • “****’s attitude and effort have improved over the last few months, she is now trying very hard to achieve her target. Great effort.” • “Commended for attitude and progress.” • “**** has settled since the beginning of the year.” • “**** has undergone quite a transformation since September. Her attitude towards the teacher and her learning have improved drastically and she should be congratulated.”
Change in Teachers’ Ratings of progress, effort and attitude(English, maths and science combined)
What this proves • If you identify a group of underachieving pupils at a particular time and then come back to them after a few months, many of them will have improved, whatever you did. • Others (the ‘hard cases’) will not have improved, whether mentored or left alone. • The interpretation of this would have been very different without a ‘control’ group
RTM and League Tables • RTM GREAT for Governments to help the credulous into believing what they do works. • In any league table those at the bottom will tend to ‘regress’ upwards to the mean whilst those at the top regress down. This lends support to naming and shaming or extra financial help to those at the bottom.
Dealing with RTM • The only way to reliably deal with the problem is through randomised trials. • Which is why before and after data are generally regarded, by the congnescenti, as almost USELESS.
History of Controlled Trials • Because of temporal and regression to mean effects we MUST have a control group.
Background • Many researchers over the centuries have seen the need for a ‘control’ group to avoid the inherent biases in the before and after study. • Controlled trials have been conducted for several hundred years probably occasionally using randomisation.
Scurvy • Scurvy was a very prevalent condition among sailors before the 19th Century. • A controlled trial in the middle of the 18th Century of 12 sailors showed that the two sailors allocated to receive lime or orange juice recovered and were able to care for their ship mates allocated to vinegar or salt water.
Lack of Dissemination • An even earlier trial in scurvy prevention used a ‘cluster’ design whereby a whole ship’s crew were allocated citrus fruit and were compared with two ships’ crews who were not. • The treatment worked but lesson forgotten. • After second trial took Navy 50 years to implement results
Agriculture • Fisher is usually thought of as the originator of randomisation in the 1920s in agricultural experiments. • He was concerned with the statistical properties of ‘randomness’ as well as the formation of unbiased groups.
Cambridge-Somerfield • In 1937 a classic experiment – the Cambridge-Somerfield trial was launched. • The aim was to show that social worker intervention among ‘delinquent’ boys would reduce ‘criminality’.
Design • 650 boys were identified by their teachers as having delinquent behaviour that put them at later risk of criminal activity. • 325 pairs were formed and one from each pair was allocated a social worker supported by psychiatrists.
Results – early follow-up % of boys indulging in crime. Green bar indicates intervention grop
Results later follow-up • In 1975 ‘boys’ were followed up again when middle aged men. • 58% of intervention group had NOT had a criminal conviction • BUT 68% of control group had NOT had a conviction. • If a control group had not been used success of the intervention would be assured.
Consequences of the Trial • The social work profession largely ABANDONED the RCT as a method of evaluation as it failed to give the RIGHT results.
RCTs and education • Lindquist writing about experimental methods in 1940 argued that advanced text books use “all illustrations given are in the field of agricultural experimentation and are concerned with “plots” “blocks” “yields” “treatments” etc, rather than with “schools” “classes” “scores” “methods” “pupils” etc.” Lindquist Statistical Analysis in Educational Research, 1940.
The Importance of Design in Educational Experiments (Lindquist) • In 1940 in his book on statistics in educational research Lindquist quite clearly describes appropriate RCTs for educational research. • His book is also the first description of the appropriate techniques to be used in analysing pupils scores in classes (I.e, cluster analysis), which was an advance on Fisher’s Design of experiments.
Cluster analysis • In health statistics Lindquists statistical methods were largely ignored until the late 1980s when it became accepted to use the methods he advocated to analyse clustered data although even now most cluster trials are badly analysed. • But 64 years on what about his descriptions on how to rigorously evaluate educational interventions?
Educational Trials: UK • Not many trials in education have been undertaken in the UK. • Most educational trials are from the USA. • WHY? (my personal view) • Futility of the ‘paradigm war’; • Failure to understand their importance; • Trials often give the ‘wrong’ answer; • Lack of funding.
Opposition to Trials is widespread • In health care many doctors will refuse to believe the results of a trial and argue the trial was faulty or poorly conducted if the result was ‘wrong’. • Recent example: WHI study of hormone replacement – many doctors REFUSE to accept the findings of this study that it INCREASES risk of heart disease.
Opposition to Polio Trial • “I found but one person who rigidly adherred to the idea of a placebo control and he is a bio-statistician who, if he did not adhere to this view, would have had to admit his own purposelessness in life” (Jonas Salk).
1950s to 1970s • The use of trials expanded rapidly within and beyond medicine. • In the social sciences experiments included: • Negative income tax; • Adoption; • Busing; • Public vs private schools; • Prevention of spousal abuse.