Randomised Controlled Trials in the Social Sciences Cluster randomised trials Martin Bland Professor of Health Statistic

Randomised Controlled Trials in the Social Sciences Cluster randomised trials Martin Bland Professor of Health Statistics University of York www-users.york.ac.uk/~mb55/

Cluster randomised trials • Also called group randomised trials. • Research subjects are not sampled independently, but in a group. • For example: • all the patients in a general practice are allocated to the same intervention, the general practice forming a cluster, • all pupils in a school class are allocated to the same intervention, the class forming a cluster.

Members of a cluster will be more like one another than they are like members of other clusters.

Members of a cluster will be more like one another than they are like members of other clusters. We need to take this into account in the analysis and design.

Methods of analysis which ignore clustering: • two sample t method, • chisquared test for a two way table, • difference between two proportions, • relative risk, • analysis of covariance, • logistic regression.

Methods of analysis which ignore clustering: • two sample t method, • chisquared test for a two way table, • difference between two proportions, • relative risk, • analysis of covariance, • logistic regression. • May mislead, because they assume that all subjects are independent observations.

Methods which ignore clustering may mislead, because they assume that all subjects are independent observations. Observations within the same cluster are correlated.

Methods which ignore clustering may mislead, because they assume that all subjects are independent observations. Observations within the same cluster are correlated. May lead to standard errors which are too small, confidence intervals which are too narrow, P values which are too small.

A little simulation Four cluster means, two in each group, from a Normal distribution with mean 10 and standard deviation 2. Generated 10 members of each cluster by adding a random number from a Normal distribution with mean zero and standard deviation 1. The null hypothesis, that there is no difference between the means in the two populations, is true. Two-sample t test comparing the means, ignoring the clustering.

1000 times: 600 significant differences, with P<0.05 502 highly significant, with P<0.01. If t test ignoring the clustering were valid,expect 50 significant differences, 5%, and 10 highly significant ones. The analysis assumes that we have 20 independent observations in each group. This is not true. We have two independent clusters of observations, but the observations in those clusters are really the same thing repeated ten times.

A valid statistical analysis. • Possible analysis: • find the means for the four clusters • carry out a two-sample t test using these four means only. • 1000 simulation runs: • 53 (5.3%) significant at P<0.05 • 14 (1.4%) highly significant at P<0.01

Simulation is very extreme. Two groups of two clusters and a very large cluster effect. Have seen a proposed study with two groups of two clusters. Smaller cluster effect would only reduce the shrinking of the P values, it would not remove it. Simulation shows that spurious significant differences can occur if we ignore the clustering.

Example: GP Education Trial Trial of General Practictioner education to improve treatment of asthma. Educate GPs in small groups, or not, and evaluate this education by giving repeated questionnaires to their asthmatic patients. Asked for my views on the sample size calculations.

Original: ignored the clustering and the GPs, and treated the design as a comparison of two groups of patients. Revised: produced a sample size calculation based primarily on the number of GPs, not patients.

The trial was funded and a research fellow, a GP, appointed. The cluster nature of the study was self-evident to me. It was not self-evident to the research fellow!

The trial was funded and a research fellow, a GP, appointed. The cluster nature of the study was self-evident to me. It was not self-evident to the research fellow! Many researchers find the importance of clustering very hard to understand.

The study appeared including the following description of the analysis: • ‘For each general practitioner a score was calculated for each questionnaire item. Analysis of variance was then carried out for each questionnaire item to compare the three groups . . . ’

How big is the effect of clustering? The design effect is what we must multiply the sample size for a trial which is not clustered, to achieve the same power. Alternatively, the power of a cluster randomised trial is the power of an individuall randomised trial of size divided by the design effect. Design effect: Deff = 1 + (m − 1)×ICC where m is the number of observations in a cluster and ICC is the intra-cluster correlation coefficient, the correlation between pairs of subjects chosen at random from the same cluster.

Deff = 1 + (m − 1)×ICC ICC is usually quite small, 0.04 is a typical figure. If m =1, cluster size one, no clustering, then Deff =1, otherwise Deff will exceed 1.

If we estimate the required sample size ignoring clustering, we must multiply it by the design effect to get the sample size required for the clustered sample. Alternatively, if the sample size is estimated ignoring the clustering, the clustered sample has the same power as for a simple sample of size equal to what we get if we divide our sample size by the design effect.

If we analyse the data as if there were no clusters, the variances of the estimates must be multiplied by Deff, hence the standard error must be multiplied by the square root of Deff.

Deff = 1 + (m − 1)×ICC Clustering may have a large effect if the ICC is large OR if the cluster size is large. E.g., if ICC = 0.001, cluster size = 500, the design effect will be 1 + (500 – 1)0.001 = 1.5, Need to increase the sample size by 50% to achieve the same power as an unclustered trial.

Deff = 1 + (m − 1)×ICC Clustering may have a large effect if the ICC is large OR if the cluster size is large. E.g., if ICC = 0.001, cluster size = 500, the design effect will be 1 + (500 – 1)0.001 = 1.5, Need to increase the sample size by 50% to achieve the same power as an unclustered trial. Need to estimate variances both within and between clusters. If the number of clusters is small, the between clusters variance will have few degrees of freedom and we will be using the t distribution in inference rather than the Normal. This too will cost in terms of power.

Example: a grant application An evaluation of a peer-led health education intervention. A comparison of two groups each of two clusters (counties) of about 750 people each.

Applicants were aware of the problem of cluster randomisation, but did not give any assessment of its likely impact on the power of the study, except to say that the intra-cluster correlation was "small", i.e. 0.005 based on a US study.

Deff = 1 + (m − 1)×ICC For the proposed design, the mean number of subjects in a cluster was about 750, so Deff = 1 + 750 × 0.005 = 4.75 Thus the estimated sample size for any given comparison should be multiplied by 4.75.

The estimated sample size for any given comparison should be multiplied by 4.75. We have the same power as an individually randomised sample of 3000/4.75 = 630

Degrees of freedom In large sample approximation sample size calculations, power 80% and alpha 5% are embodied in the multiplier (0.85 + 1.96)2 = 7.90.

For a small sample calculation using the t test, 1.96 must be replaced by the corresponding 5% point of the t distribution with the appropriate degrees of freedom. 2 degrees of freedom gives t = 4.30. Hence the sample size multiplier is (0.85 + 4.30)2 = 26.52 3.36 times that for the large sample.

This will reduce the effective sample size even more, down to 630/3.36 = 188. Thus the 3000 men in two groups of two clusters will give the same power to detect the same difference as 188 men randomised individually.

This will reduce the effective sample size even more, down to 630/3.36 = 188. Thus the 3000 men in two groups of two clusters will give the same power to detect the same difference as 188 men randomised individually. This proposal came back with many more clusters.

Cluster size small, large number of clusters, small ICC: Design effect close to one. Little effect if the clustering is ignored. E.g. randomised controlled trial of the effects of coordinating care for terminally ill cancer patients (Addington-Hall et al., 1992). 554 patients randomised by GP. About 200 GPs, so most clusters had only a few patients. Ignored the clustering.

Several approaches can be used to allow for clustering: • summary statistic for each cluster • adjust standard errors using the design effect • robust variance estimates • general estimating equation models (GEEs) • multilevel modeling • Bayesian hierarchical models • others

Several approaches can be used to allow for clustering: • summary statistic for each cluster • adjust standard errors using the design effect • robust variance estimates • general estimating equation models (GEEs) • multilevel modeling • Bayesian hierarchical models • others • Any method which takes into account the clustering will be a vast improvement compared to methods which do not.

A refereeing case study Paper sent in 1997 by the BMJ. Study of the impact of a specialist outreach team on the quality of nursing and residential home care.

Intervention carried out at the residential home level. Eligible homes were put into matched pairs and one of each pair randomised to intervention. Thus the randomisation was clustered.

The randomisation was clustered. Intervention was applied to the care staff, not to the patients. The residents in the home were used to monitor the effect of the intervention on the staff.

Clustering was totally ignored in the analysis.

Clustering was totally ignored in the analysis. Used the patient as the unit of analysis.

Clustering was totally ignored in the analysis. Used the patient as the unit of analysis. Carried out a Mann-Whitney test of the scores between the two groups at baseline. This was not significant.

Clustering was totally ignored in the analysis. Used the patient as the unit of analysis. Carried out a Mann-Whitney test of the scores between the two groups at baseline. This was not significant. Mann-Whitney test at follow-up, completely ignoring the baseline measurements.

Clustering was totally ignored in the analysis. Used the patient as the unit of analysis. Carried out a Mann-Whitney test of the scores between the two groups at baseline. This was not significant. Mann-Whitney test at follow-up, completely ignoring the baseline measurements. Wilcoxon matched pairs test for each group separately and found that one was significant and the other not.

Possible approaches Summary statistic for the home, e.g. the mean change in score. These could then be compared using a t method. As the homes were randomised within pairs, I suggested the paired t method. (This may not be right, as the matching variables may not be informative and the loss of degrees of freedom may be a problem.) The results should be given as a difference in mean change, with a confidence interval as recommended in the BMJ’s guide-lines to authors, rather than as a P value. Alternative: fit a multi-level model, with homes as one level of variability, subjects another, and variation within subjects a third. A job for a professional statistician.

What happened next? The paper was rejected.

What happened next? The paper was rejected. Study reported in the Lancet!

What happened next? The paper was rejected. Study reported in the Lancet! Extra author, a well-known medical statistician. ‘The unit of randomisation in the study was the residential home and not the resident. Thus, all data were analysed by use of general estimated equation models to adjust for clustering effects within homes. . . . Clinical data are presented as means with 95% CIs calculated with Huber variance estimates.’.

I looked for the acknowledgement to an unknown referee, in vain.

Reviews of published trials There have been several reviews of published cluster randomised trials in medical applications.

Randomised Controlled Trials in the Social Sciences Cluster randomised trials Martin Bland Professor of Health Statistic

Randomised Controlled Trials in the Social Sciences Cluster randomised trials Martin Bland Professor of Health Statistic

Presentation Transcript

Randomised Controlled Trials

Lecture 5: Randomised Controlled Trials

Rapid Critical Appraisal of Randomised Controlled Trials

Australasian Biometrics Conference 2009 Grouping in individually randomised trials Martin Bland Dept. of Health Sciences

Cluster Randomised Trials Of Schools Based Health Interventions

Randomised C ontrolled Trials: a workshop

Randomised Controlled Trials (RCTs)

Randomised controlled trials (RCTs)

Randomised Controlled Trials in the Social Sciences Conference 2009

Randomised Trials

Developing an Ontology for Randomised Controlled Trials

Involving the Community in Randomised Microbicide Clinical Trials:

Session 1 Randomised Controlled Trials

Sources of Bias in Randomised Controlled Trials

Critical Methological Issues in Recent Randomised Trials

Sources of Bias in Randomised Controlled Trials

Second Annual Conference on Randomised Controlled Trials in The Social Sciences: The way forward

Pragmatic Randomised Trials

Randomised Controlled Trials – Are They Always the Gold Standard?

Critical Methological Issues in Recent Randomised Trials