1 / 38

The EEF by numbers

Building Evidence in Education: Workshop for EEF evaluators 2 nd June: York 6 th June: London www.educationendowmentfoundation.org.uk. The EEF by numbers. 34 topics in the Toolkit. 3,000 schools participating in projects. 6 00,000 pupils involved in EEF projects. 14 members of EEF team.

najila
Télécharger la présentation

The EEF by numbers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Evidence in Education:Workshop for EEF evaluators2nd June: York6th June: Londonwww.educationendowmentfoundation.org.uk

  2. The EEF by numbers 34topics in the Toolkit 3,000 schools participating in projects 600,000pupils involved in EEF projects 14 members of EEF team 6,000heads presented to since launch £220mestimated spend over lifetime of the EEF 16independent evaluation teams 83 evaluations funded to date 10 reports published

  3. Session 1: DesignRCT design, power calculations and randomisation  Ben Styles (NFER)Maximising power using the NPDJohn Jerrim (Institute of Education)

  4. RCT design Power calculations and randomisation Ben Styles Education Endowment Foundation June 2014

  5. RCT design • The ideal trial • Methods of randomisation • Power calculations • Syntax exercise!

  6. A statistician’s ideal trial • Randomly select eligible pupils from NPD • No consent! • Simple randomisation of pupils to intervention and control groups • No attrition • No data matching problems • No measurement error

  7. 1. Trial registration: specification of primary and secondary outcomes in addition to sub-group analyses • 2. Recruit participants and explain method to stakeholders • 3. Select participants according to fixed eligibility criteria • 4. Obtain consent • 5. Baseline outcome measurement (or use existing administrative data) • 6. Randomise eligible participants into groups (evaluator carries out randomisation) • 7. Intervention runs in experimental group; control receives ‘business-as-usual’/an alternative activity • 8. Administer follow-up measurement (evaluator) • 9. Intention-to-treat analysis followed by reporting as per CONSORT guidelines • 10. Control receives intervention (under what circumstances?) BEFORE YOU START !

  8. Why we depart from the ideal • Schools manage pupils! • Nature of the intervention • Contamination – how serious is the risk?

  9. Restricted randomisation? • Use simple randomisation where you can • Timetable considerations in a pupil-randomised trial → stratify by school • Important predictor variable with small and important category → stratify by predictor • Fewer than 20 schools → minimise http://minimpy.sourceforge.net/ • Multiple recruitment tranches → blocked • Pairing → BAD IDEA!

  10. Restricted randomisation Simple randomisation Restricted randomisation Restricted randomisation more complicated and can go wrong. Take strata into account in analysis: http://www.bmj.com/content/345/bmj.e5840

  11. To remember! If you have restricted your randomisation using a factor that is associated with the outcome (e.g. school) THEN INCLUDE THE FACTOR AS A COVARIATE IN YOUR ANALYSIS

  12. Chance imbalance at baseline • As distinct from bias induced by measurement attrition • Can be quite large in small trials e.g. on baseline measure • Include covariate in final analysis

  13. Sample size calculations • School or pupil-randomised? • Intra-cluster correlation • Correlation between covariate and outcome • Expected effect size • p(type I error)=0.05; power=0.8 • Attrition

  14. Rule of thumb Lehr, 1992

  15. Pupil randomised • ICC = 0 • Correlation between baseline and outcome: http://educationendowmentfoundation.org.uk/uploads/pdf/Pre-testing_paper.pdf and your previous work • Effect size: previous evidence; cost-effectiveness; EEF security ratings • Attrition: EEF allow recruitment to be 15% above sample size after attrition

  16. Cluster-randomised • Same as for pupils aside from ICC • Proportion of total variance that is due to between cluster variance • EEF pre-testing paper has some useful guidance • Pre-test also reduces ICC e.g. from 0.2 to 0.15 for KS2 baseline, GCSE outcome

  17. MDES • Minimum detectable effect size • EEF require this on the basis of real parameters for the security rating • (avoid retrospective power calculation) • How good were my estimates?

  18. Sample size spreadsheet

  19. Running the randomisation SYNTAX EXERCISE • In pairs, explain what each of the steps does • How many schools were randomised in this block?

  20. Conclusions • Always think of any RCT (any quantitativeimpact evaluation) as a departure from the ideal trial • The design, power calculations, method of randomisation and analysis all interrelate and need to be consistent

  21. Maximising power using the NPDJohn Jerrim (Institute of Education)

  22. Structure How much power do EEF trials currently have? PISA, power, star ratings and current EEF trials Exercise Work in groups to design an EEF trial Goal = Maximise power at minimal cost My answers How might I try to maximise power? Your answers! / Discussion

  23. Power in contextEffect sizes, PISA rankings and EEF padlock ratings

  24. EEF secondary school trials As of 01 / 05 / 2014 Detectable effect size Mean = 0.276 Median = 0.25 Between 4* and 5* by EEF guidelines…. How powerful are EEF trials thus far?

  25. Effect size = 0.50 (EEF 2*) Effect size = 0.40 (EEF 3*) Effect size = 0.30 (EEF 4*) MEDIAN EEF TRIAL = 0.25 Effect size = 0.20 (EEF 5*) Power and the PISA reading rankings Effect size = 0.10 IMPLICATION Effect sizes of 0.20 are damn big … particularly given pretty small doses we are giving UK’s current position

  26. Do we currently have a power problem?- Quite possibly!- So trying to get more power in future trials very important…..

  27. Exercise

  28. Exercise Task: In groups, discuss how you would design the following trialIntervention = Teaching children how to play chessMaximum number of treatment schools = 20 secondary schoolsYear group = Year 7Level of randomisation = School levelTest = One-to-one non-verbal IQ assessment with trained educationalist (end of year 7)Control condition = ‘Business as usual’ Study type = ‘Efficacy’ study (proof of concept)Objective: Maximise power at minimum costHow would you design this trial to meet these twin objectives? What could you do to increase power in this trialE.g. Would you use a baseline test? If so, what?

  29. My answersThe usual suspects…..…and less obvious options

  30. The usual suspects….. Use a regression model and include baseline covariates….. - Adding controls explains variance. Boosts power Use Key stage 2 test scores as “pre-test”…. - Point of baseline covariates is to explain variance - KS 2 scores in maths likely to be reasonably correlated with outcome (non-verbal IQ) - CHEAP! From NPD. 3. Stratify the sample prior to randomisation - Potentially reduces error variance. Thus boosts power. - Additional advantages. Balance of baseline characteristics. 4. Really engage with control schools - Make sure we minimise loss of sample through attrition

  31. Less ‘obvious’ options….

  32. Don’t test every child…….. There are around 200 children per secondary school….. …. One-to-one testing is expensive …Testing more than 50 pupils buys you little additional power RANDOMLY SAMPLE PUPILS WITHIN SCHOOLS! Assumptions 20 schools Pre/post corr of 0.75 80% power Rho = 0.15

  33. …..use an unequal sampling fraction • We all know that ↑ clusters (k) means ↑ power • This example: limited to only a small number of treatment schools (20) • ….but control condition was non-intrusive and cheap • So don’t just recruit 20 control schools as well – recruit more! • Nothing about RCT’s mean we need equal k for treatment and control • Power calculation becomes more complex (anybody know it!?)

  34. Use more homogenous selection of schools…. PISA 2009 data All UK schools: “Worst” 25% of schools only: ALL UK SCHOOLS LOW PERFORMING SCHOOLS ONLY

  35. Why does rho decline?? The within school variation barely changes ….. …. While the between school variation declines substantially

  36. Implications • As example is an efficacy study why not restrict attention to low performing schools only? - Boosts power! - Fits with EEF mandate (close performance gap) - Not worried about generalisability • We implicitly do this anyway (e.g. by doing trials in just one or two LA’s)…… • …..but can we do it in a smarter way??? • Little appreciated trade-off between POWER and GENERALISBILITY - Long-term implications for EEF - Trial representative of England population very hard to achieve

  37. Conclusions Do we have a “power problem”? • Quite possibly • Median detectable effect size = 0.25 in EEF secondary school trials • If were to boost UK reading PISA scores by this amount, we would move above Canada, Taiwan and Finland in the rankings….. Ways to potentially increase power • Include baseline covariates (from NPD where possible) • Stratify the sample prior to randomisation • Engage with control schools! • Do you need to test every child? Practical alternatives? • Could you increase number of control schools without adding much to cost (unequal randomisation fraction) • Could you restrict your focus to a narrower population? (e.g. low performing schools only)?

More Related