A Convergence of Interests?

We are oft to blame in this –

Tis too much proved – that with devotion’s visage

And pious action we do sugar o’er

The devil himself. – Hamlet

There is an unfortunate convergence of interests becoming apparent in the recent research gold rush which raises questions about the quality of the educational debate – or, more accurately, the quality of research input into the debate about where education should be going.

Consider, for example, the Education Endowment Foundation. The government has invested considerable funds via the EEF to run randomized controlled trials. One of the RCT evaluations recently released by the EEF was for a programme called Switch-On Reading. What is not apparent in the headline, but appears later in the report, is that the programme is in fact a repackaging of Reading Recovery, which is now being aimed at students at the transition between Key Stages 2 and 3.

Its provenance does not automatically make Switch-On Reading bad. The key question the study needs to answer is whether the intervention is effective enough to justify both financial cost and opportunity cost. Financially, the authors of the report calculate that it costs £672 per pupil to deliver. That’s expensive by some standards, depending on what we compare it with, but is cheap compared to the cost of a student leaving school unable to read fluently. In terms of opportunity cost, the programme consists of 20-minute tuition sessions daily over ten weeks, a total of 1000 minutes or about 16 hours. The 20 minutes a day can probably come out of form/tutor/registration time in the mornings. Ten weeks of that isn’t going to destroy anyone’s educational life chances. If the programme is effective, the costs are bearable.

So how effective is Switch-On Reading? The evaluators decided that it has an effect size of 0.24. What does this mean? The EEF tells us that an effect size of 0.6 can be compared to about a year’s progress. So this programme delivers about four months’ progress, in daily sessions over two and a half months. John Hattie, in his mega-meta-analysis, has suggested that the average effect size of an educational intervention is about 0.4, and argues that we should be focusing on interventions that have an impact above this threshold.

You might think from this that the intervention is less than mediocre and probably not worth implementing. But for some reason this is not the conclusion the evaluators reached. The EEF tells us that it had a ‘noticeable positive impact’ and ‘can be an effective intervention for weak and disadvantaged readers’. The authors justify this by pointing to the evidence that for some sub-groups, noticeably weaker readers, gains were above the average for the group. Unfortunately, the authors also admit that the sub-groups scores are statistically unreliable.

It’s difficult to know what to make of this. Are the evaluators following a different set of criteria to the rest of us? Is there something in the fine print that means an effect size of 0.24 is actually quite pronounced? Or is effect size, as some argue, an educational irrelevance which they are reporting but don’t really care about? Are the sub-group scores to be relied on anyway, even though they are unreliable?

In the absence of any other evidence, one must presume the answer to each of these questions is ‘no’. At which point, one turns to the context to interpret the anomalies. The obvious observation is that Reading Recovery is looking to extend its reach into the secondary school market. Renaming the programme ‘Switch-On Reading’ and emphasizing its phonics component (when Reading Recovery is the mother of all Whole Language programmes) suggests that marketing is an important consideration.

This is not mere cynicism. Robert Slavin, the director of the Institute for Evidence in Education, writes a blog for the Huffington Post. On 17 July 2013, Slavin interviewed a fellow member of the Investing In Innovation network, i3. The guest was Jerry D’Agostino, head of Reading Recovery’s i3-funded project. D’Agostino was invited to contribute his thoughts on sustaining the scale and longevity of the intervention. His comments make interesting reading. You can read the whole interview here, but these points are particularly pertinent:

It is important to ‘keep the brand fresh’.

It is important to ‘recognise that adaptations are needed’.

Despite these adaptations, ‘the framework is always the same’.

So it is reasonably clear that adjusting the programme to new trends in education, and adapting the ‘brand’, are standard strategies for Reading Recovery. Although Slavin says that 30-minute one-to-one lessons are non-negotiable, Switch-On Reading consists of 20-minute lessons. (Perhaps the reason for this ‘adaptation’ is that a 20-minute slot is easier for secondary schools to accommodate.)

But the most persistent feature of the way Reading Recovery is presented by its advocates is the claims about impact versus the evidence in reports. In the same interview, D’Agostino refers to the intervention’s ‘strong effect sizes’ and elaborates: ‘…we know that the outcomes are remarkable – most of the lowest-achieving first graders accelerate with Reading Recovery and reach the average of their cohort.’

Leaving aside how one defines ‘their cohort’, these broad claims do not match the evidence well at all. When D’Agostino comments: ‘Schools don’t necessarily hear about government funded initiatives that achieve high evidence standards according to the What Works Clearinghouse’, the reader may understandably infer that Reading Recovery has achieved such high evidence standards. In fact, the website of What Works Clearinghouse, the US government-funded body intended to help make reliable research available to educators, describes the extent of evidence for Reading Recovery as ‘small’ for ‘potentially positive effects’. In fact, of 202 studies reviewed by WWC, just three met their evidence standards. Three. Once again, the claims and the details do not stack up.

We can look further back. In the TES, on 28 August 2009, an article claimed that ‘Britain’s Reading Recovery initiative is one of the world’s best programmes for struggling readers, an international study has found’. That study, by the Institute for Evidence in Education, supposedly found that ‘Reading Recovery, which involves specially trained teachers working one-to-one with pupils using a mix of methods, is one of three for which there is strong evidence of effectiveness’. (The note on ‘mix of methods’ is a classic warning signal. It means: we have added a phonics component, because we know we have to be credible, but we’re not changing our foundations). The director of the institute, Professor Robert Slavin, said the two (that’s two) UK studies in the report were very positive. The TES article described the findings as ‘a boost for the literacy scheme, which is at the heart of the government’s Every Child A Reader programme yet has been criticised by some academics who question its value for money’.*

However, in the body of the report, entitled Educators Guide: What Works for Struggling Readers, Reading Recovery is deemed to have an effect size of just 0.23 across eight studies. The authors write: ‘although the outcomes for Reading Recovery were positive, they were less so than might have been expected’. Once again, there is a mismatch between the media presentation and the details of the evidence.

Last year, Kevin Wheldall (Emeritus Professor of Macquarie University) wrote this blogpost, about a report on Reading Recovery’s long-term effectiveness. The author of the report, Jane Hurry, is based at the Institute of Education in London, which Wheldall describes as ‘the British home of Reading Recovery’. The report is published by the Institute for Evidence in Education. Hurry makes broad claims of a similar nature to those above: ‘the programme is known to have impressive effects in the short term’ and ‘substantial gains’ are continued to the end of primary school. Neither of these claims is substantiated by the report.

Wheldall highlights how the headlines in the study did not match the detail and were open to a quite different interpretation, concluding: ‘even the significant differences between the two groups in the Reading Recovery Schools and the non-Reading Recovery school are accompanied by only small effect sizes, all of which are below Hattie’s hinge value of 0.4’. Once again, the substance is different from the marketing.

Is there a pattern here? It seems obvious to me. Is it a conspiracy of some kind? A conspiracy is entirely unnecessary to bring about a favourable slant in a report on a weak set of data. But I do think there may be a convergence of interests here, where research findings of ‘the right sort’ can be presented in one way in the headlines, while the fine print says something rather different. It is all about who writes the reports and who they are aligned with. For example – and this is not an aspersion on character, but a question about sympathies between organisations – a researcher from the Institute for Evidence in Education was seconded to the Education Endowment Foundation last year – a researcher who described himself at the National Literacy Trust conference in January as ‘Reading Recovery trained’. Is there a connection between this and the mismatch of headline and detail in the EEF report? I can only ask the question, but the patterns I have seen suggest that strong links between research groups can lead to some strange outcomes.

The bottom line remains the same: until teachers have sufficient knowledge, skills and concern to evaluate research rigorously, educational researchers will continue to present findings in ways which may well serve converging interests, but perhaps not the unvarnished truth.


The EEF invested £670,000 in a larger trial of Switch-On Reading’s effectiveness. The report on this larger trial was published in May 2017. The evaluators concluded:

“Participating children in schools delivering either version of Switch-on made no additional progress in reading compared to similarly struggling children in ‘business as usual’ control schools.”

You can read the full report here.


*[Yes, that’s correct – Every Child A Reader, a scheme costing many millions of pounds, had as its core intervention Reading Recovery. ECAR was indeed criticized for cost in February 2009, just six months before the IEE report found Reading Recovery’s results ‘positive’. Policy Exchange expressed concern that the government had committed £144 million to a national roll-out of the scheme before any independent evaluations had been completed of the initial trials. Both Policy Exchange, and the independent report eventually published via the DfE in 2011, found that the Reading Recovery element of ECAR did not provide sufficient impact to justify its high ongoing costs. Interestingly, the independent evaluation, most of whose authors were from the National Centre for Social Research, does not contain one effect size measurement. Positive effects on reading are either described in percentage points, (a meaningless measure without information on spread) or assessments of ‘good’ or ‘very good’ by teachers.]


This entry was posted in Uncategorized and tagged , , , , , . Bookmark the permalink.

23 Responses to A Convergence of Interests?

  1. Thank you very much for this blog posting.

    I am going to link to it via various threads (below) which all focus on the same or similar issues that you have raised here about a general less-than-transparent state of affairs regarding teaching effectiveness, research findings and the persistence/survival of Reading Recovery in its various guises despite the conclusions of research which warn us about the inherent dangers of multi-cueing reading strategies for at least some learners.

    You will note in one of the links below that Reading Recovery is now being promoted under yet another title of the ‘International Literacy Centre’ based at the Institute of Education and its intervention reach has extended to learners beyond the six year olds:

    UK government unaccountable when Reading Recovery rolled out:


    Literacy Centre launch at the Institute of Education – RR links?


    Will the multi-cueing strategies ever go away?


    New Zealand – Reading Recovery – literacy rates flatline:


    NFER evaluation of the year one phonics check May 2014:


    With all of this accruing information, will it make a jot of difference to the spread and persistence of RR we have to wonder – and will we ever achieve transparent comparisons?

    Kind regards,


  2. Tami Reis-Frankfort says:

    Reblogged this on Phonic Books and commented:
    Thanks for this clear thinking post!

  3. Well, it is outcomes like this that will kill the idea of using RCTs in educational research. In short, the interpretation of the research presented in these reports must be very sound and clearly flow from the research results – or a rigorous counter argument presented….otherwise most educators will not be able to take it seriously, and the public money is wasted. And, the opportunity to include RCT research in education will also be lost. There is a general concern that education is too complicated to be approached using RCTs….this organisation must not undermine it’s credibility (and that of RCTs) by being anything less than very rigorous. Good blog.

  4. I agree, RCTs will lose credibility if there are many more studies like this one. But so too wili the teaching profession if we are repeatedly duped by inflated claims. Thanks for posting.

  5. Andrew Sabisky says:

    It’s worth pointing out that the Jane Hurry report isn’t on a RCT. There’s no randomization involved. In fact, this probably the biggest problem with the report & study, because JH attempts to use statistical control to substitute for a lack of experimental control (inappropriately, in my view).

  6. bt0558 says:

    “The authors justify this by pointing to the evidence that for some sub-groups, noticeably weaker readers, gains were above the average for the group. Unfortunately, the authors also admit that the sub-groups scores are statistically unreliable.”

    Sorry to appear dense, these things are not familiar to me. Isn’t this simply the nature of the beast. Isn’t it the case that because all kids are different, but that some sub groups may contain kids who are a bit more similar, results overall will tend to be less significant but more relaibel. By the time you get to the individual, the effects are less reliable but more significant.

    By this I mean that for individuals it may well be very effective or very ineffective but identifying for which kids each is the case by picking them at random is problematic.

    Will it not always be thus?

    If so why worry about this particular study?

    If not where can I find a good explanation of the implcations of the RCT method including strengths and weaknesses?

    • The ‘sub-groups’ referred to are categories of interest in UK education, for example, weaker readers, students who are eligible for extra funding (pupil premium) etc. The members of each sub group therefore all share at least one important characteristic, and the study seeks to identify how effective the intervention is for those sub groups. The problem is that the report makes claims in the headlines and then in the details admits that these claims are not able to be substantiated statistically – in which case, why were they made? After all, the whole point of a RCT is to develop enough statistical power to confirm or disconfirm a hypothesis. By all means accept that further replications may be needed, but don’t make claims in the headlines that aren’t able to supported by the data. The fact that this pattern is a regular feature of coverage of Reading Recovery suggests at best an entrenched bias, and at worst a cynical view of education consumers as easy to dupe.
      You might want to start reading about randomised controlled trials with Ben Goldacre’s paper here and also Andrew Old’s response here,as a way in to the topic.

  7. Pingback: More matter, with less art | Horatio Speaks

  8. Pingback: Teaching is technology | Horatio Speaks

  9. Pingback: No stone unturned | Horatio Speaks

  10. Pingback: Are all reading interventions created equal? | thinkingreadingwritings

  11. Stephen Gorard says:

    I am not clear at all what problem this blog has with the Switch On study. Design good. Attrition almost non-existent. Fidelity to treatment good. Individually randomised etc.

    Is there evidence it was effective? Yes. Study now being repeated independently on an even larger scale. Good.

    There seems to be some other agenda involved.

  12. Stephen Gorard says:

    Here is the end of the paper. What exactly is the problem?


    The findings are based on a randomised controlled trial, with individual random allocation to groups and a waiting list for pupils who were initially not selected to receive the intervention. There was low dropout and no sign of post-allocation demoralisation, indicating that the findings are not biased. This was an efficacy trial, set up rapidly in response to a political timetable, to test the impact of Switch-on as delivered with the developer leading the training and overseeing the provision of the intervention. Efficacy trials test evaluations in the best possible conditions to see if they hold promise, but do not demonstrate that the findings hold at scale in all types of schools. The findings do not necessarily indicate the extent to which the intervention will be effective in all schools since the participating schools were selected purposively within one local authority, and training was provided by the programme developers. The intervention was generally well-conducted and most pupils seemed very happy with their reading sessions. Staff needed training and then some monitoring to ensure that they adhere to the protocol in order that the intervention has the largest possible effect. There were indications that the intervention was mis-applied in some settings, even with close oversight and an accompanying evaluation. Therefore, problems could arise in trying to roll out this intervention to other areas and schools. However, this also suggests that the estimated effect size is realistic and not inflated by the artificial situation of an evaluation.

    The overall finding, confirmed in several ways, is that the intervention as conducted was effective with these pupils, with an effect size of +0.24. This is equivalent, in very approximate terms, to around three months extra improvement in reading-age over three months, at an estimated cost of £627 per pupil (for a school to set it up, including staff costs and books). The intervention was as effective with boys as girls, and was especially effective for pupils with recognised special educational needs (although it must be noted that the quality of this indicator varied between schools), and lower attainers. The intervention was effective for FSM-eligible pupils, based on raw-score outcomes.

    The intervention was largely conducted by teaching assistants (TAs). The future funding of TAs in England is unclear, and the evidence so far had been that just having TAs or using them as substitute teachers is rather costly and largely ineffective (Blatchford et al. 2012). Switch-on is an example of one way in which TAs might be deployed in schools to follow a set protocol and make a useful difference to the reading of pupils in transition from primary to secondary.

    The data provide no evidence on what the active elements of the interventions are, and no evidence on any unintended consequences or ‘side-effects’. For example, does it depend on these precise books, on the reading record, on the length, number or frequency of the sessions? Does it depend on the rigid use of four books on each occasion? Or would almost any process of one to one reading with a trusted member of staff be equally effective? Assuming that the overall effectiveness of Switch-on is accepted as promising, a multi-group trial could be designed to address such questions.

    Also attending around 40 sessions during normal lesson times means that pupils have 40 lessons per term disrupted. The evaluation reported here only picked up the benefits of attending the sessions for reading. But there may also be harm done to progress in other areas of the curriculum, even though this may be ‘scattered’ among many curricular areas. Can this potential damage be measured? Is it possible for all children in a class to have 20 minute session of a programme tailored to their needs (i.e. not individual attention for all), all at the same time? For some, this could be Switch-on.

    Such questions mean that there is more work to be done with Switch-on to make it more effective, as efficient and low-cost as possible, and presenting the least disruption to the life of a school. Meantime the results can be added to a growing synthesis of evidence of what works, such as that represented by the Pupil Premium Toolkit (EEF 2014). Although relatively small compared to future plans, this trial shows again that RCTs are feasible and useful, and that the EEF approach of filling in the existing gaps in Phases 6 and 7 of the research cycle (Figure 1) is possible. The evaluation itself was inexpensive (around £30k), since the main cost was that of the intervention. The intervention was to happen anyway, as so many interventions do every year, and the phasing-in was needed to ensure individual attention. Therefore, the RCT simply ‘piggy-backed’ on the kind of activity that happens regularly in schools anyway. It generated no specific ethical or practical difficulties of the kind that threatened researchers claim are intrinsic to rigorous evaluations. This work therefore forms part of the belated response to McIntyre and McIntyre (2000) and others.

    • The ‘problem’ is the claim that the approach is ‘effective’, which appears to mean, ‘has an effect’. The report conclusion claims that the effect is broadly equivalent to three months’ gain over three months (I’ve estimated about four months, so the effect is even less than I thought). So the intervention ends. Then what? For a student reading three or more years behind, they could attend an intervention like this every day, and still be behind when they leave secondary school four or five years later. The real problem is that when researchers make claims that something is ‘effective, they seem to be talking about their own standards. In the real world, where students face frustration, humiliation and failure every day, ‘effective’ means they get to read as well as everybody else. For that, they need an ‘effect’ considerably greater than that reported here. I haven’t recorded any criticisms of the processes followed in the study. What I have criticised is the way the findings are reported. When I asked in a tweet, ‘is it enough?’ your reply was ‘for what?’ That tells me everything about where you are coming from. To answer your question: is it enough for the students who attend this intervention every school day for three months and then come out reading just a quarter of a year better? My view: no, that’s not effective enough, and that’s my ‘problem’ with the report.

      As to effect size: I am quite happy to accept that Hattie’s calculation is different from the EEF’s, and that I didn’t elaborate on that point in the post. (However, Ollie Orange has helpfully done so in the comments). Given that Hattie says that an effect size of 0.6 is large, and the EEF says that an effect size of 0.6 is equivalent to roughly one year’s progress, and that my criticisms are based on the authors’ own conclusion that students make an additional three months over the three months of the programme, it seems a little odd to bang the drum so loudly on this point. As long as the EEF and other researchers claim that such a rate of progress is effective for struggling readers at secondary school, they are doing those students a grave disservice.

      Lastly, claiming that there is an unspecified ‘agenda’ is no way to advance an argument.

      • heatherfblog says:

        It was the same with P4C. A low effect size but endorsed by the EEF. When questioned it was as if there was no opportunity cost to a school investing in a programme that may have no impact. The attitude taken was that there was nothing to lose by trying anything showing any positive effect. It worried me that really the report was making a judgement beyond its remit by deciding there was nothing to lose endorsing low effect size interventions. Do such people have no understanding of schools?

  13. Pingback: Vested Interests? | Horatio Speaks

  14. Pingback: The effect of Reading Recovery | Filling the pail

  15. Pingback: Removing Barriers to Success – The Bridge Over the Reading Gap (Part 6) | thinkingreadingwritings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s