Nor do we find him forward to be sounded,
But, with a crafty madness, keeps aloof,
When we would bring him on to some confession
Of his true state. – Hamlet
Just because Reading Recovery has a track record of inflating claims, and a poor record of solving reading problems in the long term, does that mean that everything about Reading Recovery is bad? It makes good sense to identify and give extra help to children who are falling behind, and who could argue with sessions which focus on “phonemic awareness, phonics, vocabulary, fluency, comprehension and composition”?
The problem is that the way they present their programme is not always consistent with the substance. The description of lesson content above is a good example. For years Reading Recovery resisted calls from as far back as 1992 to add a significant phonics component. Once US and UK funding became dependent on ‘evidence-based approaches’ such as phonics, Reading Recovery began to describe its programmes as including phonics – but nowhere will you find a statement that this phonics is explicit, systematic, synthetic or linguistic. Likewise, the term ‘fluency’ can be interpreted a number of ways. Exactly what skills they build to fluency, their criteria for fluency, and how they build fluency, are not revealed to the uninitiated. Possibly it just means that students read a little faster than they did before.
So I was interested in Greg Ashman’s post Is Reading Recovery Like Stone Soup?, in which he queries a study on the effectiveness of that programme in the United States. Ashman’s point is primarily that the RCT doesn’t tell us whether it is Reading Recovery techniques that cause the effect, or whether the improvement is due to other components of the delivery mechanism such as one-to-one instruction in general. He argues that a more scientific ‘fair test’ would control all the variables and manipulate them to identify which had the most impact. I began to write a comment, but it became too long – hence this post.
The first question I had was – just what is the effect? This is because there is a long history of Reading Recovery headlines reporting much more favourable impacts than the details of the reports actually show. (See for example A Convergence of Interests on this blog and Small Bangs for Big Bucks by Professor Kevin Wheldall).
A link in the comments took me to the second year of the study. Impacts in that year were described in the following way:
1. The experimental group, who received Reading Recovery tuition on top of regular classroom instruction, scored 14 points higher on the Iowa Test of Basic Skills than the control group, who received only regular classroom instruction. This equated to an effect size of 0.42 against the control group and an effect of 0.33 against first-graders nationally. The authors said that these were large gains for this type of study.
2. The results were ‘benchmarked against expected gains’ on the Iowa test battery for first graders, and it was found that the experimental group exceeded ‘expected gains’ by 3.03 points, which equates to about 1.4 months additional gain over the instructional period of five months.
The effect sizes reported are not poor, though they may be unremarkable. For comparison, the authors discuss weaker effects in ‘typical educational studies’. However, I see no reason to ignore John Hattie’s assertion that interventions with effect sizes below 0.4 probably aren’t worth investing in. Saying that there are lots of studies with worse results is not an argument that this intervention has impact.
Problems arise, though, when we dig a little deeper. First, as one of the commenters on the Stone Soup post pointed out, the gains still do not reach national averages despite the intensive nature of the intervention. Secondly, although this was not in the headlines of the study, the pre-test used was actually the Clay Observation Survey, a Reading Recovery instrument with six ‘sub-tests’ including letter naming and concepts about print. The post-test was the Iowa Test of Basic Skills. Although the authors justify the use of the Clay survey by describing its correlation with other tests, I am puzzled as to why they would not have used the ITBS as both a pre- and post- test measure. It makes it more difficult to trust claims of ‘progress’ on the ITBS when only a post-test measure was taken. (This information only becomes apparent in the appendices – although if I have misread it in some way I am happy to be corrected.)
A further conundrum is that the subject sample for the RCT is focused on the lowest scoring eight students in each school. This tighter range could restrict the standard deviation of the experimental group, and this increases the apparent effect size. So the effect size – as always – needs some careful interpretation when considering how well the intervention might generalise across the population.
A fourth question that arises is whether the gains will be sustained, a long-standing concern with Reading Recovery programmes. Measuring the longer term effects is one of the aims of the current study. Despite being in its second year, there is no publication of such evidence yet. If the study reaches its fourth year and then discovers that the short-term gains have not held up, the investment will have proven futile – and the many children who took part will have had little benefit from the injection of public funds.
Greg Ashman’s question about which variables are effective highlights a general weakness of RCTs, at least in education – it is difficult to control for all variables across large populations. As a result, the statistical power is offset, at least to some extent, by a loss of explanatory power. There is another way: it is possible to have the best of both worlds by complementing or preceding RCTs with single-subject quasi-experimental designs. These allow researchers to control variables much more tightly, and to demonstrate the importance of different variables by using reversal to baseline or multiple baseline designs, (see here for examples).
With respect to whether there is evidence for the effectiveness of one-to-one instruction and other variables, there is an interesting study by Camilli, Vargas and Yurecko (2003). They re-analysed the 2000 report of the National Reading Panel and concluded (amongst other things) that systematic phonics teaching, structured language teaching, and one-to-one instruction all had significant effects, and that these were additive – in other words, they could be combined to triple the effect of phonics instruction alone. There is considerable evidence already in existence regarding the instructional variables that can be managed to achieve greater student progress – but this research remains largely unknown or ignored by educators because its provenance as ‘instruction’ immediately makes it suspect.
Should we be excited or skeptical about the latest claims for Reading Recovery? I am skeptical. Digging below the headlines with Reading Recovery studies always seems to yield the same problems: achievement below national averages, narrow selection of subjects, carefully worded descriptions that obfuscate the aspects of context and ‘visual information’ at the heart of the theoretical model underlying the intervention – and modest effect sizes which are talked up to sound more impressive than they are.
I am all for helping children to learn to read. So far, I can’t find any good evidence to suggest that Reading Recovery helps much.