Thank you to Steven Dykstra for his permission to share his critique of the Reading Recovery i3 Study. I hope to provide an electronic link to it shortly:
A Criticism of the Reading Recovery i3 Study
Steven P. Dykstra, PhD
Any study of treatment effectiveness is inherently limited to those conclusions made possible by the design of the study. A comparison of Acetaminophen to Ibuprofen, for instance, cannot be used to say either is more effective than aspirin since aspirin was not part of the study. Likewise, the nature of the experimental and control groups also limits the possible conclusions. It is possible to design a study which compares different dosages of the same treatment, where the control group simply receives a lower dosage. Such a study can say whether more of the treatment is also more effective, but it is not a comparison to a different treatment. In fact, any number of alternative treatments could be far more effective.
All research (treatment studies in particular) should be considered skeptically, paying attention to details and noting what isn’t in the study as much as what is.
These are the standards for any treatment study. They are not unique to education or reading recovery.
If we apply these standards dispassionately to the i3 study of reading recovery, we arrive at certain observations and conclusions, each of which will be discussed and explained separately:
1) The i3 study was not designed to demonstrate that Reading Recovery was better or worse than any other intervention. It cannot be used to argue that Reading Recovery is superior to any other early intervention, only that Reading Recovery was more effective than doing nothing. The control group in the study received services ranging from nothing, to some. Children who got a single, unplanned lesson are listed as receiving some intervention. None of the interventions are named or described, and, in fact, control group subjects were on a wait list to get Reading Recovery when the study was complete, so it is no surprise they got nothing worth describing during the study. Their intervention was waiting for them when the study was over.
Despite efforts to describe the control group as something other than untreated, the study can only be read as a comparison of Reading Recovery to no treatment at all. With 5 million dollars to spend on the study, investigators could have compared Reading Recovery to one or more Scientifically based Interventions. At a minimum, they could have reported out greater detail about control group subjects and compared performance between those who got some intervention and those who got none. They chose not to do any of those things, and those choices greatly limit the conclusions that can be drawn from this study.
2) Because of the way schools were selected for the study, the subjects were limited to those already receiving what is known as Whole Language or Balanced Literacy Instruction. Reading Recovery is the pinnacle of such instruction and schools which choose to use Reading Recovery, and make the vast investment of time and resources it requires, always teach a regular curriculum consistent with Whole Language and Balanced Literacy.
This is not a criticism, just an important observation. No one expected the investigators to mandate the core curriculum or require schools to implement Reading Recovery because they were randomly selected to be in the study. But the reader must be aware of this limitation. The study is an experiment to show whether children failing to read with Whole Language Reading instruction do better if provided intensive Whole Language tutoring in Reading Recovery. It is not a study to show if students failing in a wide range of core curricula benefit from Reading Recovery, just those failing in Whole Language.
To understand this issue better, it helps to consider a different treatment issue. Consider a study investigating the impact of the SuperFit program on overall fitness. SuperFit is a more intensive version of GetFit, which is a group fitness program relying heavily on self-guided diet and exercise. SuperFit includes a personal trainer and more guidance. All the subjects in the study were doing GetFit, and those who were not responding well were assigned to SuperFit.
The study shows that subjects who get Superfit after failing at GetFit do better than those who continue with GetFit alone. However, when we look closer, we see that GetFit is based on a poor diet and ineffective exercises that are hard to do alone. GetFit is unusually ineffective compared to other programs, and while adding SuperFit improves outcomes, we are left to wonder if another approach entirely isn’t a better idea.
Such is the case with the i3 study, where Reading Recovery was effective at remediating the failures of Whole Language Instruction, compared to Whole Language instruction alone. Taken together with the lack of other treatments in the study, the utility of the i3 study for selecting possible interventions is reduced to nearly zero. It is probably true that if a district is determined to follow a Whole Language approach to reading instruction, they should probably go to the time and expense of adding Reading Recovery. However, the scientific literature, including major reviews of the literature, strongly suggest a different model of reading instruction as a better alternative.
3) The final criticism is the most severe. In what can only be called a catastrophic breach of standard research design, teachers not only knew who the treatment and control subjects were, but the same reading recovery teachers with a personal investment in the success of the program and the outcome of the study had access to control group subjects as well as influence over their instruction and intervention. The study protocol began by providing Reading Recovery Teachers with the names of students and instructing them who should receive Reading Recovery and who should be delayed by assignment to the control group.
Studies which make no effort to prevent bias, invite it. In this case, teachers who had invested their careers in Reading Recovery knew who the control group subjects were and were given unfettered access to influence their instruction. Human beings are so prone to bias that they will engage in it without knowing. That’s why studies must employ measures to prevent bias, particularly when doing so is so simple and obvious.
While a fully blind study, where no one knew who was receiving treatment and who was not is impossible in educational studies, the failure in this study is remarkable. It would have been simple to provide schools with a list of students to enroll in reading recovery without telling them which students were in the treatment group. Letting teachers know which reading recovery students where in the treatment group was a critical failure. When combined with the stunning choice to let teachers both know the identity of control subjects and have influence over their instruction, the error becomes inexcusable and suggests either deliberate bias, or blatant ineptitude.
Remarkably, investigators acknowledged the risk of bias during the final assessment and touted their efforts to prevent it. The Iowa Test of Basic Skills was administered by a teacher other than the student Reading Recovery Teacher in order to prevent the possibility of the teacher influencing the test outcome. Investigators took this step despite the fact the ITBS is a standardized test with little opportunity to introduce bias in the results. They were aware of the risk of bias, took steps to prevent it during the assessment when the risk was very low, but did nothing to prevent it during the course instruction when the risk was constant, and very real.
Reading Recovery teachers with a deep, personal investment in the outcome of the study knew who was in the experimental group, who was in the control group, and had control over the instruction of both groups. Instruction to the control group was not standardized. Teachers were free to offer less, to neglect problems, or even provide damaging instruction. Students in the treatment group could get more minutes of instruction and extra attention throughout the day from teachers determined to help the study along.
I am not suggesting that teachers consciously manipulated the outcome of the study. I am saying the likelihood of unintended bias was so high that protecting against it was essential. That investigators protected against the minor risk of bias during the ITBS but ignored the profound risk during classroom instruction is incredible. It is a methodological failure that would receive a failing grade in the most basic undergraduate course. In most fields of human subject research, this study could not pass basic peer review.
Researchers had 5 million dollars to spend investigating the benefits of Reading Recovery. They could have compared Reading Recovery to Scientifically based tutoring of equal intensity and duration. They did not. They could have included subjects receiving classroom instruction within a scientific model. They did not, choosing instead to limit the study to schools teaching within a whole language framework. Those choices greatly limit the impact of the study. If it was just those limitations, we could accept the study as evidence that Reading Recovery benefits struggling readers in Whole Language classrooms. But the failure to follow the most basic procedures for managing bias in the treatment protocol disqualifies the study entirely and begs the question of what the investigators were trying to accomplish with the 5 million dollars at their disposal.
Any other conclusions require the reader to abandon not only the required skepticism, but any sense of reason, or respect for science.