Chapter 2: The Case for Formative Assessment
We've discussed how increasing the educational achievement of students is a national economic priority, and the only way to do that is to improve teacher quality. We also saw that deselecting existing teachers and improving the quality of entrants into the profession will have, at best, marginal effects, and so securing our economic future boils down to helping current teachers become more effective.
This chapter reviews the research on teacher professional development—specifically focusing on learning styles, educational neuroscience, and content-area knowledge—and shows that while there are many possible ways in which we could seek to develop the practice of serving teachers, attention to minute-by-minute and day-to-day formative assessment is likely to have the biggest impact on student outcomes. It continues by discussing the origins of formative assessment and by defining what, exactly, formative assessment is. The chapter concludes by presenting the strategies of formative assessment, which will be the subjects of each subsequent chapter in this book, and by discussing assessment as the bridge between teaching and learning.
The Importance of Professional Development
Andrew Leigh (2010) analyzed a data set that includes test scores on ninety thousand Australian elementary school students and found that, as in the American research, whether the teacher has a master's degree or not makes no difference. He did, however, find a statistically significant relationship between how much a student learns and the experience of the teacher, as seen in figure 2.1.
Figure 2.1: Increases in teacher productivity with experience. Source: Adapted from Leigh, 2010.
The value added by a teacher increases particularly quickly in the first five years of teaching, but what is most sobering about figure 2.1 is the vertical axis. If a student's literacy teacher is a twenty-year veteran, the student will learn more than he will if his teacher is a novice, but not much more. In a year with a twenty-year veteran, a student will make an extra half-month's progress—in other words, a twenty-year veteran teacher achieves in thirty-four weeks what a novice teacher will take thirty-six weeks to achieve. Because of the size of the study, this result is statistically significant, and the improvement is worth having, but it is not a large difference. Therefore, it's not surprising that many have argued that the answer is more, and better, professional development for teachers.
Indeed, it would be hard to find anyone who would say that teacher professional development is unnecessary. Professional development for serving teachers is a statutory requirement in most states. However, most of these requirements are so loosely worded as to be almost meaningless. Pennsylvania's Act 48 (Act of Nov. 23, 1999, P.L. 529, No. 48) requires teachers to complete 180 hours of professional development that relates to an educator's certificate type or area of assignment every five years. Note that there is no requirement for teachers to improve their practice or even to learn anything. The only requirement is to endure 180 hours of professional development.
Many states justify these requirements with the need for teachers to "keep up to date" with the latest developments in the field, but such a justification merely encourages teachers to chase the latest fad. One year, it's language across the curriculum; the next year, it's differentiated instruction. Because teachers are bombarded with innovations, none of these innovations has time to take root, so nothing really changes. And worse, not only is there little or no real improvement in what happens in classrooms, but teachers get justifiably cynical about the constant barrage of innovations to which they are subjected. The reason that teachers need professional development has nothing to do with professional updating. Teachers need professional development because the job of teaching is so difficult, so complex, that one lifetime is not enough to master it.
The fact that teaching is so complex is what makes it such a great job. At one time, André Previn was the highest-paid film-score composer in Hollywood, and yet one day, he quit. People asked him why he had given up this amazing job, and he replied, "I wasn't scared anymore." Every day, he was going in to his office knowing that his job held no challenges for him. This is not something that any teacher is ever going to have to worry about.
Even the best teachers fail. Talk to these teachers, and no matter how well the lesson went, they always can think of things that didn't go as well as they would have liked, things that they will do differently next time. But things get much, much worse when we collect the students' notebooks and look at what they thought we said. That's why Doug Lemov (2010) says that, for teachers, no amount of success is enough. The only teachers who think they are successful are those who have low expectations of their students. They are the sort of teachers who say, "What can you expect from these kids?" The answer is, of course, a lot more than the students are achieving with those teachers. The best teachers fail all the time because they have such high aspirations for what their students can achieve (generally much higher aspirations than the students themselves have).
People often contact me and ask whether I have any research instruments for evaluating the quality of teaching. I don't, because working out which teachers are good and which teachers are not so good is of far less interest to me than helping teachers improve. No teacher is so good—or so bad—that he or she cannot improve. That is why we need professional development.
Although there is widespread agreement that professional development is valuable, there is much less agreement about what form it should take, and there is little research about what should be the focus of teacher professional development. However, there does seem to be a consensus that one-shot deals—sessions ranging from one to five days held during the summer—are of limited effectiveness, even though they are the most common model (Muijs, Kyriakides, van der Werf, Creemers, Timperley, & Earl, 2014). The following sections highlight some of the more popular areas of focus for professional development.
Many teachers are attracted to developments such as theories pertaining to students' learning styles. The idea that each learner has a particular preferred style of learning is attractive—intuitive even. It marries up with every teacher's experience that students really are different; it just feels right. However, there is little agreement among psychologists about what learning styles are, let alone how to define them. One review of the research in this area finds seventy-one different models of learning styles (Coffield, Moseley, Hall, & Ecclestone, 2004). Indeed, it is difficult not to get the impression that the proposers of new classifications of learning styles have followed Annette Karmiloff-Smith's advice: "If you want to get ahead, get a theory" (Karmiloff-Smith & Inhelder, 1974/1975). Some of the definitions, and the questionnaires used to measure them, are so flaky that one may classify an individual as having one learning style one day and a different one the next (Boyle, 1995). Others do seem to tap into deep and stable differences between individuals in how they think and learn, but there does not appear to be any way to use this in teaching.
Although many studies have tried to show that taking students' individual learning styles into account improves learning, evidence remains elusive (Coffield et al., 2004). The Association for Psychological Science asked a blue-ribbon panel of America's leading psychologists of education to review the available research evidence to see whether there was evidence that teaching students in their preferred learning style would have an impact on student achievement. They realized that any experiment that showed the benefit of teaching students in their preferred learning style (what they called the meshing hypothesis) would have to satisfy three conditions.
- Following some assessment of their presumed learning style, teachers would divide learners into two or more groups (for example, visual, auditory, and kinesthetic learners).
- Teachers would randomly allocate learners within each of the learning-style groups to at least two different methods of instruction (for example, visual- and auditory-based approaches).
- Teachers would give all students in the study the same final test of achievement.
In such an experiment, the meshing hypothesis would be supported if the results showed that the learning method that optimized test performance of one learning-style group (for example, visual learners) was different from the learning method that optimized the test performance of a second learning-style group (for example, auditory learners). In their review, Harold Pashler and colleagues found only one study that gave even partial support to the meshing hypothesis, and two that clearly contradicted it. Their conclusion was stark: "If classification of students' learning styles has practical utility, it remains to be demonstrated" (Pashler, McDaniel, Rohrer, & Bjork, 2008, p. 117).
Now, of course, the fact that there is currently no evidence in favor of the meshing hypothesis does not mean that such evidence will not be forthcoming in the future; absence of evidence is not evidence of absence. However, it could be that the whole idea of learning styles research is misguided because its basic assumption—that the purpose of instructional design is to make learning easy—may just be incorrect.
Since the pioneering work of Hugh Carlton Blodgett in the 1920s, psychologists have found that performance on a learning task is a poor predictor of long-term retention (for a summary of this research, see Soderstrom & Bjork, 2015). More precisely, when learners do well on a learning task, they are likely to forget things more quickly than if they do badly on the learning task; good instruction creates "desirable difficulties" (Bjork, 1994, p. 193) for the learner. As Daniel Willingham (2009) says, "memory is the residue of thought" (p. 41). By trying to match our instruction to our students' preferred learning style, we may, in fact, be reducing learning. If students do not have to work hard to make sense of what they are learning, then they are less likely to remember it in six weeks' time. Perhaps the most important takeaway from the research on learning styles is that teachers need to know about learning styles if only to avoid the trap of teaching in the style they believe works best for them. A review of the literature on learning styles and learning strategies (Adey, Fairbrother, Wiliam, Johnson, & Jones, 1999) concludes that:
The only feasible "solution" is that teachers shouldn’t try to fit their teaching to each child's style, but rather that they should become aware of different styles (and help students also to become aware of different styles) and then encourage all students to use as wide a variety of styles as possible. Students need to learn both how to make the best of their own learning style and also how to use a variety of styles, and to understand the dangers of taking a limited view of their own capabilities. (p. 36)
As long as teachers vary their teaching style, then it is likely that all students will get some experience of being in their comfort zone and some experience of being pushed beyond it. Ultimately, we should remember that teaching is interesting because our students are so different, but only possible because they are so similar.
Another potential area for teacher professional development—and one that has received a lot of publicity—is applying what we are learning about the brain to the design of effective teaching. Cognitive psychologists work to understand what the brain does and how it does what it does, while neuroscientists try to connect what the brain does to its physiology.
Some of the earliest attempts to relate brain physiology to educational matters relate to the respective roles of the left and right sides of the brain in various kinds of tasks in education and training, despite clear evidence that the conclusions being drawn were unwarranted (see, for example, Hines, 1987). Schools have been inundated with suggestions for how they can use the latest findings from cognitive neuroscience to develop brain-based education, and despite the wealth of evidence that these claims are at best premature and at worst simply disingenuous (for example, Bruer, 1997, 1999; Goswami, 2006; Howard-Jones, 2009), many neuromyths still abound.
- Approximately 50 percent of teachers in China, Greece, the Netherlands, Turkey, and the United Kingdom believe that we only use about 10 percent of our brains, and more than 90 percent of teachers in these countries believe that instruction in students' preferred learning styles is more effective (Howard-Jones, 2014). Neither of these claims is actually true.
- People are more likely to believe a psychological report if the explanation claims to be based in neuroscience, even if the explanation is nonsense (Weisberg, Keil, Goodstein, Rawson, & Gray, 2008).
- Over 50 percent of teachers in the Netherlands and the United Kingdom believe that children are less attentive after consuming drinks or snacks that contain a lot of sugar (they're not), and 90 percent believe that differences in whether the left or the right brain is dominant can help explain individual differences among learners (they can't; Dekker, Lee, Howard-Jones, & Jolles, 2012).
- Many believe that people remember 10 percent of what they read, 20 percent of what they hear, 30 percent of what they see, 50 percent of what they hear and see, 70 percent of what they see and write, and 90 percent of what they do, despite the fact that there is absolutely no evidence to support these suspiciously neat percentages (De Bruyckere, Kirschner, & Hulshof, 2015).
Other neuromyths include the idea that the left side of our brain is analytical and the right side is creative, that you can train your brain with activities like Brain Gym (www.braingym.org), that male and female brains are different, that listening to classical music can improve a child's cognitive development (the so-called Mozart effect), or that we can learn when we are asleep. None of these is true as far as we know right now (De Bruyckere, Kirschner, & Hulshof, 2015). In fact, we know a great deal about how the brain works and what kinds of activities help students learn, but these findings come from cognitive science rather than neuroscience. Neuroscience, rather, provides plausible explanatory mechanisms for things we already knew from cognitive science. Two leading experts in the field of neuroscience and education, Sergio Della Sala and Mike Anderson (2011), sum it up thus in their "opinionated introduction" to their book, Neuroscience in Education:
While the use of the term "neuroscience" is attractive for education it seems to us that it is cognitive psychology that does all the useful work or "heavy lifting." The reason for this is straightforward. We believe that for educators, research indicating that one form of learning is more efficient than another is more relevant than knowing where in the brain that learning happens. There is indeed a gap between neuroscience and education. But that gap is not filled by the "interaction" of neuroscientists and teachers (nearly always constituted by the former patronizing the latter) or "bridging" the two fields by training teachers in basic neuroscience and having neuroscientists as active participators in educating children. Rather what will ultimately fill the gap is the development of evidence-based education where that base is cognitive psychology. (p. 3)
If training teachers in cognitive neuroscience isn't going to help, what about increasing teachers' knowledge of their subjects? After all, surely the more teachers know about their subjects, the more their students will learn.
There is evidence that teachers in countries that are more successful in international comparisons than the United States appear to have stronger knowledge of the subjects they are teaching (Babcock et al., 2010; Ma, 1999), and this, at least in part, appears to be responsible for a widespread belief that teacher professional development needs to be focused on teachers' knowledge of the subject matter they are teaching.
It is important to note that not all kinds of subject-matter knowledge have the same impact on student progress. A study of German high school mathematics teachers found that students did not make more progress when their teachers had advanced mathematics knowledge (such as knowledge of mathematics studied at university). However, when teachers had a profound understanding of the school-level mathematics they were teaching, then, echoing Heather Hill, Brian Rowan, and Deborah Ball's (2005) study, students did make more progress (Baumert et al., 2010). Thus, it appears that an in-depth understanding of the curriculum may be more beneficial to student progress than advanced study of a subject on the part of the teacher.
Most studies of the relationship between teacher subject-matter knowledge and student progress, including those by Hill et al. (2005) and Jurgen Baumert et al. (2010) discussed previously, are cross-sectional in nature; researchers look to see whether the teachers whose classes make more progress have higher levels of subject knowledge. However, even if a link is found, it is not clear what this means. It could be that what really matters is general intellectual ability—that those with higher intellectual ability find learning their subject easier, and also make more effective teachers. To rule this out, we need experimental studies, where some teachers work on improving their subject knowledge while others work on something else, and then we compare their students' progress. Here, the results are rather disappointing.
Summer professional development workshops do increase teachers' knowledge of their subjects (Hill & Ball, 2004), but most studies that have increased teachers' subject knowledge find little or no knock-on effects on student achievement. For example, an evaluation of professional development designed to improve second-grade teachers' reading instruction found that an eight-day content-focused workshop increased teachers' knowledge of scientifically-based reading instruction and also improved the teachers' classroom practices on one out of three instructional practices that had been emphasized in the professional development (Garet et al., 2008). However, at the end of the following school year, there was no impact on the students' reading test scores. More surprising, even when supplementing the workshop with in-school coaching, the effects were the same.
A similar story emerges from an evaluation of professional development for middle school mathematics teachers in seventy-seven schools in twelve districts (Garet et al., 2010). The districts implemented the program as intended, which resulted in an average of fifty-five hours of additional professional development for participants (who had been selected by lottery). Although the professional development had been specifically designed to be relevant to the curricula that teachers were using in their classrooms and did have some impact on teachers' classroom practice (specifically the extent to which they engaged in activities that elicited student thinking), there was no impact on student achievement, even in the specific areas on which the intervention focused (ratio, proportion, fractions, percentages, and decimals). A study that attempted to improve mathematics and science learning in early years teaching found that increasing teachers' subject knowledge had no impact on student achievement (Piasta, Logan, Pelatti, Capps, & Petrill, 2015).
These findings are clearly counterintuitive. It seems obvious that teachers need to know about the subjects they are teaching, yet the relationship between teachers' knowledge of the subjects and their students' progress is weak. Attempts to improve student outcomes by increasing teachers' subject knowledge appear to be almost entirely failures.
Of course, these failures could be due to our inability to capture the kinds of subject knowledge that are necessary for good teaching, but they suggest that there is much more to good teaching than just knowing the subject. We know that teachers make a difference, but we know much less about what makes the difference in teachers. However, there is a body of literature that shows a large impact on student achievement across different subjects, across different age groups, and across different countries, and that is the research on formative assessment.
The Origins of Formative Assessment
Polymath and academic philosopher Michael Scriven coined the term formative evaluation in 1967 to describe the role that evaluation could play "in the on-going improvement of the curriculum" (p. 41). He contrasted this with summative evaluation. Summative evaluation's job is:
To enable administrators to decide whether the entire finished curriculum, refined by use of the evaluation process in its first role, represents a sufficiently significant advance on the available alternatives to justify the expense of adoption by a school system. (Scriven, 1967, pp. 41–42)
Two years later, Benjamin Bloom (1969) applied the same distinction to classroom tests:
Quite in contrast is the use of "formative evaluation" to provide feedback and correctives at each stage in the teaching-learning process. By formative evaluation we mean evaluation by brief tests used by teachers and students as aids in the learning process. While such tests may be graded and used as part of the judging and classificatory function of evaluation, we see much more effective use of formative evaluation if it is separated from the grading process and used primarily as an aid to teaching. (p. 48)
Bloom (1969) went on to say, "Evaluation which is directly related to the teaching- learning process as it unfolds can have highly beneficial effects on the learning of students, the instructional process of teachers, and the use of instructional materials by teachers and learners" (p. 50).
Although educators used the term formative infrequently in the twenty years after Bloom's (1969) research, a number of research reviews began to highlight the importance of using assessment to inform instruction, the best known of which is cognitively guided instruction (CGI).
In the original CGI project, a group of twenty-one elementary school teachers participated, over a period of four years, in a series of workshops that showed teachers extracts of videotapes selected to illustrate critical aspects of children's thinking. The researchers then prompted teachers to reflect on what they had seen, by, for example, challenging them to relate the way a child had solved one problem to how he or she had solved or might solve other problems (for a summary of the whole project, see Fennema et al., 1996). Throughout the project, researchers encouraged the teachers to make use of the evidence they had collected about the achievement of their students to adjust their instruction to better meet their students' learning needs. Students taught by CGI teachers did better in number fact knowledge, understanding, problem solving, and confidence (Carpenter, Fennema, Peterson, Chiang, & Loef, 1989), and four years after the end of the program, the participating teachers were still implementing the principles of the program (Franke, Carpenter, Levi, & Fennema, 2001).
The power of using assessment to adapt instruction was vividly illustrated in a 1991 study of the implementation of the measurement and planning system (MAPS), in which 29 teachers, each with an aide and a site manager, assessed the readiness for learning of 428 kindergarten students (Bergan, Sladeczek, Schwarz, & Smith, 1991). The researchers tested students in mathematics and reading in the fall and again in the spring. Their teachers learned to interpret the test results and to use the classroom activity library—a series of activities typical of early grades instruction but tied specifically to empirically validated developmental progressions—to individualize instruction. The researchers then compared these students' performances with the performances of 410 other students taught by 27 different teachers. At the end of the year, 27 percent of the students in the control group were referred for placement and 20 percent were actually placed into special education programs for the following year. In the MAPS group, only 6 percent were referred, and fewer than 2 percent were placed in special education programs.
In addition to these specific studies, in the late 1980s, a number of research reviews began to highlight the importance of using assessment to inform instruction. A review by Lynn Fuchs and Douglas Fuchs (1986) synthesized findings from twenty-one different research studies on the use of assessment to inform the instruction of students with special needs. They found that regular assessment (two to five times per week) with follow-up action produced a substantial increase in student learning. A couple of other findings were notable. First, some studies required teachers, before assessing their students, to make up systematic evaluation rules that would tell the teachers when or how they were to make changes to the instructional plans they had made for their students. In other studies, teachers made judgments about what instructional changes might be needed only after seeing their students' results. Both strategies increased student achievement, but the benefit was twice as great when the teachers used rules, rather than judgments, to determine what to do next. Second, when teachers tracked their students' progress with graphs of individual students' achievements as a guide and stimulus to action, the effect was almost three times as great as when they didn't track progress. These findings suggest that when teachers rely on evidence to make decisions about what to do next, students learn more.
Over the next two years, two further research reviews, one by Gary Natriello (1987) and the other by Terence Crooks (1988), provided clear evidence that while classroom assessment can improve learning, it can also have a substantial negative impact on student achievement. Natriello (1987) concludes that much of the research he reviewed was difficult to interpret because of a failure to make key distinctions in the design of the research (for example, between the quality and quantity of feedback). As a result, although some studies showed that assessment could be harmful, it was not clear why. He also points out that assessments serve a number of different purposes in schools, and many of the studies showed that assessment that is designed for selecting students (for example, by giving a grade) is not likely to improve student achievement as much as assessment that is specifically designed to support learning. Crooks's (1988) paper focuses specifically on the impact of assessment practices on students and concludes that although classroom assessments do have the power to influence learning, too often the use of assessments for summative purposes—grading, sorting, and ranking students—gets in the way.
In 1998, Paul Black and I sought to update Natriello's and Crooks's reviews. One of the immediate difficulties we encountered was how to define the field of study. Their reviews cited 91 and 241 references respectively, and yet only 9 references were common to both papers. Neither paper cited Fuchs and Fuchs's (1986) review. While many reviews of research use electronic searches to identify relevant studies, we found that the keywords used by authors were just too inconsistent to be of much help. Where one researcher might use the term formative assessment, another might use formative evaluation, and a third might use responsive teaching. In the end, we decided that there was no alternative to actually going into the library and physically examining each issue, from 1987 to 1997, of seventy-six education and psychology journals that we thought most likely to contain relevant research studies. We read the abstracts, and where those looked relevant, read the studies. Through this process, we found just over 600 studies related to the field of classroom assessment and learning, of which approximately 250 were directly relevant.
We did consider at this point conducting a formal meta-analysis of the studies we had identified, but we quickly realized that with such a diverse range of studies, meta-analysis would simply not be appropriate (see Wiliam, 2016, for an extended analysis of the problems with meta-analysis in education). Instead we conducted what we might call a configurative review (Gough, 2015) because our main purpose was to make sense of the field rather than to quantify the effects of classroom assessment processes on students. However, many of the studies that we reviewed provided considerable evidence that attention to classroom assessment processes could substantially increase the rate of student learning, in some cases effectively doubling the speed of student learning. We realized that because of the diversity of the studies, there was no simple recipe to easily apply in every classroom, but we were confident we had identified some fruitful avenues for further exploration:
Despite the existence of some marginal and even negative results, the range of conditions and contexts under which studies have shown that gains can be achieved must indicate that the principles that underlie achievement of substantial improvements in learning are robust. Significant gains can be achieved by many different routes, and initiatives here are not likely to fail through neglect of delicate and subtle features. (Black & Wiliam, 1998a, pp. 61–62)
While we did not conduct a formal meta-analysis, in a subsequent publication (Black & Wiliam, 1998b), we did try to provide some indication for practitioners and policymakers of the likely potential benefits of formative assessment. We suggested that the effective use of formative assessment would increase achievement by between 0.4 and 0.7 standard deviations, which would be equivalent to a 50 to 70 percent increase in the rate of student learning (see Wiliam, 2006, for details).
While we were confident that the research evidence that we had compiled made a compelling case for making classroom formative assessment a priority, we were not sure that these ideas could be implemented in real classrooms, especially where students were regularly subjected to external standardized tests, and where teachers were held accountable for their students' achievement.