January 28, 2025
By Mariann Lemke
The upcoming release of the National Assessment of Educational Progress (NAEP) results will offer policymakers an opportunity to assess whether states’ efforts to improve reading performance through early screening for potential reading difficulty, curricula, professional development, and other changes are making a difference.
But NAEP scores aren’t what educators use to track progress. In many states, educators rely on commercial screening assessment tools that have set their own benchmarks to identify students at risk of reading difficulty.
One logical approach to measuring change is to examine the percentages of students flagged by a screener over time. A decline in the percentage of students identified as at risk would seem to indicate improvement, and an increase would suggest a problem. Unfortunately, things are not quite so straightforward.
Rethinking “Significant Risk”
In a prior article, we discussed differences in literacy screening assessments, including in how their risk benchmarks line up with state assessments in Colorado and Massachusetts at the end of grade 3. Using recent data, we built on that earlier work to compare those risk benchmarks to Massachusetts’s English language arts (ELA) assessment at the beginning, middle, and end of the year in grades 2 and 3.
This updated analysis shows that students who were identified as being at significant risk for reading difficulty at the beginning of the year aren’t necessarily performing at the same level as those identified as at risk at other times in the school year. In other words, “significant risk” does not have the same meaning over time within or across grade levels either within individual assessments or across them.
For example, one screener’s beginning-of-year risk benchmark at grade 2 corresponds to a score of about 481 on the Massachusetts Comprehensive Assessment System (MCAS). But at the end of 2nd grade, it matched up with a score of 474. That means the benchmark, relative to MCAS, is more difficult to achieve at the start of grade 2 than the end and that the screener is likely to identify more students as at significant risk at the beginning of the year than in the spring.
These differences don’t necessarily indicate a problem with the screener or where the benchmarks were set. For example, it may be important to identify more students as potentially at risk early in the year so they can get support earlier. It’s also true that any single test score is not the whole story and that the statistical linking of benchmarks to MCAS is not perfect.
But the differences can matter a lot to educators and policymakers who make decisions about extra staff and resources to help struggling readers because the numbers of students who appear to need extra help might change due to the benchmarks themselves—not due to changes in student knowledge or skills. That is, there is not necessarily a consistent definition of “risk” over time. So, before celebrating a decrease in the numbers of students identified as at risk over time (or catastrophizing an increase), it’s worth checking that information with other data.
How Other Data Can Help
National percentiles can help provide additional context, but they have their own limitations.1 Assessment publishers often provide these scores in addition to their own benchmarks, and they are intended to provide consistent information about how students compare to their peers nationally.
The figures below show how the percentages of students identified as at significant risk differ when one uses the publisher-provided risk benchmarks and the 25th national percentile cut point. If the percentage of students flagged as below the 25th percentile decreases over time, that means student performance is improving compared to a national reference group. If it stays the same, it tells us that student performance hasn’t changed in relative terms. And if the percentage increases, student performance is declining or at least not keeping up compared to peers.
In Figure 1a, for instance, we see that when screener 2’s risk benchmarks are used, the percentage of students identified as at significant risk increases from 19% to 51% between the beginning and end of the year. An educator might interpret that information to mean students have lost significant ground. But the national percentiles for screener 2 (Figure 1b) show that the percentage of students performing below the 25th percentile is the same at the beginning and end of the year.
For screeners 1 and 3, the percentages of students identified as at risk using publisher-provided benchmarks and the 25th percentile decrease from beginning to end of year. That provides some evidence that there has been real improvement in student performance over time.
Some assessment publishers also provide their own metrics to assess progress, such as measures of how much student scores have increased from time period to time period compared to national peers. Some even provide separate assessment tasks designed specifically to monitor progress in response to supports that schools provide, which may differ from screening assessments administered less frequently.
As always in measuring student learning, it’s complicated. Educators should ask themselves what evidence they have about student progress and whether different measures are consistent. They, and the states requiring screening, should also ask assessment providers to be clear about how benchmarks that indicate risk of reading difficulty were established for different time periods and what scores can best help measure improvement.
Sometimes, the simplest approach—in this case, just measuring whether the number of students identified as at risk changes over time—is not the most accurate.
Next Steps
- Join the conversation: How have you navigated the complexities of reading risk benchmarks in your district or state? How does your school or district measure change?
- Learn how you can collaborate with WestEd to ensure assessments provide a true picture of student progress.
- Learn more about our early literacy assessment work with the state of Massachusetts.
Mariann Lemke is a Senior Associate at WestEd who has more than 20 years’ experience managing assessment and evaluation projects at the federal, state, and local levels.