Why Updated Assessment Scores Are Changing How Educators Measure Student Progress

Image of a teacher helping a student in the classroom

November 26, 2024

By Sarah Quesen and Mariann Lemke

Recently, we wrote about how normative and criterion-referenced test scores paint different pictures of student progress. Normative growth measures highlight how students are improving relative to their peers, while criterion-referenced growth scores show how students are advancing toward content- or skill-related targets—usually state standards.

With many assessment publishers providing updated normative scores based on postpandemic student data and with some states updating their standards-based performance expectations, it’s worth revisiting this topic to clarify what these changes mean for students, parents, and educators.

The Impact of Updated Norms on Student Assessment

Many commonly used interim assessments and some state assessments report normative data to their users. Test developers often have a regular cycle of updating their norms or may take on updates in response to large-scale changes that affect student performance, such as the COVID-19 pandemic. What those updated norms show is a downward trend in the knowledge and skills required to achieve the same level of relative performance. That means that students who demonstrate the same knowledge and skills now as their peers did in the past will look better in normative terms. It’s comparable to running a mile in 7 minutes, which would have placed you in 20th place in a race in 2019 but now earns you 15th place because everyone is running slower.

For example, the publisher Renaissance recently updated norms based on 2022/23 data for its Star Early Literacy, Star Reading, and Star Math products. They report, “The new norming analysis found that scores declined overall in Star Early Literacy, Star Reading, and Star Math. Therefore, educators using school and district benchmarks will likely see fewer students classified as needing intervention and more students at/above benchmark level.”

Similarly, a recent study by Curriculum Associates of its i-Ready assessment shows lower scale scores associated with the 50th percentile compared to a 2017 norming study (see Figure 1, where the orange line shows postpandemic scores at the 50th percentile and the blue line shows prepandemic scores at the 50th percentile level). For example, in grade 5, a scale score of about 480 was associated with a student scoring at the 50th percentile in 2024, while students needed to score about 490 to be at the 50th percentile in 2017. Again, this shift doesn’t indicate that students are learning more; rather, it indicates that median performance is lower.

Figure 1. i-Ready Math Spring Scale Scores at the 50th Percentile Pre- and Postpandemic (2017 and 2024)

*Note*. *i-Ready Diagnostic* National Norms Tables: Fall 2016–Spring 2024 and *i-Ready Diagnostic*  National Norms Tables: For Use in the 2024–2025 School Year

Another commonly used interim assessment, the NWEA MAP assessment, last updated its norms in 2020 and plans to release new norms in 2025. However, MAP did update its approach to selecting the test questions that students answer, which results in changes to math scores. NWEA’s documentation notes that they “observe small to moderate changes in math scores within a test season. However, even small test-score changes result in significant normative shifts. For our partners, this likely means notable shifts in student and school achievement and growth percentiles.”

In this case, changes are not consistent—unlike in the earlier examples, data suggest that students at different grade levels and time periods will receive different scores. As an example, NWEA estimates that a grade 2 student who scored at the 50th percentile in fall 2022 would score at the 47th percentile in fall 2023, while a grade 2 student at the 50th percentile in winter 2023 would score at the 53rd percentile in winter 2024 and even higher in spring.

They recommend that users apply an adjustment to scores because “using unadjusted cut scores results in under-identifying students for intervention services and over-identifying students for talented and gifted programs” so that “partners can make decisions about student placement that are more consistent with past decisions, ensuring that students receive the appropriate level of support.” They also recommend that scores used to determine whether students are on track to proficiency on state assessments be adjusted, as well as those used in longitudinal analysis.

How New Performance Levels and State Standards Are Changing

Given changes in normative data, users might prefer to pay attention to criterion-referenced growth that shows how students are advancing toward content- or skill-related targets. These types of scores are commonly used in state assessment reporting, though some interim assessments also provide these data.

A benefit is that the proficiency expectations across time generally do not change unless there has been a significant update to the content or skill-related targets, such as standards. It’s like saying we think running a 7-minute mile is “excellent.” And that determination doesn’t change even if a lot more people are running 7-minute miles unless we change our idea about what “excellent” mile running is. So, we could track progress toward running a 7-minute mile over time. These types of metrics can therefore provide information that is more consistent about student progress because they aim at a fixed standard.

However, the 74 recently reported that several states have updated their state assessment expectations. In this case, the changes they made were not to the content or skills students are expected to learn (the standards themselves) but rather to their performance expectations—that is, how well students must demonstrate their learning on a state-required test to be considered “proficient” or “meeting expectations.”

In some cases, lowering performance expectations has led to substantial increases in the percentage of students considered proficient or on grade level. This is analogous to revising the criteria for an “excellent” runner award: one year, a 7-minute mile qualifies, but the next, the threshold is relaxed to 8 minutes. While more individuals may qualify for the award, their actual performance remains unchanged. At the same time, some states may be moving in the opposite direction, making it more difficult to achieve a performance standard. Such adjustments to proficiency standards complicate meaningful year-to-year comparisons of student achievement, potentially raising concerns about the reliability of these measures as indicators of progress.

Interpreting Test Score Changes Over Time

Changes in test scores are not necessarily a bad thing. Test developers often make changes to improve the quality and accuracy of scores. But in an environment of changes in test scores, how can educators, families, policymakers, or students themselves make sense of their performance and progress over time?

Common advice for someone who is on a boat in choppy water is for them to focus on something that isn’t moving to maintain their bearings. So, in the context of student performance levels, a good place to start is identifying what has changed and what hasn’t. In the case of updated norms or performance levels, oftentimes the underlying scale scores themselves haven’t changed. If your goal is to ensure consistency from year to year in the numbers of students receiving some services, these unchanged scale scores may serve your purposes.

As always, it’s important to use multiple measures. No single test can provide a complete picture of what students know. Putting information from tests together with other data such as assignments or other student work is critical for making accurate decisions about instruction, placement, or other topics. Looking for patterns of change in performance may provide useful insights.

Finally, remember that it’s how we act on the data we have that really matters for student success.

Sarah Quesen (she/her) is an expert in statistics and psychometrics. As Director of ARI, she leverages her understanding of assessment systems to lead rigorous, transformative research and provide evidence-based technical assistance to states, districts, and commercial organizations.

Mariann Lemke is a Senior Associate at WestEd with over 20 years of experience managing federal, state, and local assessment and evaluation projects.

Why Updated Assessment Scores Are Changing How Educators Measure Student Progress

The Impact of Updated Norms on Student Assessment

How New Performance Levels and State Standards Are Changing

Interpreting Test Score Changes Over Time

More Related to this Post

Insights & Impact

Four Key Assessment and Accountability Trends Education Leaders Should Monitor

Why Normative and Criterion Growth Goals Don’t Always Lead to the Same Finish Line, Especially After COVID-19

Is the School Assessment and Accountability Era Over? Insights From Marianne Perie.