What Goodhart’s Law Teaches Us About Testing

There is a moment in many schools where something subtle but important shifts.

The goal stops being student learning, and starts being higher test scores.

It doesn’t happen all at once. It builds gradually through small decisions, well-intentioned policies, and mounting pressure. Then one day, it becomes visible.

I remember pausing, because I knew that wasn’t true. I also knew exactly why someone had told him that. So I did what I could in that moment and I explained that the test was just one way for his school to understand what he had learned so far, and that it wasn’t something to worry about.

But I also knew something else: I am a rare parent in that situation.

I understand how these tests are designed. I understand what they are, and just as importantly, what they are not. Most families don’t have that context. Which means there are countless students sitting in classrooms right now, hearing that these tests are “important,” and quietly wondering what that means for them, their future, and their worth.

That’s the moment where you realize something has gone sideways.

Once you see it, you can’t unsee it.

So how did we get here?

There’s a concept in economics that helps explain exactly what’s happening here.

The idea comes from economist Charles Goodhart, who observed that once a metric is used for control or accountability, people begin to change their behavior in response to the metric itself, instead of the underlying goal it was meant to represent.

In his original work, he described it this way:

“Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”²

In plain terms, this means that as soon as people are held accountable to a number, they will find ways to improve that number. However, not all of those ways actually improve the thing the number was meant to represent.

People adapt. They optimize and get creative.

Sometimes that creativity leads to real improvement.

Sometimes it leads to what we might call gaming the measure, or improving the number without improving the outcome.

Over time, this idea has proven to apply far beyond economics, because it shows up anywhere measurement meets pressure.

There are a few well-known examples that make this idea easier to see.

In one case, a government tried to reduce the number of venomous cobras in populated areas by offering a reward for each snake head turned in. The number of dead snakes became the measure… and then the target.

People started breeding cobras.

Instead of reducing the snake population, the policy incentivized people to create more of the very problem it was trying to solve.

In another example, a factory measured productivity by the number of nails produced, tying worker pay directly to output. The number of nails became the measure… and then the target.

The workers began producing smaller and smaller nails.

They were able to increase production numbers, but the nails eventually became too small to be used for their intended purpose. In effect, they were producing more and accomplishing nothing.

In both cases, the measure didn’t just fail, it changed behavior in ways no one intended.

This is not a distant, theoretical concept. It is happening every day in K–12 education.

The goal of education, whether formally defined through standards or informally understood, has always been about preparing students for life beyond school. Learning to read, write, reason, solve problems, and engage with the world.

While well intended and needed, those goals are complex and are challenging to measure directly. So we use academic indicators, also called test scores, as a way to understand whether learning is happening.

That’s where the shift begins.

Over time, those indicators start to carry weight. They influence policy, guide funding, decisions, and shape public perception.

And eventually… They become the goal.

Once test scores become the target, behavior changes.

We see it in:

  • Months of “test prep” that replace real instruction
  • Students being told the test is the most important moment of their academic life
  • Schools focusing disproportionately on students near performance thresholds (AKA “bubble students”)
  • Narrowing of curriculum to only what is tested
  • High-pressure environments that increase student anxiety

And in the most extreme cases:

  • Adults altering student responses
  • Systems rewarding or restricting student opportunities based on a single score

This is what Goodhart’s Law looks like in education. This is what I call ‘snake breeding’ behavior.

It’s worth remembering why these tests exist in the first place. Statewide assessments were designed to answer a system-level question: Are students being taught the academic standards they are entitled to learn?

These tests were never intended to define a student’s potential, determine access to enrichment opportunities, evaluate parenting, or serve as the sole indicator of success

They are system indicators, not individual verdicts.

However, when we confuse the measure with the goal, we begin using them in ways they were never designed for. For more on the purpose of assessment, check out this past blog post: Revisiting the Why: The Purpose of Assessment.

Some argue that the solution is to reduce or eliminate testing altogether. But unfortunately, removing the measure will not fix the underlying problem but it would create new ones. 

Without shared measures:

  • systems lose visibility into student learning trends
  • local assessments multiply (increasing overall testing)
  • inequities become harder to detect
  • decisions rely more on anecdote than evidence
  • new and different data collections will be created to provide visibility for policy makers

The issue is not the existence of the tests. The issue is what we’ve turned testing into.

This is where the conversation needs to shift.

The problem is not the students, and it’s not even the tests themselves. It’s the way adults respond to those measurement tools.

When measures are mistaken for goals, when pressure is applied in ways that begin to distort instruction, and when results are used far beyond their intended purpose, the entire system starts to shift around those decisions.

Students feel that pressure, even when it isn’t explicitly stated. Teachers adapt their instruction in response to it, often in ways that prioritize the measure over real learning. Over time, the data itself becomes less meaningful, because it no longer reflects what we actually set out to understand, if students are being taught what they’re supposed to be learning.

What begins as a tool to illuminate learning slowly turns into something else entirely. Not because of the test, but because of the weight we ask it to carry.

This is exactly where the Compassionate Assessment Framework (CAF) becomes critical.

Goodhart’s Law doesn’t just expose a technical issue. It exposes a human one. Specifically, how adults think about, talk about, and respond to assessment and testing. At the center of this is Adult Attitudes and Beliefs.

When adults begin to treat test scores as the goal rather than a signal, their decisions (often unintentionally) reshape the entire system. Pressure increases. Instruction narrows. Opportunities shift. Eventually, over time, those behaviors become normalized.

Students feel it, even when nothing is said explicitly. Teachers respond to it, often in ways that prioritize the measure over real learning. Gradually, the system starts optimizing for the number instead of what the number was meant to represent.

CAF helps us interrupt that pattern by bringing attention back to what is actually within adult control.

It starts with noticing.

  • Where are we optimizing for scores instead of learning?
  • Where are we applying pressure that changes behavior in unintended ways?
  • Where are we using test results beyond what they were designed to tell us?

These are the moments where snake breeding begins. Not because anyone set out to do harm, but because the system quietly rewards the wrong things.

The solution is not to eliminate measurement altogether, the information matters too much. The solution is to reconnect the measure to its original purpose and realign our actions around the goal.

That means keeping the focus on the quality of instruction throughout the year, not just performance during testing. It means using multiple forms of evidence to understand student learning, instead of asking a single score to carry more weight than it was designed to hold. It means being clear, consistently and transparently about what tests can and cannot tell us.

Most importantly, it means shifting how we, as adults in the educational system, respond.

Because no matter how many times we redesign a test, change the format, or adjust the scoring, if we continue to chase the target instead of the goal, we will simply find new and more creative ways to breed snakes.

Compassionate Assessment doesn’t remove measurement. It restores it to its proper place—so that learning, not the measure, remains the goal.

If we ever forget which one matters more, test scores or student learning, Goodhart’s Law will remind us.

Sources - HEADER - Copyright Metimur LLC, 2026
  1. Goodhart’s Law Wikipedia page: https://en.wikipedia.org/wiki/Goodhart’s_law 
  2. Goodhart, Charles. Monetary relationships: a view from Threadneedle Street. University of Warwick, 1976.


Comments

One response to “When the Measure Becomes the Goal”

  1. […] is where the connection to Goodhart’s Law becomes […]

Leave a Reply

Discover more from

Subscribe now to keep reading and get access to the full archive.

Continue reading