Are Tests Measures of Test Taking Ability?

In a recent discussion of my book, A Measure of Failure, the typical argument against any critique of standardized testing was issued in response to a favorable review of the book’s main points. In the comments we read: “A math test, such as the math portion of the SAT for instance, most certainly measures a student’s ability to do the math problems on the test. It is impossible to do well on such a test without the underlying skill that is required to do the math.” It seems hard to argue with this.

But the English language does not help the discussion of measurement, as measure can signify both a standard and the process of applying a standard for the purpose of measurement, assessment or comparison. Not all applications of standards produce measurements. Applications of legal standards do not yield measurements of criminality. So, to say that a test is the best available measure may be true if by measure one means the prediction of some performance. But prediction and measurement are not the same thing. Measurement is a very specific thing, a claim that a mathematical system corresponds with the phenomenon of interest. This is the criteria of being isomorphic. Standardized tests do not meet that criteria. And, they do not identify a precise object of measurement. Thus, claiming that one must have real knowledge of mathematics to perform “well” (high rank performance) on some math test is not the same as the claim that the math test produces a measurement of math ability. Of course one must have some related skills and general intellectual development to engage with the test in a way society renders valuable. But the outcome of that exercise does not constitute a measurement.

In the course of the discussion, it was argued that test scores are at least measures of test taking ability. My claim is that tests currently in use do not meet the criteria of measurement, and that this fact is hidden, covered over, but in reality, known to psychometricians. My claim is that these tests do not produce measurements of any kind (Walt Haney tried to convince me that they are “weak” measures, which created new problems). This is why I go to great lengths to distinguish between assessment and measurement. Standardized tests are obviously tools for making assessments. They’re just not measurements, and my claim is that this distinction is very significant.

I suppose that part of what is troubling about my argument is my strict use of the word measurement. So, for example, I would agree that a score on a standardized test is a “useful indicator” of how proficient a person is at taking standardized tests in general, but I would object to someone calling that score a measurement of test-taking ability. Creating an indices, Likert scale, etc., with the aid of numbers, may provide “useful” information, and even allow that information to be treated statistically (75% of Americans are opposed to the Iraq war) but the mere assignment of numbers to something in this manner does not in itself constitute measurement. Again, I maintain that the distinction is significant; it is significant that politicians and policy experts routinely call things measurements when the results do not meet the criteria of measurement.

The claim to measurement is made because it enables one to make claims about the origin of social trends. During the rise of intelligence testing, the claim that intelligence was being measured (even though it was known to be a mere classification) enabled reformers to link school performance to what they postulated as variation in intellectual ability (and not ineffective teaching, instruction in a language not spoken by students, or a vapid curriculum). Today, the claim to measurement is required to argue that “teaching ability” or “teaching effectiveness” is the cause of various social trends. No serious scientist believes that student performance on any academic test constitutes a measurement of teaching effectiveness. And, today, even though it is well established that is “normal” for individuals to vary in their rate and depth of learning any content or skill, the useless slogan “all children can learn” is shouted by reformers as if it represents the noblest aspirations of humanity. Even if social inequality were drastically reduced, individual (not group) performance on any valued task — intellectual, social, physical — would vary widely (and this in and of itself is not a social problem).

Finally, as seems to be common when anyone presents a challenge to standardized testing, critics are imputed with the aim of “throwing out the tests.” My book is quite clear that eliminating standardized testing as we know it — while leaving all else intact — would do little good and produce more harm. But blocking the use of high stakes tests would be a positive move. And as for being pegged an anti-tester, I’m the only one (I think) to critique the critics who say standardization is “bad”; again, my aim is to analyze these concepts and structures as they are rooted in definite social and political systems. Standardization in political terms is an advance, and part of the progressive notion of equality. In fact, the tendency now is to undermine, blow off, and ignore standard psychometric procedure (reliability, validity, etc.) and this is destructive and reflective of the larger trend of those in positions of power to act with impunity. As Gene Glass notes, most states don’t even produce the most basic test validation data.

But the actual point is that the standards adopted by a social system change as the system changes; the point is that this is a political fight, and that the fight over standards is political. By political I do not meant to narrowly refer to political parties, but rather I refer to the process by which a society decides who gets what, when, where, and how. Educators can’t wish away this political feature of standards. It is an argument that ultimately says that in order to address the flaws of standardized testing and policy that relies on testing, you have to address the major flaws of the present social system that are reflected in those tools and policies. The failure of “authentic assessment” is as much a political failure as a technical one.

