One of my all-time favorite professional memories starts with me sitting at the Northeastern Educational Research Association's Annual Conference in 2014. Kurt Geisinger, esteemed Director of the Buros Center for Testing and the Meierhenry Distinguished Professor at the University of Nebraska, was being interviewed in a session that was somewhere between a keynote address and a fireside chat. One of the first questions asked to him inquired about his career in assessment, and before he addressed the question, Kurt stated (I'm paraphrasing here, but I don't think I'm far off):
Well, "assessment" is a psychological term. It refers to a process whereby someone gives an measure, there is a process of feedback and discussion, and then there are recommendations for interventions, next steps, etc. Most of us here work in "testing," where we focus on the measuring of things. We simply use the term "assessment" because people don't like tests.
Before I go ANY further, I'd like to take this opportunity to say, quite plainly, that Dr. Kurt Geisinger is one of the great people I've met in my life. Despite having more academic and professional accomplishments than I could ever imagine, he's the type of person to always know your name, even when you don't feel that your station warrants someone of his station spending a brain cell on such a datum. He's the type of person to remember where you live and work, how you were the last time you spoke, and perhaps most importantly, care about how you are doing. I've been fortunate enough to sit down to breakfast several times with Kurt, and each time I walk away, I feel honored to have had the discussion. He's also utterly brilliant, has worked in almost every aspect of educational testing you can think of, and shepherded the growth of countless researchers, both formally and informally. If you take anything away from this post, I hope it's how wonderful Kurt is...
But getting back to his comments: This was a moment of significant reflection for me on two fronts. First, Kurt was right with regard to many of my colleagues in NERA and similar measurement circles. Many people I know who carry the title "psychometrician" are very much concerned with measuring things well, but rarely consider those other aspects of the individual test-taker. How are they interpreting results? How are those results used for feedback and intervention? How might we move to a stage focused on improvement?
While this isn't my cup of tea, I'm not indicting psychometrics as a whole, nor those who choose to pursue it. It's a fundamental, yet often overlooked, aspect of education. After all, you can't focus on feedback and intervention if you haven't measured the thing properly in the first place. However, it did make me realize how much the field of educational measurement was indeed focused on testing and not "assessment," as Dr. Geisinger put it.
The second reflection was introspective. In my first post on this blog, I discussed the moment when I knew I wanted to work in the area of noncognitive skills. Yet, this moment at NERA was a solidification of where I wanted to fit in the assessment puzzle. I certainly appreciate, and at times dabble, in the measurement aspect, but I was much more interested in the interpretation and use of data. After all, that's what DIA stands for: "data to information to action." It seeks to be the embodiment of the process Kurt had mentioned.
But the real reason for this post is the clarification about this word "assessment" and what it means. It's a word that gets thrown around a lot, whether you're talking about state-level tests in K12, gathering evidence of student learning in higher education, or taking the Meyers-Briggs on your latest corporate retreat (my condolences on that one). Thus, the crux of this post is really about breaking down this term, focusing on its particular application in higher education student learning outcomes assessment, and making some recommendations for moving forward.
Measurement, testing, assessment, and evaluation
The technical distinction between these terms is a classic lesson in descriptive linguistics. You can find many texts to cite if you wanted a source, but I don't think anyone prescriptively defined the terms before educators and psychologists started essentially agreeing upon some definitions and distinctions. (And, don't worry, I'll explain the importance of hashing these terms out in a minute...)
First there is measurement, which is simply the act of ascribing a value to something. This happens all the time in daily life. Weighing a cup of sugar is placing a value (e.g., "one cup" or "eight ounces" or "128 grams") to an amount of sugar that inherently had no numeric value to begin with.
THAT part seems easy. But what about when you want to place a value on something like college readiness. This is why psychometricians exist. Because we don't have a magic "measuring cup" that can reach inside your brain and ascribe a value to these latent constructs. Thus, psychometrics is the science around the process for how we define something, ascribe measures (e.g., test items) to that meaning, and ensure that those things are working in harmony.
(Full disclosure: my description of testing and assessment below is technically my own, based on my read of several sources and years of experience and observation. I'm not claiming intellectual capital, simply saying I don't have a great resource to point you to, and would welcome debate on the topic.)
Testing refers to a dedicated moment of measurement. While measurement happens in lots of fields, testing has enhanced emphasis in the fields of education and psychology. For example, if an economist or econometrician wants to measure market growth, they identify several indicators to represent the growth of the economy. However, they didn't have to develop those indicators (i.e., test that economic factor) themselves.
But if you want to measure someone's college readiness, it's a little different. You could, in theory, take an economic approach. Simply aggregating high school grades would be an econometric-esque approach to the task. But since college readiness is a latent (i.e., unobserved) construct - not standardized in a convenient unit like dollars - testing becomes necessary. Testing is usually required because we don't have a standard unit of measurement like dollars or grams or inches. (See note 1 below for more information on grades.)
Defining assessment is important because the term has come to take on two meanings. First, as mentioned above, many have used the it as a euphemistic blanket term for all testing, when true individual assessment involves a level of testing, feedback, and intervention that few people will genuinely experience in their lives. For most of us, we take a test and we either never receive individual results (e.g., the Census, some state-based accountability measures in K12) or we receive a very limited amount of score information. Most graded assignments, even with the healthiest dose of red pen, wouldn't achieve the level of information and support that would deem them true assessments in my eyes.
Assessment in the aggregate
The second use of the term "assessment" can create confusion, but also allows a segue into some points about aggregate uses of data. In this case, assessment refers to a process of looking at data to determine effectiveness and guide decision making. The "assessment and measurement" program I attended at James Madison University focuses on this aspect of assessment in higher education, where the use of student learning data has increased dramatically over the last several decades.
In this case, rather than being a process that focuses on feedback and intervention at the individual level, it occurs across a course, program, institution, or other group. What differentiates assessment from testing in this case is the establishment of outcomes, the consideration of data against those outcomes (i.e., did we achieve them or not?), and the decision to maintain or change course as a result of that consideration.
Here, assessment exists alongside two terms that most people know, but often easily confuse with assessment in this aggregate sense. Evaluation looks much like assessment, with outcomes, review, and decision making an inherent part of the process. However, evaluation usually has a summative conclusion attached. A program will be certified or not. Funding will continue or it won't. Because of this, evaluation is typically conducted by an external person, while assessment is generally conducted by or in collaboration with someone internal to the group being measured.
One last final note on research. Research, as defined by the Department of Health and Human Services, is "a systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge." What's really important here is the part about generalizable knowledge. Whereas assessment and evaluation are generally intended to inform either the group being measured or those making decisions about them, respectively, research is intended to inform everyone through media such as conference presentations, peer-reviewed articles, etc.
What does this mean?
Ok... so why is all this important? Why have I taken us down this rather lengthy lexical road? Well, I think there are a few important issues that result from people - both within and outside of the worlds of assessment and testing - confusing and confounding some of these terms. This creates a lot of ire and backlash that is not always appropriately focused.
For example, many people raged against "The Common Core" as evil (or worse... see Note 3), without understanding what the Common Core actually is. It's a set of standards, or outcomes, that students should reach. It is not a test - tests are designed to measure if students achieved those standards. Nor is it a dictation of curriculum - curriculum is what you teach students so they can achieve those standards. Most people don't know this simply because they don't have a full understanding of measurement, testing, assessment, and evaluation.
So, implication #1 is that this knowledge of what assessment truly is can be helpful in debates about education, certification, and other areas where education and psychology meet public policy.
On a personal note, it's also helpful for people like me when at dinner parties. When I worked at a large standardized testing company, I would try and explain what I did for a living, and people would say, "Oh, so you write the questions for the SAT?" Well, as a former colleague once put it, there are engineers who design the cars and factory workers who build them. Psychometricians are more like the engineers and assessment developers are more like those front-line folks. It's just a very different type of work, and a process that's far more complex than most people understand.
Implication #2: I would love to see more of my colleagues focused on the broader implications of assessment rather than just "testing." I attended a conference in England a few years ago where I was tasked to learn more about the international "assessment" scene in higher education. I was fascinated to learn that, whereas in the U.S. we talk about "assessment in higher education" as measuring learning across groups of students, in the U.K. it was all about how to give feedback to individual students. This was an entirely different embodiment of the word that my colleagues and I rarely, if ever, considered. Conversely, my newfound colleagues in the U.K. were very rarely considering the measurement issues that we in the states confront regularly.
I know lots of (some would say too many) psychometricians. It's incredibly difficult work. But I don't know many people who focus on score report development, or examining the effectiveness of interventions on improving student learning tied to testing. If we're truly going to use the word "assessment," I think we should grant more credence to these aspects of that process.
Simply put, grades fail to meet many of the basic criteria for a standardized measure. First, while many of us think of the A/B/C/D/F scale, many schools operate on different metrics entirely. Second, even if we did have a common metric, teachers can use different criteria or weight the same criteria differently to create a grade. This is why places such as ACT and the College Board recommend combining grades with a standardized measure, because you get to balance the "authenticity" of classroom grades with the standardization and reliability of a dedicated measure. For an interesting review on how these two contribute to measuring college readiness, see the Willngham et al. study cited below.
I've got a future post coming on the relationship between assessment and research. It's usually a relationship full of misunderstanding and excessive paperwork, but I'd already diatribed enough in this post and didn't want to get much more off the rails.
I googled "common core evil" and it took me to a dark, very uninformed place. If you've never read a comments section, don't. Whatever you want to find, the internet has it, especially if it's animosity.
Erwin, T. D. (1999). Assessment and evaluation: A systems approach for their utilization. In Brown, S. and Glasner (Eds.), Assessment Matters in higher education. Buckingham, United Kingdom: Open University Press.
Erwin, T. D. (1996). Assessment, evaluation, and research. In S. R. Komives, D. B. Woodard, & Associates (Ed.), Student services: A handbook for the profession, third edition. San Francisco: Jossey-Bass.
Willingham, W. W., Pollack, J. M., & Lewis, C. (2002). Grades and test scores: Accounting for observed differences.Journal of Educational Measurement,39(1), 1-37.