Does "Fair" Have to be Boring?

I’ve been reflecting on the 2020 NCME Presidential address given by Dr. Steve Sireci, as several of his ideas merge nicely with my recent musings. Large-scale assessment has been so focused on standardization, comparability and being free from bias that I believe they no longer draw out the best of student abilities. And, in many cases, isn’t that what we want to know? That is, what is the best work this student is capable of?

Steve talked about choice and about eliminating the concept of group differences in capability of learning. I want to dive a little deeper and push on the idea of choice and cultural relevance.

A year ago, I was approached by a Black state PTA president, also a mother, who accused me (and my fellow psychometricians) of focusing so much on eliminating bias from testing that we administered tests that were so dry and boring, her sons could not engage with them. Although the conversation was friendly, that accusation stuck with me, because I believe she is correct. And, isn’t that “dryness” a source of bias in and of itself? What if, instead, we developed tests that allowed students to choose the theme/context/passage topic as they went along? Yes, it would require a huge item bank, and my fellow psychometricians would want everything pre-equated, but putting aside those concerns for a moment, consider the possibilities.

Most of our testing of younger students is not high stakes in the sense of admission into a particular school. I do believe that as individual stakes increase, comparability must also increase. However, we also need to create tests that allow students to present their best thinking. Engaging students is a critical element in obtaining that best thinking. If our question is how well students can analyze the theme of a narrative and understand how the author uses specific words and phrases to convey that theme, does it matter what the topic of the story is? Furthermore, does everyone’s topic have to be the same? Ideally, we want the complexity of the narrative and the difficulty of the questions to be equivalent, but psychometrics can adjust for any differences. Simply allowing the student to choose the topic prior to seeing the passage and questions should already increase engagement.

Steve cited the work done by the College Board on AP testing where students were allowed to review and choose the prompt to which they wished to write. The findings showed that students who were given a choice performed worse overall than students who had a prompt that was randomly assigned. This finding was more pronounced for students from lower socioeconomic backgrounds. Steve indicated we should dismiss that research as being 20 years old and high stakes; however, there is a kernel of it I think needs to be preserved. When students were surveyed, they indicated they selected the prompt they thought would be easiest to respond to, not the one they thought was the most interesting. By focusing on the topic and not allowing students to waste time reading all choices of passages and questions, we can eliminate that motivation and instead provide the student with the context they choose.

A colleague of mine, Laura Kramer, did an interesting study while I was at the University of Kansas that I don’t believe she ever published. Basically, she and her staff (including graduate students) wrote three versions of a math test for students in a career and technical education pathway. The first version was basically context-free. Word problems hit a variety of contexts, and rules were followed for universal design and inclusion. The second version put problems in the context of agriculture. Word problems were always framed around a topic aligned with what students in an agricultural pathway studied. The third version embedded word problems in a manufacturing context, again aligning with what students in that pathway learned. All other aspects of the test were equivalent, including the numbers and operations in the math problems and the item order. The three forms were then randomly assigned to students who were in either the agricultural or manufacturing pathway. The findings were much stronger than expected. Students in an agricultural pathway did much better on that math test than they did on the other two versions. Likewise, students in the manufacturing pathway did better on the mathematics test set in the manufacturing context than on the other two forms. Mathematics should be the subject with the least dependence on context, and yet the findings indicate that the context still matters.

What if any test started with a survey of student interests and items were pulled to match those interests? What if students had a choice of what essays to read or what topics they wanted to write about? Would we see some of the group differences disappear? Teachers provide a lot more choice in instruction than assessment professionals do in testing. Maybe it’s time we caught up.

Measurement in Practice

Does "Fair" Have to be Boring?

Recent Posts

Comments