Effective, Fair, and Efficient: Using Multiple Choice in the Humanities

Photo by Pixabay on Pexels.com Image description: A black and white photo of a room with seven identical white doors. Large black and white patterned wallpaper covers the wall around the doors.

Presented as a workshop at the American Association of Philosophy Teachers (AAPT) “How We Teach” conference, July 14, 2021

Jennifer Szende

In July 2021, I led a virtual workshop session on multiple choice testing in philosophy as part of the AAPT series on “How We Teach”. This is an adapted version of the handout I distributed with the presentation. Multiple choice testing is sometimes dismissed as too easy for students, too open to dishonesty, or too difficult to design for instructors. Here, I give an argument in favour of multiple choice testing, I respond to some concerns, and I offer some tips and best practices resources for effective multiple choice testing in philosophy. Much of what I say here will be relevant to other academic disciplines and other testing scenarios in addition to academic philosophy.

Why use multiple choice?

There are many good reasons to include multiple choice within a balanced assessment portfolio. I focus on effectiveness, fairness, and efficiency.

Effectiveness: Multiple Choice questions can be an effective way to assess the learner’s ability to recall, understand, apply, analyze, and evaluate. Standardized tests typically use case studies and sight passages to assess student’s understanding, application of concepts, analysis, and evaluation of novel information. So, to an extent, many of us are familiar with multiple choice testing that is designed to assess skills beyond information recall. One frequent objection to those types of tests is that they assess ‘test taking ability’ or ‘familiarity with the test format’ rather than assessing analysis or understanding. Keeping this worry in mind, I have tended to design my tests as open book and without time limit. Open book because I am very happy to build formative assessments that force students to look at the course material in a new light, and give students the opportunity to examine what they find in that new light. If they read a passage for the first time, or re-read it for a subsequent time, in order to answer the question, the test has done its job.

Fairness: Sometimes, assessing students can be unavoidably subjective. For essays and presentations, the bias and subjectivity is mostly located at the stage of marking or assessing student work, with some biases and subjectivity located in the design of the assignment. Rubrics can help standardize the subjectivity across students (thereby increasing fairness), but a level of subjectivity remains. Think of cases where TAs and instructors standardize each other’s ‘A’ paper, ‘B’ paper, and ‘C’ paper, or cases where students appeal a grade by comparing marks and assignments with other students in the class. For multiple choice, the subjectivity of assessment is located at the stage of writing questions, rather than at the stage of marking questions. As a result, the subjectivity and bias are more fairly distributed across all test takers (Loftis 2019). I have taken Rob Loftis’s advice seriously, and have taken to offering students a space in which to explain their answer. I don’t read these explanations for correct answers, but find I am often able to give partial or full credit to students who misunderstood the question but demonstrate understanding of the material, and other times these responses help me to recognize and rectify (with full credit) questions that were unintentionally ambiguous.

Efficiency: A multiple choice test allows you to assess a lot of material in a relatively short assessment, which is furthermore easy to mark. Multiple-choice tests can cover a large scope of material in a relatively small/short test. They are easy to mark, even for large and online courses. See David DiBattista’s argument here. In some cases, the Learning Management System (LMS) or scantron system can be used to automatically mark the test, or mark it pending instructor review and approval. In particular, the cognitive burden of marking is reduced. Reducing the cognitive burden of marking is no small feat, even if much of the cognitive burden shifts to the stage of test design. When I have large (90+ student) classes, these tests allow me to save some of myself for other types of student engagements and assessments. In the case of LMS tests with automatically generated feedback, there is a possibility to give immediate feedback to learners, so that the student gets an explanation of the correct answer.

So, why would philosophers avoid using MC?

It’s too difficult for the instructor! Constructed answer questions are much easier to produce (DiBattista and Kurzawa 2011). The easiest multiple-choice questions to produce assess information recall, and many teachers aren’t interested in assessing information recall. Genuinely challenging, formative multiple-choice questions, especially those that assess understanding, analysis, application, or evaluation, can be challenging and time consuming to write/design.

  • Practice writing questions in a variety of styles, for a range of skills.
  • Use some of the question-writing tips offered here or in the further resources linked below.
  • Pace yourself throughout the term. Write 1-3 questions per week, or per lecture. Schedule time to write questions after each lecture, when the material and discussion are fresh in your mind.

Multiple Choice is too easy for my students, or too low on Bloom’s taxonomy (DiBattista and Kurzawa 2011; DiBattista 2008; Loftis 2019). Many instructors worry that students will just use a search function to find the answers. The solution is to design the test/write questions with this worry in mind.

  • First, ask yourself: “What is the purpose of the test?” Are you assessing whether students have attended lecture/read the material? Whether they have understood the material? Whether they can apply a concept to a novel situation? It might turn out that you want to assess information recall in a particular instance. But, if so, it might be an appropriate occasion on which to set a time limit (with appropriate extensions for students who need it), or it might work best for an in-class test. If, however, you want to assess understanding, analysis, or application, remove the time limit and design questions to be open book. Invite students to take the time to look up the answers. You may wish to use paraphrasing to avoid searchable terms. Alternatively, you may actually choose to have your students look it up, perhaps using a search function. If they haven’t reviewed the material very closely yet, maybe the test is a good way to get them to read key passages.
  • MC can be formative, medium to high on Bloom’s taxonomy, and can provide a valid measure of student achievement.
  • Skills that can be tested with MC: Recall; Understanding; Apply; Analyze; Evaluate? (Loftis)

Academic dishonesty. Lots of worries arise on teaching forums about students paying someone else to write the test, working together, or copying each other. If that is your worry, design with it in mind. But also, learn a bit more about triggers of academic dishonesty, and try to design your evaluation to avoid these.

  • Again, consider: ‘What is the purpose of the test?’ Choose an appropriate assessment strategy for the thing being tested. Multiple choice tests can be formative, and the purpose of testing might be to familiarize students with key concepts. The process of looking up the answer and reading through the questions might be exactly what you want to test. Consider providing a provision and permission for students to work on these questions together, such as an unmarked fill in the blank ‘I worked on this test with the following person/people…. ‘.
  • Use low stakes multiple choice testing. Frequent (open book?) tests worth 2-5% with the lowest marks dropped are less likely to lead students to feel under pressure than one-time exams worth 30-40%.
  • Use randomization. Learning Management Software such as D2L/Brightspace, Blackboard, or Canvas allow multiple forms of randomization in testing. Build question ‘pools’ or ‘banks’ with a larger number of questions on each topic than will appear on the test. The LMS will randomly generate a set of questions, and will randomize the order that they appear in for each student (within parameters set by the instructor or test designer). The LMS can even randomize the order that the options appear in within each multiple choice question, which encourages closer reading of the question.
  • Consider using untimed test and/or open book tests. Design a test that will require looking up (some? Most?) answers, and give students time and permission to do so. If the test is designed to be open book, looking up the answer will not constitute cheating.

General best practices for Multiple Choice:

Image of Bloom’s Taxonomy from Vanderbilt University Center for Teaching. Bloom’s taxonomy ranks cognitive skills. From bottom to top, Bloom lists: remember, understand, apply, analyze, evaluate, create.

Some MC question writing strategies:

What follows are a few question-writing strategies that I have used in the past to generate questions. I keep this list handy when I am trying to generate 1-2 questions each week based on the discussion. I review my lecture notes or power points, any examples discussed in class – especially those raised by students – and try to write a question stem and the correct answer before generating distractor responses also based on lecture, discussion, or written material.

  1. Paraphrase, and use the paraphrase rather than quotations in the stem or the multiple choice options:
    • Paraphrase the thesis of an article.
    • Paraphrase definitions for key terms.
    • Paraphrase key objections.
  2. Use key terms and key concepts in multiple-choice, but try to use them in novel situations/case studies/ examples.
  3. Use comparisons/contrasts/lists drawn from course material or discussions.
  4. What does the example show?
    • Example from the reading: What point is Author making when they use X?
    • Example from the news/ film/ popular culture: What would Author say about X?
    • Example from the news/film/popular culture: Which Author would make which of the following claims?
  5. Who (which Author) would agree with [paraphrase]?
  6. How might Author A respond to Author B’s question/quote/example/concern?
  7. Author A and Author B agree about X.
    • True or False?
    • Which reason would each give for X?

Hack your marking and ease the cognitive load

Amongst the tasks involved in teaching, marking can be particularly exhausting and overwhelming. Marking is itself tiring and draining, and marking deteriorates as tiredness sets in and attention spans wane. Worst of all, marking necessarily comes at the end – at the end of the unit, the end of the course, the end of the semester. It comes at the point when you are most thinly stretched. Marking typically spills over into the time allotted for the next task, so that it is not unusual to have to start preparing the next lecture, unit, or syllabus while still snowed under with a huge amount of marking from the last one.

There are, however, some ways to do a portion of the cognitively hard work of marking ahead of time and thereby make your marking more fair, less taxing, and easier to manage. This post suggests a couple of ways to ease the cognitive load of marking by doing a significant portion of the decision-making before the first student even submits their assignment.

1. Multiple-Choice

Using multiple-choice questions on tests or quizzes for some part of your assessment portfolio is one way to do the hard work before you get to the marking stage. I have been completely persuaded by Rob Loftis’ excellent paper, “Beyond Information Recall: Sophisticated Multiple-Choice Questions in Philosophy”. Amongst his arguments, Loftis points out that most of the subjective and cognitively difficult tasks involved in assessment get front-loaded in multiple-choice assessments: subjective decision-making occurs at the stage of composing the questions, rather than at the stage of marking each individual answer or response. By the time the students take the test, the cognitively difficult and subjective decision-making part of assessment has been completed. Moreover, exactly the same subjective decision-making is shared equally by every student who answers a particular question. That is, students are not subject to the unfairness of repeated and distinct subjective decision-making processes every time a question is marked.

David DiBattista’s paper “Making the Most of Multiple-Choice Questions: Getting Beyond Remembering” points out that multiple-choice tests make it possible to cover a relatively wide scope of material in a relatively short test. He points out that it is furthermore relatively easy – and quick – to mark the tests, even in the context of large or online courses. Learning Management Systems (or LMS systems such as Canvas, D2L, Moodle, and Blackboard) or Optical Readers facilitate the automated marking of multiple-choice tests, which can nonetheless be tailored to focus on higher-order skills such as analysis, understanding, application, and evaluation. I note that I am also persuaded by Loftis to offer students the opportunity to explain their answers. This does add to the cognitive load of marking, but not substantially so. It has helped me catch unnoticed ambiguities in my questions in the past, without penalizing students for it.

The trick, of course, is to make time to compose multiple-choice questions early enough, and at regular intervals throughout the term. But even if you end up composing the test or the set of questions at the last minute, you will still have moved the cognitively difficult, subjective, decision-making task earlier in the assessment process, and prior to the ‘marking’ stage. In fact, you will have moved it prior to the student submission stage.

2. Rubrics

A rubric generally details and specifies the assessment criteria for an assignment. An analytic rubric provides specific criteria, and describes levels of achievement for each criterion. From the instructor’s perspective, rubrics enable timely feedback, and make marking more consistent and fair. Preparing a detailed rubric in advance eases the cognitive load of marking. At the marking stage, the grader can replace a cognitively complex and holistic assessment such as, “What grade does this essay deserve?” with a narrower and cognitively easier set of assessments. Assigning rubric levels according to set descriptions is a narrower and easier task, especially if the task is repeated for dozens or hundreds of assignments. Descriptive feedback is more easily attached to each assignment, and the process of marking is made easier overall for the marker.

From the student’s perspective, rubrics clarify expectations and marking criteria, all the more so if distributed in advance of the assignment deadline. According to Jönsson and Panadero’s chapter, “The Use and Design of Rubrics to Support Assessment for Learning”, rubrics are transparent, make it easier for students’ to use feedback, facilitate peer assessment, and they may reduce student anxiety or support self-regulated learning. The feedback provided by rubrics can be easier for students to process and respond to. The transparency and clarification of criteria provided by a rubric can help students with peer- and self-assessment of their work, and these in turn can reduce student anxiety surrounding assignment submissions.

3. Self- and Peer-Assessment

Of course, nothing will ease the cognitive burden of assessment quite like getting someone else to do it! But more seriously, students supporting each other’s learning through guided peer-assessment can provided genuinely helpful feedback in a timely manner, which again eases the cognitive burden related to marking. In this case, it eases the cognitive burden of providing specific, timely, and relevant feedback, which is a key part of the assessment process. I am by no means suggesting students give each other heavily weighted final grades or even unguided holistic assessments of each other’s work. They don’t necessarily have the training or the experience to be able to recognize what makes a strong (or weak) paper on its own merits.

Best practices in peer assessment would use an anonymized and guided system to help students offer each other feedback, and might give them credit for doing so. In order for peer-assessment to front-load the cognitive burden of marking, clear assessment criteria – perhaps a rubric, or perhaps a peer assessment form- should be prepared in advance. In “The Role of Self and Peer Assessment in Higher Education”, Pérez et. al. demonstrate that well-guided peer assessments can concord with lecturer assessments of the same work, using the same guided criteria. But even peer assessments that only provided content feedback on a draft can provide the student with useful and valuable information prior to a final submission, and would ease the cognitive load of the instructor marking the subsequent submission.

Peer assessment can be designed to use rubrics, or can be designed to provide feedback at a mid-point in a scaffolded assignment. In a recent course where I was a student, we were given the opportunity to submit an early draft for peer review (anonymized through Canvas), and only if we completed the peer review process were we eligible for detailed feedback on the final submission from the professor. Whether the peer assessment generates a mark, or a completion mark, or merely generates feedback, peer-assessment can help provide students with feedback in a format they can use, and thereby reduce the feedback burden of marking.

Guided self-assessments as part of a scaffolded assignment can likewise help students generate feedback and recognize where they have strayed from the stated criteria.

Scaling Up

A note of caution about self- and peer-assessment: they do not scale up as easily as multiple-choice and rubrics. On a large scale, self- and peer-assessment can still be used to help students provide feedback to each other, can be part of a scaffolding process, and can allow students the opportunity to learn from each other. But, when used to generate marks, self- and peer-assessment can be more easily manipulated in ways that might violate academic integrity. Accordingly, my suggestion would be to use peer- and self-assessment to facilitate students providing each other anonymous feedback, and to credit it with a small completion mark or credit worthy of the effort.

Multiple-choice and rubrics, on the other hand, scale up very well. They make the assessment more fair and less subjective on an individual assessment level, and they ease the cognitive load at the final stage of marking.