multiple choice

Multiple choice ( MC , /ˈmʌltɪpl̩.tʃɔɪs/ ) or German multiple choice , also answer-choice procedure , is a questioning technique used in exams , tests , exams and surveys , in which several pre-formulated answers are available for a question. It should be noted that multiple choice in English strictly means one valid answer out of several (hence a wrong friend ), which corresponds to single choice in German , while several valid answer options in English are referred to as multiple responses .

There is a "forced choice" ( forced-choice ) in contrast to a free response format. The questions are also referred to as closed questions in contrast to open questions , for which the test person has to enter a free answer. A combination of both types of questions is also common in individual tests or surveys.

Different formats and terminology

In some disciplines, the term "single choice" (SC) or single selection it still differed - for questions where an answer is to be selected exactly as can be selected in "Multiple Choice" by this definition, more than one answer. In many contexts, the agreement applies that only one answer can be correct or chosen at a time. Basically, this must be pointed out in the instructions, in some cases this is so self-evident that it is not explicitly pointed out - for example in exams at schools and universities in the USA or Australia.

There are basically the following selection formats for test items with n possible answers and k ≤ n correct answers, i.e. H. with n - k not applicable distractors :

Multiple choice: Select

a known number of k answers apply

Single choice or multiple choice: Choose

one answer ( k = 1) applies

Binary question or decision question

one of two dichotomous answers ( k = 1, n = 2) applies: true / false, yes / no ...

Multiple choice: check

an unknown number of answers ( k ≥ 0) apply

at most a known number of answers ( k ≤ c ) applies, e.g. in the extreme case one ( k ≤ 1) or all but one ( k ≤ n −1)

at least a known number of answers ( c ≤ k ) applies, e.g. one ( k ≥ 1)

at least and at most known numbers of answers ( c ≤ k ≤ d ) apply, e.g. one to all but one ( c = 1, d = n −1)

more answers apply than not ( k > ⁿ ⁄ ₂ ) or vice versa ( k < ⁿ ⁄ ₂ )

to form

Multiple choice
Which answers are correct? ☐ Answer 1 ☑ Answer 2 ☑ Answer 3 ☐ Answer 4

Single choice
Which answer is correct? ☐ Answer 1 ☐ Answer 2 ☒ Answer 3 ☐ Answer 4

Single choice
Which answer is correct? ○ Answer 1 ○ Answer 2 ● Answer 3 ○ Answer 4

example
Single choice colored

In electronic forms and GUIs , it is common to display single choice with round boxes and multiple choice with square boxes. Instead of a cross, a tick or something similar can be placed. Alternatively, on touch screens or on television (see quiz programs such as Who Wants to Be a Millionaire? ), The selected answers and the correct and incorrect answers during the evaluation can be displayed using text and background colors or frames and other styles.

clear answer choice
Are the answers correct? Yes No ☐☒ Answer 1 ☒☐ Answer 2 ☒☐ Answer 3 ☐☒ Answer 4

In order to be able to distinguish between unselected and unprocessed answers during the evaluation, two boxes are used for each question for “applies” / “yes” and “does not apply” / “no”. This is a group of decision questions with the same question.

correction
Which answer is correct? ☐ Answer 1 ▣ Answer 2 ☒ Answer 3 ☐ Answer 4

On paper forms, a completely filled box can be counted as a correction and thus as a box that has not been ticked. On the other hand, some automatic evaluation methods expect filled boxes instead of crosses to mark the answer.

full set
Which blood group is most common in Germany? ☐ 0 ☐ A ☐ B ☐ AB

The given answers can completely cover all possible answers or only offer a selection. Sometimes complete coverage is achieved indirectly by giving one answer: “none of the other answers apply”.

Scales and matrices

soft scale question: odd number
☐ very satisfied ☒ satisfied ☐ undecided ☐ unsatisfied ☐ very unsatisfied

hard question on the scale: even number
☐ very satisfied ☒ satisfied ☐ unsatisfied ☐ very unsatisfied

If the answer variants represent different grades of an evaluation (e.g. "very satisfied" to "very dissatisfied"), of which exactly one must be selected, one speaks in social research not of multiple choice, but of a scaled question procedure .

Since opinion is researched in MC questions in social science and not knowledge is tested, the last possible answer is often “don't know” or “no answer”, as test persons often feel obliged to tick any cross.

In special applications, crosses must be placed in a matrix . So you can realize more combination options.

Two-stage test tasks

Until a few years ago, a multiple-choice format was used in medical studies, in which various statements are initially suggested, any number of which can apply. This is followed by the actual question, for which only one answer is correct.

example
Statement 1 Statement 2 Statement 3 Statement 4 ☐ No statement is correct. ☐ Only statement 4 applies. ☒ Statements 1 and 2 apply. ☐ Statements 1, 3 and 4 apply. ☐ All statements are correct.

With five answer options in the example, the complexity is slightly higher than in the case of a single correct answer among the four statements, but significantly lower than with free combinability, including the marginal cases that none or all of the statements apply, because there would be 16 possible answer patterns. Even with the restriction that exactly one or two statements can apply, there would already be 10 patterns. The reduction in complexity therefore particularly facilitates correction and evaluation. In the example, the answer options are sorted in ascending order according to the number of applicable statements, but this does not have to be the case.

Evaluation of test performances

The fair evaluation of MC tasks is not trivial and easily leads to unfair judgments.

This is most evident in a test with only two answer alternatives per question (“applies” or “does not apply”). If a correctly placed cross is rated with a point, but no point is deducted for an incorrectly placed cross, a test person without any knowledge of simply ticking the first box in each case achieves an average of 50% of the achievable number of points and thus attests a sufficient or pass after common evaluation . Subjects who are tested with similar questions in an exam without MC are clearly at a disadvantage.

However, in practice, MC tests are sometimes evaluated in this way and therefore incorrectly. The exam results obtained in this way are then one to two grades higher than conventionally obtained results (a four obtained in this way corresponds, for example, to a six, i.e. no verifiable knowledge at all).

In some cases, with awareness of the problem but ignorance of the mathematical relationships, the pass limit is set at a flat rate of 60% regardless of the number of possible answers. However, this procedure is also incorrect, except for exactly 5 answer boxes per question (see below).

SC rating

If exactly one of the alternatives offered is correct and all the others are incorrect, the easiest way to a fair evaluation is to deduct points ( malus ) for incorrect crosses : one point per question for two answer alternatives offered, half a point for three alternatives and half a point for four alternatives a third of a point, etc. Unanswered questions and questions in which more than one cross was placed remain without evaluation, no point is given and none are deducted. In order to always give the respondent the opportunity to avoid deducting points for unanswered questions, at least two alternatives (“applies” and “does not apply”) should always be offered. Orders such as “mark the correct statements” should generally be avoided.

Consideration of the statistical effect by *deducting* points ( *penalty points* ) for incorrect answers
Alternative answers per question	Deduction for each wrongly placed cross
2	1
3	¹ ⁄ ₂
4th	¹ ⁄ ₃
5	¹ ⁄ ₄
n	¹ ⁄ _{n −1}

The consideration of the statistical effect through this deduction procedure is legally open to challenge. Alternatively, a legally secure assessment can be achieved by using an adapted points key with a higher pass mark instead of deducting points for incorrect answers. In the (most frequent) case that the test person has to prove knowledge of 50% of the material for a pass, the following corrected points key results:

Consideration of the statistical effect through corrected point key
Alternative answers per question	Passage limit
2	75%	¾
3	66.6%	⅔
4th	62.5%	⅝
5	60%	⅗
n	^{n +1} ⁄_{2 n}

As a result of a court ruling, however, North Rhine-Westphalian universities, for example, now use a fixed notation that takes into account neither the number of alternatives nor the correct answers per question and assumes that every task or correct answer is the same regardless of difficulty and complexity many points (namely one) is assessed. The pass threshold is usually 60% of the total number of points, but will be increased if the failure rate of first-time participants in an examination would otherwise be too high because this is considered an indicator of an inappropriately difficult examination. In order to cover both cases, the graduation key is defined as the percentage of correct answers above the flexible pass limit. More finely graded grades such as 1.3 and 2.7 are not specified, but are usually inserted linearly into the grid, whereby the question then is whether the limit for x applies to an x , 0 or an x , 3.

clef
grade	Minimum share	over pass	Sub-grade	soft interpretation		hard interpretation
1	90%	75%	1.0	93 ¹ ⁄ ₃  %	83 ¹ ⁄ ₃  %	90%	75%
1	90%	75%	1.3	90%	75%	86 ² ⁄ ₃  %	66 ² ⁄ ₃  %
2	80%	50%	1.7	86 ² ⁄ ₃  %	66 ² ⁄ ₃  %	83 ¹ ⁄ ₃  %	58 ¹ ⁄ ₃  %
			2.0	83 ¹ ⁄ ₃  %	58 ¹ ⁄ ₃  %	80%	50%
			2.3	80%	50%	76 ² ⁄ ₃  %	41 ² ⁄ ₃  %
3	70%	25%	2.7	76 ² ⁄ ₃  %	41 ² ⁄ ₃  %	73 ¹ ⁄ ₃  %	33 ¹ ⁄ ₃  %
			3.0	73 ¹ ⁄ ₃  %	33 ¹ ⁄ ₃  %	70%	25%
			3.3	70%	25%	66 ² ⁄ ₃  %	16 ² ⁄ ₃  %
4th	60%	0%	3.7	65%	12 ¹ ⁄ ₂  %	63 ¹ ⁄ ₃  %	8 ¹ ⁄ ₃  %
4th	60%	0%	4.0	60%	0%	60%	0%
5	0%	-	5.0	-	-	-	-

MC rating

If several answers are correct for a task, you should proceed as with several individual questions with two alternatives each (“does [not] apply”) and give a penalty of one point for wrong crosses. Answers that are not crossed or crossed twice have no consequence.

Two boxes should therefore always be provided for each alternative answer. Then the individual points are added up, negative sums are rated as 0.

Correctly created task
Which politicians were Federal Minister in the Brandt government? Yes No ☐ ☐ Karl Schiller ☐ ☐ Herbert Wehner ☐ ☐ Rainer Barzel ☐ ☐ Georg Leber ☐ ☐ Erich Mende

Inappropriately designed task, fair evaluation problematic
Which politicians were Federal Minister in the Brandt government? ☐ Karl Schiller ☐ Herbert Wehner ☐ Rainer Barzel ☐ Georg Leber ☐ Erich Mende

In order to determine the weight of the task within the overall examination, the number of points achieved can be converted to the desired number of points for the task. If, for example, as in the example shown, five possible answers have to be assessed, the overall task could earn 2 points from 4 partial points (i.e. max. One wrong cross), 1 point for 2–3 partial points and no other point.

advantages

Many learning objectives (with the exception of creative achievements) can be tested with these tests . In addition, they can usually be evaluated by machine. They are therefore used very often, e.g. B. the IQ test , the driver's license test and various qualification tests . School and university exams are also sometimes held this way. This test is also popular in company selection processes because all that is needed is a solution template .

disadvantage

The ability to develop the correct solution from purely formal information in the event of incomplete specialist knowledge or at least to eliminate individual distractors is discussed in the USA under the term testwiseness (" testability ") (Millman et al. 1965). For badly designed tests, the rule of thumb used to help, if in doubt, tick the longest answer. The New York School Board has published a parodic test that contains no meaningful knowledge, but can still be solved by purely formal reasoning.

Kubinger (2005) writes on the often underestimated impact of the rate effect on the diagnostic validity of MC tests:

The probability that an item in a test [question in the MC test; Note d. Ed.] Is only answered correctly by chance and insofar as it is "solved" is obviously greater, the fewer answer options are offered. In the test inventory available today for psychological diagnostics there are mostly five, namely the solution including four “distractors”. For such tests the a priori rate probability is 1/5 = 20%, i.e. In other words, even test persons without any correspondingly required ability would "solve" an average of 1/5 of all items. The problem is exacerbated by the fact that not all possible answers are equally plausible for test persons with at least less ability, so that of the five often one, two, sometimes three according to the falsification strategy [corresponds roughly here to: elimination procedure , see falsification ; Note d. Ed.] Correctly disregarded, which can increase the guessing probability per item individually up to 50%.

Multiple choice in international comparisons

Multiple choice tasks are also used in international school performance comparisons such as TIMSS , PIRLS or PISA . The US standard format is used, in which four to five answers are given, of which exactly one is rated as correct. In the German-speaking countries, however, where this question format is rarely used, over 10% of the students ticked more than one answer for individual questions in the first rounds of PISA.

A Canadian study shows that the advantage of North American students who are used to MC tests from their school days can also be demonstrated in exams during their studies.

Further problems

Multiple choice tests promote (partial) factual knowledge instead of specialist knowledge. People learn to verify answers instead of solving problems. A person who reliably finds the correct answer out of five possible ones may still not be able to solve the task at hand.
Example: The test person solves the problem until their solution matches one of the given solutions. If the test person does not make the mistakes that the creators of the MC solutions deliberately made in order to generate the wrong answers, the test person can find the right solution without being able to solve the task independently.

A third problem is the correct understanding of tasks, both due to ambiguities and lack of linguistic skills of the test person. You always test a mixture of specialist knowledge and mastery of the language in which the task was asked, even if the latter would not play a role in practice, because the task is usually asked from the context and not from a task in practice.

In SC selection answers, the distractors can differ greatly in their proximity to the correct answer, some are even deliberately obviously wrong, others only wrong in one easily overlooked detail. However, they are all rated equally, while in a free text correction, partial points might be given for some.

Measures against guessing

Multiple choice exams are very common at German and Austrian universities. To prevent students from guessing, the pass mark is set above the random probability or a negative point system is used, or both.

The variants presented here require either a binary decision (“true” or “false”) or a single choice.

absolute and relative ratio of bonus and malus
bonus	Malus	Neutral	absolute ratio	relative ratio
+1	−1	-	${\ displaystyle \| B \| = \| M \|}$	${\ displaystyle \| B \| = \| M \|}$
+1	−1	± 0	${\ displaystyle \| B \| = \| M \|}$	${\ displaystyle \| BN \| = \| MN \|}$
+1	−1	−½	${\ displaystyle \| B \| = \| M \|}$	${\ displaystyle \| BN \|> \| MN \|}$
+1	−1	−1	${\ displaystyle \| B \| = \| M \|}$	${\ displaystyle \| BN \|> \| MN \|}$
+1	−1	+ ½	${\ displaystyle \| B \| = \| M \|}$	${\ displaystyle \| BN \| <\| MN \|}$
+1	−2	-	${\ displaystyle \| B \| <\| M \|}$	${\ displaystyle \| B \| <\| M \|}$
+1	−2	± 0	${\ displaystyle \| B \| <\| M \|}$	${\ displaystyle \| BN \| <\| MN \|}$
+1	−2	−1	${\ displaystyle \| B \| <\| M \|}$	${\ displaystyle \| BN \|> \| MN \|}$
+1	−½	± 0	${\ displaystyle \| B \|> \| M \|}$	${\ displaystyle \| BN \|> \| MN \|}$
+1	± 0	+ ½	${\ displaystyle \| B \|> \| M \|}$	${\ displaystyle \| BN \| = \| MN \|}$
+1	± 0	± 0	${\ displaystyle \| B \|> \| M \|}$	${\ displaystyle \| BN \|> \| MN \|}$

In the simplest and most common procedure, each answer receives the same value in terms of amount, but correct positive and incorrect negative. This system is, however, legally controversial, because with this type of evaluation “points that have been achieved through a correct answer” can be deducted. The task of an examination is to “obtain statements about the job-related knowledge the examinee has. An assessment procedure in which examinations performed without errors are assessed as failed or poorly performed because other test questions have not been answered correctly, lacks this aptitude. ”(Quotation from the reasons for the judgment NRW, 14 A 2154/08).

Alternatives, which are intended to further reduce the influence of testwiseness , give a higher negative rating for incorrect answers than positive for correct answers.

Often, tasks with a malus are included in the overall evaluation with zero points in the worst case, even if the total of points is actually negative. This way, the transcript of records can be kept positive. However, this requires a task in which a task consists of several multiple-choice questions. Such tasks are often used in exams that contain predominantly other types of tasks.

Legal evaluation

In Germany there are now a large number of court rulings that show the limits of the use of the multiple-choice procedure , as it is called in the legal environment. Many judgments refer to an absolute pass limit, which has led to the fact that the number of existing candidates can differ considerably from year to year; other judgments relate to the evaluation of tasks.

Overall, it is therefore advisable to study the case law before designing multiple choice questions.

Examples

Which politicians were federal ministers in the Brandt government?

Karl Schiller
Herbert Wehner
Rainer Barzel
Georg Leber
Erich Mende

The number of correct answers is not specified. The first and fourth answers are correct. 2, 3 and 5 serve as distractors.

Which is the largest inland lake that lies entirely in Germany?

the Bodensee
the Müritz
the Steinhuder Meer

From the question text it initially seems clear that only one answer can be correct (the 2nd). The Steinhuder Meer, however, is an inland lake and is therefore also considered, but is smaller than the Müritz.

literature

KD Kubinger: Objective psychological-diagnostic procedures . In: H. Weber, T. Rammsayer (Ed.): Handbook of Personality Psychology and Differential Psychology from Handbook of Psychology . Hogrefe, Göttingen 2005, p. 158-165 .
J. Millman, CH Bishop, R. Ebel: An Analysis of Test-Wiseness . In: Educational Psychological Measurement . tape 25 , 1965, pp. 707-726 .

Web links

Individual evidence

↑ DORSCH Lexicon of Psychology

↑ Multiple response. In: Writing assessment questions for online delivery: Principles and guidelines . University of Bristol. Retrieved July 23, 2017.

^ Multiple response, itslearning. Retrieved July 23, 2017.

↑ Zeitartikel about such a case http://www.zeit.de/campus/2014/06/pruefungsverbindungen-klage

↑ ^a ^b ^c Decision of the Higher Administrative Court of North Rhine-Westphalia of December 16, 2008, 14 A 2154/08 http://www.justiz.nrw.de/nrwe/ovgs/ovg_nrw/j2008/14_A_2154_08 Judgment20081216.html

↑ Test Your Testwiseness , accessed on October 1, 2018 (PDF; 52 kB)

↑ Joachim Wuttke: The insignificance of significant differences. In: T. Jahnke, W. Meyerhöfer: PISA & Co - criticism of a program. Second edition. Franzbecker, Hildesheim 2007, p. 171 ff.
Also http://www.messen-und-deuten.de/pisa/Wuttke2007b.pdf . Wuttke points out that this distorts the test beyond the directly affected tasks, because it takes a lot more time to test four or five answer variants for correct / false instead of choosing the most plausible one.

^ A. Mahamed et al .: "Testwiseness" Among International Pharmacy Graduates and Canadian Senior Pharmacy Students. In: American Journal of Pharmaceutical Education . Volume 70, p. 131.

↑ Judgments - Assessment of multiple-choice exams

[1] DORSCH Lexicon of Psychology

[2] Multiple response. In: Writing assessment questions for online delivery: Principles and guidelines . University of Bristol. Retrieved July 23, 2017.

[3] Multiple response, itslearning. Retrieved July 23, 2017.

[4] Zeitartikel about such a case http://www.zeit.de/campus/2014/06/pruefungsverbindungen-klage