Reliability of written examinations

Teachers who administer written examinations to students obviously want to know whether the results of these examinations are ‘true’. A student’s score should reflect the extent to which the student has mastered the learning objectives for the course in question. This is, unfortunately, not always the case, as each examination provides only a snapshot, and information gathered from such examinations can therefore contain ‘noise’ that can make an examination score (e.g. 12 out of 20) reflect either over-evaluation or under-valuation. For an examination to be reliable, the results of a given group of students should correspond to the results of the same students on other examinations on the same topic and of the same level.

In this tip, we start by stating a number of criteria that are important to the reliability of a written examination. We then discuss a few practical guidelines for promoting reliability.

Criteria for reliability

The following criteria influence the reliability of written examinations:

Clarity
Have the questions formulated clearly enough to ensure that they do not confuse or mislead students unintentionally? Although challenging questions and questions that call for complex reasoning are obviously allowed, each question should contain a clearly formulated assignment that is not subject to ambiguous interpretation by someone familiar with the material. If a questions does contain such ambiguity, the multiple interpretations arising from it should not be penalised.
Difficulty
To what extent has the level of difficulty in the questions and the examination as a whole been coordinated to the level of the students for whom the examination is intended? An examination that is either extremely difficult or extremely easy does not provide enough information on the exact knowledge and skills of the students.
Specificity
Have the questions been worded in such a manner that only those with sufficient mastery of the learning objectives would be capable of producing a proper solution?
Differentiation
To what extent do the questions and the examination as a whole distinguish between students with a better and a poorer mastery of the learning objectives?
Length of the examination
Is the number of questions included in the examination sufficient to compensate for guessing? If a student answers four questions on a particular learning objective correctly, we can be more certain in concluding that the student has mastered this objective adequately than would be the case if only one question on this topic had been included and answered correctly.
Objectivity
Have the criteria for evaluating the answers been specified clearly enough in advance to minimise the potential influence of the individual assessor on the student’s score, provided that the assessor adheres to the evaluation criteria? Clear, pre-specified criteria are necessary in order to ensure interrater reliability, particularly if multiple assessors are involved in correcting an examination.

Construction guidelines

Taking the following guidelines into account when constructing a written examination can enhance its reliability:

Construct the examination in consultation with a colleague.
This guideline is important for gaining insight into both the clarity (i.e. the possibility of misunderstanding concerning the answer to be provided) and the difficulty of the questions. You could work with a colleague to develop the questions, or you could construct the examination on your own and then ask a colleague to complete it and use the feedback to make any necessary adjustments. One advantage of the latter approach is that it also provides an idea of how much time is needed in order to complete the examination.
Formulate the questions specifically, and take care that the questions in the examination are independent of each other.
This guideline is intended to prevent situations in which questions can be answered based on general knowledge and/or elements from other questions in the examination. This could have a negative effect on the specificity of the examination. Asking someone who has not completed the programme (or one related to it) to take the examination can help to determine whether all of the various examination questions are sufficiently specific.
Try to achieve a balance between questions that are relatively easy and those that are more challenging.
Ensure a 60:40 ratio between easier and more difficult questions. This will produce a balanced examination that allows sufficient differentiation between students who have mastered the learning objectives well and those who have not mastered them as well. Questions from previous examinations (your own or those of a predecessor) can be used as a base for assessing the difficulty of questions.
Ensure that the examination consists of a sufficient number of questions.
This guideline is intended to eliminate the chance of correct or incorrect answers by chance. Try to arrive at an optimal balance between the time needed to complete the examination and the length of the examination, while ensuring that the core of the module is examined thoroughly. Unless one of the objectives of the examination is to measure working speed, there should be sufficient time to answer all of the questions.

Correction guidelines

Finally, the following guidelines are important in order to promote the objectivity of scoring when correcting written examinations:

Correct the examination based on pre-specified evaluation criteria.
Use an answer key that includes the various elements of a correct answer. For essay questions, it could be worthwhile to supplement the answer key with assessment aspects that should be considered (e.g. the extent to which the answer is properly structured or constructively critical). The relative weight of the various components or assessment elements of the answer should always be stated. In addition, the weight of the questions within the test as a whole should be clarified.
Anonymise the examination forms.
If you know the students by name, anonymising the examination forms can help you to arrive at objective scoring. If correction is not anonymous, it is often difficult to ensure that previous impressions of students do not influence your scoring (either positively or negatively).
Correct examination forms question by question, and change the order of the forms regularly.
Correcting question by question (instead of examination by examination) makes it easier to adhere consistently to the pre-specified evaluation criteria for the various questions. This also makes it possible to makes it possible to regularly change the order in which you correct the examination forms (e.g. such that Student Y’s form is not always corrected before Student X’s form), thereby promoting more objective scoring. This is because everyone has a natural tendency to over-value a solution that follows a poor solution and, conversely, to under-value a student’s answer after having read an excellent response.

Want to know more?

Jessop, Tansy and Hughes, Gwyneth. (2018). Beyond winners and losers in assessment and feedback. In J.P. Davies & N. Pachler (Eds.), Teaching and Learning in Higher Education: Perspectives from UCL (pp. 64-83). London: UCL IOE Press.

Norton, L. (2009). Chapter 10, Assessing student learning. In H. Fry, S. Ketteridge, & S. Marshall (Eds.), A handbook for teaching and learning in higher education: enhancing academic practice (3rd ed., pp. 132-149). New York Abingdon: Routledge.

Wakeford, R. (2003). Principles of student assessment. In H. Fry, S. Ketteridge & S. Marshall (Eds.), A handbook for learning and teaching in higher education (2nd ed., pp. 42-61). London: Kogan Page.

The Graide Network (Sep. 10, 2018). Importance of Validity and Reliability in Classroom Assessments. Retrieved from The Graide Network website (accessed August 23, 2019).

Morgan, C., Dunn, L., Parry, S. & O'Reilly, M. (2004), The student assessment handbook : new directions in traditional and online assessment , London: Routledge Falmer.

Lees deze tip in het Nederlands

Criteria for reliability

Specificity
Have the questions been worded in such a manner that only those with sufficient mastery of the learning objectives would be capable of producing a proper solution?

Differentiation
To what extent do the questions and the examination as a whole distinguish between students with a better and a poorer mastery of the learning objectives?

Construction guidelines

Correction guidelines

Want to know more?

Cookie prreferences

Criteria for reliability

Specificity Have the questions been worded in such a manner that only those with sufficient mastery of the learning objectives would be capable of producing a proper solution?

Differentiation To what extent do the questions and the examination as a whole distinguish between students with a better and a poorer mastery of the learning objectives?

Construction guidelines

Correction guidelines

Want to know more?

Specificity
Have the questions been worded in such a manner that only those with sufficient mastery of the learning objectives would be capable of producing a proper solution?

Differentiation
To what extent do the questions and the examination as a whole distinguish between students with a better and a poorer mastery of the learning objectives?