This paper explores the impact of the rubric rating scale on the evaluation of projects from a first year engineering design course.Asmall experiment was conducted in which twenty-one experienced graders scored five technical posters using one of four rating scales. All rating scales tested produced excellent results in terms of inter-rater reliability and validity. However, there were significant differences in the performance of each of the scales. Based on the experiment’s results and past experience, we conclude that increasing the opportunities for raters to deduct points results in greater point deductions and lower overall scores. Increasing the granularity of the scale can reduce this effect. Rating scales that use letter grades are less reliable than other types of scale. Assigning weights to individual criteria can lead to problems with validity if the weights are improperly balanced. Thus, heavily weighted rubrics should be avoided if viable alternatives exist. Placing more responsibility for the final score on the grader instead of the rubric seems to increase the validity at the cost of rater satisfaction. Finally, rater discomfort can lead to intentional misuse of a rating scale. This, in turn, increases the need to perform outlier detection on the final scores. Based on these findings, we recommend rating scale rubrics that use simple 3 or 4-point ordinal rating scales (augmented checks) for individual criteria and that assign numerical scores to groups of criteria.
International Journal of Engineering Education, 2013, Vol 29, Issue 6