Skip main navigation

New offer! Get 30% off your first 2 months of Unlimited Monthly. Start your subscription for just £35.99 £24.99. New subscribers only T&Cs apply

Find out more

02.09 – Rater Errors

02.09 - Rater Errors
Evaluating someone’s performance is a particular type of a cognitive process. And just like any cognitive process it’s often fraught with biases. And these biases often lead to rater errors. So, I would like for us to discuss some of the most common, prevalent rater errors in performance evaluations. Learn to recognize them, and talk about possible counter measures. Have you ever thought of why so many of us are afraid of flying? If you tell me it’s because it’s dangerous, it’s not a particularly defensible argument, because our odds of dying from falling or in a car accident are substantially higher.
But, even though plane crashes are rare events, they’re so heavily broadcasted by the media, but they’re made very salient, very vivid in our memory. They’re really stuck in our memory. And this helps us understand the availability error, where in making judgements and making decisions, we tend to overemphasize information, that is readily available, recently exposed, vivid, easily retrievable from memory. When applied to assigning performance ratings, what we tend to emphasize is most recent meetings, interactions, projects, and assignments. Or, he really was a star in this last meeting with a customer. Or, she really bombed that last presentation. We’re particularly likely to fall prey to availability errors when we don’t have a systematic record of employees’ performance.
So again I know I’ve said that many times before, but open that text file on your teammates, on your direct reports, you need to evaluate it, and keep track of their performance throughout the entire year. You can also see that the change in performance appraisal systems at many companies now a days, including Accenture, Microsoft, Netflix, where companies are striving for more continuous feedback, more frequent feedback. Then once a year is driven in part by their desire to deal with the availability error.
Leniency error is a situation when some raters have a tendency to inflate ratings. We wanna be liked and we are reluctant to deliver bad news. So I’m gonna inflate everybody’s ratings. I’m gonna give everybody high scores because, in this case, I’m happy, I don’t have to deliver bad news. You’re happy because you’re receiving only good news, and we often have a very happy, and a very dysfunctional team. Distributionally you can see that, in this case, ratings are skewed to the right. Another variant of a distributional error is, the central tendency error, where we tend to rely on the middle range of the scale, and avoid the extremes. We play it safe.
As you can see that our scores cluster in the middle. One of the large companies I work with discovered that in their performance appraisals, 95% of their employees were rated between two and four on a five point scale. In one of the business units, out of 300 employees, only one person received the score of five. There could be multiple reasons why managers are reluctant to give extreme scores. They might be reluctant to give really high scores because you might feel that communicates implicit promises to the employee, with respect to their future career progression. People can be reluctant to give really low scores because, again, that’s delivering bad news.
It also comes with extra paperwork, escalation of those cases, developmental plans, and so on. But you can see that both the central tendency error and leniency error are quite problematic, because it makes comparisons among employees extremely difficult. And the consequence of that is that it creates perceptions of inequity where top performers don’t feel adequately differentiated from low performers.
By the way, you can also see that a lot of enthusiasm behind forced curves, forced B rankings, stems from their ability to deal exactly with these two errors, central tendency and leniency.
Attribution error. You may recall that we talked about this in course one, where in evaluating others’ performance, we tend to attribute employee’s poor performance to his or her internal characteristics. You fail because you’re lazy, incompetent, not skilled. So we underestimate the effect of situational constraints, and that creates at least two problems. One is that we can assign needlessly stringent ratings to employees. And secondly our intervention efforts can fail because we’re not addressing the root cause of the problem, we’re not addressing the situational constraints, lack of resources, support from a manager, support from the teammates. Sample size error, what I like to call the faith in small samples.
What I would like for us to recognize is that when we’re trying to compare an employee with three years of experience at our company, to somebody else with three months of experience at our company, these are not exactly comparable cases. Small sample sizes, in terms of, performance data can lead us to make mistakes on the side of the extremes. Larger sample sizes, in contrast, enable us to get closer to that true mean. So, as an example, somebody may seem like a star based on their first three months on the job, in just a few data points, but they might be more of an average employee when we look at their performance, over the course of the entire year.
And, finally, I’d like for us to talk about halo error, which is the tendency to not distinguish between different dimensions of performance to be evaluated. In other words, we tend to assign ratings based on our overall impression of a given employee. I’m giving you an example here that if an employee has a perfect attendance record, the rater will also give her high ratings for productivity and efficiency. Or, if you’re really good at your presentation skills and quick on your feet, I might also assume that you’re good at developing others. So what research tells us is there are two particular types of rater training that can be helpful in dealing with these errors.
One type of training is called the rater error training, which would start with our defining and providing examples of common rating errors, which is exactly what we’ve done so far. Then we’d show video cases or discuss printed cases, vignettes, that are designed to elicit rating errors. It would be fictitious performance scenarios. You do brief examples of these common rating errors led by an experienced facilitator, and you go through the cycle multiple times, where the facilitator helps understand the differences between the trainee scores and the true scores, for these fictitious vignettes. Another type of rater training is the frame of reference training. Process wise, it is very similar to rate of error training.
We use written and videotape examples to practice performance evaluation repeatedly. But the goal of this training is to provide raters with an appropriate, common evaluative standard for performance dimensions to be rated. Put simply, the key goal of the frame of reference training is for us to walk away from that session with a very good understanding of what constitutes excellent, good, average, and poor performance. For example, you might spend some time discussing these particular anchors for customer service dimension. How do we differentiate ones from fives? Five, exceptional customer service as measured by customer service feedback in post customer visit surveys, and end of the year customer comprehensive satisfaction survey.
You also need to exhibit evidence of continuous innovation in customer service tools and processes, and receive company industry awards for exceptional customer service. Ones, on the other hand, substandard customer service feedback in post customer visit surveys, and end of the year customer comprehensive satisfaction survey, lack of innovative activity. I’m giving you here as a reference tool of a step by step process of how to think about designing frame of reference training. This approach was provided by and designed by Herman Aguinis.
What I would like for you to see here is the results of a meta analysis that evaluates the effects of rater training on rating accuracy. Now meta analysis, as you may know already, is a type of analysis that aggregates results from a number of empirical studies, and reports the average aggregate effect. And you can see here that rater error training was found to be effective in reducing halo error, leniency error, and increasing rating accuracy. Frame of reference training increased overall rating accuracy, and increased observational accuracy or the accuracy with which record behaviors and performance outcomes.
So it maybe tempting for us to rush to address some of these rater areas with really radical, very invasive procedures, such as installing a forced curve or a forced b ranking. But before we go there, consider these much softer and less invasive interventions, which are much less costly, and much less risky for organizations, with those alternative measures.
This article is from the free online

Managing Talent

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now