I’ve been in this situation a lot of times until now. I would say that I’ve been more rated than rater. But either way, this is the most impossible thing that a human can do to another human. Those at HR departments who think they will ever have any accurate results to such ratings, they are definitely wrong, in most cases they rate good guys with bad results because they are simply disconnected from reality and their ratings are totally based on wrong data. Or if some still thing that they do a good job, HR partner or team leader or manager of whatever his/her rank is, then I have some questions for you guys. Give me a clear answer to the following questions:
- How much do you think you can know about a person simply by watching him?
- If you work with him every single day, do you think you can figure out what drives him?
- Could you spot enough dues to reveal to you whether he’s competitive, or altruistic, or has a burning need to cross things off his list every day?
- How about his style of thinking? Are you perceptive enough to see his patterns and pinpoint that he is a big-picture, what-if thinker, or a logical, deductive reasoner, or that he values facts over concepts?
- And could you parse how he relates to others, and discern, for instance, that he’s far more empathetic than he appears, and that deep down he really cares about his teammates?
Perhaps you can. Perhaps you are one of those people who instinctively picks up on the threads of others’ behaviors and then weaves these into a detailed picture of who a person is and how he moves through the world.
Certainly, the best team leaders seem able to do this. They pay close attention to the spontaneous actions and reactions of their team members, and figure out that one person likes receiving praise in private, while another values it only when it’s given in front of the entire team; that one responds to dear directives, while another shuts down if you even appear to be telling him what to do. They know that each member of their team is unique, and they spend a huge amount of time trying to attend to and channel this uniqueness into something productive. So then I keep asking:
- How about rating your team, though? Do you think you could accurately give your team members scores on each of their characteristics?
- If you surmise that one of your team is a strategic thinker, could you with confidence choose a number to signify how good at it he actually is?
- Could you do the same for his influencing skills, or his business knowledge, or even his overall performance?
- And if you were asked how much of these things he had in relation to everyone else on the team, do you think you could weigh each person precisely enough to put a number to each person’s relative abilities?
This might sound a bit trickier – you’d have to keep your definition of influencing skills stable, even while judging each unique person against that definition. But if I gave you a scale of 1 to 5, with detailed descriptions of the behaviors associated with each number on the scale then:
- Do you think you could use that scale fairly, and arrive at a true rating?
- And even if you are confident in your own ability to do this, what do you think about all the other team leaders around you? Do you think they would use the scale in the same way, with the same level of objectivity and discernment as you?
- Or would you worry that they might be more lenient graders, and so wind up with higher marks for everyone, or that they might define “influencing skills” differently from you?
- Do you think it’s possible to teach all of these team leaders how to do this in exactly the same way?
It’s a lot to keep straight – so many different people rating so many other different people on so many different characteristics, producing torrents of data. But keep it all straight we must, because this data represents people, and once collected, it comes to define how people are seen at work.
At least once a year, a number of your more senior colleagues will gather together in a room to discuss you. They will talk about your performance, your potential, and your career aspirations, and decide on such consequential issues as how much bonus you should get, whether you should be selected for a special training program, and when or if you should be promoted. This meeting, as you might know, is called a “talent review”, and virtually every organization conducts some version of it. The organization’s interest is in looking one by one at its people-its talent-and then deciding how to invest deferentially in those individuals. The people who display the highest performance and potential – the stars, if you like-will normally get the most money and opportunity, while those further down the scale will get less, and those struggling at the lower end of the scale will more than likely be moved into a euphemistically described Performance Improvement Plan (P.I.P.) and thereby eased out.
These talent reviews are the mechanism that organizations use to manage their people. They want to keep the best people happy and challenged, and simultaneously weed out those who aren’t contributing. Since, in most organizations, the largest costs are people’s wages and benefits, these meetings are taken very seriously, and the most pressing question-a central preoccupation of all senior leaders in all large organizations-is, “How can we make sure that we are seeing our people for who they really are?”
This is a wake-up-in-the middle-of-the-night sort of question for senior leaders, because they worry that their team leaders might not, in fact, understand the sort of person the organization needs nearly as clearly as the senior leaders do, and further that the team leaders might not be objective raters of their own people. To combat this worry, companies have set up all sorts of systems designed to add rigor to this review process. The one you may be most familiar with is the nine box.
This is a graph showing performance along the x-axis and potential up the y-axis, with each axis divided into thirds – low, medium, and high-to create nine possible regions. Each team leader is asked to think about each person on his or her team and then place them, in advance of the talent review, into one of the nine boxes -to rate them, that is, on both their performance and their potential. This system is designed to allow a team leader to highlight that a particular person might have bags of potential, and yet not have translated that potential into actual performance, whereas another team member might contribute top-notch performance, and yet have very little potential upside – he’s maxed out in his current position. With this data displayed in the talent review, the leadership team can define different courses of action for each person: the former will be given more training and more time, for example, while the latter might just be offered a healthy bonus.
Many companies also give people performance ratings on a scale of 1-5, either in parallel with or as an alternative to the nine-box process.
Again, each team leader is asked to propose a rating for each person on his or her team. Then, before or as part of the talent review, there is a meeting called a “consensus” or “calibration” meeting, which goes something like this: your team leader talks about you and defends why he ended up giving you a 4 rating, and then his colleagues weigh in on why they gave their people 5s, or 4s, or 3s, whereupon debates ensue about what really constitutes a 4, whether a 4 on one team is the same as a 4 on another team, whether you truly deserve a 4 this year, and if you do, whether the organization has enough 4s left over to allow you to have one.
If the organization has run out of 4s -which happens often since many team leaders are reluctant to give a person a 3 or, perish the thought, a 2 – then your team leader may have to give you a 3 and tell you that, though you truly deserved a 4, it wasn’t your turn this year, and that he will look out for you next year. This is called “forcing the curve,” which is the name given to the rather painful process of reconciling the organization’s need to have only a certain percentage of employees show up as super-high performers with the team leaders’ tendency to give high ratings to everyone so as to avoid having unpleasant performance conversations.
Forced curves are no one’s idea of fun, but they are felt to be a necessary constraint on team leaders, and a way of ensuring that rewards are appropriately “differentiated,” so that high performers get much more than low performers. Perhaps wanting to add more precision to the words performance and potential, many organizations have created lists of competencies that team members are supposed to possess, and against which they are rated at the end of the year.
I am still in doubt that these models are true reflections of what performance looks like in the real world. Does anyone really have all of the competencies? Can we really prove that those who acquire the ones they lack outperform those who don’t?
Nevertheless, many organizations still rate each person against such standard checklists. To aid in this, each competency is defined in terms of behaviors, and then the behaviors are tied to a particular point on the rating scale. So, for example, on a competency called organizational savvy and politics, if you see that the person “Provides examples of savvy approaches to successfully solving organizational problems,” then you’d rate her a 3. If you see that she “Recognizes and effectively addresses politically challenging situations,” you would rate her a 4. Using your behaviorally anchored competency ratings as your building blocks, you would then be asked to construct an overall rating of her performance and potential, and this is how she’ d be represented during the talent review.
Historically, the talent review has happened only once or twice a year. With the arrival of smartphones it’s now technologically possible for an organization to launch short performance-ratings surveys throughout the year. Each person can be rated by their peers, direct reports, and bosses, and then the scores can be aggregated either at mid-year or at year’s end to produce a final performance rating. This race to real-time ratings appears as inevitable as it is frenzied, and all of it is in service of the organization’s interest, which is to answer the question, “When it comes to our people, what do we really have here?”
Your interest in all this is related, but different. You won’t be too worried about competencies, and calibration sessions, and behavioral anchors, all of which probably sound a bit esoteric. Instead, you’ll be acutely aware of a few real-world practicalities that boil down to the fact that your pay, your promotion possibilities, and possibly even your continued employment are being decided in a meeting to which you are conspicuously not invited. The people who are in the room-some of whom you know, and some of whom know you, and others of whom you’ve never met-are talking about you, and people like you, and they are rating you, deciding which box you go in, and thereby deciding what you will get after a year of hard work, and also where your career will go next. You may not realize this during your first couple of years in the workforce, but once you do, it will preoccupy you.
You’ll think to yourself: Do I really want these people to think well of me?. Do I really, really want these people not to think ill of me. But most of all, I want the truth of me in the room where the decisions are made. This is your interest.
You will come to wonder about these rating scales, these peer surveys, and these always-on 360-degree apps, and you will hope that there is enough science in them, enough rigor and process, that you-ideally, the best of you – will be portrayed accurately. After that, let the chips fall where they may. At least, then, you will have been given a fair hearing on your true merits as a person, and as a team member.
It is going to bother you greatly to learn, then, that in the real world, none of this works. None of the mechanisms and meetings – not the models, not the consensus sessions, not the exhaustive competencies, not the carefully calibrated rating scales – none of them will ensure that the truth of you emerges in the room, because all of them are based on the belief that people can reliably rate other people. And they can’t. This, in all its frustrating simplicity, is a lie.
It’s frustrating because it would be so much more convenient if, with enough training and a well-designed tool, a person could become a reliable rater of another person’s skills and performance. Think of all the data on you we could gather, aggregate, and then act on! We could precisely peg your performance and your potential. We could accurately assess your competencies. We could look at all of these and more through the eyes of your bosses, peers, and subordinates. And then we could feed all this into an algorithm, and out would come promotion lists, succession plans, development plans, nominations for the high-potential program, and more. But none of this is possible, despite the fact that many human capital software systems claim to do exactly what’s described above.
Over the last forty years, we have tested and retested people’s ability to rate others, and the inescapable conclusion-reported in research papers is: that human beings cannot reliably rate other human beings, on anything at all.
We could easily confirm this by watching the ice-skating scoring at any recent Winter Olympics – how can the Chinese and the Canadian judges disagree so dramatically on the scoring of that triple toe loop? This is extremely subjective due to on single factor: the unique personality of the rater. The same happen at work. Each rater-regardless of whether he or she is a boss, a peer, or a direct report – displays his or her own particular rating pattern. Some we have very lenient raters, skewing far to the right of the rating scale, while others were tough graders, skewing left. Some had natural range, using the entire scale from 1 to 5; while others seemed to be more comfortable arranging their ratings in a tight cluster. Each person, whether he or she realize it or not, has an idiosyncratic pattern of ratings, so this powerful effect come to be called the Idiosyncratic Rater Effect.
Here’s what’s going on. When Lucy rates Charlie on the various subquestions in the competency called strategic thinking, there is a distinct pattern to her ratings, which her organization believes reflects her judgment about how much strategic thinking Charlie has. For this to be true, however, when Lucy then turns her attention to a different team member, Steve, and rates him on the same competency, the pattern of her ratings should change, because she is now looking at a different person with, presumably, different levels of strategic thinking. We can therefore conclude that Lucy’s pattern of ratings does not change when she rates two different people.Instead her ratings stay just about the same-her ratings pattern travels with her, regardless of who she’s rating, so her ratings reveal more about her than they do about her team members.
I think that rating tools are windows that allow us to see out to other people, but they’re really just mirrors, with each of us endlessly bouncing us back at ourselves. And this effect is not, by the way, associated with unconscious bias on the part of the rater for or against people of a particular gender, race, or age. These biases do exist, of course, and we should do everything we can to teach people how to see past them or remove them – what I can clearly say is that the Idiosyncratic Rater Effect applies regardless of the gender, race, or age of both the rater and the person being rated. This is the first hurdle we must face.
The idiosyncrasy of the rating pattern sterns from the uniqueness of the rater, and doesn’t appear to have much of anything to do with the person being rated. In fact, it’s pretty much as though that person isn’t there at all. When we rate other people on a list of questions about their abilities, the Idiosyncratic Rater Effect explains more than half of why we choose the ratings we do.
Since you’re most concerned that the truth of you be in the room, this should worry you enormously. The rating given to you tells us, in the main, about the rating patterns of your team leader, and yet, in the room, we act as though it tells us about the performance patterns in you. And even if we could in fact correct for our rating idiosyncrasies, we’d still have another hurdle in front of us. The people you work with simply don’t interact with you enough to be able to pinpoint the extent to which you possess, say, influencing skills, or political savvy, or strategic thinking, or frankly any abstract attribute.
People at work are preoccupied (with work, mainly), and paying attention to you closely and continuously enough to be able to rate you on any of these abstractions is a practical impossibility. They simply don’t see you enough. Their data on you is insufficient-hence the name for this second hurdle: data insufficiency.
If Olympic ice-skating judges can’t agree on the quality of each triple toe loop, when the only thing they are doing is sitting watching triple toe loops one after the other, then what hope does a busy peer, direct report, or boss have of accurately rating your “business acumen”?
Even if we changed the world of work, and created a job category of roving raters whose sole responsibility was to wander the hallways and meeting rooms, to watch each person act and react in real time, and then to rate each person on a list of qualities, we still wouldn’t get good data, in part because our definitions are poor.
A triple toe loop = is defined as a take-off from a backward outside (skate) edge assisted by the toe of the other foot, followed by three rotations, followed by a landing on the same backward outside edge-and this is the only definition of it.
Business acumen = is keenness and speed in understanding and deciding on a business situation … people with business acumen … are able to obtain essential information about a situation, focus on the key objectives, recognize the relevant options available for a solution, [and] select an appropriate course of action.
And this is just one of many definitions you’ll encounter. Furthermore, there is a world of difference between the specificity of “take-off from a backward outer edge” and the vagueness of “essential information,” “key objectives,” and “appropriate course of action.” But then let me ask you:
- Essential to whom?
- Key objectives as determined by whom?
- Appropriate course of action as determined how?
Of course, each of us reading the definitions thinks, “Well, I could easily define those for myself”– but that’s the point. When we rate people on abstractions, there is even more scope for our ratings to reflect our own idiosyncrasies. And because one person’s understanding of business acumen is meaningfully different from another’s, even when two highly trained and focused raters rate the same person on the same quality, they find it extraordinarily difficult to arrive at the same rating for the same quality. To all this talk of the Idiosyncratic Rater Effect and data insufficiency, however, some will tell you to calm your fears. The truth of you will indeed emerge in the room, they’ll say, because even though one _ person might be an unreliable and idiosyncratic rater, many people won’t be.
The problem with almost all data relating to people-including you – is that it isn’t reliable. Goals data that reports your “percent complete”; competency data comparing you to abstractions; ratings data measuring your performance and your potential through the eyes of unreliable witnesses: it wobbles by itself, and fails to measure what it says it’s measuring. One of the most bizarre implications of this systematic unreliability is that, in what is supposedly the age of big data, no organization can say what drives performance-at least, not knowledge-worker performance. We may be able to say something intelligent about what drives sales, say, or piece-work output, because both of these are inherently and reliably measurable-they can be counted.
But for any other work-which means most work-we have no way of knowing what drives performance, because we have no reliable way of measuring performance. We don’t know:
- whether bigger teams drive performance more than smaller teams.
- whether remote workers perform better than colocated workers.
- whether culturally more diverse teams are higher performing than less diverse ones.
- whether contractors are higher performers than full-time employees, or if it’s the other way around.
- We can’t even show that our investments in the training and development of our employees lead to greater performance.
We can’t say anything about any of these things, precisely because we have no reliable way to measure performance.
I began this post by asking you how you can be confident that the truth of you is in the room during the talent review-how you can be confident that decisions about your pay, your next role, your promotion, and your career are being made based on a true understanding of who you are. But actually, you don’t want the truth of you in the room. You don’t want someone to be in any room pretending that they have a reliable measure of who you are.
In the same way that you hated your singular performance rating-you were never just a 3, because you were never just a number-so you will come to despise the newer tools that now claim, ever more loudly, to capture all your essential competencies. They don’t, and they never will: they simply add gasoline to the conflagration of bad data purporting to represent you. Any tool that pretends to reveal who you are is false. What you want in the room is different: not the truth of you, but just the truth. You don’t want to be represented by data that attempts, arrogantly, to divine who you are. Instead, you want to be represented by data that simply, reliably, and humbly captures the reaction of your team leader to you. That’s not you, and it shouldn’t pretend to be you. It’s your leader, and what he feels, and what he would do in the future. And that’s enough. Truly.