All Articles
Science

Your Annual Review Has Been Broken Since the Roman Empire — On Purpose

By The Clio Method Science
Your Annual Review Has Been Broken Since the Roman Empire — On Purpose

Your Annual Review Has Been Broken Since the Roman Empire — On Purpose

It's that time of year. You fill out a self-evaluation form that asks you to rate your own performance on a scale from one to five. Your manager fills out an identical form about you. You sit in a room together and agree that you are performing at a level of three-point-seven-five. Nobody's salary changes. The form goes into a system nobody will ever open again. You return to your desk and keep doing your job exactly as you were doing it before.

You probably assume this is a failure of modern management — a bureaucratic ritual that corporations haven't bothered to fix because fixing things is hard and the current system at least generates paperwork. You are half right. It is a failure. But it is not a modern one, and calling it a failure implies it was ever supposed to work the way the org chart says it does.

The performance evaluation is at least two thousand years old. And it has been producing the exact same dysfunction for the entire run.

The Roman Legionary Report Card

The Roman army was, by ancient standards, a documentation machine. Legions kept daily records — the acta diurna — that tracked individual soldiers' duties, absences, equipment status, and conduct. Officers were evaluated on their fitness reports, called probatio assessments, which determined promotion and assignment. This was not informal. It was standardized, hierarchical, and tied directly to career advancement.

And the soldiers gamed it immediately.

The historian Sara Elise Phang, in her work on Roman military culture, documents extensive evidence of soldiers cultivating relationships with evaluating officers — not through performance but through social proximity. The legionary who was good at contubernium (barracks-room politics) advanced. The one who was excellent at actual soldiering but had the wrong friends did not. Sound familiar?

More telling: the officers who were supposed to be doing the evaluating had their own evaluations, conducted by tribunes, who were evaluated by legates, who were evaluated by the Senate. At each level, the incentive was to report upward that everything was fine. Problematic soldiers reflected poorly on their centurions. Problematic centurions reflected poorly on their tribunes. The result was a reporting system that was exquisitely calibrated to produce good-looking numbers and structurally resistant to surfacing actual problems.

The Roman army still functioned, of course. It conquered most of the known world. But it functioned despite its evaluation system, not because of it — because the actual performance information traveled through informal networks of reputation and personal loyalty that existed alongside the official paperwork and bore little relationship to it.

The Egyptian Papyrus Problem

About five hundred miles southeast of Rome's sphere of influence, the grain bureaucracy of Roman Egypt was generating its own archive of administrative dysfunction. The Oxyrhynchus papyri — a massive collection of documents recovered from an ancient Egyptian trash heap and now housed largely at Oxford — include hundreds of administrative records from the first through fourth centuries CE, and they are a remarkable window into middle management.

Grain workers and tax collectors were evaluated against output quotas. The quotas were set centrally. The evaluations were conducted locally. The people conducting the evaluations were also responsible for meeting the quotas. This created an obvious incentive structure: if your workers were underperforming, you either reported the underperformance (which reflected on you) or you adjusted the numbers (which did not). The papyri show, in the dry language of ancient bureaucracy, extensive evidence of the second option.

One particularly well-documented case involves a series of grain delivery records from the third century CE in which reported yields from a specific region remain suspiciously consistent across years that, based on climate and agricultural data, should have shown significant variation. Someone was smoothing the numbers. The evaluation system was generating data that looked like performance management and functioned as fiction.

China's Nine-Point Scale and the Invention of the Bell Curve

The Chinese imperial bureaucracy under the Han dynasty (roughly contemporary with the height of Roman power) developed one of the most sophisticated performance evaluation systems in the ancient world. Officials were assessed on a nine-rank scale covering moral conduct, administrative competence, and personal loyalty. Evaluations happened annually and were theoretically tied to promotion, demotion, and dismissal.

The political philosopher Shen Yue, writing in the fifth century CE, documented what had gone wrong: nearly every official was being rated in the middle three ranks. The top ranks were reserved for political favorites. The bottom ranks were used as weapons in factional disputes. The vast middle — where most actual performance variation lived — was compressed into meaninglessness because rating someone too high or too low created enemies, and creating enemies was bad for your evaluation.

This is, with minor variation, the modern forced-distribution performance review problem. Every HR department in America has at some point tried to solve the "everyone gets a three" problem by mandating a bell curve — requiring managers to rate a certain percentage of employees in each tier. The Chinese imperial bureaucracy tried this too. It didn't work then either. Managers simply traded ratings with each other to hit their distributions while protecting their actual relationships.

Why It Doesn't Get Fixed

Here's the thing that the management consulting industry doesn't love to talk about: the performance review may be working exactly as intended, just not for the purposes written in the employee handbook.

What formal evaluation systems actually do, consistently across cultures and millennia, is legitimize existing hierarchy. They create a paper trail that explains why the people who were going to get promoted anyway got promoted. They give managers a documented reason for decisions they'd already made on the basis of personal loyalty, political alignment, and gut feeling. They protect the organization from legal challenge. They make the power structure look like a meritocracy.

This is not a cynical conspiracy. It's a deeply human tendency. We are social primates who evolved in small groups where personal relationships were the most reliable signal of trustworthiness and competence. The formal system sits on top of that ancient social hardware and largely loses. The Roman centurion who promoted his favorite wasn't being corrupt by his own lights — he was being rational. He knew that soldier. He trusted him. The papyrus form said something different, but the papyrus form didn't know what he knew.

The Lesson That Never Gets Learned

Every decade or so, a new management framework arrives promising to finally fix performance evaluation. Management by Objectives. 360-degree feedback. OKRs. Continuous performance management. Radical transparency. Each one generates a wave of consulting engagements, airport business books, and genuine organizational optimism. Each one, within five to ten years, produces documentation showing that the people being evaluated are gaming the new metrics, the managers are protecting their favorites within the new framework, and the executives are ignoring the results in favor of their own judgment.

This is not because the new frameworks are bad. Some of them are quite good. It's because the frameworks are being implemented by human beings whose social and hierarchical instincts are significantly older and more powerful than any performance management philosophy.

The Romans figured this out — or rather, they lived it — two thousand years ago. They kept the evaluation system anyway, because the appearance of systematic meritocracy was valuable even when the reality didn't match. We are doing the same thing, in the same way, for the same reasons.

Your three-point-seven-five is in excellent historical company.