Back in March it was reported that Obsidian Entertainment missed out on a bonus payment from their publisher Bethesda when their game Fallout: New Vegas narrowly failed to score 85 points on the Metacritic scoring system, which aggregates game ratings from a variety of sources to form an overall score. There was some discussion over whether that sort of criterion was a fair one to base developer bonuses on, especially given the news that Obsidian were having to make lay-offs when this news became public. The implication was that if they’d got the bonus, those job losses may not have happened.

The debate was renewed this week when it emerged that Irrational Games, makers of classics like Bioshock and System Shock 2, had included a requirement on a recent job advert that applicants should have a credit on a game with a Metacritic score of 85 or above. After the initial flurry of criticism online, the requirement has been removed from the ad. Yet it would be hard to imagine that this consideration will have been forgotten entirely, since it was considered important enough to add in the first place.

Gamasutra asked industry writers on their thoughts, and they tend to converge on a common position of being critical of the practice:

  • “Some really smart and talented folks have contributed to games that weren’t outright critical darlings.”
  • “Holding their individual work to a group standard, and a nebulous one at that, is beyond the pale.”
  •  “[…] it’s even worse when you’re pinning that badge to an individual whose contribution to a bad game could have been amazing, or to a great game could have been insignificant.”
  • “your Metacritic score is really just an arbitrary number derived from the press, and it doesn’t take much to ruin your chances of receiving a “good” score.”
  • “who would want to work for a company that believes this to be an acceptable requirement for hiring?”

The complaints seem to revolve around a few key issues – that organisations (whether publishers in the case of bonuses, or developers in the case of hiring) shouldn’t be judging whole people and entire products based on these numerical scales,  that the Metacritic scores themselves are arbitrary and don’t measure anything useful, and that good people worked well on games that weren’t critically acclaimed (and vice versa).

The first issue is odd, because the job world is already heavily numbers-based. Even the amended job spec with the Metacritic requirement taken out asks for 6 or more years as a designer in the industry, 4 or more years of management experience, and 3 or more games worked on for the project duration. Requirements like these are common for top-tier jobs – for entry level positions it’s common to see requirements like “1-2 years in proficiency in C++“,6+ months console development experience“, “Bachelors Degree“, etc. Of course there will be people without one or more of these criteria who are better than some of the candidates with all the criteria, but we know that the criteria are still a useful guide. The number of false negatives you will suffer by ruling people out wrongly is almost certainly compensated by the time saved in filtering out inappropriate applicants.

As for the relationship between a publisher and a developer, a publisher will often tie bonus payments to sales figures, and payments during the development period may be dependent on the quality of milestone builds or on the dev team meeting fixed deadlines. Tying bonuses to sales is an important part of managing the risk a publisher is exposed to when funding a project, and is essentially equivalent to paying less up front but adding royalties, except with a steeper threshold. This lets them minimise the fixed cost while also being able to reward successful developers with money that would generally only ever come from profits. Without the ability to do this, publishers would have to take fewer risks – if that is even possible these days! – and fewer games would get funded. So in the big bad world of high budget game development, these metrics are a necessary evil. If you want a publisher to throw millions at you to make a game,  you’re crossing over from art to commerce, and the people who fund you deserve to get some assurances back that you are trying to make the best product with their money. And as a developer you’d usually prefer that the quality of that product was based on things you can more directly influence, such as how much the reviewers enjoy it, than on things you get little control over, such as how many units it sells. Metacritic scores are a step up from sales figures here.

The second issue is about the Metacritic score itself. What does it measure –  fun,  quality, predicted sales? Does it even make sense to assign a score to a game, which is surely going to be experienced subjectively? And does the aggregate value make any more sense than any given individual one? One answer to all these questions, which is actually quite simple but will not satisfy purists, is to abandon the idea that the score measures anything other than critical opinion. And whereas critical opinion itself does not equal fun, or quality, or predicted sales, it does actually correlate highly with all of those variables when the population is viewed as a whole. Metacritic scores correlate positively with the user scores (with a Pearson coefficient of 0.47 in one test I did of 50 randomly selected games) which implies the critics are at least in touch with public opinion, and that what they like is probably what the market will like too; at least one laboratory study supports this, as does empirical data from EEDAR presented at this year’s Game Developers Conference. Of course, everybody can find discrepancies, whether between critical opinion and public opinion on one game (eg. Mass Effect 3 getting 89 on Metacritic while the user score averages 4.2 out of 10), or in the relative rankings, or in finding games that scored highly but sold poorly, but on the whole the ordering is far more right than wrong and the scores are meaningful. The value itself may be imprecise but it doesn’t mean that you discard the entire measurement, just adjust your expectations.

Also in the Gamasutra article, Kris Graft suggests, “maybe their HR departments should just cut out the middleman and recruit a couple dozen video game reviewers who will play job applicants’ games, score them independently, then average out the results. Isn’t that essentially what’s going on here?” Not exactly. But is it really absurd to check a designer’s abilities by getting experienced players to actually play the candidate’s games and rate them? Surely not – in fact, surely that is going to be one of the better tests if we care about a game’s actual experience. And most Metacritic scores of reasonably well-known games are formed by aggregating more reviews than two dozen (although not all, admittedly), which makes the scores more valid than an in-house test of that size would be,  as the more samples you get, the closer you approximate the ‘actual’ value. And using publicly available scores (rather than it being privately done for HR departments) means more transparency and a level playing field. The Metacritic score will almost always be a better judge of critical acclaim than any in-house test. Many developers, in games and elsewhere, will have experienced the lottery of in-house testing which often rejects someone who then goes on to pass a test at somewhere which, on paper, would be an equal or better quality employer. A standard focus for comparison on public data would seem to be an improvement on this situation.

The last major objection was that a game’s score isn’t necessarily a good match for an individual’s score. This is the hardest one to argue with because there is a lot of truth in the fact that great developers sometimes end up on poor games, and perhaps vice versa. But this is where industry experience tends to even things out – the best developers will, over their careers, generally gravitate towards the higher quality companies and make higher quality games, giving themselves a good chance of getting such a credit. (Note that the controversial advert only asked for “one game with a Metacritic score of 85 or higher” – not an average of 85 over your career.) So rather than being read as “are you good enough that you inspire whoever you work with to create 85+ scoring games”, it should be read as “are you good enough that you were previously hired by a company that has made 85+ scoring games”, or perhaps even “do you have experience of working in the kind of environment and with the kind of people that make 85+ scoring games”. These are more useful ways of viewing the requirement, and while there is still a lot of scope for false negatives and rejecting some good developers – as with any measurement made prior to employment – you can be sure that someone who meets this level will be likely to boast the kind of experience a top developer would need.

So, whereas it’s understandable that people don’t like their art being distilled down to subjective ratings and strict thresholds, nor the idea of jobs being lost or companies closing due to a metric that is potentially at the mercy of a few rogue journalists, it’s hard to argue that judging developers by Metacritic scores is inherently bad. Gamers, developers, and publishers all want better games, and that means finding ways of deciding what ‘better’ means. Metacritic scores may be far from ideal in that regard, but right now they’re probably the best we have.