What is WAR?

This is page a quick primer on WAR: what it is, what it isn’t, and what questions about it this site can help you answer.

The Fog

Actions in war are based upon factors wrapped in a fog of uncertainty. A sensitive and discriminating judgment is called for; a skilled intelligence to scent out the truth. — Carl von Clausewitz¹

Carl was talking about actual war there, with guns and bayonets and the like, but the concept of “the fog of war” applies just as well to uncertainty in opposing troop movements as it does to the the general misunderstanding and misapplication of the Wins Above Replacement statistic in the baseball fan (and even analyst) community.

If you’ve heard of WAR, I’d wager you’ve already heard some of the following statements:

WAR is wrong, because it is impossible to reduce a player’s season down to a single number.

The fact that different WAR implementations arrive at such different answers for the value of Player X indicates that WAR is fundamentally flawed or unreliable.

WAR cannot be used to compare players within ~0.5 win of eachother, because that difference is insignificant or within the margin of error.

Although I believe these statements to be false, false, and imprecise respectively, the purpose of this site is not to prove that to you. Instead, this site attempts to provide the tools for you to examine these and other claims for yourself, by breaking down the components and underpinnings of WAR and evaluating uncertainty of each of them. Specifically, it presents statistics gathered from the publicly-available WAR leaderboards of Baseball Reference, FanGraphs, and Baseball Prospectus, which I’ll refer to as the three “WAR providers”.²

So, What is WAR?

WAR is: a unit.

A definition that short makes it seem like I’m trying to be cute, but I assure you I’m not. Wins Above Replacement is a unit of measure, just like miles, meters, gallons, or dog-years.

An analogy: Perhaps you’ve heard of the tallest mountain in the world, Mount Everest. It stands at a towering 29,000 feet above sea level, give or take. But wait! That scratching noise you hear at your window is a horde of trivia geeks and fun-fact addicts, desperate to tell us that the correct answer is arguably Hawaii’s Mauna Kea, because thousands of feet of the mountain’s height are submerged beneath the ocean. It’s simply a question of how you measure elevation: feet above the ocean floor, or feet above sea level?

The same is true of every baseball statistic. Each has two components, the unit and the reference point. Batting Average is technically Batting Average Above .000, and Home Runs are similarly measured relative to zero, though we never explicitly write them out that way because it’s implied. However, one can easily imagine a stat with a different reference point, for example Home Runs Above 30 (HRA30). Barry Bonds in his 73 homer season set the single-season record for HRA30, with 43. Mike Trout had -3 HRA30 in 2013, but that doesn’t mean his 27 home runs weren’t valuable! Et cetera. It’s a silly stat, but it’s measuring the same thing, just with a different baseline.

Fun fact: 17 players in history have a positive HRA30 for their career, ranging from Barry Bonds’ +113 to Jeff Bagwell’s +1. It’s a list of hall-of-famers, and also Adam Dunn.

Hopefully you see where this is going. WAR is just like every other baseball statistic you know: a measurement (in “Wins”) and a baseline (called “Replacement”).

How Do You Measure Replacement Level?

Most traditional baseball statistics are relative to zero. This is valuable in its simplicity, but it is also subtly counter-intuitive. Aaron Judge’s 62 home runs in 2022 tell us the Yankees benefited from +62 home runs of value from Judge, compared to simply leaving the his lineup spot empty every night and accruing 3 to 5 automatic outs each game for failing to send a batter to the plate. But that’s obviously the wrong baseline to compare to. The correct baseline is the number of home runs we’d expect from the assortment of no-name Triple-A outfielders and waiver-wire claims the Yankees would have to call upon to supplement their roster if they lost Judge. If we think these Johnny NoNames would hit 14 home runs given Judge’s playing time, then Judge’s actual home run value added is closer to +38 home runs above replacement. That’s the whole idea of measuring relative to replacement. Simple, right?

As for the actual implementation of estimating replacement level, the three WAR providers all take a similar approaches: Instead of looking at the specific guys that would fill in for an injured Aaron Judge, they opt to estimate the average skill level of all such “replacement-level” players throughout the league. This is neither right nor wrong, it is simply a choice that has to be made. One could easily make a different decision by measuring a team-specific replacement level, and it’d result in an equally valid and defensible WAR statistic. Because WAR is just the unit of measure.

How Do You Measure Wins?

You can measure wins however you want to! WAR is flexible like that. You could argue that every walk-off hit is worth +1 win, and everything else is zero, perhaps relative to a replacement-level rate of 0.1 walk-offs per year. That’s totally valid!

Nick Castellanos is the runaway favorite for 2024 MVP, he leads baseball with 3.9 walk-off wins above replacement!

Even though this formulation of WAR isn’t very useful or interesting, as long as you’re estimating wins relative to some replacement baseline, it’s WAR. Of course, there are better ways of going about it. The smart minds at the three WAR providers all settled on a fairly similar philosophical framework for measuring wins, which is to measure everything’s value in runs first, and then convert to wins. We could just measure in runs and leave it at that (some have argued that we should anyway), except that runs have different value across different eras. Scoring a run in 1968, year of the pitcher, helped your team win a whole lot more than a run scored in 2000, the peak of the steroid era. It’s fairly easy to compute a simple conversion factor, which at the moment stands at roughly ten runs per one win.³

For hitters, the WAR providers break value into three common-sense buckets of value: hitting value above average, fielding value above average, and baserunning value above average. Each of these components is in turn the sum of the run values of every hit, catch, stolen base, etc. that the player contributed to. Lastly, they add the value difference between average players and replacement level players, to set the baseline at replacement level.⁴

WAR also exists for pitchers, but I haven’t implemented it for this site yet so you’ll only see hitters here.

What is WAR not?

There are some things baseball analysts haven’t figured out how to measure very well, or at all. Catcher game calling. Veteran presence. But since these are un-measurable, it would be silly to expect they are captured in WAR, a unit of measure. And no serious proponents of WAR claim that it does. If you want to argue that much of the value of soft-skills guys like Jeff Mathis goes un-captured by WAR, that’s absolutely a valid position to hold, and many of the architects of WAR will likely even agree with you.

WAR never claims to be perfect, only better than the alternative.

Clearing the Fog

Now that we’ve got a solid understanding of what WAR is, let’s take a look at those claims from the top of the page, and how we can go about investigating them.

Claim: WAR is wrong, because it is impossible to reduce a player’s season down to a single number.

As we’ve seen, WAR is not really a single number, it’s just a unified scale upon which many thousands of numbers are added together. The value of every hit, catch, stolen base, etc. goes into the calculation of WAR. You can use the Player Comparison tool to see all the components that contribute to WAR.

Claim: The fact that different WAR implementations arrive at such different answers for the value of Player X indicates that WAR is fundamentally flawed or unreliable.

This is actually a great strength of WAR. Multiple independent WAR estimates allow us to gauge uncertainty or tailor each implementation to specific needs. See how the existence of multiple WAR implementations provides a richer picture of player value by perusing the WAR leaderboards. Also check out this piece by Tom Tango, one of WAR’s founding fathers, talking about the benefits of WAR’s flexibility and multiple implementations.

Claim: WAR cannot be used to compare players within ~0.5 win of eachother, because that difference is insignificant or within the margin of error.

Although this is true in sentiment, the use of a fixed WAR difference threshold is not accurate. The uncertainty of WAR is highly influenced by the style of the player! Just take a look at the Uncertainty Comparison for Jarren Duran and Shohei Ohtani. While their average WARs are similar, the error bars on Duran’s estimates are massive, close to ±3 WAR, while Ohtani’s are much more tightly clustered around ±0.5 wins.

Now that you’ve got a handle on WAR, dive into the site features and explore!

footnotes

I paraphrased Carl’s quote here slightly for clarity ↩
These sites have done all the heavy lifting, I am just repackaging their material. Many thanks to them for making their WAR totals, as well as the constituent components, available to the public. ↩
Ten runs per win might seem counterintuitive at first. It’s not uncommon to win a game by scoring just 3 or 4 runs! But the ten runs are not all added to the same game, you must imagine them as randomly distributed throughout all games in the season. Most of the time, an extra run will turn a 5-2 loss into a 5-3 loss, or a 7-4 win into an 8-4 win. Only roughly 1 in 10 of those runs will end up contributing to flipping the result of a game from a loss to a win. Hence 10 runs per win, on average. ↩
This is generally accurate, but simplified. Refer to the three WAR providers for the specifics on how they implement their calculations. ↩