Run Values (RV) is a stat I created with inspiration from Fangraphs’ Runs Above Replacement (RAR). My goal with Run Values is to try to develop a statistic that measures as much about a player’s value as possible and then converts that value into baseball’s currency, runs. For hitters, I ran a regression on 14 offensive statistics that most influence run scoring in the team setting. For pitchers, I use 9 statistics in a similar regression. The R^2 figures for hitting and pitching were both well above 0.90 (I’ll explain R^2 later, but this is a very good number). In each case, data was taken from the entire league since 1990.
The 14 stat categories used to measure an offensive players output are Singles, Doubles, Triples, Home runs, both unintentional and intentional walks, hit by pitches, strikeouts, double play groundouts, steals, caught stealing, sacrifice hits and flies, groundball rates and all “other outs.” In a multiple linear regression study of major league teams from the last 20 full seasons, these 14 stats were able to explain about 95% of all variations in team run scoring (that’s what R^2 measures, how well these 14 stats are able to explain variances in run scoring).
This way of tracking a player’s value goes beyond any statistic I know of in that it credits a player who steals efficiently, and avoids hitting into double plays, as well as crediting him for the standard OBP and Slugging statistics. That being said, something that becomes very clear after running many of these regressions is that stolen bases just don’t seem to influence run scoring that much. Bill James once argued that during the season he stole 130 bags, Ricky Henderson’s 42 times caught stealing very nearly negated any run scoring potential that his team gained from his thievery. Indeed my regression tends to back that up: stealing makes a very marginal difference over the course of a season in how many runs a team scores.
So I calculate how many runs per game an offense would score if eight of the players were league average, and the player in question was slotted into the lineup. If run scoring goes up, this player has a positive Run Value. If run scoring goes down from the league average, this player gets a negative value. I then add in a player’s UZR defensive stat from Fangraphs to account for the run value of a player’s glove. The combined score gives me an estimation of how many runs a player is better or worse than the league average.
Now for pitchers. If you’re questioning my selection of which stats to run through the regression, think about the only things that a pitcher has nearly 100% control over. Those would be homers, walks (including intentional walks and hit by pitches), strike outs and groundball rates. These are the things on which no other defensive player (except sometimes the catcher) has any influence, and it has been shown that pitchers can replicate these stats from year to year (they are skill stats, in other words). But occasionally two pitchers will have almost the exact same homer/walk/strikeout/GB rates, yet their ERAs will be quite different. The two luck indicators that almost always separate these two pitchers are BABIP and LOB%.
So you’re still wondering, how can I say that just these few statistics tell me what I need to know? Again I did a multiple regression, testing how much the pitching stats above affected a pitcher’s runs allowed per game. This regression took into account more than 2,500 pitcher’s seasons in both leagues from the last 20 years. Multiple regression spits out not only a formula to find the pitcher’s runs per 9 average, but also computes that little R-square figure I talked about earlier. This R^2 figure basically measures what percent of the data is correctly explained by the formula. If you remember, the hitting R^2, it was 0.95 ish. The R-square for this regression was more than 0.99! Basically, 99% of all fluctuations in a pitcher’s runs allowed per game can be explained by a formula using just home runs, walks, strikeouts, BABIP and LOB%. The end result is an excellent estimate of a pitcher’s ERA if he were to play on an average defensive team, and in an average ballpark (remember, all ballparks are different!).
The formula cranks out the estimated runs per 9 innings. To find a pitcher’s run value, I look at his R/9 regression figure and compare it to the league average. If a pitcher allows 3.0 runs per 9, and the league average is 4.5, then he is saving his team 1.5 runs every 9 innings, 3 runs every 18, 6 runs every 36 innings, etc. If he pitches 198 innings this season, he will save his team 33 runs. His Run value is +33.
The process seems complex, and perhaps it is, but in the end I believe that we get a more accurate measurement of the true value of a player. For example, the better a pitcher’s defense is, the better a pitcher appears to be. But this formula credits the defensive players with runs saved, and calibrates the pitcher’s Run Value as if he had an average defense behind him, appropriately distributing run values to where they are earned, the fielders. Good hitters who don’t get a lot of RBI–perhaps because their leadoff men are underachieving–are not affected by the poor play of the hitters in front of them, as an average player would likely be generating even less RBI. All in all, I believe my RV stat is very effective at separating out who is and isn’t responsible for runs scored, both for and against his team.
I am making improvements to the Run Value stat all the time, as I am aware that it can never be perfect. I can gather more data and run regressions on more stats. For example, I hope to someday test the theory that aggressive base stealers throw the pitcher off, causing more walks and hits. As I research these things, I will add them into Run Values, hopefully improving it’s representation of a player’s value.