Run Expectancy and Reliever’s ERAs

June 30, 2009

The lowest ERAs in the league every year are turned in by relievers, and in 2009 relievers as a whole have allowed 0.4 less runs per nine innings that the starters (4.8 vs. 4.4). Most would agree that starters are generally better pitchers, so why is it that runs are scored at a higher rate on their watch? Bill James had the answer years ago.

We can talk about batters not being able to adjust to a relief pitcher – adjusting to the new release, arm angle, movement and velocity is hard – but there’s something much more fundamental that separates starters and relievers: when they enter a game. Starters always start every inning from the beginning with none out, whereas relievers often come in with one or two outs, and their runs allowed stats are treated as though the bases were empty when they entered.

Taking a hypothetical league-average pitcher, we can see the difference between starters and relievers. The league average starter will enter every inning with no runners and no outs. His run expectancy (2009 stats) is 0.51 runs for that inning, or about 4.65 runs per nine. If a reliever enters with 1 out, assuming he performs at the league average, his run expectancy is about 0.28 runs for that inning, equating to 3.8 runs per nine. If he enters with 2 outs, his run expectancy is 0.1 runs for the inning and 2.76 runs per nine. So basically, the very same pitcher has a huge advantage if he enters the game with 1 or 2 outs.

Given an average middle reliever who gets 1/3 of his appearances in each situation – 0 outs, 1 out and 2 outs – his expected runs allowed is about 4.00, even though his true figure would be 4.65 if he started. By virtue of occasionally entering games mid-inning, relief pitchers are given an advantage in ERA and runs allowed stats.


Pitcher Reports

June 30, 2009

A few weeks ago I posted about BABIP, and it’s effect on a pitcher’s apparent ability. Pitchers are often the beneficiaries of luck, whether it be good or bad, and the Nationals’ Jordan Zimmermann appeared to be suffering from the bad kind. On the other hand, White Sox pitcher Gavin Floyd burst onto the scene last season with a 17-8 record and 3.84 ERA, and crazy-good numbers in has last 8 starts this season. Eric Karabell gave him some love in an ESPN article recently, but is it deserved?

Posting a 0.802 OPS against and giving up 5.37 runs* per nine innings, at first glance one might have guessed that Zimmermann was headed for a minors stint. Since then, however, he and his defense have allowed just three runs in nearly 18 innings with an OPS against of 0.555. The difference? His strikeout and walk numbers have remained fairly constant, but his BABIP has plummeted nearly 100 points and he’s giving up less home runs per plate appearance. The rest of the way, look for his OPS against to balance out toward 0.700 and his runs allowed to hang around 3.8 per 9 innings.

Karabell notes in his article that Floyd has been stellar in his last 8 starts, with an ERA of only 1.39. But looking deeper, there are some telling numbers that predict his immediate one-way ticket to mediocrity. First of all, his BABIP over those past 8 starts is just 0.220. He’s suddenly allowing more ground balls, and while a pitcher has some control over this, it’s not likely that he’s changed anything mid season. And finally, his homers per fly ball ratio is at about 3%, yet his career average with the Sox is closer to 10%. With Floyd, we expect a more level BABIP, normal groundball/flyball ratios, and an increase in homers per flyball. I estimate a 0.750 OPS against and approximately 4.6-4.7 runs allowed per game.

Remember, win-loss records and ERA hide so much of the true value of a pitcher. Always pay attention to luck figures such as BABIP and HR/flyball, as well and suspicious spikes in the more controllable stats such as HR/inning, groundball/flyball ratio and strikeout-walk rates.

*Note that these are all runs allowed numbers, not earned runs allowed.

Predicting Wins from Run Differential

June 28, 2009

In a previous post, I discussed the importance of run differential to winning ballgames. Run differential (runs scored minus runs allowed) seems to be the best indicator and predictor of win-loss records, having an extremely high correlation to total wins. The father of modern baseball stats analysis, Bill James, invented a formula based on run differential to predict a team’s winning percentage, and that percentage can help us to predict how certain teams may do down the stretch.

(That formula is simply:   (runs scored)^2 / ((runs scored)^2 + (runs allowed)^2). *The little arrows indicate that the run totals are squared.)

While some teams may just be good at winning close games, there is likely a luck factor in play with many ballclubs’ records. As we get into late summer, we can use Bill James’ formula to predict which teams are likely to pick it up in the win column or, well, not.

Don’t worry. You don’t have to do any work yourself! Baseball Reference has already done it for you. Scanning over this Baseball Reference page to the “Luck” column, we can see which teams are outperforming and under-performing their expectations. Then, a brief look over to the “1run” column shows us each team’s record in games decided by one run. It is not surprising that there is a strong correlation between one-run win-loss records and “luck factors.” Teams that win a lot of close games are generally the recipients of some good fortune – likely in combination with poise and skill – that can lead to discrepancies between the formula win percentage and the actual win percentage.

Focusing in on two particular teams, the LA Angels have a luck factor of 2  and a win-loss record of 15-9 in one-run games, while Tampa Bay is at a -5 luck factor and 9-13 in the close ones (as of 6-28-09). There are legitimate arguments that some teams are just made to win, whether by 1 run or 10, and that it’s not luck. Perhaps a good bullpen or solid team poise in the face of defeat wills certain teams to victory…a “clutch rating” if you will.

While I cannot disprove the existence of clutch in baseball, there is little to indicate that the Angels are a clutch team, or any more clutch than the Rays. On the pitching side, LA’s bullpen ERA is higher than the league average, their save and win-loss percentages are at the league average, and their relievers allow inherited runners (runners left on by the previous pitcher) to score at the league average rate. Sense a pattern? Overall, the Angels have a very much mediocre relief staff finishing off games this season.

On the offensive side, the Halos are scoring runs in the later innings at a clip just slightly worse than their own average for all innings, and below the Rays’ average. In addition, LA’s team OPS when the game is within two runs is actually slightly lower than their overall team OPS, and much lower than the Rays’ close-game OPS (.766 vs .793).

Everything points to the Angels being a medicore clutch team, yet they are 15-9 in one-run games. Why? Likely because 1-run games are like coin flips for most teams. I’m not trying to say the Angels are going to flounder – especially considering that they are getting healthier – but I am simply trying to show another way to measure luck. Since luck does not accumulate, we expect the Angels close-game winning percentage drop.

Those same Rays that have a 9-13 record in close games and a luck factor of -5 beat out the Angels in every single clutch stat mentioned above. In fact, the Rays expected record is the best in the American League. Considering there’s nothing indicating they lack “clutch,” and nothing that indicates it matters anyway, look for the Rays to nab a playoff spot in the incredibly competetive AL East.

Making Baseball Fun

June 23, 2009

Here’s a visually-pleasing baseball site that was forwarded to me…there is some fun stuff here!

Watch Baseball on Mute

June 22, 2009

There are many controversial issues in baseball stats analysis, ranging from the control a pitcher has over his BABIP to how much effect a catcher has on the performance of his pitcher. Former ESPN baseball analyst, Harold Reynolds, chose neither of these to attack when he decided the time was ripe for a stats counter-movement. He instead questioned the value of OPS in a less-than-coherent manner.

This from his very own blog:

“If you have a ball club that’s a great offensive team then that changes everything. But if you have a guy like Adrian Gonzalez, for example, his OPS is going to be high – he’s got a lot of home runs and walks a lot…because you’re not going to pitch to him…Big power hitters swing and miss and strikeout. Or they hit home runs and walk. And at the end of the year their OBP is always going to be higher than most of the other guys on the team because they clog the bases.”

Clog the bases? Yeah, I don’t want runners on base when I come up, either.

To be fair, Reynolds’  initial point was a good one. OPS cannot explain EVERYTHING about a player. However, there is substantial evidence – including this study of mine – showing that OPS is extremely important to offensive production. I can only hope that Harold will see the light someday so I don’t have to listen to his incoherent spiels on the MLB network.

What Correlates to Winning League Championships?

June 19, 2009

In sports, we often hear the phrase, “defense wins championships,” and specifically in baseball, “pitching wins championships.”  The other night on an ESPN baseball telecast, analyst Steve Phillips was taking that very argument even further, asserting that low ERAs are more important that high offensive production when it comes to winning championships. While there is no doubt that lower team ERAs help win games and playoff series, I am skeptical of much of what comes out of Phillips’ mouth, so I set out to check his hunch.

First off, to get into the playoffs and have a shot at the League Championship Series, teams have to win in the regular season. Taking a quick look at some regular season stats from the last 20 seasons and how they correlate to regular-season wins, season run differential (average runs scored minus average runs allowed) comes out on top. If you’re concerned about the differences between the AL and the NL, the only noticeable difference is that offensive runs per game tend to be slightly more important in the AL than in the NL.

*Coefficients closer to 1 represent stronger correlations

Stat                                              Coefficient                  Wins+/Run

Run Differential                             0.92                                        16

ERA                                               0.63                                        13.5

Total runs scored per game          0.56                                        13

This tells us that, of these three team stats, it’s the run differential that best explains teams winning games, and making it to the playoffs. Also, ERA tends to explain total wins slightly better than run scoring. Hey Steve, you might be on to something here. Winning championships, though, is a different story. The unpredictability of five or seven-game series does not often distinguish teams that have only marginally better ERAs or run differentials.

Observing the logistic correlation between just the playoff teams’ season stats versus winning league championships exhibited some interesting results over the last 20 years—not counting 1994, obviously, because of the strike.

In the American League, the best correlation to which team won the ALCS was total season wins, followed closely by run differential and ERA. Each of these was an excellent indicator of which team won, all being significant at the 5% level. However, runs per game did not have any predictive power over which team won the ALCS.

In the National League, none of the correlations were particularly strong, but again season wins came out on top, followed this time by ERA then run differential.  However, no correlation was significant at even the 30% level. Runs per game stayed consistent, showing almost no correlation to NLCS winners.

There are some important things we can take from this study. Teams that make the playoffs and give themselves a shot at the championship are likely to be teams with healthy, positive run differentials. Because run differential implies both high run-scoring and low ERAs, these stats also logically correlate to winning.

Once we get to the playoffs, the best predictors for League Championship Series Winners are—aside from total season wins—ERA and run differential, which both are much more important than offensive production. Basically, teams that win the regular season win in the playoffs, and teams with low ERAs and good run differentials win championships more consistently than teams with greater run production.

When building a team to win a championship, the past has shown that healthy run differentials, when driven by lower ERAs, are the keys to winning the League Championship Series. I would like to give Steve Phillips a gold star for his always on-target gut instinct. Please sense the sarcasm.

The Raul Ibañez “Steroids Scandal”

June 16, 2009

Phillies outfielder and former Mariner, Raul Ibañez, was the subject of a now-infamous blog post that hinted at the possibility of him using steroids. Indeed, his home run numbers are jumping off the page, and that generally doesn’t sit well with fans in the steroid era. However, when statistics are applied correctly, this nothing out of the ordinary.

Ibañez is one of 42 players at least 34 years old that has reached 100 plate appearances so far this year. Entering the twilight of their careers, these players  are often the target of steroid suspicions. I recorded each of these player’s home run paces this season in terms of how many standard deviations their pace was above their career averages (a standard deviation is a measure of how much a player’s stats are likely to vary). Of these 42 players, only one joins Ibañez in relative home run productivity, Johnny Damon at about 2.5 standard deviations above average.

Statistically, it’s tempting to crank out the probability that a player will match what Ibañez or Damon has done this year – that likelihood is less than 1% – and then question these players’ morals. But it should be noted that both these players have experienced a change in venue to ballparks that have proven to be more home run friendly. Bombs are flying out of Yankee Stadium at an unprecedented pace, boosting Damon’s true power this year by by enough to explain his surge. The change from Safeco Field to Citizens Bank may have influenced Ibañez’s home run figures by about 1.1 according to ESPN’s home run park factors, leaving his standard deviation score, known as a  Z score, at 2.1.

Assuming that individual home run numbers follow a normal distribution (the bell curve), Ibañez’s theoretical likelihood of hitting homers at this scorching pace is about 1.7%, but the collective story is what is important. Before the season started, I would have been willing to bet (just a gentleman’s bet, mind you) that we would see at least one of these 42 players performing at this high a level in 2009, no steroids attached. That likelihood is about 50%. The fact that we’re not seeing more than one oldie hitting home runs at Ibañez’s rate shows that players are aging as we would expect them to.

While it is rare that a given player performs at an extremely elevated level of play, the likelihood that some player from a sample of many performs at this level is actually quite high. This year it just so happens to be Raul Ibañez.