Laws of Regression

Boston Globe guys Bob Ryan (who you may recognize from Around the Horn) and Joe Sullivan discussed Bill James’ 2011 projections for the Red Sox starting pitching staff. Of course, the only thing they talked about was wins. Anybody who reads this site often enough (or probably once would do) knows that I absolutely despise the “win” stat for pitchers. Ryan and Sullivan were horrified that Boston’s starting staff is only projected to go 60-48, and that Lester in particular is only projected to win 14 after winning 19 last year. How could Bill James be so pessimistic about his own team?!?

A) Boston’s top 6 starters last year combined to go 69-49, with unpredictable results from Lester (19-9) and Buchholz (17-7). This is not all that far off James’ projection this season. And, of course, wins are a fickle stat, often jumping all over the place from year to year. Ask Felix Hernandez how that goes. You can win 19 one year, and fall to thirteen wins the next year after pitching better.

B) More importantly, regression (not the analysis that I do in class, but the falling-back-to-earth phenomenon) is a very real thing. Take the best 10% of whoever in whatever stat category after one year, then check in on them again after the next year. They will have regressed by as much as 10 to 20% in some cases. It’s natural. Players who are performing at the top of the league in any given year are doing so for two reasons. They are good, but they are probably a little bit lucky in some way, too. Whether they were able to avoid a DL stint, their BABIPs were extreme, or the wind was often at their backs, some things that were out of the players’ control had to go right for them to be the best in that one season. Let me demonstrate.

Of the top 100 batters in terms of plate appearances in 2009, here are your top 10% in OPS:

Albert Pujols 1.101
Joe Mauer* 1.031
Prince Fielder* 1.014
Derrek Lee 0.972
Kevin Youkilis 0.961
Adrian Gonzalez* 0.958
Hanley Ramirez 0.954
Mark Teixeira# 0.948
Ben Zobrist# 0.948
Pablo Sandoval# 0.943
Average 0.983

And here’s what they did the next season — in 2010:

Albert Pujols 1.011
Joe Mauer* 0.871
Prince Fielder* 0.871
Derrek Lee 0.774
Kevin Youkilis 0.975
Adrian Gonzalez* 0.904
Hanley Ramirez 0.853
Mark Teixeira# 0.846
Ben Zobrist# 0.699
Pablo Sandoval# 0.732
Average 0.8536

As a group, these 10 studs were still exceptional in 2010, but not the best. They lost 13% of their OPS production from the year before. In fact, you might notice that only one of ten players improved at all. That was Kevin Youkilis, who saw his OPS rise 14 points – a whopping 1.5% increase. If you’re still not convinced, play this game with any statistic between consecutive years. The top 10% will always decline as a group the next season.

This is why well-constructed projections often look a little weak to the casual fan, as Fangraphs points out. We tend to think that a player will repeat a great season because we believe that it was all up to him—that he had complete control of his outcomes. When he is incapable of repeating his career year (there’s a reason it’s called a “career” year), we assume that he wasn’t tough enough to get it done again or something like that. In reality, the Baseball Gods probably didn’t treat him nearly as well. Players have a lot of control over what goes on in the stats, but not 100% control, as can be seen the top-10% game I just introduced.


2 Responses to Laws of Regression

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: