Bowl Championship Furies

Once again the theatrics of college football are center stage this time of the year. UCLA sort of nearly made the Rose Bowl after finishing the regular season 6-6, two SEC teams are going to replay a baseball game for the national title, and many lesser teams made BCS bowls because of conference contracts.

College football rankings create chaos, and much of the blame is often directed at the BCS system. Granted, the BCS system is a joke (great article, by the way). However, there are two much larger problems at the root of college football’s chaos. Parity, or lack thereof; and sample size, or lack thereof.

If all college football teams competed at roughly the same level, as is the case in most professional sports, strength of schedule rankings wouldn’t be needed. Every team’s opponents would be about equal. But in reality, the quality of Houston’s opponents is in a different stratosphere than the quality of the opponents of, say, LSU or even Oregon. Thus some measurement of a team’s opponents is necessary in ranking that team. Enter, Strength of Schedule.

Additionally, teams only play about 12 games a year. There are 120 teams in the FBS classification, so teams are only playing one-tenth of all schools in their division. And because of the huge differences in ability—even within the FBS division—that 12-game sample is rarely representative of 12 average college football teams. The SEC teams—and other major conference teams—play some of the best teams in college football, while most of Arkansas State’s (7-0, 9-2) opponents in the Sun Belt Conference are significantly worse than even the most inept of the major conference schools. Example: Arkansas State lost to Illinois, the Big 10’s eleventh best team (huh?), by 18 points.

The combination of wildly uneven teams and short seasons creates a very difficult problem in ranking teams. Obviously there are upsets; some team that is noticeably inferior to its opponent wins every weekend, and we don’t instantly believe it is the better team. We accept that the inferior team just had a good day. Unfortunately, those particular upsets also make up about 8% of the entire season, and the better team doesn’t have enough games remaining to show its true colors in the win-loss department.

I looked at some computer simulations in an effort to articulate the phenomenon that the best teams don’t always come out on top. To get started, the following are the results of a simulation of the former Pac-10 conference, with the assumption that every team has a 50% chance to beat every other team. In other words, for a second I’m going to assume that all teams are equal and see what happens in 1,000,000 simulations of a Pac-10 season. The chart shows how often each number of wins won the conference.

Wins

%

9

30.6%

8

49.8%

7

16.9%

6

2.0%

5

0.7%

Even though every team was evenly talented and had a 50% chance to beat any other team, the conference winner still ended up 9-0 or 8-1 80% of the time. Since even an 8-1 team from the former Pac-10 was likely to get a BCS bowl, perfect parity among the teams in this hypothetical Pac-10 conference still produced a BCS bowl-worthy team 80% of time! If the Pac-10 were, say, the 3rd best conference, and all its teams were equal, then it is hard to believe that that 8-1 team would truly be a top-10 team nationally.

In the next example, I used the old Pac-10 template again, except I changed the skill levels of the teams to something a little more realistic. In this system, the best team in the conference has a 94% chance of beating the worst team in the conference, and the other head-to-head probabilities are proportional to the teams’ true abilities. The results here indicate how often the true best team won the conference, followed by the true second best team, and so on. Again upsets happen, and the simulation allowed for that possibility.

Champ

%

Best

44.1%

2nd

23.6%

3rd

10.9%

4th

7.2%

Other

14.2%

What we see here is that the best team in the conference had just a 44% chance to actually win the conference. Kind of like how the best golfer in the world doesn’t win every major. When you’re up against the field in such a short season (or tournament), your chances aren’t good. For more context, in this scenario the best team in the conference was given a 63% chance to beat the second best team, and the second best team a 61% chance to beat the third best team. Four teams clumped in the middle, all about even, and then there were three pretty bad teams.

Also interesting to note, the collection of teams 5th best and below still won the conference 14.2% of the time (much of that coming from the 5th and 6th best teams). Changing the theoretical win probabilities will obviously change the results, but the point remains. A 9-game conference season does a poor job of identifying the best teams based solely on their win-loss records.

(In future I hope to put together a simulation of the Pac-12 conference, using two separate divisions and a championship game to pick a winner.)

I hope the results above are convincing enough to show that more information is needed to create accurate rankings. When only win-loss records and opponent’s win-loss records are used, there is not a lot of information available to create accurate rankings. The obvious remedies are increasing the number of games played, and using more data.

Increasing the number of games played is somewhat impossible. It would be great statistically if every team played every other team in the FBS 46 times each, but realistically we can only fit about 12-14 college football games in per team per season. So the next best thing we can do is use more data, and the best statistic available is margin of victory, or point differential.

The primary objection to using margin of victory revolves around blowouts. Coaches would try running up the score, and it would be hard to determine the relative difference between Oregon beating Colorado by 43, and USC beating Colorado by 25 (for example). While both teams won comfortably, Oregon gets 18 additional margin of victory points over USC for that game, a significant difference. There are many easy fixes for this. A cap at 21, for instance, would give them both 21-point wins in the rankings. Or, using the square root would give Oregon a 6.6-point victory while USC would get a 5 point victory. The 1.6-point difference now only represents 32% of USC’s “5-point victory”, while before the 18-point difference represented 72% of USC’s 25-point win. The square root function, in effect, creates diminishing returns for piling on points. If you want “no returns” for piling on points, well the cap is for you.

Bill James, a pioneer in baseball and sports statistics, claimed this about the BCS computer rankings, “This isn’t a sincere effort to use math to find the answer at all. It’s clearly an effort to use math as a cover for whatever you want to do. I don’t even know if the people who set up the system are aware of that…It’s just nonsense math.”

Kenneth Massey, the creator of one of the six BCS computer rankings, complained, “You’re asked to rank teams that don’t play each other, that don’t play long seasons, and you can’t include margin of victory?… It’s a very challenging problem from a data-analysis standpoint. It does require sacrificing a bit of accuracy. It’s not the best way to do it.” Massey actually provides what he believes are better rankings on his website, http://www.masseyratings.com/rate.php?lg=cf, which, of course, utilize margin of victory.

Citing blowouts and running up the score as an excuse for excluding margin of victory is a cop-out by the BCS, plain and simple. It’s an easy fix, and we’ve seen how using only win-loss records falls apart in many scenarios due to small sample sizes. To me, it doesn’t seem like the BCS is very serious about its computer rankings…

Advertisements

2 Responses to Bowl Championship Furies

  1. Greta says:

    I like your use of bold words.

  2. […] to any computer-based system, but the BCS doesn’t agree. However, I’ve mentioned that here before, so I won’t again. I’m here today to talk about flaws in the human polls using […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: