Oregon Shafted?

March 18, 2013

Joe Lunardi projected Oregon as a 9-seed at the end of the regular season, and then he upgraded the Ducks to a projected 8-seed after a win against UCLA in the Pac-12 tournament championship game. When I saw the 12-seed that Oregon was given Sunday afternoon, my reaction was probably similar to that of many Duck fans. I was mildly upset,  but then I was immediately reminded of an article I read some years ago by this guy you might know, Nate Silver.

Oregon was expected by many to get an 8-seed or a 9-seed, but as Silver explains, those are probably the worst seeds to be awarded in the 5-to-12 range. If we assume Oregon is as talented as most 8/9-seeds, then had they gotten an 8 or 9-seed, the Ducks would have had about a 50% chance of winning in round one and a 14% chance of winning in round two. That multiplies out to a 7% chance of making the Sweet 16, which is likely a high estimate given that Oregon’s true talent level is objectively worse than a typical 8-seed.*

From the 12-seed, the Ducks will need to beat a 5-seed in Oklahoma State, and then probably the 4-seed in St. Louis (though possibly the 13 in New Mexico State). History suggests that most 12-seeds have a 34% chance of a round one upset. Those that are able to pull off the upset go on to beat the 4-seed 40% of time. That seems a little backward that 12-seeds would do better against 4-seeds than 5-seeds, but remember there’s selection bias. The 12’s that get to play 4’s are the ones that were able to beat the 5’s first. So this is a group of 12-seeds that was underrated. If we just take those probabilities from past tournaments at face value, then Oregon would have a 14% chance of making the Sweet 16. That’s almost surely a low estimate, as Oregon’s true talent level is probably something better than a typical 12-seed.**

Oregon’s chances of making the Sweet 16 actually improved from a conservative estimate of 7% to 14%, simply by getting “shafted” by the seeding the committee, which had no idea it was actually doing Oregon a favor. Indeed, if you look back at the past six tournaments, you’ll find that five 12-seeds have made it to the Sweet 16 versus just one 8-seed and one 9-seed.

UPDATE: Silver’s projections are out, and he has given Oregon a 17.5% chance at the Sweet 16. The 8 and 9-seeds in that draw, Colorado State and Missouri, have been given a combined 14.5%.


*Ken Pomeroy has Oregon as a true-talent 10-seed while Jeff Sagarin has them as a true-talent 12-seed. The AP poll projected Oregon as a 9-seed before winning the Pac-12 tournament. So a simple average of the three would estimate the Ducks’ true talent level as about that of a typical 10-seed. 

**And actually, if we additionally account for the slim chances that New Mexico State pulls it out against St. Louis, Oregon could have something closer to a 20% chance at the Sweet 16. 


Two Wins in One Night?

November 7, 2012

New readers may wonder if this has turned into a political blog, and I wouldn’t blame them. I went ahead and included some sports analogies in this one just to stick to the site’s domain name at least a little!

One single result is rarely enough statistical information to prove a point. Four years ago I went to the Rose Garden to see the Blazers take on the Celtics during Boston’s 62-win season. Brandon  Roy—an All Star in his prime—watched in street clothes as his Blazers pulled out an improbable 5-point win. Despite the outcome, it would have been foolish to think that the Blazers were better starting Rudy Fernandez over Roy. The Blazers’ process for winning games that season revolved around a slow pace and high offensive efficiency. There were games and games of data that showed that Roy was one of the most efficient scorers in the game, and that the Blazers were better with him. Just because the Blazers won a single game without Roy, didn’t mean a new process was needed.

The case of election polls is no different, and Obama winning the 2012 presidential election doesn’t prove that any prognosticator was any more right than the other. There will be approximately 496.23 articles over the next week referring to the vindication of Nate Silver. It’s hard to blame them after looking at his political map Tuesday morning versus polling rival Dean Chambers. As of this writing, it looks like Silver has picked every single state correctly. Projecting the right color for each of the battleground states is not easy to fluke, especially for a small-in-stature, effeminate man of average intelligence.

But still, Silver shouldn’t be vindicated Wednesday morning. If we point to the 2012 election results as proof of his genius, then we are no better than the Dylan Byers and Dean Chambers of the world, or the Blazer fan that watches one game against Boston in 2008 without Roy, and thinks, “hey, we could win without this guy.”

Silver’s vindication is, in fact, long overdue. The second he started projecting baseball players based on a system completely influenced by data; or maybe when he started projecting elections based on an unbiased system of leveraging available polls—those are days that Silver should have been vindicated. His process has been right all along. What Silver does is not magic; it is not voodoo; and, in his own words, it is not wizardry or rocket science. What Silver does is called statistics. Good statistical analysis is done without the influence of emotional bias, but rather the influence of trustworthy data. It turns out, the hardest part for most people about statistics is simply accepting them.

Obama’s win Tuesday night was hopefully a win for the United States of America. Silver’s win Tuesday night was hopefully a win for objective reasoning–a win for statistics. And a win for statistics might just be more important for this country in the long term than any single presidential election.

Election Polls: A Brief Lesson in Probability and Statistics

October 30, 2012

This may be my sports blog, but that won’t stop me from writing about the statistics behind election polls. So I’m taking a break from sports on this one. Enjoy 🙂

As the presidential election nears, much of our attention goes to the polls to figure out what’s going to happen. Whether the incumbent is our guy, or we want someone new, most of us just want to know. We’re humans. We like knowing things.

But the polls that probably matter least are the polls to which Americans are paying most attention. Far too much attention. The national polls are a problem for many reasons, but two stand out.

National polls deceive us. I don’t mean they’re lying, but rather, our brains are poor at interpreting their meaning. Fox News gave Romney the edge 47.9% to 47.1% yesterday. CNN doesn’t like decimals apparently, and put Romney in the lead 48% to 47%. What does this mean? Is there a 48% chance Romney will win? Is there a 47% Obama will win? What happened to the other 5%? None of the above. If the election were actually orchestrated using popular vote, we would have to look at the sample sizes taken for these polls to interpret what the 48% and 47% actually mean in terms of election probabilities.

But that brings up another point. We don’t elect presidents by popular vote in this country. It’s true that, most of the time, the electoral college results on the first Tuesday of November reflect the same outcome as that of the national popular vote. But not every time. (Bush/Gore, anyone?)

Since the national polls are so close, to predict the outcome we’d need some sort of projection system that takes individual state races into account. You know, like how we actually vote for presidents around here. For that we turn to Nate Silver and the 538 blog. Silver’s projection takes our actual voting system into account—looking at the likelihood each state swings toward each candidate—and then formulates a projection.

Currently, Silver’s system gives Obama a 72.9% chance of getting at least 270 electoral votes, the total required for winning the election. If Romney wins, it does not discredit the projection system at all as Politico’s Dylan Byers would have us believe. Silver’s projection allows for Romney to win, though it’s not as likely.

Said Silver, “We can debate how much of a favorite Obama is; Romney, clearly, could still win.” While Byers argues this is a concession that Silver’s projection system is flawed, Byers is a moron, and that’s not what Silver was saying at all. Silver can’t account for all the voters that haven’t answered official polling surveys, nor can he account for the people that will change their minds. No one can. But like any good statistician, he can calculate expectations, he can calculate uncertainty, and then he can present his findings in terms of probabilities. That’s statistics, not a faulty projection system.

For an example, say there are 1 million citizens that will vote in Oregon during this election. And say I go around to all the counties and I poll a representative sample of Oregon’s voting population. In the end, I sample 500 Oregonians from all over the state, and 275 report that they would vote for Obama. I haven’t asked a very large proportion of Oregon voters, but if I sampled diligently and correctly, I can still formulate some helpful conclusions.

For instance, let’s consider for a second that exactly half of those 1 million Oregon voters will end up voting for Obama. Our question then becomes, what is the probability that a random sample of 500 from this population would yield at least 275 Obama voters? In other words, given somehow we knew that Oregon was a 50/50 state, what is the probability of getting data like we just got? It turns out there’s a probability distribution for that very question.  It’s called the hypergeometric probability distribution.

After crunching some numbers, we calculate that the probability that a 50/50 population would yield at least 275 Obama voters of 500 sampled is merely 1.4%. At this point, we could choose to go with our made-up 50/50 conclusion, or we could choose to side with the data that suggests our 50/50 hypothetical is not likely. In such a scenario, we would probably conclude that there is a 1.4% chance that Obama’s true support in the population is contained between 0% and 50%, and a much larger likelihood that the interval between 50% and 100% contains his true support (final vote percentage).

Though these are fabricated numbers, the concept is likely something close to what Silver is using. Taking samples, we can make probabilistic statements about how a state is likely to turn out. If we do this for every state, we get an idea of how each state is going to sway in terms of probabilities. Now some sort of probability theory or simulation takes us the rest of the way, adding up electoral votes and projecting what is likely to happen on election night.

Along the path of projection, I’m sure Silver included some additional information that past elections have taught us. But in the end, it comes down to what the data says.  In Silver’s words, “this is not wizardry or rocket science. All you have to do is take an average, and count to 270. It’s a pretty simple set of facts.”