Election Polls: A Brief Lesson in Probability and Statistics

This may be my sports blog, but that won’t stop me from writing about the statistics behind election polls. So I’m taking a break from sports on this one. Enjoy 🙂

As the presidential election nears, much of our attention goes to the polls to figure out what’s going to happen. Whether the incumbent is our guy, or we want someone new, most of us just want to know. We’re humans. We like knowing things.

But the polls that probably matter least are the polls to which Americans are paying most attention. Far too much attention. The national polls are a problem for many reasons, but two stand out.

National polls deceive us. I don’t mean they’re lying, but rather, our brains are poor at interpreting their meaning. Fox News gave Romney the edge 47.9% to 47.1% yesterday. CNN doesn’t like decimals apparently, and put Romney in the lead 48% to 47%. What does this mean? Is there a 48% chance Romney will win? Is there a 47% Obama will win? What happened to the other 5%? None of the above. If the election were actually orchestrated using popular vote, we would have to look at the sample sizes taken for these polls to interpret what the 48% and 47% actually mean in terms of election probabilities.

But that brings up another point. We don’t elect presidents by popular vote in this country. It’s true that, most of the time, the electoral college results on the first Tuesday of November reflect the same outcome as that of the national popular vote. But not every time. (Bush/Gore, anyone?)

Since the national polls are so close, to predict the outcome we’d need some sort of projection system that takes individual state races into account. You know, like how we actually vote for presidents around here. For that we turn to Nate Silver and the 538 blog. Silver’s projection takes our actual voting system into account—looking at the likelihood each state swings toward each candidate—and then formulates a projection.

Currently, Silver’s system gives Obama a 72.9% chance of getting at least 270 electoral votes, the total required for winning the election. If Romney wins, it does not discredit the projection system at all as Politico’s Dylan Byers would have us believe. Silver’s projection allows for Romney to win, though it’s not as likely.

Said Silver, “We can debate how much of a favorite Obama is; Romney, clearly, could still win.” While Byers argues this is a concession that Silver’s projection system is flawed, Byers is a moron, and that’s not what Silver was saying at all. Silver can’t account for all the voters that haven’t answered official polling surveys, nor can he account for the people that will change their minds. No one can. But like any good statistician, he can calculate expectations, he can calculate uncertainty, and then he can present his findings in terms of probabilities. That’s statistics, not a faulty projection system.

For an example, say there are 1 million citizens that will vote in Oregon during this election. And say I go around to all the counties and I poll a representative sample of Oregon’s voting population. In the end, I sample 500 Oregonians from all over the state, and 275 report that they would vote for Obama. I haven’t asked a very large proportion of Oregon voters, but if I sampled diligently and correctly, I can still formulate some helpful conclusions.

For instance, let’s consider for a second that exactly half of those 1 million Oregon voters will end up voting for Obama. Our question then becomes, what is the probability that a random sample of 500 from this population would yield at least 275 Obama voters? In other words, given somehow we knew that Oregon was a 50/50 state, what is the probability of getting data like we just got? It turns out there’s a probability distribution for that very question.  It’s called the hypergeometric probability distribution.

After crunching some numbers, we calculate that the probability that a 50/50 population would yield at least 275 Obama voters of 500 sampled is merely 1.4%. At this point, we could choose to go with our made-up 50/50 conclusion, or we could choose to side with the data that suggests our 50/50 hypothetical is not likely. In such a scenario, we would probably conclude that there is a 1.4% chance that Obama’s true support in the population is contained between 0% and 50%, and a much larger likelihood that the interval between 50% and 100% contains his true support (final vote percentage).

Though these are fabricated numbers, the concept is likely something close to what Silver is using. Taking samples, we can make probabilistic statements about how a state is likely to turn out. If we do this for every state, we get an idea of how each state is going to sway in terms of probabilities. Now some sort of probability theory or simulation takes us the rest of the way, adding up electoral votes and projecting what is likely to happen on election night.

Along the path of projection, I’m sure Silver included some additional information that past elections have taught us. But in the end, it comes down to what the data says.  In Silver’s words, “this is not wizardry or rocket science. All you have to do is take an average, and count to 270. It’s a pretty simple set of facts.”

One Response to Election Polls: A Brief Lesson in Probability and Statistics

  1. Enrique says:

    Politico just got pwnd.

Leave a comment