There seems to be a myth floating around that there is parity developing in college basketball. That the lower seeds are performing better in the tournament than they did in first decade of the 64 (or 65)-team field. With the inspiration of a friend of mine, we delved into the numbers to see if we could statistically find this parity.
College basketball has gone through many changes both in rules and standards over the last twenty years. Basketball has become much more popular internationally and consequently, there are many more international players in college basketball. This has contributed to a larger pool of good college players, possibly making the difference between the best and the worst players smaller than what it was in the past. Further contributing to the pool of college basketball players is the difference in rules for players entering the NBA draft. Up until 2006 players were allowed to go straight from high school to the NBA. Now, players are required to be 19 years old and must be at least one full year removed from their high school class’ graduation. The larger number of good college players could potentially contribute to more parity in college basketball.
There are also proposed rule changes on an annual basis with several changes in rules during this period. Recent changes have included the distance of the three point line and a different shape for the free throw lane. We assumed these differences in rules over the years should not change the differences in teams because all teams have to deal with these rules, and they should not favor better or worse teams. Starting in 2001 they expanded the NCAA tournament from 64 teams to 65 teams, making the two lowest seeded teams play one another to advance to the first round. This could theoretically make the number 16 seeds slightly better than before as arguably the weakest team gets eliminated, but since no sixteen seed has ever won a game we assume this does not have any real effects on our data.
To test these claims that talent levels are evening out, we compiled the last twenty years of tournament data and used several statistical tests on Microsoft Excel. We looked at the data in two respects, first utilizing linear regression analysis, and then splitting the sample into two decades and using statistical tests to measure the difference between the 1990s and the 2000s.
The two main parameters that we were interested in were the margin of victory and sum of seeds by round. As parity increases, we would expect both more upsets and a smaller margin of victory (on average) as the lower ranked teams do better. Which seeds are in each round is just a listing of the seeding of teams that advanced to each specific round. So if we add up all the seeds in a given round, the higher that number, the more upsets there were. This data entry system allowed for a quick and accurate sorting of data to evaluate the different parameters of interest.
We attempted to use a simple linear regression, with the years as the explanatory variable and the margin of victory as the response variable. For example, we looked at margin of victory between only 1 and 16 seeds in the first round, and margin of victory between only the 5 and 12 seeds in the first round, etc. All regressions exhibited a slope coefficient indicating a declining margin of victory–meaning the lower-ranked teams are performing better–but none of the regressions were statistically significant at the five percent level. What this means is, that while the average margin of victory between the favorites and the underdogs was indeed shrinking, it was not shrinking by enough to be sure that it was a true phenomenon. What we need is more years of data.
To evaluate the differences between the 1990s and the 2000s, we statistically compared the average margins of victory and average sum of seeds between the two decades. We used statistical tests that basically measure a bunch of data in two distinct samples, and then tell us whether or not the means are comparable statistically. This is where we focused on two distinct stats: one being margin of victory again, and the other being the total sum of the seeds in the 2nd, 3rd, 4th, 5th and 6th rounds (notice that the sum of seeds in the first round will always be the same).
For all tests, we were unable to statistically show a difference between the 1990s and the 2000s at the 5% level, meaning that there is, once again, no obvious parity increase in the last ten years over the previous 10. Furthermore, none of the tests were even significant at the ten percent significance level. This means that even expanding our window to allow for more randomness, we still couldn’t show that parity was occurring. This evidence is pretty clear that there is no statistically significant difference between the decades in terms of margin of victory or worse teams advancing further in the tournament.
Originally we thought that the only statistics that we could use were the score, round and seeds. After we finished the tests we realized that there are differences between seeds even if they both have the same rank. Since there are four of each seed for every year there can be a large difference between any one two-seed and another two-seed, for example. Take 2005 for example. The Washington Huskies grabbed the one-seed out west despite being ranked in most polls as 10th or 11th in the country. Generally there is a marked difference between the #1 team in the country and the #11. To control for this we could in the future use a combination of final rankings, as voted on by the press, so that we are able to differentiate between these seeds.
We also only considered only the final score—the final margin of victory—which contains several biases. To say that a team won the game by ten points can imply different types of games. The game might have been really close, and in the final couple minutes one team pulled away on free throws; or the game might never have been close, and the losing team scored the last 5 or 10 points to close the gap. We also did not include any variables for overtime even though games that go to overtime are clearly closer than games that do not.
Unfortunately the rules and number of teams eligible for the tournament change frequently, which makes it hard to test differences over years while holding these other variables constant. There are a couple years of data from the 1980’s that have exactly 64 teams that could improve or change our analysis. Before the 1980’s there were fewer teams in the tournament, completely changing its dynamic. If the teams remain constant for the next several years, we will receive more data which we could evaluate to see if there is any indication that parity is actually changing.