Sign in / Join

NCAA Tourney: Seeding Noise?

File this post under, "I meant to publish this a month ago but forgot." Anyway ... In March of 2005 I complained about all of the attention given to the last team left out of the NCAA basketball tournament when the real problem lay in the seeding.  My suggestion:  seed teams 1-4 and have a random draw play-in round for 96 teams for the other 48 spots with seeds randomly assigned to the winners of the play-in round.    The NCAA's seeding of teams 5-12 appeared to rely more on conference affiliation or name recognition than genuine differences in performance ability.  Over the past few years, the increasing flow of one-and-done players to the NBA has only reinforced my suspicions.

Data from the highly competitive 2011 Tourney strongly suggests that differences in, say 5 and 11 seeds are built on slight, if not non-existent, differences.  In fact, seeding 4-13 has little informational content.  If all of the first round games are used, the seeding doesn't look too bad.  Here are the results of a simple regression of seeding on score differential:

Score Differential = 4.0 - 1.7xSeed Differential  (Score Diff Explained = 36%)

In fact, the seed differential does a slightly better job (36 percent to 32 percent) than the Vegas line in predicting scores within the sample:

Score Differential = -1.8 + 1.0xVegas Spread (Score Diff Explained = 32%)

In this setup, the Vegas line displays a 1-to-1 relationship with the score differential.   Both the Seed difference and Vegas line jump up 10 percent in explanatory ability of non-linear effects are included.  The Selection Committee and Vegas oddsmakers and betters appear to find and utilize useful information.

However, cutting the sample seeds 4-13 produces very different and sketchy results:

Score Differential = -2.8 - 0.05xSeed Differential (Score Diff Explained = 0.2%)

Score Differential = -5.3 - 0.7xVegas Spread (Score Diff Explained = 7%)

While Vegas "explains" more of the score differences, the direction of effect is wrong -- bigger spread, smaller score differential.  Both of these models do a little better with non-linear effects, but not much.  Seed Differential increases to only 4%. Beyond score differences, both do a poor job of predicting winners.  Of the 20 games, the higher seeded team lost 7 and the team with a point spread advantage lost 9.

A big gap exists between teams seeded 13 and those seeded 15 and 16, giving the top-seeded teams a huge advantage by removing one round of the tourney.  In spite of this one game "bye," even 1 and 2 seeds are very vulnerable once the truly weak teams are sifted.

Of course, a single year of data hardly proves anything, but it is suggestive that the NCAA Selection Committee spends countless hours finding ways to interpret data that contains a lot more noise than signal.