On winning & attendance in MLB

My suspicions regarding the Star-Telgram study proved warranted. That data hound and regression maestro, JC Bradbury of Sabernomics picked up the ball and did some analysis. Here's JC's picture of the bivariate relation between winning % and attendance in MLB.

That's an impressive chart.

JC notes that the Star-Telgram's

model includes winning percentage along with runs and era. This is not much different than including runs scored and runs allowed, which would correlate with the Pythagorean win percentage. This is almost like putting win percentage in the regression twice. This creates a problem known as multicollinearity, which can bias the standard errors upwards and lower t-scores.

Worse, winning percentage is what we economists call endogenous, hence its coefficient cannot be identified in the Star-Telegram's regression. The correction for this is not immediately obvious. If you share my belief - that fans are more entertained when the home team wins 4-3 than when it loses 5-4, you need to estimate a model which includes "superfluous" scoring, not all scoring. If you share the Star-Telegram's professed belief, that fans care only about scoring and not about winning, you need to include a variable for the propensity to win that is not predicted by, say the Pythagorean formula. Perhaps JC (or I) can address that issue with a little free time.

In the meantime, JC's chart rules!