The racial bias analysis of strikes-balls decisions of MLB umps by UT-Austin professor Daniel Hamermesh and coauthors (Parsons, Sulaeman, Yates) has generated the expected buzz in the popular press and on baseball blogs (Google Search results; Time article). See Hamermesh’s post on Freakonomics and link to the AER article).
As one would expect, given Hamermesh’s past reputation and standing, this is not a lightweight effort. It’s creative and utilizes data that many sports researchers would covet.  Valuable research is about the questions and attempts to answer them as it is about the answers. This piece has and will generate follow-on work.
The authors make some substantial, albeit qualified, claims:
The results allow us to think about the deeper question of measuring discrimination generally. If, as we show here, the match to the race/ethnicity of their evaluator affects evaluations of workers, then the measured productivity of the worker will depend on the nature of that match. This difficulty has serious implications for measuring discrimination and is another manifestation of the difficulty of identifying discrimination pointed out by Stephen G. Donald and Daniel S. Hamermesh (2006).
So what have the authors found?  Umps, favor, ever so slightly, pitchers of the same race. The effect holds up to adjustments for pitch count, inning, score, and game attendance. It diminishes with the application of technology for monitoring umps. The same effects don’t arise for ump-batter matches. The result is very slight on a percentage basis. Still, multiplied over thousands of pitches per season adds up to about 5% additional wins for a white pitcher when favored by a white ump and an ERA about 0.13 lower relative to baseline values over the 2004-2008 data sample in games without the monitoring technology.
Ok. And the big shoe dropping here would be …??? Let’s suppose these figures are right on the mark. The 0.13 improvement in ERA translates into a few thousand dollar increase in salary when plugged into the typical ERA effect on wins and wins’ impact on revenue equations. That’s a few thousand dollar impact in a league with average salaries over 3 million dollars.
Across various disciplines, many researchers aim to show that subtle biases continue to exist. Yes, my reaction tends to be, and the world is still around. It’s not merely a glass-half-full versus half-empty difference. It’s a glass nearly to the brim. Still, upon very close inspection, we’ve determined that it’s fractionally below the rim, too close to see with the naked eye or even with typical magnification. Still, we’ve discovered a measurable distance between the edge and the water level with a high-powered lens. As economists, we frequently assert and explain how zero is rarely the optimal amount of anything, even undesirable things like pollution. If the standard against which bias is to measured is zero, then I suppose we haven’t reached it, and this article provides evidence — showing us that, as Jim Buchanan would say, water still runs downhill.
If, instead of just accepting these minute results as they are, questions are raised, and the authors have responded to several. However, I’m interested in an expanded approach — a broader model of umpire decision-making. The authors’ controls are worthwhile, but when such a small effect is found, leaving out other ones that may have some not-so-obvious link to race may matter.  Maybe reputation (Cy Young winners) or point-of-delivery matters for umps and isn’t fully randomized across races.
Maybe of more interest, is skin color the only physical characteristic that matters to umps or other people? Rather than a model digging for bias, perhaps one should be looking at a model of affinity. What draws people closer? What are commonalities? When I’m people watching at Disney World, I’m struck by how frequently tall people find tall partners, dark-haired find, dark-haired partners, etc. Do umps “like” pitchers with similar characteristics to themselves? Do they favor tall pitchers? What about heavy-set pitchers? Fast (in terms of time between pitches) pitchers, neatly groomed, …?
What about the pitcher-hitter match? The authors briefly address this and refer to tests where they took account of hitter-ump battles, but this is a very unsatisfying response. If we are going to call favoritism across pitchers racial bias, what are we going to call the implicit within-race favoritism of the pitcher against the same-race hitter?
Racial “biases” and racial “differences”? As I expressed in my NBA referee-bias comments, a curious and frustrating constraint hangs over economics and other disciplines. Topics that involve “biases,” even tiny ones, find quality publication outlets. However, other areas where much more pronounced racial differences exist, such as the considerable shift in the racial composition of NBA All-Defensive players, Olympic sprinters, or NFL cornerbacks and safeties, are entirely taboo. It would be interesting to see economists of the quality of the MLB ump paper take on these race-related questions.
(Thanks to one of my long time friends for calling my attention to the Freakonomics post)
Right on.
I read the original paper and my overwhelming reaction was “so little discrimination!!?” I’m old enough to think this is near miraculous progress.
As a retired scientist my next thought was “anytime you think you have found a very tiny effect, even if statistically significant, odds are it isn’t what you think it is.” You expressed a bunch of possibilities.
There are much more dramatic effects in baseball that we don’t understand. For example what happened to Barry Zito the last few years? or are there more pitchers like Ryan Vogelsong underemployed somewhere? (If you’re not a Giants fan substitute whatever unexplained annual change you like.) Why worry about one or two marginal calls a game.
Couldn’t agree more, Brian – never ceases to amaze me how often this stuff (as solid technically as it is) turns up in AER, JPE, etc. I, too , would like to see more on those racial issues you refer to, though it might be harder for authors to sell the economic content in such work.
Brian,
Well said, but I have a counterpoint. I spoke with Dan Hammermesh about this paper not long before its final acceptance by the AER. I very clearly recall us both agreeing that the key finding in the paper is that a small amount of discrimination responds to various prices/incentives. Water flows downhill, yes, but perhaps in a different way than your post suggests.
Some additional thoughts on this paper here:
http://leastthing.blogspot.com/2011/07/bias-and-accountability-in-context-of.html
If you fire a gun from the same spot at a target 100 yards away it will never hit in exactly the same spot. When you are talking about humans making hundreds of decisions each game you are going to have variances from perfection. Wouldn’t we be more surprised if the variance was zero?