Skip to content

MLB Umps Not so Color Blind?

2011 July 5
by Brian Goff

The racial bias analysis of strikes-balls decisions of MLB umps by UT-Austin professor Daniel Hamermesh and coauthors (Parsons, Sulaeman, Yates) has generated the expected buzz in the popular press and on baseball blogs (Google Search results; Time article).  See Hamermesh’s post on Freakonomics and link to AER article).

As one would expect given Hamermesh’s past reputation and standing, this is not a lightweight effort.  It’s creative and utilizes data that many sports researchers would covet.   Valuable research is as much about the questions and attempts to answer them as it is about the answers.  This piece has and will generate follow-on work.

The authors make some substantial, albeit qualified, claims:

The results allow us to think about the deeper question of measuring discrimination generally. If, as we show here, the match to the race/ethnicity of their evaluator affects evaluations of workers, then the measured productivity of the worker will depend on the nature of that match. This difficulty has serious implications for measuring discrimination and is another manifestation of the difficulty of identifying discrimination pointed out by Stephen G. Donald and Daniel S. Hamermesh (2006).

So what have the author’s really found?   Umps favor, ever so slightly, pitchers of the same race.  The effect holds up to adjustments for pitch count, inning, score, and game attendance.  It diminishes with application of technology for monitoring umps.  The same effects don’t arise for ump-batter matches.  The effect is very slight on a percentage basis, but multiplied over thousands of pitches per season, it adds up to about 5% additional wins for a white pitcher when favored by a white ump and an ERA about 0.13 lower relative to baseline values over the 2004-2008 data sample in games without the monitoring technology.

Ok.  And the big shoe dropping here would be …???  Let’s suppose these figures are right on the mark.  The 0.13 improvement in ERA translates into a few thousand dollar increase in salary when plugged into the typical ERA effect on wins, and wins’ effect on revenue equations.  That’s a few thousand dollar impact in a league with average salaries over $3 million dollars.

Across a variety of disciplines, many researchers aim to show that subtle biases continue to exist.  My reaction tends to be, yes, and the world is still round.  It’s not merely a glass half-full versus half-empty difference.  It’s a glass nearly to the brim, but upon very, very close inspection, we’ve determined that it’s fractionally below the rim, too close to see with the naked eye or even with typical magnification; but with a really high-powered lens we’ve discovered a measurable distance between the rim and the water level. As economists, we frequently assert and explain how zero is rarely the optimal amount of anything, even undesirable things like pollution.  If the standard against which bias is to measured is zero, then I suppose we haven’t reached it, and this article provides evidence — showing us that, as Jim Buchanan would say, water still runs downhill.

If, instead of just accepting these minute results as they are, questions are raised, and the authors have responded to several.  However, I’m interested in a expanded approach  — a broader model of umpire decision making.  The controls the authors employ are worthwhile, but when such a small effect is found, leaving out other ones that may have some not-so-obvious link to race may matter.   Maybe reputation (Cy Young winners) or point-of-delivery matters for umps and isn’t fully randomized across race.

Maybe of more interest, is skin color the only physical characteristic that matters to umps or other people?  Rather than a model digging for bias, maybe one should be looking at a model of affinity.  What draws people closer, what are commonalities?  When I’m people watching at Disney World, I’m struck by how frequently tall people find tall partners, dark-haired find dark-haired partners, and so on.  Do umps “like” pitchers with similar characteristics to themselves?  Do they favor tall pitchers?  What about heavy-set pitchers?  Fast (in terms of time between pitches) pitchers, neatly groomed, …?

What about the pitcher-hitter match?  The authors briefly address this, and refer to tests where they took account of hitter-ump matches, but this is a very unsatisfying response.  If we are going to call favoritism across pitchers racial bias, what are we going to call the implicit within-race favoritism of the pitcher against the same-race hitter?

Racial “biases” and racial “differences”?  As I expressed in my NBA referee-bias comments, there is a curious and frustrating constraint that hangs over economics and other disciplines.  Topics that involve “biases,” even very small ones, find quality publication outlets.  However, other areas where much more pronounced racial differences exist, such as the huge shift in the racial composition of NBA All-Defensive players, Olympic sprinters, or NFL cornerbacks and safeties, are completely taboo.  It would be interesting to see economists of the quality of the MLB ump paper take on these race-related questions.

(Thanks to one of my long time friends for calling my attention to the Freakonomics post)

5 Responses
  1. Norm permalink
    July 5, 2011

    Right on.

    I read the original paper and my overwhelming reaction was “so little discrimination!!?” I’m old enough to think this is near miraculous progress.

    As a retired scientist my next thought was “anytime you think you have found a very tiny effect, even if statistically significant, odds are it isn’t what you think it is.” You expressed a bunch of possibilities.

    There are much more dramatic effects in baseball that we don’t understand. For example what happened to Barry Zito the last few years? or are there more pitchers like Ryan Vogelsong underemployed somewhere? (If you’re not a Giants fan substitute whatever unexplained annual change you like.) Why worry about one or two marginal calls a game.

  2. Liam Lenten permalink
    July 5, 2011

    Couldn’t agree more, Brian – never ceases to amaze me how often this stuff (as solid technically as it is) turns up in AER, JPE, etc. I, too , would like to see more on those racial issues you refer to, though it might be harder for authors to sell the economic content in such work.

  3. July 6, 2011


    Well said, but I have a counterpoint. I spoke with Dan Hammermesh about this paper not long before its final acceptance by the AER. I very clearly recall us both agreeing that the key finding in the paper is that a small amount of discrimination responds to various prices/incentives. Water flows downhill, yes, but perhaps in a different way than your post suggests.

  4. Dan permalink
    July 15, 2011

    If you fire a gun from the same spot at a target 100 yards away it will never hit in exactly the same spot. When you are talking about humans making hundreds of decisions each game you are going to have variances from perfection. Wouldn’t we be more surprised if the variance was zero?

Comments are closed.