Matt Harvey and the Challenge of Choosing the Right Data to Measure


[Image courtesy of snymets]

Yesterday, my friend and fellow baseball fan (and EAC Commissioner) Matt Masterson sent me a Tweet regarding the ongoing controversy over the New York Mets’ Matt Harvey. Harvey is an incredible young pitcher who is back in the game this year after “Tommy John” surgery to replace a damaged ligament in his throwing arm. Because he is returning to the mound after surgery and the lengthy rehabilitation process it requires, he and the team want to be somewhat cautious in how they use him lest he re-injure his arm.

But now the Mets are on the verge of clinching their division [NOTE: As a fan of the division-rival (and preseason favorite) Washington Nationals fan, I don’t want to talk about why] and both the team and the pitcher (and his agent) are forced to confront the decision about how to balance the desire to moderate his workload (and preserve his career) with the adage “championship flags fly forever.”

The focus of the dispute now is a limit of 180 on the number of “innings pitched” (IP) by Harvey this season – with a controversy emerging about whether 180 IP is a hard limit like it has been with other pitchers like the Nats’ Stephen Strasburg in 2012 or if it’s somewhat flexible given the Mets’ sudden good fortune.

But questions have arisen about the usefulness of that stat. IP is calculated by counting the number of outs a pitcher gets and dividing by three – meaning that the limit of 180 in Harvey’s case would give him 540 outs – but it ignores lots of other data that might say more about the toll on a pitcher’s arm like how many batters he’s faced, pitches he’s thrown or (a new concept) the stress level of the situations he’s facing (score, runners on base, etc.) Yesterday, ESPN writer Jayson Stark said in a radio appearance, “with all the information and advanced analytics, why are we counting innings in this day and age?” In other words, while IP might tell us how well a pitcher is throwing in relation to advancing his team’s chances to win, it tells us very little about how Harvey’s workload might affect his long-term health.

The problem, I think, is that IP is a stat that’s easy to calculate and familiar to everyone; while some of the other data is also available, it isn’t as widely understood or (in the case of the “stress level”) not yet fully agreed-upon or accepted. As a result, the Mets and Harvey (and his agent) are fighting about a very important issue – the health of his very valuable pitching arm – using data ill-suited to the task.

I think we often face the same problem in the field of elections. I’ve already written at length about how turnout is overrated and misunderstood as a metric of the health of the election system – in part because it’s (relatively) easy to measure and understand. But that focus on turnout can obscure other important inquiries like the efficiency or cost-effectiveness of a voting system, which require data that’s much harder to collect and analyze if it’s being collected at all.

That’s why it’s so important and exciting that there are efforts underway in academia and the private sector – as well as in election offices nationwide – to gather more real-time data (including cost) on every aspect of the voting experience in order to better assess what’s working and what’s not. These efforts, like the efforts to gain more understanding about pitcher fatigue, are data-focused problems with huge ramifications for both the success of an enterprise (like a baseball team or a local election system) as well as a window into the best ways to achieve that success.

Bottom line whether you’re an election official or a baseball GM: if you’re going to consider making a binding decision based on data, make sure it’s the right data – and make sure you’re using it the right way.

Will it happen? Will it work? Stay tuned …

Be the first to comment on "Matt Harvey and the Challenge of Choosing the Right Data to Measure"

Leave a comment

Your email address will not be published.