Skip to content
June 30, 2018 / Nat Anacostia

A brief rant about statistics for suspended games

The May 15 suspended game between the Yankees and Nats was finished on June 18. MLB official statistics record the game as having taken place on the day that it started, which led to a number of statistical anomalies. We’ve seen the articles about how Juan Soto hit a home run five days before his debut.

Each month I prepare an article reviewing the baseball played by the Nats during the month. I’ve decided that the way that MLB records the game is not just a bit idiosyncratic. Rather, in the era of on-line databases like baseball-reference and fangraphs, the convention actually distorts baseball history in undesirable ways.

For example, if you look up the standings on June 6 from baseball-reference, it shows the Nats moving into first place with a 36–25 record and a half-game lead over the Braves. Except, the Nats didn’t have a 36–25 record or a half-game lead after playing on June 6; their actual record was 35–25 and they were essentially tied with the Braves (though the Nats’ .583 winning percentage was slightly better than the Braves’ .581). But with the databases now showing the game as if it were completed on May 15, the standings from May 15 through June 17 have all been revised. And for someone who is trying to write the history of the season as it actually occurred, that’s a problem.

I think I get how MLB originally made the decision. Back in the days of paper records, it was too much trouble to split the records for a game, so you had to file them all on a single date. But for at least the last 30 years, records have been kept in electronic databases, and there’s really no reason we can’t record a game having been played across two days (as it, in fact, was). The plate appearances from June 18 should be recorded as having taken place on June 18, not on May 15, and the Nationals’ win should also be recorded as having taken place on June 18. Modern databases should be able to accommodate this easily.

There would be a few statistics where we’d have to make some decisions. If a player played both on May 15 and June 18, it should count as one game played, and I guess I’d record the game played on May 15 when the game became official. (If a player only played on June 18, however, I don’t see any reason his game played couldn’t be recorded on June 18.) And if there’s a consecutive game streak underway, it should be based on the same day that the player’s game played was counted. But most individual statistics, such as plate appearances, hits, and strikeouts, can be recorded on the day that they actually occurred. No problem.

Let’s take this aspect of MLB statistical convention into the 21st century and start recording suspended games in a way that’s consistent with when the events actually took place. We’d have better records and a more accurate history.

Advertisements
%d bloggers like this: