Wednesday, July 9, 2008

More Fun with Sports and Statistics

Or, what to do with yourself now that the basketball season is over.

Now that the NBA season has ended and the Boston Celtics have been crowned champions, I find myself with little interest in pro sports. Baseball has long been my least favorite of the three major professional sports, even when my beloved Colorado Rockies came out of nowhere last year to claim the National League pennant. Their amazing win streak last year got me thinking, though - should we have seen this coming? Is there a way that we could have accurately predicted this?

Of course, the answer is yes. Just like I did for the NCAA tournament, I created a metric to gauge an MLB team's strength. And just like the SPI, I used scoring differential as the primary method of determining power. Like I argued before, scoring differential is a more reliable tool for determining a team's overall strength than wins and losses. A team winning a majority of its games by extremely slim margins isn't necessarily a good team. More likely, it's a lucky team having a lot of things go its way, which you'd expect to level off. A team with a high scoring margin - even over subpar teams - can be reasonably expected to be a quality team. Scoring differential also factors out two weaknesses that are hidden by a win-loss record - great offense and lousy defense, and vice versa. A team with great offense but no defense could score 12 runs a game, only to give up 11. That gives them a low scoring differential. Likewise, a team that has no offense but a fantastic defense maybe scores one run a game while giving up zero. Again, low scoring differential.

I compiled some data on the 30 MLB teams (home record, road record, and scoring differential) and created a simple formula. (I'm not telling you the coefficients I used. You want a formula, go make your own.) I then added 100 to each score to keep teams from going negative, and also because a score of 102.915 seems cooler than one of 2.915. ESPN.com keeps great records, so compiling all the data was pretty simple. And once I finished it, the results were, unsurprisingly, not what common wisdom would have told you.

The entire sports world has been crowing about how amazing the Tampa Bay Rays have been this year, and they're right. They've come off ten consecutive losing seasons to post the current best record in the league. That's pretty impressive. But I think it's wrong to crown them the best team in the league based on that alone. They've scored 70 more runs than they've allowed this year for an average margin of victory of 0.787 runs per game. That's pretty good - fifth in the league. But since four teams have a better differential, four teams find themselves above Tampa Bay in the standings - Philadelphia, Boston, and both Chicago teams. In fact, the Chicago Cubs have scored 36 more runs in their differential than Tampa, putting them easily in the top spot.

There's even more we can learn from this metric. Conventional wisdom holds that the American League is far better than the National League right now. A look at my metric (I hesitate to call this one the SPI as well, but I haven't got a better name for it) shows that seven of the top ten teams in the league are from the AL. Pretty compelling. We can also learn that there are some underachieving teams in the league, and some overachievers. By looking at the SPI, you can tell how many games a team should be expecting to win based on their scoring differential. (You can extrapolate that to see how many games they should win over an entire season, but that's not accurate, as it fails to take into account trades and injuries.) Based on that, we see teams like Atlanta (currently 43-48) sitting well beneath their ability. The SPI has them six games higher, with an expected win-loss record of 49-42 and cracking the top ten. (ESPN.com's Power Rankings have them currently at 21. At the same time, you have teams like the L.A. Angels, which ESPN lists at number 3. With a record of 52-36, they look like a strong contender. However, the SPI pegs them at 48-42, with a barely positive run differential of 24.

The difference, probably, is that the SPI measures a team's potential, while most commentators measure performance. And true, we ought to be concerned with what a team is actually doing rather than what they could be doing, but with half the season left, I think measuring a team's potential still has some use. And if you're complaining to yourself about my writing another sports statistics article, then too bad, because this is my blog and I can write whatever I want unless the editors fire me.

No comments: