Results tagged ‘ Statistics ’
If Numbers Could Talk…
Well, it’s been a while since my last post. I did not like my original notion of a format doing game previews for several reasons: 1) It’s very time consuming. I only have respect for those that can do that on a daily basis and do a good job with it. 2) It’s been done, and it’s done better than myself. Finally, 3) It does not really suit my personality so that’s that.
I wanted to get back into the “blogosphere” by sharing some thoughts I’d recently had about the use of statistics and baseball. There are reasons that numbers geeks love baseball. Probably the most obvious reason is that more than any other sport, baseball lends itself to numbers. The season is long enough and events occur frequently enough to permit meaningful statistical analysis. Numbers also mean a lot in the sport of baseball for historical reasons, and because of that numbers are deified in the sport more than in any other I can think of. All I have to say is 73; 56; 2,632; or 4,256 and most baseball fans know what I’m referring to. We talk about Hall of Fame credentials with respect to whether a batter had 3,000 hits, 500 homeruns, or a .300 average. We talk about pitchers with respect to 300 wins, 3,000 strikeouts, or a sub – 3.00 ERA as the pinnacle of the profession.
Football, by contrast, though a favored American sport does not hold the same reverence for numbers. Who holds the all-time record for sacks? How many? Who holds the all-time record for touchdowns? How many? Did you know that 19 of the top 22 career completion percentage leaders for quarterback are currently active? Do you know who heads that list? If you knew it was Chad Pennington, then you were ahead of me. It’s not that numbers don’t matter in football, but they don’t hold the same place in a fan’s heart that numbers do in baseball.
Sabremetrics has become an increasingly important field for those that study baseball. It started as an academic exercise, but since then has become nearly mainstream. Nowadays, you’re not an educated fan unless you know the importance of BABIP, OPS, Win Shares, FIP, and the like. I, admittedly, am a fan of numbers and have grown a decent respect for these types of peripheral statistics. Just the other night I was discussing with a friend a book I was reading on numbers and probabilities (For those interested, the book was The Drunkard’s Walk: How Randomness Rules Our Lives by Leonard Mlodinow. I highly recommend it for those interested in probabilites and statistics.You can buy it, Here). Her response to me trying to explain some of the things I found interesting was “you really love this stuff.” Probably true, and this comes from a person whose absolute least favorite subject in school was math.
The use of advanced statistics was popularized by Michael Lewis’ book Moneyball which, for those who are somehow baseball fans and simultaneously live in a vaccuum, chronicled the success of Billy Beane and the Oakland A’s in the early 2000′s by utilizing advanced statistics to “uncover” hidden value in players. My personal opinion is that these statistics are both incredibly useful for understanding baseball and a horrible means for permitting numbers geeks to undercut what is observed on the baseball diamond. When a particular player emerges from obscurity the modern tendency is to look to these underlying numbers. Generally, the conclusion comes back “he’s been remarkably lucky” compared to his norms or the norms of baseball. I think there’s a value in understanding why a player has been more successful, but reliance on the norms that sabremetrics identifies can also lead us astray.
There’s nothing wrong with assuming a player will regress if he’s “overperforming” or improve if he’s “underperforming” based upon his numbers. There’s nothing wrong with assuming that a player will migrate towards the mean of the normal distribution. However, a key word in all of this is distribution. One point which Professor Mlodinow makes in his book is that there’s a difference between knowing what a true probability is in the abstract and inferring from that what the results will be given a large enough sample size versus attempting to infer a true probability from observed results. We can say that most pitchers will have a BABIP against ranging from .290-.310 based upon historical figures, but that does not mean that’s the true range of what a given pitcher can do in the long run. Assuming that what has already been observed is in fact the true probability is a fallacy. The truth of the matter is that those numbers are based upon observed results of all pitchers and then inferring a result from the normal distribution about what a pitcher’s potential will be. Even something as simple as flipping a coin, to an extent, is an observed result. We generally start with the presumption that the chance of flipping heads or tails is 50/50, but observed results may differ. As we increase the sample size, we’d expect that the observed results will more and more closely resemble that 50/50 chance, but in all likelyhood, it won’t be exact. Even if you flip a coin a million or a billion times, the chances of it landing exactly 50% of the time heads or tails is pretty small. Sure, the results might be so close that the observed differential is neglible, but you still won’t be able to say for sure that the real chance of heads v. tails is exactly 50/50. With baseball, there are many more variables and a much smaller sample space to start from.
There’s value in observed results because baseball is a sport with a long season meaning that particular situations occur enough where better guesses as to future success can be achieved. Given a random pitcher of average capability, the safest play is to expect a BABIP somewhere in that range. However, baseball is still a human and random occurrence. So many variables go into every play that attempting to take even the highly researched sabremetric statistics as “truths” will inevitably lead to disappointment. Since observed results fall along a distribution rather than falling in line with a law based upon true probability suggests that there will be outliers. This can be true for a season or even a career.
My point of all of this is to say: If numbers could talk, they’d tell you to pay attention to what they say but also understand the limits of their predictions. When someone tells you “XXX can’t possibly sustain this because his HR%/FB is too low” pay attention to what that person has to say because they may know something, but don’t go so far as to believe that that person can predict the future to a certainty.
Recent Comments