Tuesday, December 14, 2010

Believing You Can Do It

The word “statistics” brings memories of dread for quite a few people. While the word simply means “the collection, organization, and interpretation of numerical data”, most people think of the high school or college class that involves sampling, interpolation, and regressions, and honestly, I don’t have much of an idea what some of those words mean (I have never taken a “stats” class). All people really think when they hear “stats” is computers doing really complex math, which people don’t like anyway. “Statistics” inspires fear and confusion, and at times it can seem like magic—just plug in some numbers and get the result. But what the hell happened in between?

And here’s where we arrive at the problem. There are all sorts of great primers—Sabermetrics Library, Lookout Landing, and books such as Beyond Batting Average—but there’s still some anxiety toward those statistics. Where do the coefficients (it’s not just the coefficients; formulas are just as confusing trying to decipher how one arrived at the final result) 13, 3, and 2 in the equation for FIP come from? Or all the damn coefficients in wOBA? It’s great to give me the equation because I can plug in the counting stats such as HR, 2B, K, BB, etc., but where the hell does the .90 come from that I have to multiply to the number of singles as part of the wOBA equation? Bring on the intricate math, the computers, and the fear.

This is going to get a little complicated, but try and bear with me. Let’s say you don’t know much about the newer stats, so you go to one of those great sites to find out more. When you get there, you understand the theory, but the formula is a bit confusing. You ask someone what they mean, and someone tells you that they are numbers derived from weighting different stats against each other, usually according to run expectancy (which you may not also understand—it’s essentially the number of runs you can expect to get once you have done something). The further one delves into it and the more that the person has to explain, the further away you get from what you can actually understand. You aren’t a statistics guru, and when they start talking about regression, you get as confused as I do when the guy at Willis Music starts talking about Humbucker Pick-ups. At some point, you both realize that there is a communication barrier between you based on previous knowledge. It’s no one’s fault, but you aren’t going to completely understand each other (in the same way, you go to a doctor and need to sign an “informed consent” for a procedure; there’s no way you’re going to be “informed”, but you essentially have to have faith that the doctors know what they’re doing and that they have your best interests in mind). Once you have reached this stage, you are going to have to take his word for it, and that’s not exactly what people like to do.

Keeping religion out of it, faith is a tough issue. So this completely scientific guy is telling me to take his word for it? And I’m just supposed to believe he has no bias, whatsoever, in making this suggestion? So what about these other people who have other coefficients and formulas for their statistics? How does that happen if you both used proper math and scientific methods? Now, you’ve reached a climactic moment. Take the guy’s word for it, knowing that he would have been reamed by others if the math was truly horrible, or cling to the accepted statistics that have more understandable and overt measurements. Some people never go down this far down the road, but this is the essential question you have to ask yourself (I’m just giving you the benefit of the doubt that you didn’t simply dismiss it). People choose different reactions, of course, and that’s just the nature of being human and being different. I chose to take a leap of faith (something I don’t like doing) because I figured that, if there were major problems with a metric, there would be an outcry over the metric (ie. I trust defensive metrics the least of all metrics because of the amount of criticism coming from all sides and from the actual arguments made). If there isn’t an outcry, I’ll see if all the cool kids are using it.

But I have some trouble with this. I don’t like “taking your word for it”. I don’t like telling other people to do it. Discussion and criticism are essential elements of analysis (you may ask, “Why do I care? I don’t need these stats”, but we’re all amateur analysts using stats to back-up our assertions), but our bases of knowledge—what we already know—are different. There really isn’t a way to correct that. But I am asking you to take a leap of faith, and seeing that I am not a statistics expert on the behind-the-scenes stuff, what is my advice on how to handle this? The first thing I do is find sources that I trust. When I first began reading more on the subject, I read Craig Calcaterra (of Shysterball fame, then), and through his posts, I found out about Tom Tango, FanGraphs, Rob Neyer and Keith Law at ESPN, and Baseball Prospectus. After a while, I cyber-met Jason here, Peter Hjort at Capitol Avenue Club, TCM and Bill at the Platoon Advantage (though they had different affiliations then), and Daniel Moroz at Camden Crazies, and whenever I have major questions about metrics, I ask them. I can’t tell you how appreciative I am of their help, and I am privileged to cyber-know them. Hopefully, you trust me at this point, but you can also trust them. I began trusting them because what they wrote made sense, other people cited them, and lots of people began reading them. If something controversial happens, I’ll see what they have to say and weigh the arguments if they conflict. But what if you don’t want to do all that? Well, you’re already here, so go to the contacts section, copy my email address or Twitter page (follow me anyway … or at least talk to me?), and send me an email or tweet. I’ll do the legwork if I can’t help you myself.

We all come from different backgrounds and have different interests, which lead to varying bases of knowledge. When it comes time to look at baseball statistics, the varying levels of knowledge about statistics gives the newbies pause. If they’re really interested, maybe they’ll learn more about the math involved, but it’s not practical that everyone would do that. Knowing that, the best thing to do is to accept this and investigate the arguments of the statisticians, while believing the underlying math is correct. After all, they are still fighting for credence, and it doesn’t make too much sense for them to be awful at their job. But it’s not an easy thing to do. If you argue in favor of something using these new statistics without knowing everything about what went into making them, you feel like you’re walking a tightrope without a firm grasp, and if the statistic is discredited, then your analysis is as well, making you appear stupid and possibly gullible. That’s not an easy approach to take, and not one an agnostic like me wants to do either, but when faced with a situation in which you will probably never completely understand, sometimes you have to take that leap. As long as the argument makes sense, that is.

This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php
Five Filters featured site: So, Why is Wikileaks a Good Thing Again?.


View the original article here

No comments:

Post a Comment