Effectively Wild: A FanGraphs Baseball Podcast - Effectively Wild Episode 519: The Crack of the Bat and Other Research

Episode Date: August 21, 2014

Ben and Sam talk to Robert Arthur about the analytical significance of the crack of the bat and his other original sabermetric research....

Transcript
Discussion (0)
Starting point is 00:00:00 You were right. He's hand-stressed. How did you know that if you can't see him? Because I've been in this business too damn long, that's why. No, it's more than that. Tell me. It's the sound you hear. It's like a ball coming off the batter and exploding into a glove. It's a pure sound. You'll know it when you hear it's like a ball coming off a batter and exploding into a glove. It's a pure sound. You'll know it when you hear it.
Starting point is 00:00:30 Good morning and welcome to episode 519 of Effectively Wild, the daily podcast from Baseball Prospectus, presented by the BaseballReference.com Play Index. I am Ben Lindberg of Grantland.com, joined by Sam Miller of Baseball Prospectus. Hi, Ben. Hello. Today we are going to be talking about an article that Baseball Prospectus published on Wednesday, and it's an article that made me miserable for a while. It's about the analytical value of the crack of the bat, and it's some very interesting and original research. It
Starting point is 00:01:03 made me miserable because I had once tried and failed to do something on the same subject. And I harbored this delusion that one day I would revisit it. And then ultimately, it made me happy because it was done better than whatever I ever would have done. And because it was written by a person I helped bring to BP, which means that everything he writes is really a monument to me in a sense. And that person is Rob Arthur, who joins us today. Hello, Rob. Hey. How does it feel to be a monument to me?
Starting point is 00:01:34 A little disappointing, but I'll make it. So we're going to talk about this article, which is, I think, one of my favorite pieces of baseball research that I've seen recently, and also one of Sam's. I guess I'll start off by asking you what inspired it, or what made you want to study the crack of the bat and see whether there was any sort of analytical value to the sound. I don't think that there was any one moment that I decided to do it. I've just seen people talking about it a lot and it was something that had occurred to me just while watching baseball. You hear the sound and it sounds different for different players and it sounds different when it's a line drive versus when it's a home run versus when it's a ground out. And I was just sort of thinking about, would the sound be able
Starting point is 00:02:25 to actually tell you anything about the quality of contact? And so that was the initial inspiration, I guess. Okay, well, this is a podcast. This is audio. So we can bring you those sounds so that you don't have to imagine them. So I'm going to play these sounds now. So Rob in this article embedded a number of different batted ball types, like composites of a number of different batted balls of a certain type, so that you can hear the differences between them. So first I will play the sound of a composite line drive. This is 10 line drives put together, is that right? Is that an accurate description of what this is? Yep, that's right.
Starting point is 00:03:12 Okay, so here are line drives. Okay, and then you can hear the difference between that and ground outs, which I will play now. So here are ground outs which I will play now so here are ground outs and now I am going to play them back-to-back quickly so that you can compare okay and you can even hear the difference between ground ball outs or ground outs and ground ball singles the the ground ball singles are slightly higher pitched and now I will play those first the ground outs and ground ball singles. The ground ball singles are slightly higher pitched, and now I will play those. First the ground outs, then the ground ball singles. Okay, so tell us why these sound different.
Starting point is 00:04:00 So first of all, I didn't know when I started. All I knew is that I recorded a bunch of sounds and they sounded different from each other. So I had to go ask Alan Nathan, who is a former professor of physics from the University of Illinois and an expert on the physics of baseball. And he informed me that basically there's two factors that are at play that change the sound of the collision. One is the relative speed of the collision. So when the player is swinging the bat faster and the ball is going faster, that will make the sound of the collision higher pitched. And then the second thing is where on the bat the ball is contacted. the where on the bat the ball is contacted. So if it's right on the sweet spot,
Starting point is 00:04:49 the idea is that it will be higher pitched and it may minimize the amount of vibration of the bat that occurs. The vibration on the bat is sort of bad for the speed of the ball that comes off it because the vibration is wasted energy that could be going into driving the ball. So the closer it is to the sweet spot, again, the higher pitched it should be, and the fewer the off that pitch vibrations
Starting point is 00:05:13 there should be. So can you describe the process of isolating and analyzing these sounds a little bit and make it sound as complicated as possible so that I feel better about being completely stumped when I tried to do it. Yeah, I thought it was going to be complicated, but it ended up being pretty easy. I just downloaded this, this software called Audacity, which is a, um, uh, audio recording app. And, um, I put on a baseball game and I just started recording and, um, Audacity sort of tells you about the amplitude of the sound that it's recording. And initially, I just wanted to see whether I could figure out where there were hits and whether those had a distinct signature in the sound. And it turns out that they have a very obvious signature.
Starting point is 00:05:59 So in the course of the game, the pitcher will throw, the announcers will stop talking. And then if there's a contact, there'll be this huge spike in volume that's very short. And that's what the hit manifests as in the audio data. And so it's very easy to pick out. So once I figured out that I could pick out the hits, the contacts, I went through the condensed games that are on MLB.tv, and I just started recording as many hits as I could and noting what kind of contact each hit was. So whether it was a ground out or a line drive or a home run or whatever.
Starting point is 00:06:40 And so I ended up getting about five to ten of each different variety, and then I started looking at how the different kinds of hits differ from each other. So for a lot of reasons, we're only talking about ten or so of each kind of hit. So that wouldn't be enough to smooth out all the different variations that might affect this beyond the quality of contact. So what sorts of noise, it's weird to use the term noise here, but what sorts of noise would there be in this data as it is? Well, people have been suggesting things all day. One of the things is the speed of the pitch, the kind of pitch, like whether it's a breaking ball or a fastball, the kind of bat, what the bat is made out of, what sorts of wood, the size of the bat. Somebody suggested that.
Starting point is 00:07:45 himself is probably going to have some impact although we haven't been able to look into that yet. And also a big one that I did notice is just the ballpark. So since I'm recording directly from the TV audio feed, different ballparks, I don't know if they have different mic setups or a different audio feed processing or what, but certainly there's an effect of where the game is being played on the sounds that you hear. And I haven't yet filtered out all those effects. Okay, so presumably to filter out those effects, we would need a lot more of these. And it seems somewhat, as you describe it, it's time consuming to do this kind of collection. So, I guess two questions. One
Starting point is 00:08:26 is, do you anticipate finding a way to mass-collect these where you could theoretically get a thousand or more in a sample? And two, if you could, what would you say would be the predictive value of this? How would you use this for analysis beyond a cool observation? What would you use it for? So the first part, how to collect the data in a higher throughput way or basically get more samples. I'm not sure yet. And I've been talking to Ben about maybe getting some help collecting different samples from different particular players. The other thing I thought about was maybe crowdsourcing it somehow to see if I could get people to send in backcracks from their team or their favorite player or whatever. But those are the only two things I've come up with. I don't, it's
Starting point is 00:09:25 gonna be difficult to make it go any faster for a single person. I think this will definitely require just multiple people collecting the data. So the second part of your question was what could we use all this for? I think that's a really interesting question and I don't know all of the possibilities, but the thing that initially sort of suggested itself to me was, because the sound relates directly to the quality of contact, we might be able to see what players are making good quality contact and whether that matches up with, for instance, the kind of BABIPs that they get. So do players that have abnormally high BABIPs, is it because they are making consistently good contact, consistently squaring up on the ball, and vice versa? Are the players with abnormally
Starting point is 00:10:20 low BABIPs, does that arise because they are just sort of deflecting the ball off the side of the bat? So you can imagine using that in a predictive way to see if a player is in line to overperform their current babbip or underperform it. So that's the one sort of obvious use that I've come up with. But there's also a lot of just sort of frivolous, fun things that you could do with it, like see whether Javier Baez makes a really special sound with his bat, as people like to say all the time. So this is sort of like, or the way I've been thinking about it is sort of like a homebrew hit FX in a way. is sort of like a homebrew hit FX in a way. Theoretically, if we had bat crack sounds for everyone on every batted ball, we could assess their quality of contact
Starting point is 00:11:12 in a way that teams can currently by just looking at exit speed and exit angle and all those cool things that we wish we had access to, but didn't. Is that essentially the case that it could maybe replicate some of the value of that for those of us who don't have access to that information? Yeah, I think that's possible. So the hit effects gives you the speed off the bat, right? And so that will certainly be, we'll be able to get at that same kind of question with the sound and I don't know to what
Starting point is 00:11:51 extent you can tell with hitfx where on the bat the contact occurred maybe you can using the angle but we should also be able to say something about that whether the the contact is consistently at the sweet spot or off the sweet spot using this kind of data. And we might be able to, I mean, who knows, we might be able to get other things that even hit effects can't see. The sound data is so rich that you might be able to see all sorts of different things in it. And can you imagine, I mean, is it possible that if you were running a scouting department that you would tell your scouts to record audio of a bat crack just so you could, if you see a, you know, a high school prospect or a college prospect one time and you get one good bat crack,
Starting point is 00:12:38 does that tell you something that the scouts' eyes don't see? Because I mean, we, you know, something that the scouts' eyes don't see. Because, I mean, we, you know, presumably Stanton or Baez are these people who we know hit the ball really hard because we see them hit the ball really hard and really far. That will show up in the frequency if we do an analysis of their bat cracks. And that will just be confirming something we already know and maybe presenting it in kind of a cool way that we can look at and listen to. But would there be any value to it?
Starting point is 00:13:12 Maybe in the sense of if you're seeing someone in a small sample or something and you don't have stats just to know that someone is capable of producing a certain kind of bat crack like that that typical you know scout story about how i wasn't even looking at the field and i just heard the crack of the bat and it sounds different coming off his bat i guess if you can hear it maybe you don't need to see the frequency graph or or maybe you do maybe there would be some value to it anyway yeah i suspect there would there would certainly be some value. I don't know how quickly the characteristics of the frequency spectrum would sort of become stable, like how large of a sample you would need. But I do think that you could use it sort of as verification for hearing that special sound. So I mean, i think that a lot of prospects get that
Starting point is 00:14:07 label put on them that so and so makes a special sound with their bat and i'm sure it's not true for all of them i'm sure at least some of the time it's just kind of like wishful thinking and so this would be like this is i sort of mentioned this in the piece that this is like an So this would be like, I sort of mentioned this in the piece, that this is like an objective way to look at that. And so there should be some utility in that. I know that when you play them side by side, we can tell the difference. But we are talking about, I don't know, it seems to me that we're talking about, from a physics perspective, fairly small differences. Do you think that it's really the case that a person has heard enough baseball in their lives that they can tell the difference between a good home run and a bad home run by the sound of the bat? Because I guess to some degree, we just had a big debate actually, not me, but there was a big debate the comments of an bpp's last week about whether the phrase means anything uh in the way that it's used now does
Starting point is 00:15:10 the research that you did convince you that it means something or that in fact it maybe means very little to our naked ear well the short and boring answer is to say that I just don't know yet because I don't have enough samples from particular players, but I don't want to give the short and boring answer. So I think that there probably is something to the idea that some players really do make some some kind of unique sounds in their bat ball contacts. And I believe that because I hear a lot of very smart scouting type people say that about certain players. And I think we have to keep in mind that even though intuition can be kind of unreliable, intuition can be kind of unreliable. Fundamentally, what the scouts are doing when they say so-and-so hits the ball with a special sound is they're recording the data, and then they're processing it with their brains, and then they're noticing some characteristic in that data that makes them
Starting point is 00:16:21 think, based on past experience, that that player is going to be particularly special somehow. So I don't think it's outside the realm of possibility that there would be some truth to the idea that certain players hit the ball in certain ways that produce certain sounds. So Ben said that this was his favorite piece of research in a while. It's actually only my second favorite. I am totally in the tank for the things that you wrote earlier this spring
Starting point is 00:16:50 using basically using PitchFX to deduce a team's scouting report of a player and then looking at whether that's more predictive than his actual stats. I am completely obsessed with that concept now. And it has some similarities to this one in that you've sort of found new data where nobody knew there was data before. And it seems like there is a lot of talk now about how all the good stuff is proprietary, how we've maxed out what we have. And yet you twice have figured out a way to create a new kind of data. Do you think that the gloomy outlook for public sabermetrics is too pessimistic? I mean,
Starting point is 00:17:33 do you think there are all sorts of these wonderful ways of building databases without even, you know, that people haven't tapped yet? Yeah, I really do. I think that we're sort of at a, I guess I would say it's sort of like a plateau right now in Sabermetrics where we have a reasonably good grasp of the game and we know that there, we've come across some important sort of truths about it, but, and it looks like right now, we're not going to be able to get much further. But I think that there's a lot of interesting stuff that people haven't even
Starting point is 00:18:11 really delved into, such as the sound, such as the using the pitch effects metrics in different ways. I think, in particular, pitch effects is a good example of this. A lot of the PitchFX work that's being done right now, I think we're only scratching the surface of what's possible with it. It's at a more granular level than at bat level stats. And I think that we can use that to figure out all sorts of new things, such as the piece that I wrote earlier about
Starting point is 00:18:48 using the PitchFX stats to see if a player's true talent level has changed. I think part of the reason that works is because there's several thousand pitches per player season. And so you have a lot of power there to see interesting trends develop in a way that you don't if you just have, let's say, 500 at bats. I think if you were to just look at the trend in 500 at bats, more often than not, the trend won't be useful. But when you have 2,000 pitches, suddenly you can start to see trends falling out of the data that are actually interesting and useful.
Starting point is 00:19:27 And so I think that PitchFX among any other things should be able to push saver metrics past that plateau. Can you talk a little bit about that concept of scouting via PitchFX? Because Sam and I discussed it briefly when you wrote, I think, your first article about it, maybe, but you did a follow-up kind of checking in on the guys that you had predicted would exceed their Pocota projections based on the work you had done, and it looked like they were. So can you tell people a little bit about how that method works? you tell people a little bit about how that method works? Yeah, so the basic idea is that there's this sort of equilibrium between the pitcher and the batter, where the pitcher wants to get the ball
Starting point is 00:20:13 in the zone as much as possible, but he's afraid that the batter is going to hit it. So he wants to throw it as far away from the zone as he can while still getting the batter out. And this is backed up by correlational analyses where you can see that hitters like Giancarlo Stanton, for instance, they almost never see balls anywhere close to the center of their zone. So the idea of then using that to detect changes in ability is that if a player is sort of stuck at one level of, if he's stuck in a particular scouting report where he's getting, he's seeing pitches, let's say, a foot from the center of the zone, and then suddenly you start to see that he's seeing pitches 1.2 feet from the center of the zone. There must be a reason for
Starting point is 00:21:07 that. And the reason might be that the pitchers have decided that his true talent level has changed. And he's now more of a threat to hit the ball when it comes into the zone. So they've adjusted to that by giving him fewer pitches in the zone. to that by giving him fewer pitches in the zone. And so this was just a crazy idea I had, but then I looked in the data and I found initially that this would have predicted, for instance, Chris Davis's breakout, which really nobody had seen coming, coming because he was seeing pitches, you know, sort of at a level that was that fit his sort of quad a status prior to his breakout. And then all of a sudden, in the middle of the year, he started seeing pitches way far away. And it just so happened that that coincided with him
Starting point is 00:22:01 starting to hit a bunch of homers and then 2013 happened and he had this huge breakout so that kind of convinced me that there was some merit to this idea and I applied it to players this year and I got kind of a weird group of players and when I first came up with this idea I think we're only about a hundred plate appearances into the season. And I picked out, I believe, Victor Martinez, Chase Utley, Jimmy Rollins, Lucas Duda, a weird heterogeneous collection of players, and also Raul Abanez. And so when I went back and checked on those players, I guess it was a month ago or so, it turns out that a lot of them are outperforming their projections.
Starting point is 00:22:51 Some of them in ways not quite as dramatic as Chris Davis, but still fairly dramatic. For instance, Victor Martinez is sort of another case where nobody foresaw him coming back to be as good as he has been this season, but this method would have picked him out. But there are also some pretty spectacular failures. So it's not like a magic bullet that it can pick out everybody. It seems like it only applies to some players, and of the players who see the most significant increases in their distance from the center of the zone, some of them just end up falling apart like Raul Abanez. You should have stopped when it sounded like it was a magic bullet because then people were lining up to read your stuff
Starting point is 00:23:34 so they could win their fantasy leagues. So you know what you've done here, I think, is when people describe baseball, romanticize baseball, the first thing they always talk about is the moment when they walked into the stadium and they saw the green of the grass and the crack of the bat. And now you have, you have taken all the joy out of the bat crack. You've quantified it. You've you've applied numbers to the romantic notion of the bat crack.
Starting point is 00:24:01 People are going to be so upset. You're sapping all the joy out of baseball but i'm i'm glad that you did yeah get your ears out of a spreadsheet yeah i i think it's i think to me at least it enhances my uh my joy at baseball because whereas i was able to tell that uh certain back cracks were different than certain other back cracks i had no idea why. I wouldn't have been able to tell you that the ground ball singles were higher pitched than ground ball outs. I just would have said, this sounds different.
Starting point is 00:24:34 So now I actually know, and I know what to listen for. So I feel like that's kind of cool. I agree. So this article, The Analytic Value of the Crack of the Bat, is free at Baseball Perspectives. If you're not a subscriber for some reason, you can still read it. I will link to it in the podcast blog post at BP and also in the Facebook group. I don't like the expression, if you're not doing X, you're doing it wrong.
Starting point is 00:25:01 But really, if you're not reading Rob, you're definitely not doing it right. Because you should be reading Rob. He's doing some really creative and innovative work. And you can follow him on Twitter at NoLittlePlans. There are underscores between those words. And you can also find some of his mostly non-baseball writing at MakeNoLittlePlans.net, where he applies some of his mostly non-baseball writing at make no little plans dot net where he applies some of the same sort of methods to other interesting subjects like the look of star trek movies and the criminal topography of chicago and a bunch of a bunch of interesting stuff that i enjoy so thank you for joining us, Rob. Thank you. All right. And please support our sponsor, Baseball Reference. Go to baseballreference.com. Use the coupon code BP to get the discounted
Starting point is 00:25:53 price of $30 on a one-year subscription. And we'll be back with another show tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.