Rates & Barrels - Stuff+, the latest on bat tracking stats and making your mark Max Bay

Episode Date: May 20, 2024

Eno is joined by co-parent of Stuff+ and former member of the Houston Astros' analytics department Max Bay. The guys dive into the state of analytics in the sport... Rundown: 4:09 - The past, present ...and future of Stuff+ 29:45 - How should we be using the new bat tracking data 41:44 - How can people who want to get into baseball analytics for a team stand out Follow Eno on Twitter: @enosarris Follow Trevor on Twitter: @IamTrevorMay Follow DVR on Twitter: @DerekVanRiper e-mail: ratesandbarrels@gmail.com Join our Discord: https://discord.gg/FyBa9f3wFe Join us on Fridays at 1p ET/10a PT for our livestream episodes! Subscribe to The Athletic: theathletic.com/ratesandbarrels Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to a special edition of Rates and Barrels. Today we have co-parent of stuff plus Max Bay on the line. Max is back with us in the public sphere. Say hello, Max, tell us a little bit about yourself. Hi everybody. Hi, you know, lifelong baseball fan. I did a PhD in neuroscience from 2015 to 2022, which tells you something about how terrible a PhD can be.
Starting point is 00:00:42 But while I was there, doing a lot of high dimensional statistics and modeling, machine learning, et cetera. And my baseball enthusiasm kind of bled over into my analytical work and vice versa. And kind of just, you know, a hobby grew out of that. It's like a tale of all this time at this point. So, you know, a hobby grew out of that. It's like a tales all this time at this point. So, you know, a lot of the public work that was out there, I, I read.
Starting point is 00:01:13 Things that I found particularly interesting. I want to understand thoroughly. I tried to replicate and yeah, that sort of just blossomed into a more, I guess, serious pursuit of answering baseball questions and coming up with baseball questions. That's pretty much it. So yeah, that's like the back story. And one day, somebody from a team from the Astros reached out to me and had a really compelling conversation about, you know, what it'd be like working for a team, what it would be like, you know, in the organization, et cetera. And so did that for a little while, really loved it. But for a number of reasons, I decided to leave and,
Starting point is 00:01:51 and now sort of in a, a non baseball role for the first time in a while. So the dust kind of settling and yeah, that's, that's, that's my life story. What were the particular skills that, that stood out for the organization when they were looking at what you could do? What was your best foot forward, in other words? For me personally, I think this story will be different for just about everyone. But for me, it was sort of the marriage of really, really loving baseball, being very curious about what makes it tick.
Starting point is 00:02:26 What dials can you turn? What knobs? Wait, dials, can you? Yeah, you turn dials. What can you manipulate to change your chances? And how do you evaluate things in the first place? And so, be curious about all that stuff and then, you know, having like a programming background and technical expertise that,
Starting point is 00:02:52 or at least like technical familiarity level that was sufficient to start to ask those questions like with computers, because you just have so much data and you really have to, you know, process these things like on machine, obviously. Yeah, and I would say the other thing is having like, have so much data, you really have to process these things like on machine, obviously. Yeah, and I would say the other thing is having
Starting point is 00:03:09 like pretty robust machine learning exposure also. So these tools that can take large, large quantities of data with like very complicated interactions and make predictions about the future, given input and all that. So all that stuff, I think, what was the thing that stood out? Our baby stuff plus is out in the world now, and there's a lot of different versions of it now.
Starting point is 00:03:40 I don't know, have you stayed up on that? Have you, could you sort of quickly analyze like some of the differences in the models in terms of structure and maybe just describe to me how much how much we're fighting over here? Are we fighting over our tiny little decimal points? Are they or are they you think the models are distinctly different? I mean, the teams have models like this too. So you can almost expand it to a discussion of like how different stuff models are across the league or if you're aware of that, but you also, you know.
Starting point is 00:04:12 Not, it's only worked for one team. Yeah, right. But from what you've seen in the public space, like how different are these models and what are some of the structural differences? Yeah, so broadly the models are all really similar I would say that the two different pitch grader Sort of implementations at a very high level tend to be either regression or classification models
Starting point is 00:04:35 So the regression model it takes all the input, you know I'll hardly through the ball with movement was maybe things relative to other pitches in your arsenal where you threw it from etc and the prediction for a regression model is to map that into expected run value dimension, right? Like this one number from, you know, basically negative infinity or positive infinity. But realistically, numbers are like per pitch thrown, you know, somewhere between like, actually don't really know, depends on the count, but like negative 0.5 and 0.5, say something like that. When you do that, there's basically a direct mapping of the pitch values onto run value. The classification models are a little bit different, similar and they're super, super similar in their goal, but how they
Starting point is 00:05:20 get there are, they're kind of three steps. So there's the inputs. And then the inputs, instead of going straight to run value, they map onto outcomes. They give you a probability of a swing, or called strike, or however you structure it, probability of a whiff, probability of putting it in play. And then there's a run value assigned to each event. Then the idea is basically to just get the weighted average of the run values
Starting point is 00:05:48 across where the weights are the probabilities of the event. Probably those seem to be the two kinds of pitch graders that are out there. You get some additional functionality with the classifiers because you can look at things like expected with. The probabilities are in there for the individual events. You're not just limited to the run value metric, right? Like you can look at the stuff that's stuff, so to speak, that's on
Starting point is 00:06:12 the way to the actual run value. But yeah. And then there's obviously like the difference between a pitching plus type model and a stuff type model, right? Where the, the generally how stuff's done is you just take away the location features and maybe the count features. And it's just saying like broadly over the like
Starting point is 00:06:36 marginal distribution of these things, how would this pitch perform? So those are the basic frameworks. The differences between the different models are like are really hard to say because these models are actually very they're complicated in ways that you wouldn't really understand totally you know unless you did this type of modeling. But what I can say is there are things like for extreme gradient boosting which is like a very common machine learning framework that's
Starting point is 00:07:05 used for these types of models. You can put limits on the number of trees that it will make. You can make it learn faster or slower. You can run because this model is basically iteratively improving. You can run it over some truncated amount of time. You can run it forever. So that's just within extreme gradient boosting.
Starting point is 00:07:23 But then there's a within extreme gradient boosting. But then there's like a very similar gradient boosting machine type framework called CatBoost. And there are neural nets. There are always all these ways to do that. And what I'd say is that those things I described that you can tune for XGB for the extreme gradient boosting, like there are similar but slightly different versions of that for each one of these models.
Starting point is 00:07:44 And the other thing is there are different frameworks people have. Sometimes people have a separate model for each component of what's happening. So there'll be a swing model and then a called strike model. If they didn't, it did swing, then there'll be a different model for basically did they whiff or foul it. And then if they, or whiff or make contact, yeah. And then a different model for foul or put in play. And then, you know, and then if you put it in play, there's another model. Does that run up on any sort of computing issues,
Starting point is 00:08:14 like, you know, computing capabilities? Does it start to get? I would say that like these models and computers in particular are so advanced now that not really, like a person on a laptop can run this sort of stuff pretty efficiently. I mean like the original stuff plus I put together my like 2015 MacBook and you know I would run the entire season in like two minutes so it didn't take very long. Yeah we haven't we haven't run into
Starting point is 00:08:43 anything yet. There have been some decisions that we've made that are not necessarily the kinds of decisions you're talking about. They're a little bit more almost like philosophical questions that we didn't put platoon splits in stuff plus at first, partially thinking because we're going to put it in pitching plus, right? We thought that evaluating the platoon split of a pitch meant more when you were evaluating the entire arsenal
Starting point is 00:09:10 than when maybe you were evaluating just the pitch itself. Was that kind of our thinking, I think? Yeah, and it was sort of, you know, with all this stuff like their assumptions. And so the assumption really early on was, you know, I want, I was curious how pitch would perform in general, but that doesn't mean that it will necessarily be deployed to the same amount for each platoon.
Starting point is 00:09:33 What the model is effectively learning when you train it is what the usage is. And so I felt like it would just kind of figure out with the platoon. Cause this is a pitch that doesn't get used against this way, right? Yeah, so it's not in the training set. So it doesn't learn as much about that.
Starting point is 00:09:48 That's ultimately, yeah, like a kind of a quasi philosophical thing, quasi quantifiable thing, because you could just make the model with the platoons and see if it performs better predictably. Well, that's what we ended up doing. And we ended up putting it into Stuff Plus again, because while you're out, we ended up putting it back in because it improved the model. And I think one way that you see it playing out is,
Starting point is 00:10:12 I hate to read too much into this, but you might have had a bunch of models out there that did not have flip-tune splits in their models that was spitting out how great sweepers were. We had some organizations that just basically taught sweepers to every pitcher that came up. And now we have a fair amount of pitchers out in the big leagues realizing they can't throw their sweeper to the opposite hands and they need to figure something else out on top of it. So
Starting point is 00:10:36 as we go and bumble along and stuff plus, so do teams. And I wonder what the future is for us in these terms. I have a little fun thing that you've put together where you're looking at acceleration. And you're gonna have to explain to me acceleration here. But for those who are watching on YouTube, you can see the different acceleration in Luis Castillo's change up and force him.
Starting point is 00:11:04 And so the red is just like the league average population. The yellow is what you would expect the Castillo's pitch to be just based on release characteristics. Just based on release point. Just on point. Just not even, you're not even doing release angles or anything. Just, just release point. XYZ where it was released in space. Yeah. And then green is observed, which is
Starting point is 00:11:30 something I've been fascinated with for a long time. One question is, do you think this is in stuff plus? And if it is, what can this what can this lead us down? What sort of pathway can this lead us down that might help us improve Stuff Plus? To answer the question about whether this is in Stuff Plus, the short answer is kind of. So Stuff Plus has a number of predictor variables. So these are the features of a pitch that are used to estimate its quality
Starting point is 00:11:59 or its qualities estimated from the features. And where it's released in space, it's actually three of the features, right? Because it's X, Y, Z where it's released in space. And some of the most important features in the model, actually. Exactly, yeah. If you hold everything else constant
Starting point is 00:12:15 and then you change where it's released, you're gonna get different qualities. So there's a dependency on where you release it and the quality. And mechanistically, I think people have a pretty good understanding of this now. You get it really low low if you have a really low release pump but you got a ton of ride you have a couple things going for you're no longer throwing the pitch down into the zone it's going straight you aren't like matching the upward angle the bat and so the
Starting point is 00:12:39 collision is offline from the bat from the ball. And anyway, there's some mechanistic like explanations for why the grader would see the performance dip or change. So when I say that it's kind of in the model, what I mean is that we tell the model that it was released from this point, and it's clearly sensitive to those features, right. And this is, I think, true for like, basically all the public creators. For me, the interesting thing and the reason I do this work in my free time is so that I understand it.
Starting point is 00:13:14 It's not just to build a predictive model. If I had it like a predictive model that like a perfectly described baseball, I'd be proud of it. It'd be interesting. But I would not be satisfied in my understanding of baseball. Machine now of baseball. Machine now understands baseball. Actually, that's one thing that I would like to do more with Stuff Plus is, you know, we've got the shiny that shows some of the interactions to kind of use it to have some takeaways. I think that's what you were describing here a little bit.
Starting point is 00:13:44 It's frustrated me that it's a black box. It is a black back box because it's machine learning, and so all these different features are interacting in all these different ways. But I can't just be like fastball go brr. Although you can. I mean, Velo is a really important piece of the model. So as we can make little things like this, we can better understand the takeaways to help people sort of spot it on their own in the wild I guess. Yeah and I also think just you know like these models aren't perfect they're they have large residuals so they make a prediction and then there's errors off that prediction and that error isn't just noise from sampling that error is like sticky within player between seasons
Starting point is 00:14:23 so if a player is under predicted, you know, in which or in whatever you've seen this, like, it will usually that's that's like important signal. And so the way I, I guess I have been thinking about this more recently is that like, let's say you have a Bayesian understanding of things, but it's a prior, and then you have your data, basically your observation. And then the thought is, well, how likely was these observations or how likely is my prior given these observations?
Starting point is 00:14:55 And then you have your posterior, which is like your new understanding of things. And the way I think about the pitch graders now is, well, actually what it does is it establishes a really like, informative prior. So instead of the prior being, this is an average pitcher, it'll say, well, look at their pitch features and see how they look.
Starting point is 00:15:14 But their performance is additional information that is useful in understanding that pitcher. One of the reasons I want to understand this stuff is because it's not just, oh, I want to improve stuff plus stuff plus tells you something or pitching plus or whatever they tell you something but there's other unexplained sources of variance and it's not all going to be explained but the satisfying thing about doing this and why like any of us do this in our free time is getting an understanding of it.
Starting point is 00:15:45 And so for me, part of that was understanding what it means for a pitch to have an expected movement profile. I think we take for granted that it means is that it's like the angle off the, you know, the arm slot basically. And maybe that's it. The other thing is it's not a point. People don't expect necessarily a point. There's like a range that it
Starting point is 00:16:06 could be. And that's the most likely range. And if you've got a really surprising pitch, it'll be off of that. And so for me, that that was part of why I wanted to build this out because you could now characterize like a surprise pitch and probabilistic terms like this is a really unlikely shape given where it should be Based off your slot. Yeah, I can understand that it's a little bit like my Quest for understanding better how pitches within an arsenal interact, you know
Starting point is 00:16:38 I did try to look at some of the under performers and over performers and When you do binning you can kind of be like, oh, look, you know, in these bins, you find a lot of people who underperform their stuff plus that have fewer pitches and people who overperform their stuff plus. And yet when we try to put, you know, number of pitches into the model,
Starting point is 00:16:59 it doesn't make anything more predictive. So it's something that can help you understand it better. And, you know, from the outside, if you're trying to use the model, you can, you can sort of adjust those shades of gray a little bit and say, you know, Hey, this player over here that we're scouting or this player that I'm thinking about picking up the stuff pluses and amazing, but he does have like five pitches. And it seems like he can command them. This type of player may not, you know, pop in this model in the same way that his usability might be out there for the big leagues or for my fancy team.
Starting point is 00:17:31 And then at some point, there may be something about the interaction. What's fun about this too is that there could be an interaction between expected shapes within an arsenal. So then we could better understand even maybe within the context of Stuff Plus. But one thing that I've always struggled with with stuff plus is if you think we should put more sort of Derived stats and I mean one thing that I'd like about stuff plus is it's mostly just like raw stats
Starting point is 00:17:56 It's like raw movement, you know, you know V lo and stuff like that We haven't put something like vertical approach angle in because we're hoping that or that by waiting release points so importantly and and and knowing the shapes of the pitches that if the model thinks the VAA is good it'll be in there the stuff that goes into VAA. Yeah it's like redundant representation a little bit. Right so if we start if we put VAA in on top of it then we we risk overloading that aspect. We've got VA and its components in the model separately, you know. Maybe it's worthwhile to get in there, you know, you just got to toy around with these
Starting point is 00:18:34 things. And you can either do that in an objective or subjective way. Like you could have it kind of build out the same model with different combinations of features and do it. This is a nice thing about computers. They'll just do this stuff for you. So. Right, just iterate it and see if it's worth it.
Starting point is 00:18:50 We do have some, we do have some derived stats in there. Like for us to have spin efficiencies in there. And that's like, that's actually, that's a stat that's taken out of context and made into something new and put that. Yeah, it's like a feature constructed from the other features. And so there's definitely some redundant representation there too, because like, in order to get
Starting point is 00:19:10 certain types of shape off of a certain arm, I think you'd have to have an inefficient release, like a tight foreseam or foreseam with a little bit of cuts, gonna be a little inefficient. And so yeah, like that information is kind of there, we're redundantly represented. We have spin in there, and we have movement in there. So're redundantly representing. Because we have spin in there and we have movement in there. So there go on some level, we have spin efficiency in there.
Starting point is 00:19:29 Exactly. So maybe you don't need it. But the other thing is like, generally these sorts of things don't really move the needle at all. I think that the things that really move the needle, maybe this is something you bring up later, but the things that really move the middle, maybe this is something you will bring up later, but the things that kind of move the needle are bigger than that. It's
Starting point is 00:19:51 not just throw this extra feature in there. It has, I think, more to do with what you do with the values after you've created them. Lastly on this on the on the subject of stuff plus in its future, I guess I'm wondering if we're waiting on any metrics that would improve them. I think the reason we have spin efficiency there for some on some level is that it's possible that lower spin efficiency pitches move differently over the course of ball flight than high spin efficiency or at least high magnus. You know what I mean? Like high spin efficiency means that you've got kind of a magnus force on it, which was usually sort of up or down. And that's what makes curve balls go down and foreseamers go up. Generally, I know.
Starting point is 00:20:41 Yeah, parallel to the direction of movement of the flying ball. Yeah. But when you become spin efficiency, then you become less spin efficient, your turn your you have some spin that's not turning into movement. But over the course of the flight of the ball, the trajectory of the ball might catch some of that tier two four before useless spin. Like, you know what I mean?
Starting point is 00:21:06 Like it starts to catch some of that spin. And in other words, gyro cutters, gyrosliders, some of them might have late movement in a way that may not be captured by the model that just sort of sees broad movement, you know, like movement over time. Like in theory, the movement profile, it's like, let's say you have a gyro ball,
Starting point is 00:21:28 perfect gyro ball, right? Like no vertical. A perfect gyro ball means none of the spin is turning into movement, and it's moving just like a bullet through space, basically. Exactly. And so on a movement plot, it would be zero, zero. It'd be right at the origin, right?
Starting point is 00:21:42 Right in the center. So the model, it could learn a little bit about how performance on that shape, you know, what performance on that shape sort of looks like. Because that ball by definition has zero spin efficiency. So you still might not need the spin efficiency value in there. But I do think separately something that I thought I had been thinking about a bit, and I posted something about this. It's not new or anything at all, but I was just curious about spin efficiency is actually
Starting point is 00:22:14 like a dynamic thing over the flight of a ball. If you imagine, like, you throw, just using the classic example of the gyro slider, like the spin vector is going actually in the direction of the gyro slider, like the spin vector is going actually the direction of the pitch. It's sort of what the definition of perfectly inefficient spin is, right? So it's like this. And so they're both moving like this and the spin vector stays like this, but the ball starts to fall down, right? And so it becomes kind of spin efficient and then actually gets some magnus force moving it at the end of its flight. How much?
Starting point is 00:22:46 And that may be different based on release characteristics and Reese angles, velocity. Yeah, so how hard you throw it means it has more time to fall. And then also if you throw it more up, it will also have more time to fall. Yeah, extension matters, velocity matters. With velocity means more time to fall. Yeah extension matters velocity matters We've lost it means more time to change angle to show but they're like actually show
Starting point is 00:23:12 Like this hasn't been demonstrated to my knowledge The the actual late movement and when I say late movement I mean like acceleration that doesn't occur until later because it can't because the spin efficiency isn't... I've only seen it sort of discussed in these types of ways, like in theoretical ways. And so are we waiting for... You need to build high speed cameras and then to show the deviation from the from the, like the null trajectory. And yeah, that's how you show it. That would be so that's like sort of a research based idea. And it's out there for anybody who has the capabilities. It's not super easy. But there are some labs out there. I mean,
Starting point is 00:23:54 Smith is listening. I'm sure Barton Smith could do this. But are we also waiting for are there possible Hawkeye metrics that that they could give us that may come out someday that would tell us more about the movement in space? Or are they mostly just a beginning and end point movement system as well? So when it comes to like the spin efficiency stuff, I don't think there's anything right now that you're gonna like that's been captured,
Starting point is 00:24:19 that's embargoed by public. And I think the thing that people are waiting on is some of the biomechanics of pitching. The backtracking stuff, for example, is very derivative. There's this very raw back, handle, and head, and then they have to take the velocity at a particular time, and that's what you get, velocity at a particular time. The biomechanics are like that. It's the tracking of an individual over time. But maybe there's some important features in that
Starting point is 00:24:48 that they would release publicly that would be useful in evaluating pictures that heretofore has not existed. One thing I've been saying is, you know, changeups seem to lag for us a little bit. And one thing that would be nice is arm speed. I did have a discussion with someone, arm speed as separate from hand speed or even arm angle, something about that forearm angle as it comes through. I think that maybe hitters can spot change-ups based on some biomechanical characteristics. Someone then said, well, is that stuff or is that deception? And I
Starting point is 00:25:21 was like, you know, I think at its core, it's really hard to pull deception out from stuff. I think that's what we, what was a real sort of eye opening moment for us when we saw that how important release point was in features. Like that might be quote unquote, deception, but in the end, we're talking about what makes pitches good. And if deception and stuff are not really separate separate things then we don't let's not get bogged down in like is that stuff then you know I think I'd rather just make a good pitch
Starting point is 00:25:54 greater than then worry about what is stuff versus deception yeah it's all about making a good pitch all right so the bad tracking stuff came out and You know you mentioned that we kind of got the pre-tude stuff. I think that makes sense because First of all there the clients of MLB AM are pretty much broadcasters, I think and they're writers and There's no way that they would be served. Well by just releasing a bunch of dimensions and velocities and angular notations and stuff. So like, you know, for their consumers, this made the most sense.
Starting point is 00:26:33 So leaving aside how we can improve them, just given what we have now, you know, do you think you have any sort of best practices for people where they should be looking, what kinds of value they can get out of these metrics as they stand? Oh, I'm going to give a terrible answer to this, which is I have not thought about this enough. All right.
Starting point is 00:26:52 My move out of baseball was reasoned enough for the dust to not quite have settled. So my exposure to what is out there is like medium level. But I could say this, the things that have been released, which are essentially to my understanding, that velocity, swing length. Yeah, and then this metric they're calling swing length, which is basically the cumulative distance
Starting point is 00:27:21 that the barrel of a bat traveled from time point A to time point B Those are Because of the way they defined it actually not independent of other important things So it's not just a like if you were to explain this in model terms You wouldn't say that there's just a player effect, right? There's also no effect. We haven't boiled down to things that have been so,
Starting point is 00:27:48 we haven't boiled down to like the rise. Yeah, so here's why, because bat speed is defined as, I may get this wrong, but I'm pretty sure I'm right. Bat speed is defined as on contact, the velocity of the bat in the frame immediately preceding contact. So if you make the same pitch, same location, same shape, same everything, if you make contact at different parts of your swing, which have different velocities because the
Starting point is 00:28:13 velocity changes, right, it starts at zero, at different parts of your swing it will have different velocity. So where you make contact matters. Now if you do not make contact, it's defined as the velocity in the frame where the barrel of your bat is closest to the pitch in space. The pitch is moving in time, so moving over time. So again, like if you whiff, depends, so like if you whiff in your early or if you whiff in your late, you'll have different velocity readings even if the shape of your swing was the same, but it started at different times.
Starting point is 00:28:51 So what I mean to say is it's not controlled in a way where you're, by looking at just the raw values, able to make super precise claims about a player's swing. Now obviously, you do the average. If you just look at average bat speed, the stratification makes sense. And it's probably really close to their bat speed controlling for some location in their swing or something. But there are important sources of variance
Starting point is 00:29:19 that people have looked at. I think that Stephen Sutton Brown did, he posted something without even really explaining it very much. And I was like, dude, this is this is what you can do, which is basically build out a model that attempts to control for where it was in space. Pitches high in the zone gets shorter swings, because exactly they have to because you have to to get your bat there, you have to be shorter. And pitches out in front of the plate get higher VLOS Like you mentioned higher VLOS and higher swing lengths just because of where they are in your swing They're front on from the yeah, and you probably use pull percentage and pitch height
Starting point is 00:29:57 To kind of control for those for a couple of those variables a little exactly And so I think you know what we're kind of waiting on for like really precise characterizations at the player level are things that control for that stuff. And then basically say, given what pitches you saw, and what count they were in and all that stuff, how much faster was your swing than the average? How much slower was your swing than the average? So it's kind of controlling for those variables and then saying over expected how much faster you and the stratification are probably super similar but this matters for players with not that many swings right where the context like where it was thrown it's very much it reminds me a lot of like sort of like stuff plus where exactly yeah very
Starting point is 00:30:38 beginning when you have very little sample and you just like a guy just debuted or something you can say more about him using these bat tracking metrics than you can about his results but that is ever shifting towards results and there will be people that have great quote-unquote swings. Jesus Sanchez right now has the same swing length and same bat speed as Gunnar Henderson and you're just like Don't think that means Hazel Sanchez is as good as Gunnar Henderson curious to see if that is holds true once you Control for some of these other things. I bet it it pretty much does but it's you know I think there's a reason for this type of precision and
Starting point is 00:31:24 One of those reasons is if you want to track changes within a season, for example, then you really want to control the noise. Yeah, you don't want to this guy's pulling the ball more. Oh, his swing just got longer. No, he's just pulling the ball. Just get back from injury. Like, does this look different? We want to control for the stuff that has nothing to do with their injury.
Starting point is 00:31:44 Would it have been better though, if they just picked a fixed place and time and it's hard. It's hard. I will say that this is a challenging problem because not everyone stands the same place. Yes. I mean you could. Okay. So like you could stand like this. This is not going to be useful especially for people only listening. You could stand right at the plate, like sort of with one leg in front, one leg behind basically. You'd like sort of right.
Starting point is 00:32:12 You can stand basically so that your body, your legs are pointing towards the pitcher or they can be angled a little bit back. And so where you are in your swing, maybe a different location in actual space For different hitters and so it's a tough problem There's also the problem of like when you make contact The bat has to slow down because it's not encountering the momentum of the ball
Starting point is 00:32:36 You're gonna say that the bat speed is higher on all the whiffs of people. That's like a weird thing to say Yeah, and then you get some funky correlations and so it's's just, you know, it's, it's a complicated problem. I totally appreciate why the data are represented the way they are. I think just for, as the consumer, you just want to be aware of those things and be on the lookout for some work that can help clean it up a little bit. One thing that was cool was that at the beginning, Kyle Bland did, um, swing length plus, uh, swing speed into a swing acceleration.
Starting point is 00:33:07 And that's already, I like that. Now, if you can take swing acceleration and then account for pitch location, maybe pitch speed, but definitely pitch location and pull rate, something like that. I think you could maybe get a sense of acceleration over average, like you're saying, or acceleration over some sort of baseline.
Starting point is 00:33:29 I also know that down the line, there is planned to put into production, like contact point or the vertical bat angle, which is basically the angle the bat is moving, like the barrel of the bat is moving with respect to up down, stuff like that. But even that is gonna change over the, right right like a bat kind of goes down and up. So the vertical batting is going to change too.
Starting point is 00:33:51 So where you do that measurement matters. But as as these new metrics kind of trickle out, I think the community is strong and we'll figure out how to squeeze the most information out of what's there. It would be nice to see a contact point in the future. That might be the easiest way to help us along. But it's still exciting. It's still a fun new suite of pitches. And I know from what I know within organizations that this is a place where there's still disagreement in organizations. I know organizations that have Bat Path grades and half the people there are like, I don't know if it's any good.
Starting point is 00:34:33 There's a guy, DK Willitzen, who wrote a book about how VBA is everything on quantitative hitting. You can read that, but I would read it with a little bit of suspicion because it's seized on one metric. Anytime you, I mean, I know I'm a stuff plus guy, but like anytime you seize on one metric and you think that's, you know, and you don't seem to really pick at it and find any flaws in it, then I get a little suspicious of you, you know, I don't think VBA is the answer to everything hitting.
Starting point is 00:35:09 You know, and he's- No, there are point-forward things there. I think, you know, I really liked that book, but yeah, the conclusions are very different from the methodology, you know, parts to it. And the methodology is really cool because it like parameterizes the swing or characterizes the swing in a way
Starting point is 00:35:27 that you could numerically represent. And how important this stuff is, I think that's up for debate, but how the characterizations are suggested in terms of lift and loft and stuff like that. I think that it's some intriguing things that will be brought back, like things in that book that were mentioned,
Starting point is 00:35:48 I'm sure that language that will enter the lexicon. It's true, I mean, we're now entering a space where we're gonna have to talk about things. I talked to a hitting coach and he said his favorite stat in the lexicon that was available to him was one that described how low the bat gets in the back. And it's basically almost like a barrel dumping stat or something.
Starting point is 00:36:11 It's about like the lowness. It's a little bit about, it's kind of an angle. It's like a kind of a vertical bat angle, like stat, but it was called like SBA or something. I don't know, but like it was about about how low it gets in the back. And I think, yeah, I think some of what Wilson did helps us talk about stuff like that. It helps us talk about how, what, you know
Starting point is 00:36:33 what are the words we should use? What are the ways we should even describe this? I, yeah, I do recommend the book. I just, I just don't think that VBA is necessarily everything. And we don't even have that. We't even started are you about that? Yeah, that's that coming down the pipe Yeah One thing is like people always ask me is like, you know, how how do people get in to baseball? And you know, you've been talking about this a little bit back and forth about the different skills that you had
Starting point is 00:37:03 but one way that I would like to think about is putting together the two different themes in this podcast so far of, you know, where different stats are going and how to get into baseball. You know, where do you think there are sort of easy inroads for young analysts? What and what types of approach, even if it comes down to like, even saying like, is it Python or R? You know, like, you know, do you think they should be focusing more on neural networks or machine learning? You know, we've given them some hints,
Starting point is 00:37:32 but where you think they could take numbers that are out there right now and make an impact and sort of make a name for themselves? So while, you know, I just have my story. And so maybe I should just sort of tell that story briefly and then get into that. So I was just genuinely interested in these baseball questions that were floating around at the time. Things like trying to understand which players are potentially going to be impacted by a dead end ball, like which hitters will see the biggest drop.
Starting point is 00:38:08 It's not like it's the same effect for everyone. The effect will be concentrated in certain types of players. Things like how do you model a pitch based on physical characteristics alone? And there's a lot of really great work out there. Just kind of read it. Some cases replicated, some cases, just kind of read it. Some cases replicated some cases tried to like expand on it. And so I just put my stuff out there. By the way, if I can interject really quickly, I think reading is how I got to where I am in my job.
Starting point is 00:38:35 If anybody cares about that, you know, I clicked out every printed out every single link that Rob Nyer wrote that Rob Nyer put in his link a thonsie back in the day, and read every single one of those and would write pieces that just linked to them as well. So reading and linking and just sort that Rob Nyer wrote, that Rob Nyer put in his linkathons back in the day and read every single one of those and would write pieces that just linked to them as well. So reading and linking and just sort of becoming part of the community, I think is definitely part of the answer. Yeah, and you know, like so often the wheel is reinvented and that's fine, you know, for like gaining your own
Starting point is 00:38:58 understanding of things. I'm sure I've reinvented the wheel myself many times, not to say that what I did was as important as the wheel. reinventing the tiny little screw that fits in this little piece of something. But but you know, like being just kind of like just read stuff, man, you know, your interests will will are self guiding, right? Like they will take you where you want to go. It's pretty clear where the pockets are. So anyway, I was trying to make this short,
Starting point is 00:39:26 but of course I go way too long. Would you leave R behind? Oh, like the R and Python thing? Yeah. No. No? People are crazy. I don't know.
Starting point is 00:39:37 Use what language is most suitable for you. So if you don't have familiarity with R or Python necessarily, R is actually like lower entry point for understanding A little bit easier to use. It does it's not as general purpose. It's not as flexible So there's some limitations, but once you understand enough about R you could make the leap to Python Would there be somebody in the Asteros front office that just did an R? Would there be someone in a theoretical front office
Starting point is 00:40:05 that would just work in hard and then be fine? Absolutely. Yeah. Absolutely. And so I don't think that puts like a, yeah, it may for some organizations put a hard constraint on you. Maybe I just answered the question right there.
Starting point is 00:40:16 But yeah, I think between those two things, between our Python and there are definitely other languages, both are actually great. Ours is a little easier. So if you want to lower an entry point that's a little easier, R may be great for you. There's some stuff that's available to you. Bill Petty has a baseball with R package.
Starting point is 00:40:33 Yeah, it's called the baseball R package, baseball and then the letter R. Yeah, I mean, it's actually like the linchpin for so much baseball research, particularly because there are scraping functions where you can load all of the data. Pretty much. Yeah, I believe we use it in part of ours.
Starting point is 00:40:52 But there's also baseball with our I believe a book. Let me see it Max Marchi. Jim Albert. I think baseball data with our I believe that's the Max Marchi book. Yeah, it's Yeah. And that's it and there are a few authors on that one. There's a free version of that, version three, that's available for free online and it is awesome. Nothing's going to teach you everything.
Starting point is 00:41:15 If you learned everything, then you would have no reason to do any research. But it equips you with, I think, the programming and the plotting statistical analysis tools that you need to pursue your own stuff. OK, so let's end with an interesting question that I was asked by Niv Shah, the founder of AutoNew. He asked me if I could put the team together, me in the public space, just using publicly available data and whatever we could get off Savant
Starting point is 00:41:46 and whatever we could put together, how would my team do? I wonder if we can talk about that without getting into any specifics of any organization we're just at. But just to think about the quality of stats you saw inside the game versus the quality with the work outside the game. And then also, I guess what I would say
Starting point is 00:42:05 is that what somebody could bring to this theoretical team that may not every team might be good at is sort of good business tactics in terms of a good organizational workflow, a good org chart, good communication, good sort of using benchmarks to inform processes, like good process. If you could, I know enough about different teams that I can say not every team has good process
Starting point is 00:42:31 and good communication. So how do you think this team of public analysts armed with baseball's avant could do in baseball? I think you do pretty well. I think really do pretty well. I think really like so much of what makes a team work or not is, as you say, just good communication. And I don't think that means imposing your view and doing that in a clear way. Communication should be a two way street where you're learning from the people you're talking with,
Starting point is 00:43:06 you gain their trust, and vice versa. So I think at all levels, you establish credibility by being a good person and by listening and making products that people believe because over time you've shown that they're effective and you've shown that you take input seriously. And so if the organization had really, really, really good quantitative tools, understood the game analytically super, super, super well, and then they impose their ideas in
Starting point is 00:43:37 a draconian way where you don't gain the trust of the coaches, where you don't interact with them, first of all, you're going to miss important stuff because coaches are aware of tons of things that are not like available in the data. And you'll get a lot of pushback where your stuff is not actually implemented because people are just rolling their eyes at you. Exactly, yeah. And so there you go.
Starting point is 00:43:56 So you could say you try to implement it, but then it just won't be. Yeah. So the data matters, it totally matters. And obviously it matters. Yeah. But I think that you have to be The data matters, it totally matters. Obviously, it matters. But I think that you have to be not just sensitive to the recipients, but also really take their input.
Starting point is 00:44:15 I'd say you could do pretty well because there are probably some teams out there that just don't have that communication and that's your competition. Your performance is just relative to your competition. There's also a hidden advantage actually in this theoretical framework that we've created, which is that I'd be building from zero from scratch. And so that's actually, there's some advantage to that where what happens with every team is
Starting point is 00:44:41 there are a little bit of an amalgamation of all the different GMs they've ever had. You know, there's still little ghosts in the framework of, you know, like, you know, Bobby Cox is still Atlanta somewhere in that system. You know, they're still saying things that, uh, that, that, that people have come before, like who was, I, I get why I can't think of the, the pitching coach that was low and away guy that I grew up with in Atlanta. But anyway, you know, there's still these sort of vestiges left over. And in our new one, we could just create a new system from scratch
Starting point is 00:45:14 and maybe theoretically, at least, you know, have something that worked better for our modern time than some of the old systems that are still in place. Yeah, they're like the institutional and nurse has gone. And so yeah, that's like both a good and a bad thing, probably. But we would lose some institutional knowledge that other organizations had, you know, but yeah, I think what I mean to say my, you know, kind of hurt answer, or terse, whatever,
Starting point is 00:45:45 is that there's, first of all, that's a very rich information resource. And there's a lot of like coachable things on there. There's a lot of decision-making related stats in there, obviously, and you could probably go really far, especially if there's good cohesion, there's good communication with people. But, you know, it's the game of inches.
Starting point is 00:46:02 And often, the reason you have R&Ds because there's competitive advantage in having your own information. And so, and when I say information, I don't mean your own data, but your own data processing kind of mechanisms. And on some level, like it would be hard for us to make trades and win trades
Starting point is 00:46:19 because everybody would know how we were evaluating. This is something that's true for like fantasy people. It's like, if we all have the same, if we all have the same if we all have the same access the same day and we're all trying to make trades I find the trade making to be harder than ever these days. Right like maybe it'd be hard to win first place or whatever in your league but you'd probably be better you're like you probably if you just like stuck to your guns and followed a sound methodology I mean my guess is obviously sometimes it can be pretty bad but on average over like however many seasons you play your problem pretty good. Yeah I think we could do something a little bit like what the A's do because and I think about this too because I feel
Starting point is 00:46:58 like people know exactly what the A's want now in prospects. Right. Oh, you want somebody who's like between triple A and the majors, you know, who's very projectable. Like I could, you know, I could, I could probably call up Steamer, look through my Steamer projected prospects in my organization and figure out pretty quickly, you know, who you want and who I don't necessarily care.
Starting point is 00:47:21 Like it's fine if I part with them, you know. My personal theory was that the brewers traded for Asturio Ruiz to trade him to the ace. Oh, that's funny. I buy that. But yeah, after a while, it would be hard to win trades because people would know exactly what you're looking at. They could just call up your system and just be like,
Starting point is 00:47:44 let me go on Savant. There should be some variance in, when I say stick to your process, your process does not have to be static. Like your process could be like 10% of the time I do a high variance thing or something like that. You know, like it could be defined stochastically. And just because we're starting with the Savant data
Starting point is 00:47:59 doesn't mean we haven't post processed that into our own stuff. Exactly. It's all how you interpret the stuff. It's been fascinating, Max. It's been great to talk to you. We need to do this again. Stuff Plus needs to have a meeting. We're going to have an organizational meeting. Yes, smash the like button, subscribe to our channel. Thank you to Max Bay for coming and sitting with us and talking to us about all things analytics and Stuff Plus and the future of baseball. Thanks for coming on. Thanks, Dino.
Starting point is 00:48:26 And we'll be back with Rates and Barrels on Tuesday. Thanks for listening.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.