Rates & Barrels - Stuff+, the latest on bat tracking stats and making your mark Max Bay
Episode Date: May 20, 2024Eno is joined by co-parent of Stuff+ and former member of the Houston Astros' analytics department Max Bay. The guys dive into the state of analytics in the sport... Rundown: 4:09 - The past, present ...and future of Stuff+ 29:45 - How should we be using the new bat tracking data 41:44 - How can people who want to get into baseball analytics for a team stand out Follow Eno on Twitter: @enosarris Follow Trevor on Twitter: @IamTrevorMay Follow DVR on Twitter: @DerekVanRiper e-mail: ratesandbarrels@gmail.com Join our Discord: https://discord.gg/FyBa9f3wFe Join us on Fridays at 1p ET/10a PT for our livestream episodes! Subscribe to The Athletic: theathletic.com/ratesandbarrels Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Hello and welcome to a special edition of Rates and Barrels.
Today we have co-parent of stuff plus Max Bay on the line.
Max is back with us in the public sphere.
Say hello, Max, tell us a little bit about yourself.
Hi everybody.
Hi, you know, lifelong baseball fan.
I did a PhD in neuroscience from 2015 to 2022,
which tells you something about how terrible a PhD can be.
But while I was there,
doing a lot of high dimensional statistics
and modeling, machine learning, et cetera.
And my baseball enthusiasm kind of bled over into
my analytical work and vice versa.
And kind of just, you know, a hobby grew out of that.
It's like a tale of all this time at this point. So, you know, a hobby grew out of that. It's like a tales all this time at this point.
So, you know, a lot of the public work that was out there, I, I read.
Things that I found particularly interesting. I want to understand thoroughly.
I tried to replicate and yeah, that sort of just blossomed into a more, I guess,
serious pursuit of answering baseball questions and coming up with baseball questions.
That's pretty much it. So yeah, that's like the back story. And one day, somebody from a team
from the Astros reached out to me and had a really compelling conversation about, you know,
what it'd be like working for a team, what it would be like, you know, in the organization,
et cetera. And so did that for a little while, really loved it.
But for a number of reasons, I decided to leave and,
and now sort of in a, a non baseball role for the first time in a while.
So the dust kind of settling and yeah, that's, that's, that's my life story.
What were the particular skills that,
that stood out for the organization when they were looking at what you could do?
What was your best foot forward, in other words?
For me personally, I think this story will be different for just about everyone.
But for me, it was sort of the marriage of really, really loving baseball,
being very curious about what makes it tick.
What dials can you turn?
What knobs?
Wait, dials, can you?
Yeah, you turn dials.
What can you manipulate to change your chances?
And how do you evaluate things in the first place?
And so, be curious about all that stuff and then, you know, having like a programming background
and technical expertise that,
or at least like technical familiarity level
that was sufficient to start to ask those questions
like with computers,
because you just have so much data
and you really have to, you know, process these things
like on machine, obviously. Yeah, and I would say the other thing is having like, have so much data, you really have to process these things
like on machine, obviously.
Yeah, and I would say the other thing is having
like pretty robust machine learning exposure also.
So these tools that can take large, large quantities
of data with like very complicated interactions
and make predictions about the future, given input and all that.
So all that stuff, I think,
what was the thing that stood out?
Our baby stuff plus is out in the world now,
and there's a lot of different versions of it now.
I don't know, have you stayed up on that?
Have you, could you sort of quickly analyze
like some of the differences in the models in terms of structure and maybe just describe to me
how much how much we're fighting over here? Are we fighting over our tiny little decimal points?
Are they or are they you think the models are distinctly different? I mean, the teams have
models like this too. So you can almost expand it to a discussion
of like how different stuff models are across the league
or if you're aware of that, but you also, you know.
Not, it's only worked for one team.
Yeah, right.
But from what you've seen in the public space,
like how different are these models
and what are some of the structural differences?
Yeah, so broadly the models are all really similar
I would say that the two different pitch grader
Sort of implementations at a very high level tend to be either regression or classification models
So the regression model it takes all the input, you know
I'll hardly through the ball with movement was maybe things relative to other pitches in your arsenal where you threw it from etc
and the prediction for a regression model is to map
that into expected run value dimension, right? Like this one number from, you know, basically
negative infinity or positive infinity. But realistically, numbers are like per pitch
thrown, you know, somewhere between like, actually don't really know, depends on the count, but like negative 0.5 and 0.5, say something like that. When you do that,
there's basically a direct mapping of the pitch values onto run value. The classification models
are a little bit different, similar and they're super, super similar in their goal, but how they
get there are, they're kind of three steps. So there's the inputs. And then the inputs, instead of going straight to run value,
they map onto outcomes.
They give you a probability of a swing, or called strike,
or however you structure it, probability of a whiff,
probability of putting it in play.
And then there's a run value assigned to each event.
Then the idea is basically to just get
the weighted average of the run values
across where the weights are the probabilities of the event.
Probably those seem to be the two kinds
of pitch graders that are out there.
You get some additional functionality with
the classifiers because you can look at things like expected with.
The probabilities are in there for the individual events.
You're not just limited to the run value metric, right?
Like you can look at the stuff that's stuff, so to speak, that's on
the way to the actual run value.
But yeah.
And then there's obviously like the difference between a pitching plus
type model and a stuff type model, right?
Where the, the generally how stuff's done is
you just take away the location features
and maybe the count features.
And it's just saying like broadly over the like
marginal distribution of these things,
how would this pitch perform?
So those are the basic frameworks.
The differences between the different models are like
are really hard to say because these models are actually very they're complicated in ways that
you wouldn't really understand totally you know unless you did this type of modeling. But what I
can say is there are things like for extreme gradient boosting which is like a very common
machine learning framework that's
used for these types of models.
You can put limits on the number of trees that it will make.
You can make it learn faster or slower.
You can run because this model is basically iteratively
improving.
You can run it over some truncated amount of time.
You can run it forever.
So that's just within extreme gradient boosting.
But then there's a within extreme gradient boosting. But then there's like a very similar gradient boosting
machine type framework called CatBoost.
And there are neural nets.
There are always all these ways to do that.
And what I'd say is that those things I described
that you can tune for XGB for the extreme gradient boosting,
like there are similar but slightly different versions
of that for each one of these models.
And the other thing is there are different frameworks people have. Sometimes people have a
separate model for each component of what's happening. So there'll be a swing model and then
a called strike model. If they didn't, it did swing, then there'll be a different model for
basically did they whiff or foul it. And then if they, or whiff or make contact, yeah.
And then a different model for foul or put in play.
And then, you know, and then if you put it in play,
there's another model.
Does that run up on any sort of computing issues,
like, you know, computing capabilities?
Does it start to get?
I would say that like these models and computers
in particular are so advanced now that not really,
like a person on a laptop can run
this sort of stuff pretty efficiently. I mean like the original stuff plus I put
together my like 2015 MacBook and you know I would run the entire season in
like two minutes so it didn't take very long. Yeah we haven't we haven't run into
anything yet. There have been some decisions that we've made that are not
necessarily the kinds of decisions you're talking about.
They're a little bit more almost like philosophical questions
that we didn't put platoon splits in stuff plus at first,
partially thinking because we're going to put it
in pitching plus, right?
We thought that evaluating the platoon split of a pitch
meant more when you were evaluating the entire arsenal
than when maybe you were evaluating just the pitch itself.
Was that kind of our thinking, I think?
Yeah, and it was sort of, you know,
with all this stuff like their assumptions.
And so the assumption really early on was, you know,
I want, I was curious how pitch would perform in general,
but that doesn't mean that it will necessarily be deployed
to the same amount for each platoon.
What the model is effectively learning when you train it
is what the usage is.
And so I felt like it would just kind of figure out
with the platoon.
Cause this is a pitch that doesn't get used
against this way, right?
Yeah, so it's not in the training set.
So it doesn't learn as much about that.
That's ultimately, yeah, like a kind of a quasi philosophical thing,
quasi quantifiable thing, because you could just make the model
with the platoons and see if it performs better predictably.
Well, that's what we ended up doing.
And we ended up putting it into Stuff Plus again,
because while you're out,
we ended up putting it back in because it improved the model.
And I think one way that you see it playing out is,
I hate to read too much into this,
but you might have had a bunch of models out there that did not
have flip-tune splits in their models that was spitting out
how great sweepers were.
We had some organizations that just basically taught sweepers
to every pitcher that came up.
And now we have a fair amount of pitchers out in the big leagues realizing they can't throw their
sweeper to the opposite hands and they need to figure something else out on top of it. So
as we go and bumble along and stuff plus, so do teams. And I wonder what the future is for us
in these terms.
I have a little fun thing that you've put together
where you're looking at acceleration.
And you're gonna have to explain to me acceleration here.
But for those who are watching on YouTube,
you can see the different acceleration
in Luis Castillo's change up and force him.
And so the red is just like the league average
population.
The yellow is what you would expect the Castillo's pitch to be
just based on release characteristics.
Just based on release point.
Just on point. Just not even,
you're not even doing release angles or anything. Just,
just release point. XYZ where it was released in space. Yeah. And then green is observed, which is
something I've been fascinated with for a long time. One question is, do you think this
is in stuff plus? And if it is, what can this what can this lead us down? What sort of pathway
can this lead us down that might help us improve Stuff Plus?
To answer the question about whether this is in Stuff Plus,
the short answer is kind of.
So Stuff Plus has a number of predictor variables.
So these are the features of a pitch
that are used to estimate its quality
or its qualities estimated from the features.
And where it's released in space,
it's actually three of the features, right?
Because it's X, Y, Z where it's released in space.
And some of the most important features
in the model, actually.
Exactly, yeah.
If you hold everything else constant
and then you change where it's released,
you're gonna get different qualities.
So there's a dependency on where you release it
and the quality.
And mechanistically, I think people have a pretty good
understanding of this now. You get it really low low if you have a really low release pump but you got a ton
of ride you have a couple things going for you're no longer throwing the pitch down into
the zone it's going straight you aren't like matching the upward angle the bat and so the
collision is offline from the bat from the ball. And anyway, there's some mechanistic like explanations for why the grader
would see the performance dip or change. So when I say that it's
kind of in the model, what I mean is that we tell the model
that it was released from this point, and it's clearly
sensitive to those features, right. And this is, I think,
true for like, basically all the public creators.
For me, the interesting thing
and the reason I do this work in my free time is so that I understand it.
It's not just to build a predictive model.
If I had it like a predictive model that like a perfectly described baseball,
I'd be proud of it. It'd be interesting. But I would not be satisfied in my understanding of baseball.
Machine now of baseball.
Machine now understands baseball.
Actually, that's one thing that I would like to do more with Stuff Plus is, you know, we've
got the shiny that shows some of the interactions to kind of use it to have some takeaways.
I think that's what you were describing here a little bit.
It's frustrated me that it's a black box. It is a black back box because it's machine learning,
and so all these different features are interacting in all these different ways.
But I can't just be like fastball go brr. Although you can. I mean, Velo is a really
important piece of the model. So as we can make little things like this, we can better understand
the takeaways to help people sort of spot it on their own in the
wild I guess. Yeah and I also think just you know like these models aren't perfect they're
they have large residuals so they make a prediction and then there's errors off that prediction and
that error isn't just noise from sampling that error is like sticky within player between seasons
so if a player is under predicted, you know,
in which or in whatever you've seen this, like, it will usually that's that's like important signal.
And so the way I, I guess I have been thinking about this more recently is that like, let's say
you have a Bayesian understanding of things, but it's a prior, and then you have your data,
basically your observation.
And then the thought is,
well, how likely was these observations
or how likely is my prior given these observations?
And then you have your posterior,
which is like your new understanding of things.
And the way I think about the pitch graders now is,
well, actually what it does is it establishes
a really like, informative prior.
So instead of the prior being, this is an average pitcher,
it'll say, well, look at their pitch features
and see how they look.
But their performance is additional information
that is useful in understanding that pitcher.
One of the reasons I want to understand this stuff
is because it's not just, oh, I want to improve stuff plus
stuff plus tells you something
or pitching plus or whatever they tell you something but there's other unexplained sources of variance and
it's not all going to be explained but the satisfying thing about doing this and why like any of us do this in our free time is
getting an understanding of it.
And so for me, part of that was understanding what it means for a pitch to have an expected
movement profile.
I think we take for granted that it means is that it's like the angle off the, you know,
the arm slot basically.
And maybe that's it.
The other thing is it's not a point.
People don't expect necessarily a point.
There's like a range that it
could be. And that's the most likely range. And if you've got
a really surprising pitch, it'll be off of that. And so for me,
that that was part of why I wanted to build this out because
you could now characterize like a surprise pitch and probabilistic
terms like this is a really unlikely shape given where it should be
Based off your slot. Yeah, I can understand that it's a little bit like my
Quest for understanding better how pitches within an arsenal
interact, you know
I did try to look at some of the under performers and over performers and
When you do binning you can kind of be like,
oh, look, you know, in these bins,
you find a lot of people who underperform their stuff plus
that have fewer pitches
and people who overperform their stuff plus.
And yet when we try to put, you know,
number of pitches into the model,
it doesn't make anything more predictive.
So it's something that can help you understand it better.
And, you know, from the outside, if you're trying to use the model, you can, you can sort of adjust
those shades of gray a little bit and say, you know, Hey, this player over here that we're
scouting or this player that I'm thinking about picking up the stuff pluses and amazing, but he
does have like five pitches. And it seems like he can command them. This type of player may not,
you know, pop in this model in the same way that his usability might be out there
for the big leagues or for my fancy team.
And then at some point,
there may be something about the interaction.
What's fun about this too is that there could be
an interaction between expected shapes within an arsenal.
So then we could better understand
even maybe within the context of Stuff Plus.
But one thing that I've always struggled with with stuff plus is if you think we should put more sort of
Derived stats and I mean one thing that I'd like about stuff plus is it's mostly just like raw stats
It's like raw movement, you know, you know V lo and stuff like that
We haven't put something like vertical approach angle in because we're hoping that or
that by waiting release points so importantly and and and knowing the shapes of the pitches
that if the model thinks the VAA is good it'll be in there the stuff that goes into VAA.
Yeah it's like redundant representation a little bit. Right so if we start if we put VAA in on top
of it then we we risk overloading that aspect.
We've got VA and its components in the model separately, you know.
Maybe it's worthwhile to get in there, you know, you just got to toy around with these
things.
And you can either do that in an objective or subjective way.
Like you could have it kind of build out the same model with different combinations of
features and do it.
This is a nice thing about computers.
They'll just do this stuff for you.
So.
Right, just iterate it and see if it's worth it.
We do have some, we do have some derived stats in there.
Like for us to have spin efficiencies in there.
And that's like, that's actually,
that's a stat that's taken out of context
and made into something new and put that.
Yeah, it's like a feature constructed from the other
features. And so there's definitely some redundant
representation there too, because like, in order to get
certain types of shape off of a certain arm, I think you'd have
to have an inefficient release, like a tight foreseam or
foreseam with a little bit of cuts, gonna be a little
inefficient. And so yeah, like that information is kind of
there, we're redundantly
represented.
We have spin in there, and we have movement in there. So're redundantly representing. Because we have spin in there and we have movement in there.
So there go on some level, we have spin efficiency in there.
Exactly.
So maybe you don't need it.
But the other thing is like,
generally these sorts of things
don't really move the needle at all.
I think that the things that really move the needle,
maybe this is something you bring up later, but the things that really move the middle, maybe this is something you
will bring up later, but the things that kind of move the needle are bigger than that. It's
not just throw this extra feature in there. It has, I think, more to do with what you
do with the values after you've created them. Lastly on this on the on the subject of stuff plus in its future, I guess I'm wondering
if we're waiting on any metrics that would improve them. I think the reason we have spin
efficiency there for some on some level is that it's possible that lower spin efficiency pitches
move differently over the course of ball flight than high spin
efficiency or at least high magnus. You know what I mean? Like high spin efficiency means
that you've got kind of a magnus force on it, which was usually sort of up or down.
And that's what makes curve balls go down and foreseamers go up. Generally, I know.
Yeah, parallel to the direction of movement of the flying ball.
Yeah.
But when you become spin efficiency, then you become less
spin efficient, your turn your you have some spin that's not
turning into movement. But over the course of the flight of the
ball, the trajectory of the ball might catch some of that tier two
four before useless spin.
Like, you know what I mean?
Like it starts to catch some of that spin.
And in other words, gyro cutters, gyrosliders,
some of them might have late movement
in a way that may not be captured by the model
that just sort of sees broad movement,
you know, like movement over time.
Like in theory, the movement profile,
it's like, let's say you have a gyro ball,
perfect gyro ball, right?
Like no vertical.
A perfect gyro ball means none of the spin
is turning into movement,
and it's moving just like a bullet through space, basically.
Exactly.
And so on a movement plot, it would be zero, zero.
It'd be right at the origin, right?
Right in the center.
So the model, it could learn a little bit about how performance on that shape,
you know, what performance on that shape sort of looks like.
Because that ball by definition has zero spin efficiency.
So you still might not need the spin efficiency value in there.
But I do think separately something that I thought I had been thinking about a bit,
and I posted something about this.
It's not new or anything at all, but I was just curious about spin efficiency is actually
like a dynamic thing over the flight of a ball.
If you imagine, like, you throw, just using the classic example of the gyro slider,
like the spin vector is going actually in the direction of the gyro slider, like the spin vector is going actually the direction of the
pitch. It's sort of what the definition of perfectly inefficient spin is, right? So it's
like this. And so they're both moving like this and the spin vector stays like this, but the ball
starts to fall down, right? And so it becomes kind of spin efficient and then actually gets some
magnus force moving it at the end of its flight.
How much?
And that may be different based on release characteristics
and Reese angles, velocity.
Yeah, so how hard you throw it means
it has more time to fall.
And then also if you throw it more up,
it will also have more time to fall.
Yeah, extension matters, velocity matters. With velocity means more time to fall. Yeah extension matters velocity matters
We've lost it means more time to change angle to show but they're like actually show
Like this hasn't been demonstrated to my
knowledge
The the actual late movement and when I say late movement
I mean like acceleration that doesn't occur until later because it can't because the spin
efficiency isn't... I've only seen it sort of discussed in these types of ways, like in theoretical ways.
And so are we waiting for... You need to build high speed cameras and then to show the deviation from the
from the, like the null trajectory. And yeah, that's how you show it. That would be so that's like sort of a research based idea. And it's out there for
anybody who has the capabilities. It's not super easy. But there are some labs out there. I mean,
Smith is listening. I'm sure Barton Smith could do this. But are we also waiting for are there
possible Hawkeye metrics that that they could give us that may come out someday
that would tell us more about the movement in space?
Or are they mostly just a beginning
and end point movement system as well?
So when it comes to like the spin efficiency stuff,
I don't think there's anything right now
that you're gonna like that's been captured,
that's embargoed by public.
And I think the thing that people are waiting on
is some of the biomechanics
of pitching. The backtracking stuff, for example, is very derivative. There's this very raw
back, handle, and head, and then they have to take the velocity at a particular time,
and that's what you get, velocity at a particular time. The biomechanics are like that. It's the
tracking of an individual over time.
But maybe there's some important features in that
that they would release publicly that would be useful
in evaluating pictures that heretofore has not existed.
One thing I've been saying is, you know,
changeups seem to lag for us a little bit.
And one thing that would be nice is arm speed.
I did have a discussion
with someone, arm speed as separate from hand speed or even arm angle, something about that forearm angle as it comes through. I think that maybe hitters can spot change-ups based on some
biomechanical characteristics. Someone then said, well, is that stuff or is that deception? And I
was like, you know, I think at its core, it's really hard to pull deception out from stuff.
I think that's what we,
what was a real sort of eye opening moment for us when we saw that how
important release point was in features. Like that might be quote unquote,
deception, but in the end,
we're talking about what makes pitches good.
And if deception and stuff are not really separate separate things then we don't let's not get bogged down
in like is that stuff then you know I think I'd rather just make a good pitch
greater than then worry about what is stuff versus deception yeah it's all
about making a good pitch all right so the bad tracking stuff came out and
You know you mentioned that we kind of got the pre-tude stuff. I think that makes sense because
First of all there the clients of MLB AM are pretty much
broadcasters, I think and they're writers and
There's no way that they would be served. Well by just releasing a bunch of dimensions and velocities and angular notations and stuff.
So like, you know, for their consumers,
this made the most sense.
So leaving aside how we can improve them,
just given what we have now, you know,
do you think you have any sort of best practices
for people where they should be looking,
what kinds of value they can get out of these metrics as they stand?
Oh, I'm going to give a terrible answer to this, which is I have not thought about this
enough.
All right.
My move out of baseball was reasoned enough for the dust to not quite have settled.
So my exposure to what is out there is like medium level.
But I could say this,
the things that have been released,
which are essentially to my understanding,
that velocity, swing length.
Yeah, and then this metric they're calling swing length,
which is basically the cumulative distance
that the barrel of a bat traveled
from time point A to time point B
Those are
Because of the way they defined it
actually not
independent of other important things
So it's not just a like if you were to explain this in model terms
You wouldn't say that there's just a player effect, right? There's also no effect. We haven't boiled down to things that have been so,
we haven't boiled down to like the rise.
Yeah, so here's why, because bat speed is defined as,
I may get this wrong, but I'm pretty sure I'm right.
Bat speed is defined as on contact,
the velocity of the bat in the frame
immediately preceding contact.
So if you make the same pitch, same location, same shape, same everything, if you make contact at
different parts of your swing, which have different velocities because the
velocity changes, right, it starts at zero, at different parts of your swing it will
have different velocity. So where you make contact matters. Now if you do not
make contact, it's defined as the velocity in the frame where the barrel
of your bat is closest to the pitch in space.
The pitch is moving in time, so moving over time.
So again, like if you whiff, depends, so like if you whiff in your early or if you whiff
in your late, you'll have different velocity
readings even if the shape of your swing was the same, but it started at different times.
So what I mean to say is it's not controlled in a way where you're, by looking at just
the raw values, able to make super precise claims about a player's swing.
Now obviously, you do the average.
If you just look at average bat speed,
the stratification makes sense.
And it's probably really close to their bat speed
controlling for some location in their swing or something.
But there are important sources of variance
that people have looked at.
I think that Stephen Sutton Brown did,
he posted something without even really explaining it very much. And I was like, dude, this is this is what you can do, which is basically
build out a model that attempts to control for where it was in space. Pitches high in the zone
gets shorter swings, because exactly they have to because you have to to get your bat there,
you have to be shorter. And pitches out in front of the plate get higher VLOS
Like you mentioned higher VLOS and higher swing lengths just because of where they are in your swing
They're front on from the yeah, and you probably use pull percentage and pitch height
To kind of control for those for a couple of those variables a little exactly
And so I think you know what we're kind of waiting on for like really precise characterizations at the player level are things that control for that stuff.
And then basically say, given what pitches you saw, and what count they were in and all that stuff,
how much faster was your swing than the average? How much slower was your swing than the average?
So it's kind of controlling for those variables and then saying over expected how much faster you
and the stratification are probably super similar
but this matters for players with not that many swings right where the context like where it was
thrown it's very much it reminds me a lot of like sort of like stuff plus where exactly yeah very
beginning when you have very little sample and you just like a guy just debuted or something you can say more about him using these bat tracking metrics than you can about his
results but that is ever shifting towards results and there will be people
that have great quote-unquote swings. Jesus Sanchez right now has the same
swing length and same bat speed as Gunnar Henderson and you're just like
Don't think that means Hazel Sanchez is as good as Gunnar Henderson
curious to see if that is
holds true once you
Control for some of these other things. I bet it it pretty much does but it's you know I think there's a reason for this type of precision and
One of those reasons is if you want to track changes within a season, for example,
then you really want to control the noise.
Yeah, you don't want to this guy's pulling the ball more.
Oh, his swing just got longer.
No, he's just pulling the ball.
Just get back from injury.
Like, does this look different?
We want to control for the stuff that has nothing to do with their injury.
Would it have been better though, if they just picked a fixed place and time and it's hard.
It's hard. I will say that this is a challenging problem because not everyone stands the same
place.
Yes. I mean you could. Okay. So like you could stand like this. This is not going to be useful
especially for people only listening. You could stand right at the plate,
like sort of with one leg in front,
one leg behind basically.
You'd like sort of right.
You can stand basically so that your body,
your legs are pointing towards the pitcher
or they can be angled a little bit back.
And so where you are in your swing,
maybe a different location in actual space
For different hitters and so it's a tough problem
There's also the problem of like when you make contact
The bat has to slow down because it's not encountering the momentum of the ball
You're gonna say that the bat speed is higher on all the whiffs of people. That's like a weird thing to say
Yeah, and then you get some funky correlations and so it's's just, you know, it's, it's a complicated problem.
I totally appreciate why the data are represented the way they are.
I think just for, as the consumer,
you just want to be aware of those things and be on the lookout for some work
that can help clean it up a little bit.
One thing that was cool was that at the beginning, Kyle Bland did, um,
swing length plus, uh, swing speed into a swing acceleration.
And that's already, I like that.
Now, if you can take swing acceleration
and then account for pitch location,
maybe pitch speed, but definitely pitch location
and pull rate, something like that.
I think you could maybe get a sense of acceleration
over average, like you're saying,
or acceleration over some sort of baseline.
I also know that down the line,
there is planned to put into production,
like contact point or the vertical bat angle,
which is basically the angle the bat is moving,
like the barrel of the bat is moving with respect to up down,
stuff like that.
But even that is gonna change over the, right right like a bat kind of goes down and up.
So the vertical batting is going to change too.
So where you do that measurement matters.
But as as these new metrics kind of trickle out, I think the community is strong and we'll
figure out how to squeeze the most information out of what's there.
It would be nice to see a contact point in the future. That might be the easiest way to help us along.
But it's still exciting. It's still a fun new suite of pitches.
And I know from what I know within organizations that this is a place where there's still disagreement in organizations.
I know organizations that have Bat Path grades and half the people there are like,
I don't know if it's any good.
There's a guy, DK Willitzen,
who wrote a book about how VBA is everything on quantitative hitting.
You can read that, but I would read it with a little bit of suspicion
because it's seized on one metric.
Anytime you, I mean, I know I'm a stuff plus guy, but like anytime you seize on one metric
and you think that's, you know, and you don't seem to really pick at it and find any flaws
in it, then I get a little suspicious of you, you know, I don't think VBA is the answer
to everything hitting.
You know, and he's-
No, there are point-forward things there.
I think, you know, I really liked that book,
but yeah, the conclusions are very different
from the methodology, you know, parts to it.
And the methodology is really cool
because it like parameterizes the swing
or characterizes the swing in a way
that you could numerically represent.
And how important this stuff is,
I think that's up for debate,
but how the characterizations are suggested
in terms of lift and loft and stuff like that.
I think that it's some intriguing things
that will be brought back,
like things in that book that were mentioned,
I'm sure that language that will enter the lexicon.
It's true, I mean, we're now entering a space
where we're gonna have to talk about things.
I talked to a hitting coach and he said his favorite stat
in the lexicon that was available to him
was one that described how low the bat gets in the back.
And it's basically almost like a barrel dumping stat
or something.
It's about like the lowness.
It's a little bit about, it's kind of an angle.
It's like a kind of a vertical bat angle, like stat,
but it was called like SBA or something.
I don't know, but like it was about about how low it gets in the back.
And I think, yeah, I think some of what Wilson did
helps us talk about stuff like that.
It helps us talk about how, what, you know
what are the words we should use?
What are the ways we should even describe this?
I, yeah, I do recommend the book.
I just, I just don't think that VBA is necessarily everything.
And we don't even have that. We't even started are you about that? Yeah, that's that coming down the pipe
Yeah
One thing is like people always ask me is like, you know, how how do people get in to baseball?
And you know, you've been talking about this a little bit back and forth about the different skills that you had
but one way that I would like to think about is putting together the two different themes
in this podcast so far of, you know, where different stats are going and how to get into baseball.
You know, where do you think there are sort of easy inroads for young analysts? What and what
types of approach, even if it comes down to like, even saying like, is it Python or R?
You know, like, you know,
do you think they should be focusing more
on neural networks or machine learning?
You know, we've given them some hints,
but where you think they could take numbers
that are out there right now and make an impact
and sort of make a name for themselves?
So while, you know, I just have my story.
And so maybe I should just sort of tell that story briefly and then get into that. So I was
just genuinely interested in these baseball questions that were floating around at the time.
Things like trying to understand which players are potentially going to be impacted by a dead end ball, like which hitters will
see the biggest drop.
It's not like it's the same effect for everyone.
The effect will be concentrated in certain types of players.
Things like how do you model a pitch based on physical characteristics alone?
And there's a lot of really great work out there.
Just kind of read it.
Some cases replicated, some cases, just kind of read it. Some cases
replicated some cases tried to like expand on it. And so I just put my stuff out there.
By the way, if I can interject really quickly, I think reading is how I got to where I am in my job.
If anybody cares about that, you know, I clicked out every printed out every single link that Rob
Nyer wrote that Rob Nyer put in his link a thonsie back in the day, and read every single one of
those and would write pieces that just linked to them as well. So reading and linking and just sort that Rob Nyer wrote, that Rob Nyer put in his linkathons back in the day and read every single one of those
and would write pieces that just linked to them as well.
So reading and linking and just sort of becoming part
of the community, I think is definitely part of the answer.
Yeah, and you know, like so often the wheel is reinvented
and that's fine, you know, for like gaining your own
understanding of things.
I'm sure I've reinvented the wheel myself many times,
not to say that what I did was as important as the wheel.
reinventing the tiny little screw that fits in this little piece of something. But
but you know, like being just kind of like just read stuff,
man, you know, your interests will will are self guiding,
right? Like they will take you where you want to go. It's
pretty clear where the pockets are. So anyway, I was trying to make this short,
but of course I go way too long.
Would you leave R behind?
Oh, like the R and Python thing?
Yeah.
No.
No?
People are crazy.
I don't know.
Use what language is most suitable for you.
So if you don't have familiarity
with R or Python necessarily,
R is actually like lower entry point for understanding
A little bit easier to use. It does it's not as general purpose. It's not as flexible
So there's some limitations, but once you understand enough about R you could make the leap to Python
Would there be somebody in the Asteros front office that just did an R?
Would there be someone in a theoretical front office
that would just work in hard and then be fine?
Absolutely.
Yeah.
Absolutely.
And so I don't think that puts like a,
yeah, it may for some organizations
put a hard constraint on you.
Maybe I just answered the question right there.
But yeah, I think between those two things,
between our Python and there are definitely other languages,
both are actually great.
Ours is a little easier.
So if you want to lower an entry point
that's a little easier, R may be great for you.
There's some stuff that's available to you.
Bill Petty has a baseball with R package.
Yeah, it's called the baseball R package,
baseball and then the letter R.
Yeah, I mean, it's actually like the linchpin
for so much baseball research,
particularly because there are scraping functions
where you can load all of the data.
Pretty much.
Yeah, I believe we use it in part of ours.
But there's also baseball with our I believe a book.
Let me see it Max Marchi.
Jim Albert.
I think baseball data with our I believe that's the Max Marchi book.
Yeah, it's Yeah. And that's it and there are a few authors on that one.
There's a free version of that, version three, that's available for free online and it is
awesome.
Nothing's going to teach you everything.
If you learned everything, then you would have no reason to do any research.
But it equips you with, I think, the programming and the plotting statistical analysis tools that you need to pursue
your own stuff.
OK, so let's end with an interesting question
that I was asked by Niv Shah, the founder of AutoNew.
He asked me if I could put the team together,
me in the public space, just using publicly available data
and whatever we could get off Savant
and whatever we could put together, how would my team do?
I wonder if we can talk about that
without getting into any specifics
of any organization we're just at.
But just to think about the quality of stats you saw
inside the game versus the quality
with the work outside the game.
And then also, I guess what I would say
is that what somebody could bring to this theoretical team
that may not every team might be good at is sort of good business
tactics in terms of a good organizational workflow,
a good org chart, good communication, good sort
of using benchmarks to inform processes,
like good process.
If you could, I know enough about different teams
that I can say not every team has good process
and good communication.
So how do you think this team of public analysts
armed with baseball's avant could do in baseball?
I think you do pretty well.
I think really do pretty well.
I think really like so much of what makes a team work or not is, as you say, just good communication.
And I don't think that means imposing your view and doing that in a clear way.
Communication should be a two way street where you're learning from the people you're talking with,
you gain their trust, and vice versa.
So I think at all levels, you establish credibility
by being a good person and by listening
and making products that people believe
because over time you've shown that they're effective
and you've shown that you take input seriously.
And so if the organization had really, really, really good quantitative tools, understood
the game analytically super, super, super well, and then they impose their ideas in
a draconian way where you don't gain the trust of the coaches, where you don't interact with
them, first of all, you're going to miss important stuff because coaches are aware of tons of things
that are not like available in the data.
And you'll get a lot of pushback
where your stuff is not actually implemented
because people are just rolling their eyes at you.
Exactly, yeah.
And so there you go.
So you could say you try to implement it,
but then it just won't be.
Yeah.
So the data matters, it totally matters.
And obviously it matters.
Yeah. But I think that you have to be The data matters, it totally matters. Obviously, it matters.
But I think that you have to be not just sensitive to the recipients,
but also really take their input.
I'd say you could do pretty well because there are probably
some teams out there that just don't have that communication and that's your competition.
Your performance is just relative to your competition.
There's also a hidden advantage actually
in this theoretical framework that we've created,
which is that I'd be building from zero from scratch.
And so that's actually, there's some advantage to that
where what happens with every team is
there are a little bit of an amalgamation
of all the different GMs they've ever had. You know,
there's still little ghosts in the framework of, you know, like,
you know, Bobby Cox is still Atlanta somewhere in that system. You know,
they're still saying things that, uh, that, that,
that people have come before, like who was, I, I get why I can't think of the,
the pitching coach that was low and away guy that I grew up with in Atlanta. But anyway, you know, there's still these sort
of vestiges left over. And in our new one, we could just create a new system from scratch
and maybe theoretically, at least, you know, have something that worked better for our
modern time than some of the old systems that are still in place.
Yeah, they're like the institutional and nurse has gone. And so yeah, that's
like both a good and a bad thing, probably. But
we would lose some institutional knowledge that other organizations had,
you know,
but yeah, I think what I mean to say my, you know, kind of hurt answer, or
terse, whatever,
is that there's, first of all,
that's a very rich information resource.
And there's a lot of like coachable things on there.
There's a lot of decision-making related stats in there,
obviously, and you could probably go really far,
especially if there's good cohesion,
there's good communication with people.
But, you know, it's the game of inches.
And often, the reason you have R&Ds
because there's competitive advantage
in having your own information.
And so, and when I say information,
I don't mean your own data,
but your own data processing kind of mechanisms.
And on some level, like it would be hard for us
to make trades and win trades
because everybody would know how we were evaluating.
This is something that's true for like fantasy people.
It's like, if we all have the same, if we all have the same if we all have the same access the same day and we're all trying
to make trades I find the trade making to be harder than ever these days. Right like maybe
it'd be hard to win first place or whatever in your league but you'd probably be better you're
like you probably if you just like stuck to your guns and followed a sound methodology I mean my
guess is obviously sometimes it can be pretty bad but on average over like however many seasons you play your problem pretty good. Yeah I think we could
do something a little bit like what the A's do because and I think about this too because I feel
like people know exactly what the A's want now in prospects. Right. Oh, you want somebody who's like
between triple A and the majors, you know,
who's very projectable.
Like I could, you know, I could,
I could probably call up Steamer,
look through my Steamer projected prospects
in my organization and figure out pretty quickly,
you know, who you want and who I don't necessarily care.
Like it's fine if I part with them, you know.
My personal theory was that the brewers traded
for Asturio Ruiz to trade him to the ace.
Oh, that's funny.
I buy that.
But yeah, after a while, it would be hard to win trades
because people would know exactly what you're looking at.
They could just call up your system and just be like,
let me go on Savant.
There should be some variance in,
when I say stick to your process,
your process does not have to be static.
Like your process could be like 10% of the time
I do a high variance thing or something like that.
You know, like it could be defined stochastically.
And just because we're starting with the Savant data
doesn't mean we haven't post processed that
into our own stuff. Exactly.
It's all how you interpret the stuff.
It's been fascinating, Max. It's been great to talk to you. We need to do this again. Stuff Plus
needs to have a meeting. We're going to have an organizational meeting. Yes, smash the like button,
subscribe to our channel. Thank you to Max Bay for coming and sitting with us and talking to us
about all things analytics and Stuff Plus and the future of baseball. Thanks for coming on.
Thanks, Dino.
And we'll be back with Rates and Barrels on Tuesday.
Thanks for listening.