Effectively Wild: A FanGraphs Baseball Podcast - Effectively Wild Episode 1853: What Are the Odds?
Episode Date: May 26, 2022Ben Lindbergh and Meg Rowley talk to Kelly Pracht, the CEO and co-founder of predictive analytics startup nVenue, which has provided the real-time probabilities displayed on this season’s MLB Networ...k-produced Friday Night Baseball broadcasts on Apple TV+. They discuss nVenue’s origin story, its sports-betting ambitions, its 100-plus-input machine-learning model, which factors are and aren’t predictive […]
Transcript
Discussion (0)
Hello and welcome to episode 1853 of Effectively Wild, a baseball podcast from Fangrass presented Let you win again it and we will next time before the end of this week. But today we are going to give you something
that we've been hoping to bring you for a while. We are devoting all of this episode to the
probabilities that have been appearing on Apple's Friday Night Baseball broadcasts, which are
co-produced or produced maybe entirely by MLB Network for the entirety of this season. Every
Friday since the start of the year, there have
been two games every Friday evening that have appeared on Apple TV Plus. And as many people
have noticed, those broadcasts include real-time probabilities, which are updated from pitch to
pitch on the bottom right of the screen. So sometimes they will show a probability that
the hitter will get a hit. Sometimes that they'll strike out.
Sometimes that they will reach base, etc.
So, of course, we were interested in this from the start.
In fact, I think even before these first appeared, I may have mentioned on the podcast that they were going to appear because I read it in a press release.
And we were interested.
Our interest was piqued, I would say.
Right.
We were kind of into this idea.
Yeah.
We were into the initial concept for sure.
Yeah, I was curious to see what it would look like at least.
And so I tuned in and I still thought the concept was kind of cool.
I don't know if it's essential.
I don't know if it would enhance the enjoyment of the broadcast to everyone, but basically on board with the concept. But there were certain odds that were raising red flags to me or at least not what I would have expected to see.
Let's see. As I would put it, basically, either I fundamentally misunderstood baseball in some way or there was something weird going on with these odds.
It was really either one or the other. And I am open to being
fundamentally wrong about baseball. That is entirely possible. I've been wrong about plenty
of things. And in a way, I would welcome being wrong because it would give me an opportunity to
learn something. But I've been watching these odds. And a lot of our listeners have also been
watching the odds. And we've gotten emails and
we've gotten tweets and we've gotten people talking about them in the Discord group, in the Facebook
group. And so I just kind of wanted to figure out, well, how do these work? And it was reported by
Sport Techie that a company, a startup called Invenu, has been providing these odds to Apple
and that they've raised more than $4 million from various
investors and they're interested in getting into sports betting, et cetera. So in mid-April,
I reached out to InVenue and asked, we've been getting some questions about some of these odds
that are kind of confusing people and not just the Effectively Wild audience, but baseball,
Twitter as a whole, I would say, and just asked if we could get some explanation or if we could talk to someone
potentially and have just kind of been corresponding ever since for several weeks with
their chief marketing officer and had kind of an informal conversation with their CEO at a certain
point and conveyed some of my confusion about these odds and
sent numerous examples about ones that I had a hard time following. Specifically, there are cases,
and we will cite some examples later on in this episode, but there are cases where the probabilities
of reaching base, let's say, are making an out move in the direction that we would not expect
them to move within a
plate appearance. So it's not so much about the number. Sometimes it's about the number,
but also it's just about it went up when I thought it would go down or it went down when I thought
it would go up based on what we think we know about baseball, which is that if the count becomes
more favorable to the batter, if they go up 1-0, 2-0, 3-0, et cetera, that generally they have a
better chance of reaching base,
which is what we see on the whole.
And there are some published odds here that contradict that, which kind of raised my eyebrows,
I suppose.
So that's sort of what I was wondering about, asking about.
InVenue told me that they would maybe be publishing an FAQ or explainer of some sort on their
website.
I don't believe that they've done that yet. But eventually, we arranged an interview with Kelly Prock, their CEO and co-founder, who will be
joining us in just a second. Now, independently, as some of that was going on, Fangraph's writer
Ben Clemens had some of the same questions and curiosities. And I don't know if he heard us talk
about this on the podcast because we had registered some of these concerns in an April episode or two as well. But he decided to
look into this also by actually gathering the data that was published on these broadcasts and
doing a study to see how predictive it was. And so he has now published that study at Fangraphs,
and we will talk to him a little later in this episode as well
to explain some of his methodology. But first, we will talk to Kelly and we'll hear a little bit
about InVenu and how they were founded and how they ended up providing these probabilities. And
then we'll get into the model and some of the specific numbers that we had questions about.
And then we will talk to Ben. Is there anything else you think we need to explain
before we get to this we like math we vote we vote yes to probabilistic thinking as a useful
lens through which to understand the world and also baseball and uh we hope that like a good
version of that can find its way into the public forum i I don't know. That sounds so formal, Ben. It was a long interview
and there were a lot of things covered. So yeah, I don't know. I don't have anything else to say.
I think you summed it up well. Let's get to this interview, won't we? All right. Well, we are joined
now by Kelly Prock. She is the CEO and co-founder of InVenue, the company that has been providing
the odds that you've been seeing all
season on Apple TV Plus's Friday night baseball broadcast. Kelly, welcome to the show.
Hey, thank you for having me. Really excited to talk numbers with you.
So to start, could you summarize InVenue's origin story? When, how, why did you co-found the company?
Yeah, no problem. So, you know, I've been leading big tech for
quite some time, my whole career, in fact, almost 20 years. And, you know, I come from this crazy
sports family growing up in West Texas in the world of Friday Night Lights, where sports was
a way of life. And as I became a technologist, I really felt a huge disconnect between how tech is used for sports for fans that
are watching. I saw a huge gap when I was leading supercomputing teams. I saw that live data could
be used so much better. That was around 2017. Probably wrote that code, wrote the first lines
of code that exact day when I had the aha moment. And here we are five years later with the InVenue
algorithms that do live prediction for real-time sports. And how did your relationship with Major
League Baseball and Apple TV Plus for these particular broadcasts start? Yes. So in 2021,
we participated in the Comcast NBC Accelerator. And it was so great because we found this product market fit for these algorithms that we created.
And we spent a lot of time with NBC and with Sky Sports and just a numerous amount of people.
And that turned into a relationship where around September of 2021, NBC put our content on air for some A's and some White Sox games because that's where they
have the regional sports network. Out of that, the MLB network caught hold of that, gave us a call
and said, wow, we've been talking to them through Comcast Sports Tech. But when they saw it on air,
they said this could be something pretty powerful. Well, we got a call maybe two weeks before the
start of the season. I think the lockout had just been resolved.
Spring training was just starting.
And we got a call from MLB Network saying, hey, there's an opportunity to put your content on air.
We did everything we could in the course of those two weeks to get ready.
And it landed on April 8th.
We were on air for the first time with Apple TV and MLB Network.
we were on air for the first time with Apple TV and MLB Network.
And could you tell me a little bit about your partners and who has helped you develop the model and some of their backgrounds as well?
Well, our algorithms are really homegrown.
Myself and my CTO, Mick Stearns, created this from scratch in the very, very beginning,
based on our knowledge and deep understanding of
how to do compute, how to do machine learning, and how to make things really fast. So when you
talk about partners on creation, it's all Houston grown and our own. And those are obviously useful
backgrounds to have in play here when you're developing an algorithm. Where did the baseball
expertise at the company come from? Well, that's one of the things that my co-founders and I and our CTO, we all have in
common. In fact, the entire executive team, we are, like I said earlier, we're crazy sports fans. So
even developing the algorithm, I had, you know, just love baseball, grew up with baseball, grew
up with football, basketball, we've all played sports. But while I was creating the algorithm, we had season tickets to the Astros games. And I
think I've probably spent well over 100 games in stadium with our algorithm, watching it live,
coming back, working on the code, repeating the next day. So we're just true sports fans
in every sense of the word.
So there's no one, I guess, that you've consulted with who's kind of a baseball person specifically.
I mean, I know you all have some expertise when it comes to baseball and sports, but in terms of someone who has been in the industry before or that kind of background, I don't know that that's necessary, of course, because people from outside the baseball establishment have really revolutionized the game in the past couple of decades and in many cases can pick up on things that experienced baseball people might have missed.
But I was just curious whether that has been a part of the process or even maybe working with
your broadcast partners, whether that's been a part of the process.
Oh, it's so huge that you bring that up because it only takes looking at the products that come
out of tech for sports specifically. It doesn't take us very long to pick out who knows sports
and who doesn't. And so there are sports that are not in our wheelhouse, like, for example, hockey,
just not a sport that I know super well. So as we deliver algorithms for sports that I don't know as well,
we take these methods that we've done for creating the live machine learning and apply them.
But we are bringing on board people with massive sports expertise,
people deep within the league, coaches within the league, sometimes the league itself.
So in baseball, that wasn't really necessary because we have between, you
know, across our founders, we have every bit of expertise that's needed. But there are sports that
are coming that we don't. So it's a great call. Even when we watch data feeds come from some of
our providers, we can tell if they're American based or not. Like when they're talking about
football and they say, you know, something about this side of the pitch, it's clearly they're, you know, clearly they're more European based.
So you can always tell.
And you talked about some of the early efforts with NBC and then obviously the version that people are going to be most familiar with this season are the Apple TV broadcasts.
Were there any changes or refinements to your process that were made between those initial
broadcasts and what we're seeing this season?
There's always changes and improvements.
The core algorithms remain the same.
They get better.
Deep learning and machine learning, that's its goal, is to learn from previous.
So it does get better.
I think one of the big differences from what NBC showed to what Apple showed is NBC showed a really large tombstone of
data that took up a bunch of the screen. And as sports fans, we were like, oh my gosh, that's too
much. And quite honestly, the feedback that we saw come in through Twitter was, it's too much,
it's too big. They used multiple decimal points, but it took away from the game, we thought,
They used multiple decimal points, but it took away from the game, we thought, because it was just covered too much of the action.
But in Apple TV, it's still the same algorithms, but they've taken a different approach.
They said, let's go simple.
Let's keep it clean and brief.
Now, the downfall to that, to using just one data point, whereas NBC used five, six. They chose at will what data points to show.
The downfall is now we hear questions about, oh my goodness, well, that didn't make sense. That reach went up when a strike came in. Whereas on the NBC broadcast, you could see how the other
fields would go up and down. And I think that made for an interesting experience,
but they both have pros and cons. And were there any particular game states
or situations that you found that early versions of the algorithm struggled to fully capture,
even just in terms of, you know, it's striking you as someone who knows the sport as like,
that doesn't seem quite right. Oh, goodness. You know, I've always questioned the data from day
one. Like, how could this be? How could we pick up on this? But every time
I had a question and every time I had the question, I would go back to the core data that
was used for the machine learning algorithm, because it's all built on the same thing that
stats generate, right? It's all built on what happened with each and every pitch, like all of
the metrics, like what goes into the model.
And I would come back and say, that's why. It always, always made sense. I have yet to find
a case that doesn't make sense. There are a few that have not made sense to me,
but we can get to that. So before we get there, I did read in the Sports Business Journal that
you had recently hired a lead
betting advisor, right? And I know that you have some ambitions in that area as well. So is InVenue
currently working in that space or are you primarily involved in broadcast now and hoping
to expand in there? What do you see the long-term applications beyond the kind of implementation
that you've had on the Apple broadcast being? Well, our mission, and it's always been our mission, by the way, is to allow fans to
engage with the data. I think just putting data up is one thing. It's not exciting until you can
engage with it. By the way, when you don't agree with it, to be able to say, I don't agree with
that and to make yourself heard or bet against it. So we found out that in the betting world, this live predictive analytics, like with each pitch, it's a pretty rare thing.
Only a handful of companies are doing it.
And it makes perfect sense for us to really further this fan engagement by providing those odds for the real time bets.
Now, we also see a world where betting and watching are going to merge. So right now,
a couple of weeks ago, I was in New York and I was watching the Yankees game and I was betting
on the DraftKings app and I could not watch them bet at the same time just due to the latencies on
the screen. As those things go away, that betting and watching in real time is going to be really
powerful and we're excited about that. So that's a natural application for InVenue.
And quite honestly, it's our number one plan.
I appreciate that this is your company.
And so you're probably not keen to pull back the curtain too much.
But given that the obvious application, both that you envision and that people might take
away from this, is them putting their actual money at stake either because they disagree with the odds that you're showing or
because they agree with them what kind of sort of outside validation or testing has gone on to
to verify these odds i know that your sounds like your team does that on a fairly regular basis but
i don't know that if i were sitting at home that the way that I would express my sort of belief in the accuracy
of the odds, if I thought they were wrong, would be to put money on it.
It might be to say, these don't look right to me.
So what kind of validation is going on here, given the stakes for people?
Yeah, no, it's a wonderful question.
It's actually my favorite question is to talk about accuracy.
First of all, the accuracy of what a sports prediction is, is not a widely known metric
to communicate.
So even looking at stats, for example, you know, if I asked you, could a stat tell you
what's going to happen?
I think most of us would cross our eyes, pull up 40 stats, all the splits and come back
with the answer of, I don't know.
It really comes down to sports intuition. So we actually had to really dig deep and create the methods to prove regression. So we've run,
at this point in our game, four years of baseball seasons under our belt. We've run millions,
I would say probably on the order of 100 million predictions. So what do we do to
prove the accuracy? It really comes down to, did our models train? Do they pick up on the
unusualities that are happening on the field? Does it pick up when there's a streak or a slump?
That's really how you communicate that. And outside of going into a lot of data science
explanations on calibration curves and how we fit the line over the course of these millions of pitches, that would take a lot longer.
I'd be happy to go into it at some time.
But to answer your question, Meg, we study every bit of the data year over year, check that the calibrations work.
And our accuracy, we believe and we can articulately show, is much better
than any stat could provide. One more question before we move on to some more
odd specific questions. This is more on the business-y side. I know you've put some information
out there about investors and seed rounds and money that you've raised. What, if anything,
can you share about who has invested in the company or how much money you've made or your plans to expand, etc.?
Sure. Yeah, very, very public information. I'm happy to talk about it.
So after our Comcast stint, Comcast NBC accelerator stint, we did raise funds.
We raised with KB Partners, which is a sports tech VC in Chicago, as well as Corazon Capital,
which is also based in Chicago.
And the founder and lead partner of that is Sam Yagan, who has some Texas roots and roots in some of the big tech type applications.
Lots of smaller investors, but those are our two leads.
And what they did was deliver us enough capital so we could take this algorithm that we feel like we've mastered in baseball. And yes, I'm happy to get into the challenges of that. But we're taking that algorithm with those funds to go and deliver more sports, to package it into what we call micro bets, and to really, really expand our entire product and offering.
So as for the model, and don't hold back on our account because our audience is fairly stat savvy, I think,
and would be interested in whatever details you're able to offer, though I know this is obviously a proprietary model
and you can't give the whole game away here, but to the extent that you can share, tell us how the model works,
I guess, and you can go into whatever level of detail you're comfortable with.
So first of all, and y'all are baseball people, what we look at is exactly what you would think
when a batter comes up to the plate, right? What things do you think of, Ben? You think of,
where are we at in the game?
How's the batter doing so far? Is he 0 for 2? Is he 2 for 2? How's he done in the past?
What else comes to your mind? Personally, I probably wouldn't consider how the batter has done in that game. I mean, you know, I wouldn't think that that would be predictive
personally, but I would certainly look at the batter and the pitcher and the umpire and the
catcher and the ballpark and the defense and all of that. I assume that that's all part of the soup here. And I know that you've mentioned,
or it has been mentioned, 120 inputs. I don't know if that's the hard number or whether that's
grown or shrunk over time, but I was kind of curious about examples of inputs other than
some of those obvious ones, because I can think of a lot of factors that could
potentially impact the outcome of a plate appearance, but 120 is a big number. I don't
know if I could come up with a list that long myself. So I was wondering about some of the
subtler or less obvious factors that you might be taking into account.
You got it. So what comes from the field, what comes from the data stream from the live play
is literally thousands of data points, right? So trimming it down into between 100, 120, you know, it takes a little
bit of gumption to say, okay, let's, let's, let's add in all the things that we know about the,
the batter, you know, about how he's doing this year, how he's done this game, you know, that,
and a lot of folks do look at, you know, like if a, if a batter is 0-2 and he hits, you know,
And a lot of folks do look at, you know, like if a batter is 0-2 and he hits, you know,
320, probably thinking he might get a hit.
Now, we know that that doesn't really apply always.
It's not a given.
But it is a piece of the puzzle.
We look at the ballpark, the distances to the outfield.
We look at the weather, the temperature, where we're at in the season. We look at, is this American League versus National League?
We look at, are they in the same division? How are they ranking versus each other? Like,
is this team, you know, 20 games behind or half a game behind? We feel that that adds in there.
Now, we also put in a lot of times when people watch a matchup, they think about the batter
only. Like, think of all these
batting stats. And we tend to kind of understand those a little bit better than we do some of the
pitching stats. But honestly, when we start to put in the pitcher, how he's doing, how he's done
the last five batters, when we put in his performance over years, weeks, and months,
then all of a sudden we see a much more accurate version of what's
going to happen. So all of that, it comes up to somewhere between, it depends, somewhere between,
I think we're sitting at 110 right now, but all of those things, they just add up. And by the way,
that's already trimmed out so much. And after that, we let the machine pick up which is the most important feature.
And it varies for every single matchup, which is really interesting. We did a study on an at-bat
that didn't make sense to us a few weeks ago. It was a Max Muncy batting. And we could not for the
life of us figure out why one of the factors kept such a great importance. And it
turns out in the Max Muncy situation for that particular count, the pitch count was a really
big deal. That's what the data said. And that's something that perhaps we don't always pay
attention to. Yeah. And I know that there may be some circumstances with this type of model where
you can't always necessarily explain why it's saying
something it's saying, which maybe doesn't necessarily mean it's wrong. It's possible
that it's picking up on something that is not obvious. I guess that's kind of the trade-off
when it comes to the complexity of some of the advanced stats that are being brought to bear
these days. It may not be quite as transparent, but it's also maybe a little bit hard to figure
out if it's going wrong
in some way at times, which is why I wondered, you know, when you're talking about these various
factors that seem like they might have some predictive value and then you add them to the
model, is that based on testing every step of the way so that, hey, let's see if this actually
improves the predictions and if it does, we will keep it And if it doesn't, we will jettison it.
Or is it just sort of we think this should matter or help and we'll try it?
So our models are per player, right?
There's a lot of players in baseball, right?
There's a lot of pitchers.
There's a lot of batters.
And when we tried to really deep learn on which player, like if it's Altuve,
we're going to find these things are important, so let's jettison it. We really found out that our methods of machine
learning were solid enough that we didn't have to do that too often. You know, after you get,
you know, past 100 inputs, removing one does not change the scenario very often once you get them
fairly right. Now, one of the pieces, and this is kind of in the
secret sauce of what we do, but I'm happy to share this really important method that we do.
And that's, we've heard other people say, oh, we look back 10 years to understand what's going to
happen. But you know, and I know that Altuve five years ago is not the same as four years,
not the same as two years ago. Heck, this month, he's really not even the same as he was last month because he was injured.
So in our data, we segmented it out.
We do year models, month models, week models.
And we were able to find this really cool thing.
We're able to find trends, streaks, and slumps.
And what you don't know that's deeper than what you're seeing on Apple TV
with every single outcome. We include in our API things that we want to know as sports fans, which
is why and helping us to understand. So we have little factors that tell us this is really
different than the league. It's much higher. There's a trend going on here. The pitcher over
the last few months has been giving up more home runs. That sort of thing is present in our API.
It's just not always the easiest thing to display when you have 14 seconds between the pitch.
Yeah, I was curious about that because I assume there's always some risk of
overfitting, right? If you're looking at some smallish sample, some recent sample, I mean,
just to throw some numbers out there, if someone is 0 for 10, the last 10 plate appearances that
they started out 2-2 or something like that, and you're looking at the last 10 plate appearances
as predictive, then maybe the next time they come up and they start out 2-2, then you'd say, oh,
well, they have no chance to get on base here, right, because they haven't the last 10 times.
But you could potentially get yourself in trouble there because if you're focusing on too small a sample, then maybe you're not throwing out some recent performance that is
useful and predictive, but you're also not basing your predictions and probabilities too heavily
on that thin slice. So that seems like an area where you have to be pretty vigilant and do
testing to make sure that you are looking at the right timeframe. And that's exactly what we do. So within each prediction,
we have up to 72 models that we run.
We run pitcher for years, months, weeks,
batter for years, months, weeks.
We run matchup models for months and weeks.
And we also grab what the league average is
for that particular situation.
So we take all of those together
and then we look and figure out based
on our algorithms, which ones are going to be the most effective, how to fit those and merge all
those back together into the best prediction. Because by the way, we get one shot of telling
you what the prediction is and it's a set of multiple outcomes. Like you may be seeing one
on Apple TV, but behind the curtain, there's actually a lot more of the outcomes. And by the
way, that all has to add up to 100%. So if you see something like a reach go up, that actually
contains elements of single, double, triple, home run, walk, hit by pitch. So we're actually
showing you just a summary of the result, but we're really excited about how we can merge all those together.
And again, the regression numbers that we have are truly fantastic. And we show without a doubt
that our models are able to calibrate to what we call a line of perfection. And we're not saying
we're perfect, but we're saying that if for 200,000 times this count happened and we said 10% was the predicted outcome, did we in fact, were we right 20,000 of those times?
Does that make sense?
That's how we calibrate.
And we're dead on.
It's in good shape for all of the areas where we have plenty of samples.
There's always outliers.
And those actually can be kind of fun to look at.
have plenty of samples. There's always outliers and those actually can be kind of fun to look at.
So I guess this is a related question that might just be rephrasing Ben's in slightly the same way,
but I guess I'm curious, you know, you're doing a lot of this predictive modeling on the per player basis, but obviously, you know, there are plenty of comps that exist across a player
population. We do studies at Fangraphs where we are trying to understand
how a player is likely to perform in a given moment. And we're not simply going to look at
that player's performance, but other players like that player in similar circumstances facing,
you know, similar kinds of pitchers throwing, say, pitches that are of a similar velocity band in a
similar part of the strike zone. And you've talked a lot about the machine
learning and the algorithm sort of confirming itself, but I'm curious what sort of just
cross baseball work you guys are doing to try to understand where there may be gaps. Because
I know that you've said, you know, your aggressions look great and you're managing
to a line of perfection, but as experienced baseball people, I think we still sit here with
moments during this broadcast where we are scratching our heads about how you could be
arriving at that. And maybe that's a result of you not showing enough or the broadcast not showing
enough, but there are, you know, instances here where I don't know if you are overfitting or if
the sample is too small or what have you, but it still doesn't pass the sniff
test. So I'm curious, like what sort of baseball wide studies you're doing to try to say, how does
this jive with what we know about baseball? Well, no, it's a great, it's a great time to
actually get in a few numbers. So let's talk about one of the examples that you guys mentioned ahead
of the podcast, which was the Semi and Javier matchup from this past week. The Astros Rangers were facing off here in Houston, right? Now on the broadcast,
so this was quite a long at-bat. It went, you know, 0-0, 0-1, 0-2. He took a few fouls,
and then he took a couple of balls. And, you know, after six pitches, he finally struck out
swinging. And so, you know, on Apple TV, they showed the reach probability in that case.
And I think one of the things that might have been counterintuitive that would cause you to scratch your head was as he took the first, you know, I think reach was somewhere around 23% at the beginning of the at-bat.
A strike came in and reach went up to 33%.
Another strike came in and it went up slightly more. And then it started to go back down as the at-bat. A strike came in and reach went up to 33%. Another strike came in and
it went up slightly more. And then it started to go back down as the at-bat progressed. Would I be
right in saying that would cause you to scratch your head? Yes. Yeah. Okay. No, and us too, right?
So we look at it every time we see one that doesn't make sense, we say, okay, why? Well,
I went a little bit further on this particular at-bat, and I love fan graphs, by the way.
We're huge fans, you know, been on your site many, many times looking at splits and all the wonderful work that you guys do.
And I went ahead and grabbed what the splits are for, I only grabbed Simeon in this case.
And I said, I wonder if we could learn anything from the splits that would be nonsensical.
if we could learn anything from the splits, that would be nonsensical. And as I looked at the batting average for SEMI and for this particular count progression, it made me scratch my head.
And that's just the stats and it's built on data. So I would claim that stats always,
if you looked at stats in the same way and the same types of progression, you might
scratch your head. But stats don't have one of the limitations that we have
because stats can stand on their own.
And very often do you really break down what's in a batting average
versus the slugging or OPS.
A lot of times in 15 seconds, you just don't have time to dig into those splits.
We could do it after.
We could do it post-game.
But we can't really do it in the game.
But I claim that in this particular at-bat, you might find some things that would scratch your head, make
you scratch your head. However, we said, we took that at-bat, since you guys had mentioned that
ahead of this, and said, I wonder why reach went up when a strike came in. That, yeah, it's like,
goodness, why did that happen? Well, here's the thing. We dug in.
We looked under the hood.
We looked at all the predictions that were coming out.
And something really interesting happened.
The single percentage doubled between the first and second pitch.
Why did it double?
It didn't show up in the Marcus Simeon data, but it did show up in the Javier data.
So for that situation, Javier was giving up more singles.
Now that was in his more recent performance, but that was the cause. And it truly was in the data
that in this matchup, that's what happened. And two models, both the pitcher models, as well as
our matchup models confirmed it. So that's why it came in higher. I'd love to
explain that on Apple TV and tell you why. But at this point in time, you just get the number.
So two questions. One, who controls which type of probability is displayed on the broadcast? Is that
you or is that Apple or is that MLB Network? Because sometimes it'll be reach or out percentage.
Sometimes it'll be hit percentage, strikeout percentage, walk percentage, home run percentage.
Who decides what will be shown at any particular time?
So we are total fans.
And that week we were in New York, my chief product officer and I had the opportunity to go to Secaucus and go sit with the MLB network for both broadcasts that Friday night.
And there is a human being that looks at all of the data that's coming in and picks which
ones to show.
SuperSport's smart.
It's what they're interested in.
Now, under the hood also for the in-venue API, we also have flags and signals that tell you this is the most game
impactful thing to show. This is the most relevant because we want to help our future customers,
you know, be very sports smart. But in this case, the MLB network, they have it down.
By the way, sitting there with these guys, sitting there with the crew, watching the game,
watching them put up their numbers, they would ask me questions like, Kelly, why did the numbers go up here? Or why did this go down?
And, and then we would talk about it and be like, oh, you know, because, you know, the groundup
probability went way up because he throws, you know, wicked curve, which, you know, you know,
pulls more groundouts, you know, like it always made sense and it made for great conversation,
but they're just super sharp and know exactly what they want to show.
But every now and then they'll show something that's a little counterintuitive to me.
You know, like I think they showed extra base hits a few times, you know, a home run a few times because that's an exciting one.
But, you know, the numbers are always a little bit lower. So it's human controlled. It doesn't always have to be.
So it's human controlled.
It doesn't always have to be.
Got it.
So I see that there are some cases where there could be a counterintuitive movement that might actually make sense.
For example, if a hitter takes a ball, let's say, and the count is more favorable toward
that batter.
There are cases I could imagine where maybe their percentage of a hit actually goes down
because, well, their percentage of a walk went up, let's say, right?
So that might not be clear to everyone because you're only seeing that single number on the screen.
However, for reach percentage or for out percentage, that seems like a case where you wouldn't necessarily need to see the other numbers
or the other numbers wouldn't necessarily change the story because you're just talking about is the guy going to get on base, right? They don't always explain exactly what is meant by reach baseball probability, but I assume that
it means he's going to get on base by any means, right? He's going to just end up not making it
out in this plate appearance. And that, just based on everything I think we know about baseball,
suggests to me that there really should never be a case where I could come up with a
logical explanation for why a batter's odds of reaching base should go down when they have a
strike thrown to them or two strikes in Simeon's case. And, you know, if there were some way in
which their odds of reaching base actually did go down, then it should be consistent, I would think, within the plate appearance.
So that if Semien is starting out with a 22-ish percent chance of reaching the base
and then he takes strike one and it goes up over 30
and then he takes strike two and it goes up again
and then there's another ball and it's going down
and then ball two and it's down again,
that seems to me counter to, you know, like if you were to ask Marcus Simeon or the pitcher
Javier there, like, would they want to start in that count or not?
You know, I'm pretty sure Marcus Simeon would not want to take the 0-2 count just because
the probability said that he had a better chance of reaching base somehow there.
That kind of example, which is not an isolated occurrence, that happens quite a few times per
game. That's the one where, to me, there are a lot of probabilities that you display that I don't
bat an eye at, and they look like they are completely reasonable. And sometimes they should surprise me, I think, because if it never surprised me,
if it never told us anything that we didn't think already, it wouldn't be very useful.
But that just seems like a core baseline thing about baseball,
that as the count gets more favorable to the batter,
then the outcome of the play to print should get more favorable to the batter, then the outcome of the play to print should get more
favorable to the batter. So even if that hasn't been the case very recently for that batter or
for that pitcher, that just seems to me to be outweighed by the larger pattern of how we know
the league as a whole performs in those counts. So here's my question to you. Would you rather see
a prediction for Javier versus Simeon, or would you rather see what happens typically with the league? Because we do have both. And you're right,
the way you described what should or what would happen average, on average, you're exactly right.
That is what our minds think, because that is the average in baseball, 100%. But what you're talking about from the first pitch received to
0-0 to 0-1, what you're talking about is a difference of 9-10%. And so if the odds of the
single went up 10% and other things held, something has to go, but the reach easily can go up 9%,
10%. You're not talking about it went from 5% to 90%. I would argue if it went from 5% to 90%,
wow, there's something really unusual there that doesn't follow baseball. But you're talking a 9%
shift. And a 9% shift, in our opinion, and over the millions of pitches we've analyzed, 9% shift is not dramatic or drastic.
It actually does explain this matchup, what's happened in their past.
And we think that's pretty darn interesting.
But we've also offered additional content that says it went up 9%.
It went up 9%. It went up because he has the higher odds of getting a hit based on performance over such and such time frame.
Right.
I guess I would certainly rather see tailored to those particular players, but I would not expect that it would be so dramatically different that it would go against that league-wide trend.
that it would go against that league-wide trend.
Like if you were to tell me that Semyon was particularly well-suited to match up with Javier and that maybe the expectation here should be higher than I would think based on their respective past performances,
I could buy that and maybe the model would be picking up something that I'm not aware of.
I think the sticking point for me is that you have
these movements within the count for these two players where, you know, if you were telling me
this is what has happened in the past, sure, but this is supposed to be a prediction, a probability
of what will happen next, right? So there's some balance there where maybe you are taking into
account what has happened in the past, but are you taking that into account so heavily that it is swamping what we know about what typically
happens between players? And that's the thing that gets me because if there were a higher or
lower than expected probability on, say, an 0-0 count or something, that's okay. But then what
about that matchup between those two players would actually give Semyon an upper hand as, you know, the count gets less favorable to him, you know, as he's taking strikes.
If he's down 0-1, he's down 0-2 now.
And we're saying that he has a better chance to get on base.
That is the part that perplexes me just kind of based on, you know, any matchup, really.
that perplexes me just kind of based on, you know, any matchup, really.
Well, Ben, I think it's always going to perplex you for a couple of reasons because you know baseball so well.
You know, you're an esteemed author and have a podcast on it.
So you understand intuitively how the league performs in general.
But what you might not know is Simeon versus Javier,
their recent, in the course of 15 seconds, their recent past,
how, I mean, how they've been performing. And quite honestly, I don't see anything in existence
that could tell us that in 15 seconds. Again, now, if we dove into the splits post-game,
they start, they support this. Because by the way, the splits is built on the data,
which is built on the same data that we build our models on. And so it does make sense,
but you're always going to scratch your head because we as fans are not attuned to A,
talking anything outside of stats. We talk stats, but we don't talk probabilities. And that's
a gap that quite honestly, we haven't spoken in that method before when we're describing baseball. So we've got a new method. And we've also got something that's doing pitch by pitch. And in football, we do play by play. And, you know, in basketball, time segment by time segment. It's a new way to experience the game. It's built on data. It makes sense. It's proven to do very well. It's just you're
always going to scratch your head until you're used, if you ever are used to this way of looking
at it. And, you know, I could add, I read a really great Fangraphs blog, I think it was written a
few years ago, in which one gentleman was writing about, he did a very simple predictive modeling algorithm
where he took some past data. He came up with the probabilities, very similar to what we do. I mean,
the methods were different, but he came up with probabilities for the outcome of the at-bat with
each pitch. And then in order to communicate that, he took those same probabilities and he turned
them back into stats to help people
understand. So I think we just have a different method of talking, but I think this is very
helpful to folks who probably don't always understand what the slugging average is or OPS.
Sure. So I guess I have a couple of questions about that and we don't need to dwell overly
long on the semi and at bat, but I guess part of what we're trying to suss out here is how how much weight is is relatively small sample recent performance
actually being given in the model here because i just listened to your explanation you're right
that we are people who deal with this stuff every day and are familiar and comfortable thinking
probabilistically. And
I have to tell you, I don't come away super satisfied with that explanation that you just
gave. And so I guess I'm curious about the data piece of it and maybe more broadly sort of how
you guys are thinking about data in the model that might be true, but that might not be sort
of meaningfully true in terms of it being important to the outcome of any given plate appearance. And then, so let's start with that.
And then I have a follow-up question on the idea of this, you know, being for folks who don't
understand baseball particularly well, because especially if we are marrying that to hope that
they will then bet on these odds in some way.
That strikes me as kind of concerning if you have to, you know,
have the kind of experience that Ben and I have to intuitively know,
well, there's something off about that.
So let's start with the data and how much the recency of data
and how large that sample is and the role that that plays
in the model? Because I think that that probably merits some clarification here. So part of that
is in the secret sauce of how we do things. But I will tell you that hundreds of thousands of
pieces of data go into every single prediction, sometimes more. It's not taken lightly. And we
do look at like the past several years, we look at the past several months, and we look at the
past weeks, and we look at them all individually months. And we look at the past weeks.
And we look at them all individually.
And then we marry them together.
That's super important.
It makes sports sense.
It's not an area that I think can be easily explained over a podcast.
But one of the questions when we first talked to Ben was, my question was, we can explain
data science, the data behind it, the modeling and the methods for a long time.
And we think that's really interesting and happy to do that in lots of settings.
In fact, I said while I was in New York, why don't we I'm happy to meet.
Let's grab a whiteboard and I'll go through all the details.
Sign an NDA. We'll go through all the details until I can convince you.
But I'm going to ask you the same question that I asked Ben right now that that I asked him a few weeks ago. How can I convince you that it works? Like what methods can I use to convince incredibly stat savvy people like yourselves that this works outside of intuition? What would it take?
It might involve a thorough enough explanation of the model such that you wouldn't be comfortable doing that from a trade secret perspective.
But I think that part of what might be helpful here to swaying people that this is actually
describing probability in an accurate way, because we know that not everything you predict
on the screen is going to come to pass.
We understand that.
But this approach doesn't seem to really speak to a very robust and longstanding body of work around what we know
to matter in baseball in terms of the likelihood of a particular player getting a hit versus not.
So I think part of what we are maybe reacting to here is that this seems largely divorced from an established literature
that would maybe have raised some red flags about, say, the probabilities we saw in the
semiannual bat, for instance.
Yeah, I guess if I could hop in, I know that there are a lot of people who might just object
to the presence of probabilities on a baseball broadcast at all, you know, whether they were
correct and accurate or not. They just might not like that.
We aren't those people.
Yeah, we are not those people. So we're kind of like in your target demographic here. I think
not so much with the wagering maybe, but at least with the concept of real-time probabilities on a
baseball broadcast. When I first heard that these telecasts would be displaying these stats, I was
pretty intrigued and excited to see them. And, you know, I've even kind of been on the other side of it. Briefly, saw the blowback to that from people who questioned the point or the practicality or,
you know, how can you predict anything about baseball or it was wrong in this one specific
instance. And so therefore it must be broken, that kind of thing. So I'm sure that you've gotten
that kind of feedback. And I sympathize because that's just going to happen inevitably, I think.
So the ones that kind of raised the red flags for me are the ones that seem to run contrary to
my understanding of the sport. And I guess when it comes to the question of how I could be
convinced, I guess it kind of comes back to the old Carl Sagan extraordinary claims requiring extraordinary evidence.
To me, this would be extraordinary to suggest that a batter would be less likely to reach base as balls are thrown to that batter, for instance.
So I guess, you know, the only thing that maybe could convince me is if you were to basically open the books, publish all the validation you've done,
all of the previous predictions
and shown that they work.
And I know that you may not be able to do that, but.
Such good news in that we are planning to publish.
Like all of this will come to pass
as we go forward within Venue.
So we look forward to publishing
because we're really onto something here.
And I'm a huge respecter of the work you guys do, the things that you've written, Ben, in particular.
Huge, huge fan of what you do.
And, you know, another one of the questions that I asked you when we first talked was, in addition to what would it take to convince you?
Then the next one was, how much grace do you give us?
If you saw, like, how many pitches are there in a baseball game, right?
Like 300-ish, give or take, right?
For 300 pitches, if you're watching the entire game and it's up for every single pitch, how many times of scratching your head, you know, like what's the rate of acceptance?
You get what I'm saying?
Yeah. And so we feel that we fall well into not just an okay result.
We feel that our results are solid and sound.
They're based on true data, just like the stats are.
It's just a different way of looking at it.
And, you know, I'm going to pick on the Simeon Javier situation again.
You know, when the AB the at bat started and we said
somewhere around 22 percent odds of reaching. Right. Do you know what the league average is
for reach on an OO account? Oh, well, it depends, I guess, what you're counting in reach, but
a little over 30 percent. Right. It's actually 30 percent. So and when we do that, we take all
the OO accounts for several years, and we say,
how did baseball perform? And we came up with, I think I rounded, but 30%. So if you're questioning
the shift of moving up, you should question at first, why is it so low? Why is he being predicted
to get on base at a 22? And we do the sum. A reach is the sum of a hit, a walk, hit by pitch,
or anything that gets him on base.
If you're questioning the transition of an 8%, 9% shift later in the
progression of this at bat, you should question this first.
And by the way, if you look at Simeon's batting average this year,
it's not so great, right?
I think he, at the time, he was a 175.
He's doing terribly, yeah.
So that right there should tell you we're good at a no-o-count, right?
Like his batting average was 175.
We predicted 8% lower than the league of getting on base because he's doing bad.
So we're starting from a solid place.
Yeah.
And now when the next pitch comes in and we see, oh, the pitcher's giving up some hits and we put some weight on the pitcher, we put some weight on the matchup.
And I realized that that that that that that can be questioned. This is fair. All is fair in data and statistics and interpretation.
And so it went up. It's just what the data told us. Now, by the way, the league average went from 30 to 24.
data told us now by the way the the league average went from 30 to 24 so your intuition was right it did drop six we went up nine it dropped six it's just the way it is we could tell you what
the league average is and you might be happy but you know if we told you the league average at 30
when it started and he's really doing so poorly you might be no i don't believe that so you'd
have a question the other way too yeah right but if it went down once he
had taken a strike that would at least be directionally consistent with what we would
imagine there you know i think that some of the odds in these moments are are striking on their
own but i i don't want to be overly fixated on that i think that i appreciate that this is all
being done very quickly and that sometimes we are surprised by odds and that doesn't necessarily mean anything bad, right? Like it is pleasant to be surprised
by odds sometimes as Ben said, but I think that where we are maybe chafing at this is that
directionally it doesn't make a ton of sense. And I agree with Ben, I would rather have odds that
are tailored to the particular players involved, but I think that I would imagine have odds that are tailored to the particular players involved.
But I think that I would imagine that a balance can be struck between doing that with a sufficiently robust sample versus being so fixated on this particular matchup or the recent results that these particular players have had that we kind of get turned around directionally on some of this stuff, which is where I think.
Let me clarify. We do not use tiny amounts of samples. We look at large samples,
medium samples, and small samples, and we make conclusions from them. But here's the most cool
thing about all of this. And this is why I really appreciate that you guys bring to this whole
table. We're discussing these numbers and we think that's a win for baseball, right? Like,
table. We're discussing these numbers and we think that's a win for baseball, right? Like,
do people really discuss that Simeon went from a 175 to a 180, then down to a 132 for his batting average over the course of this at-bat? No, but now we've got, it's a win for Apple, it's a win
for MLB, it's a win for those of us who love numbers. And by the way, time will tell, right?
Like you've been looking at probabilities and these transitions for, you know,
all of, gosh, not very long, right?
Like since April 8th,
which I think was the first on-air broadcast.
And I think it's going to take time to get used to.
It's going to take publications
and we're really, really excited to lead that charge
because we stand by what we're doing.
It's mathematically and data-wise very, very sound.
But altogether, we can't describe the full science in a podcast, but my offer stands.
Willing to sign an NDA, I'll fly to wherever you're at and we can take a whiteboard. And I
can convince you, Meg, that this is not based on small samples and it is based on science.
Yeah, I think it is definitely more the direction than
the magnitude for me. There are certain examples. I emailed about this before, the example that
you highlighted in the playoffs last year about Jorge Soler in a case where he hit a home run and
the odds had jumped up to 19% prior to that pitch. And so you cited that as an example of, hey, this seemed to be
picking something up about him. It had gone from, I don't know, 1% or 2% to 3% to 19%,
something like that. And that raised my eyebrows because I just have a hard time imagining that
the best hitter against the best pitcher in the most favorable count would ever have a 19% chance
to hit a home run in any particular plate appearance.
That sort of, it strains credulity for me to hear that.
I mean, you know, you could look at Barry Bonds or Mark McGuire in their record setting
home run years on those counts, and they did not have those kind of odds.
So, you know, that just kind of comes down to, I guess, the burden of proof, right?
That if you have made those kind of predictions, well, did they pan out or not? But it's not even just the extremes so much that kind of caught my eye because I do buy that sometimes they could just be lower than I would think or higher than I would think. And that could be reasonable and it could actually be helpful in perceiving something subtle that maybe we should take note of. But it is those directional changes within the plate appearance that kind of just calls
into question the other numbers, even the ones that to me seem perfectly reasonable
that just, you know, if it is also producing those ones that I just cannot explain those
movements in what seemed to be the direction opposite of what I would expect, that just,
I guess, cost me some confidence.
So, you know, I guess we could go back and forth forever.
No, I fully note your concern.
And by the way, we're going to take that
and we're continuing to make suggestions to the users of our data
and the media sense of how to take some of these quips
and some of these things that we have that live under the covers
and help explain that so it doesn't leave you feeling so confused. But again, it's an exciting opportunity to drive
fan engagement. It's a new day. There are a lot of things that Apple TV is doing really right
and keeping it crisp and clean, putting it down on the right. We look forward to a day when
we can help influence a little bit more information and tell you the cool things
about the whys of the numbers. This kind of feedback that we're giving about these particular
perplexing ones, are these things that you have heard from other people? I mean, I've seen on
Twitter some people pointing out similar things, and I don't know whether you would care to confirm
or refute this, but I also did hear from people who were involved in the White
Sox broadcast last year that they had had some similar concerns and had communicated those and
maybe had actually cut the trial short because of the concerns that they had about some of these
numbers. So is this something that you've encountered as you have made your way in the
industry here? Sure. And I'd love to talk about that because it gives a real sense of the style of adopting the numbers. So in the Oakland A's
trial, we did three games and we saw Dallas Braden et al really lean into it and they enjoyed it and
they made fun out of it. It was good. It was a very good experience. Now, Jason Bonetti,
huge respect for Jason Bonetti and
the whole NBC crew. They had some questions about why did an RBI change? And it was a valid question.
And we did a follow-up with the entire NBC crew, both the A's and the White Sox.
And Jason Bonetti's feedback was, and he was on the call, was, hey, love the numbers. It's good.
We understand that numbers are up and down. He didn't question the numbers, but what he said was, hey,
Invenu, I really want to know a little bit more information about the why. And so what we've done
from that September game to now is we've made sure that we have available those whys, and those are
in our API, and we're super excited to help folks explain
what's going on. But again, our mission is really to engage fans in the numbers.
We feel they're very good. We can clearly show they're good. And we're really excited about it.
But watching Twitter, that can be a sport in itself. We see a lot of great comments, and we see some rather troll-like comments as well.
We see people enjoy them.
We see people hate them.
And I think we're just seeing a new world.
You know, they're not necessarily a fan of all things in the broadcast outside of the in-venue contribution because it's different.
We're watching different.
They have to watch their Apple TV.
So we take all of that with a grain of salt, but we watch every single comment.
We poll, we watch, and we seek to improve.
And that's why when I asked you the question a while back, Meg, I asked you the same question.
What will it take to prove to you?
How can we improve your experience?
And how can we pass that information on to anybody in media that might use our numbers?
And we're always looking to improve.
So I guess the last thing, we don't have access to all of the history of the predictions you have made.
And so we can only go on what's out there.
There is a writer, Ben Clemens, who's done some research just based on the games that have happened thus far and has gathered the probabilities and just tried to compare them to the most simplistic model that he could come up with, which is basically just using the league average outcomes for that count. count, so not even anything batter or pitcher specific. And just comparing those predictions,
sort of the naive model to the ones that have been on the broadcast, and then looking at the outcomes,
he has concluded that the more naive, the simplistic model with just the one factor
has been more accurate thus far through the games that we've seen, not including, I think, the very first week.
And I know that maybe the model has been adjusted as it's gone on, and it seems to have become more
accurate as time has gone on. And I've noticed fewer kind of eyebrow-raising, personally,
probabilities as time has gone on. But he seems to have found that even if you compare to just the
league splits, that that actually compares favorably to what has been published thus far.
And, you know, that's just going based on what is out there,
and we don't have access to all the things that you may have access to.
Well, I ask again, and I welcome that conversation.
So, you know, we've offered a number of times
to open the books with you guys and Fangraph specifically.
We welcome that.
As a technologist and with experience here, I would love to chat with Ben.
Perhaps he's the author of that article that I saw a few years ago.
But my question will always remain, what is accurate?
So as we watch this progression go from 22 to 31, and then it
plummeted back down to 20. And by the way, he didn't reach, he struck out. And if you watch
our strikeout percentage, which you didn't see, it also moved a variety of ways. And so,
yeah, there's not a hardcore answer to this. But again, we stand by our data.
We stand by the math.
We stand by what we're doing.
And we're so excited to share with the world a lot more about what we're doing.
But right now, this is where we're at.
And I would love to continue to chat and share more as we go along.
So I hope you guys take me up on that.
And I hope that we can build you know, build the relationship.
All right.
Well, thank you.
We do appreciate your coming on and answering the questions to the extent that you're able.
And I do support the project of making baseball podcasts more stat savvy and stat rich, as
long as the stats are sort of, I guess, have some basis in accuracy.
I guess the question is, you know, I think it can be beneficial to publish these sorts
of probabilities and can teach people things about baseball.
But I guess the question is, is it teaching them the correct thing or not?
And is it potentially turning off people if they're seeing something that to them does not
make sense for some defensible reason where they start to question every number they see on a
baseball broadcast? That would be my worry. But I am in favor of the probabilities being accurate,
and I hope that they are accurate and would enjoy them if I were more confident that they were,
I guess. But I do appreciate your coming on and talking to us today.
Well, thanks so much. Thanks for having me. And thanks for asking the tough questions. You know,
we welcome it. And again, mad respect for what you guys do, your podcast for Fangraphs itself,
and love to talk numbers. And like I said, we'll continue to talk.
All right. Thank you, Kelly. Do you want to recommend anywhere where people could find out
more information about the company or you or anything else before we let you go?
Certainly. Partnerships at InVenue.com. Feel free to shoot an info email over and we'd be happy to respond.
You can also find us on our website, InVenue.com. There's a way to submit your email so that we can reach you.
We're also on the standard social, so
feel free to use any of those methods. All right. Thank you, Kelly.
All right. Thanks, guys. Thanks, Ben. Thanks, Meg.
Thanks.
All right.
So you just heard us mention at the end of that conversation with Kelly,
a study that Ben Clemens of Fangraphs has done
that should be available now on the Fangraphs website as you were hearing this podcast.
So we figured that we should probably bring Ben on for a few minutes here to explain exactly
what it is that he did and make his methodology transparent. So hello, Ben.
Hey, Ben. How's it going? Hey, Meg.
Hello.
All right. And to be clear, when I reached out to Invenu initially and corresponded with them for the first few weeks at least,
I was not aware that Ben was planning to research this.
I don't know whether you were planning to at that time or when you decided to,
but I was just kind of initially, because of the probabilities that I had seen, kind of mystified by some of them
and then later learned that you were intending to do a study and some effectively wild listeners.
And I also did some data collection for that too. And you can see all of that data because
Ben is linking to it in his piece. So if you want to look at any of the individual probabilities from those past games, you can. But Ben, do you want to explain what your basic approach to testing these predictions was?
Yeah, the basic approach was pretty straightforward. I and you and some Effectively Wild listeners
just recorded them all. We just wrote down the count and the prediction for every pitch where there was a
prediction and we did some like methodological things about at bats where the state changes
during the at bats they change what they're predicting but basically we just wrote down
what was on the screen and then i didn't have anything to compare it to and you know as anyone
who makes predictions to tell you like it's not clear
what you should benchmark them to right yeah so i i ran what's called a briar score on the uh the
probabilities are listed on the screen and i got a number of i think 0.2 and yeah okay cool briar
score of 0.2 does that do anything for you guys not. No, not for me. It's not like the ice cream, right?
Right, exactly.
So what I decided to do was for some predictions where I could create a kind of simple dummy model of my own,
which is basically anything that I could model using just the count.
So odds of reaching base, odds of striking out, walking, getting a hit,
something that doesn't take into account
the runners on base.
So the NVENU makes predictions for RBIs
and grounded double plays.
And I'm not confident that I can test those
in a reasonable one factor way.
So all I did was take the league average
result probabilities after each count
on the day before each game.
So for an April 15th game, April 14th.
For an April 22nd game, April 14th.
For an April 22nd game,
I took stats through April 21st.
So league production through the day before the game.
So if I were guessing
using only the count on that day,
what data would I use?
And I would just use, you know,
league average after that date.
And not league average for the team
or the player doesn't take into account
the batter, the pitcher.
It's exactly one factor.
Incredibly simple model.
Yes.
I mean, I guess you could say it's got several dummy variables,
but anyway, it's a one factor model.
Like all I looked at was what the league did overall.
And I just ran the two of them through the same battery tests.
Cause that tells, that doesn't tell me, you know,
which one's better or which one will do better going forward.
It just says, how were these two at predicting what happened and also i recorded
what happened that's uh that's something i skipped earlier yeah i just also recorded what happened on
every plate appearance and so we had two different models they each made predictions i mean i would
say mine made them not particularly well it's's a very simplistic model on purpose. And then we just compared the results.
There are, I think, 2,077 pitches that the two both made predictions on.
And in aggregate, the one-factor model did a little bit better.
It marginally outperformed the end venue model in briar score over the 12 games we recorded.
new model in briar score over the uh the 12 games we recorded and it outperformed them pretty significantly when what you could do is one model sets the odds and the other model bets on the
outcome of the game based on those odds that's kind of a simplistic model versus model test that
i've picked up as a way of saying like basically if if you are making mostly good predictions but a few outlandish ones
that that's maybe not great for uh uh if you're gonna let somebody gamble against your odds and
i think the idea was kind of suggested to me because n venue is you know partially a sports
gambling thing and it's just a very natural way to test predictions against each other is say oh
you want to bet on it and it's pretty easy to set that up as well which which appealed to me both briar score
and uh this this betting against each other model test are pretty like intuitive briar score just
measures the average squared difference between your prediction and the outcome and the gambling
thing is
i mean i explained the exact mechanics of it in the article but it's straightforward it's just
you cut i choose for every play and yeah so the the one factor model it did very well in the
gambling against the other model thing it did better although not overwhelmingly so in briar scores and there's some
serial correlation there you know if i'm predicting the same at bat eight times because it's an eight
pitch at bat and then i could just get lucky once i could be like the the one factor model could be
wrong but you know we only have one observation that counts 10x in our sample. So I ran a separate subset of just 0, 0 counts.
So that's a zero factor model, right?
Like it's only 0, 0 counts.
So mine, just like the dummy control model
just gives league average every time
because league average is after 0, 0 counts by definition.
And on these 0, 0 counts,
there was still a slight briar score advantage for the
league average model and a slight positive
gambling return to the count
only slash no information model.
Not as significant as all the
counts, and I think that has to do with some of the
wrong way stat movements that you guys
talked about in the
interview, but I think there was
a pretty significant, not statistically
significant, I mean, honestly, I did not try to measure the statistical significance of this because it's some odds on
the screen and they could change next week and i don't feel like i have any you know any
understanding or any ability to predict how their model is going to evolve and change over time it
does seem like it's gotten better in the in the six weeks of games that i scored i should mention
there have been seven weeks of games we only did six because one set of Apple TV Plus games was a day after the first day of the season.
Right.
And that's kind of weird for season to date stats.
So I just threw it out because I didn't think that I could actually come up with a...
I think it would be a little bit unfair to use any stats that hadn't occurred yet.
Yeah.
For example, like on April 8th or whatever it was, I didn't...
Like we didn't know and their predictive models didn't know the ball would be dead.
And so I just tossed that one out.
I think that's, you know, probably the right thing to do.
And it seems like there were still some kinks with getting the integration going at that point.
Like some of the odds are coming up late at that point.
And that's very understandable.
So I thought it was just easiest to toss it out and kind of go from there. And is that 2000 plus pitches? I mean,
is there any way to say whether that is sufficient to reach any kind of conclusion about those
already published predictions? Like, obviously, we have no idea if they will refine their model and
improve it. But is there any chance that over that span that compared to your extremely simplistic model, the lack of overperformance of that model is not telling, is random, is just by chance?
I mean, how much can you conclude from what has happened so far?
I'm not a statistician.
I should probably mention that from up top.
But basically, no.
That doesn't really say anything about the go forward.
Like you said, we're looking at what comes out of a box, which is very different than the workings of the machine.
It's entirely possible that just through sheer chance, the types of situations that were testable against my one-factor model happen to be the things that they do worst at.
And that if we had looked at a different subset of batters and pitchers or a different subset
of counts that had come up, they would have done better. There's just no way of knowing that. And
I don't want to make any claims about what that means going forward or anything like that.
What I can say is that the predictions that have been on screen so far are, I think,
pretty clearly have not been as accurate as just using count-based predictions.
That, yeah, again, that doesn't say anything about anything other than that.
Just they haven't matched up to the count-based predictions as of yet.
Well, and I think that part of our reaction to all of this, well, I'll speak for myself.
I think part of my reaction to all of this is the use of it. Like if we care about stats being accurate and one of the great sort of
challenges of our collective lives is having to explain probability to people who aren't super
well-versed in it. And I don't mean that in a snarky way. Like I think that part of the power
of what we all do with baseball analysis is that it can illuminate probabilistic thinking in a way
that's useful beyond just
baseball, right? And so if this were less accurate than just account-based model and it were being
purported to be as accurate as it is being purported to be, like, I think that would still
bother us, right? Because we like things to be more accurate than not. And because we'd like
to stop having to tell people why our playeroff odds are fine, actually, for instance.
But I think part of why I find this so flummoxing is that, you know, there are going to be people who potentially make decisions about gambling based on the odds that they're seeing and might be doing that with less complete information than they think they're making those decisions based on, which, you know,
we can debate how much that matters relative to the decision to gamble at all. But it seems like
if you feel like you have information, you might make bolder choices than you would if you didn't
think you had information that was, you know, more accurate than just an account-based model.
So I think that's part of why this rankles for me.
Yeah. I think for me, I mean, the way the odds are currently, I guess for one thing,
I would be sort of surprised if they are used in that way, just because if there were actual money at stake here and given Ben's results, it seems like this would not be the house winning
in this case, at least so far. Yeah, that's a fair point.
I mean, if anything, maybe it could convince people that it is easier to bet on baseball than it actually is in general.
Like I'd be tempted to bet on baseball if I could bet on some of these very perplexing probabilities.
I think also, yes, it is something that could color people's perceptions of just the use of stats and probabilities and sabermetrics in general. Like people who see these things on the screen, something that really clearly doesn't seem to make sense even to us, and then use that to kind of cast a wide net and say, oh, the stat nerds don't know what they're talking about, right?
Because I think this could be an educational tool. It could enlighten people. It could illuminate things about baseball.
But if it is doing the opposite of that, then maybe it makes it harder to get predictions or
probabilities on the screen the next time around if there is a model that comes out that maybe
doesn't produce probabilities that are confusing
in this same way. So I think that's the thing for me. The reason that I started doing this is
because I watched the NBC Bay Area telecast last year and I was like, this is awesome.
I have been waiting my whole life for them to have this in a little box on the side of a baseball broadcast and not every
pitch but i love that like it pops up and you can see like a ball just came in and like wow his
chances of getting a hit are going down even though he got a ball oh but his chances of getting a walk
going up i loved it it was like it was one of the it was a thing that i had not seen on a broadcast
before that i thought was really cool and so when apple started doing it i thought oh wow that's
neat i wonder how good these predictions, wow, that's neat.
I wonder how good these predictions are.
You know, that's kind of hard to tell.
It's one thing when you see it once and I was like, oh, that's cool.
Like, I'm glad they're showing probabilities.
I think in probabilities a lot
and I love that they're putting them on a broadcast.
And then once they were there every pitch,
I thought, ah, you know, I like testing things.
Yeah.
I wonder how good these are.
And it's really hard to make predictions what is it neil's
war said it's hard to make predictions especially about the future like that's that's very true i
would i would never purport to be able to make a model that could do better than my own count-based
model i think everything i added would just make it worse and so i'd rather than saying i wonder
if i can beat this i just thought ah i wonder how good it is and i don't know i i think that the the endeavor
of trying to put odds to what's going to happen and like tell a story of what might happen next
can be very interesting narratively i do worry that uh if the odds are kind of counterintuitive
on their face that people are less likely to say oh i like this and more likely to say why are
there so many numbers on the screen right and so that was kind of my initial impetus is like, I think odds are cool. And
I would like us to use them in more places in life. And I wonder if this is a good place to use them.
Yeah, right. I tuned into the first Apple broadcast specifically to see this really,
because I thought it was kind of a cool idea. I understand if people aren't that into the concept regardless of whether they're accurate or not,
just because, you know,
like if you don't need to see the probability on every pitch,
I get that.
Like, I think that it is perfectly fine to say
this does not enhance my enjoyment of the game.
Like maybe I already have a sense
of what the probabilities are
or maybe just seeing them
and then seeing different results happen,
it just like cheapens it in some seeing different results happen it just it like cheapens
it in some way or it makes you just feel like oh this is one trial that i happen to see i don't
know maybe it would actually affect people's enjoyment of the game in the other direction so
i'm not saying that like if you are a stat head in general that you will love this concept i don't
know that you need to it's just that the idea of
things being published that maybe reflect poorly on just the endeavor of doing it at all. I guess
that's the thing that gave me some misgivings about this. Yeah. I mean, I think that putting
probabilities to a broadcast is an interesting narrative tool. And Meg, we've talked about this
before about playoff odds. If you give something 5% playoff odds and somebody makes the playoffs, that doesn't mean,
man, these odds were just terrible and bad. I mean, something awesome just happened.
Right.
And that's the way that I kind of approach all these is, hey, if the odds look pretty reasonable
from some kind of outside test, and we ran these tests on our playoff odds to kind of see how often the things that we predicted happened. And they did pretty well.
Yeah. Like I find odds to be more useful as a narrative tool than as a, like, here's what's
going to happen because they're not what's going to happen. Right. Like someone is not going to
reach 20% of a base. There's just no chance. Like, I don't know if his arm gets there or something.
Like either he'll get on base or he won't.
What is on screen will never happen.
But giving you an idea of like, hey, this is an easy situation to get on base.
Hey, this is a hard situation to get on base, et cetera.
I think it's quite cool.
And just to give people a sense, because we really fixated on that Semyon plate appearance,
which it just was one that Kelly said she had looked at the numbers for and it was last week. So we focused
on that, but that's just kind of a microcosm. There are many, many cases like that, that same
sort of problem or what seems problematic to us. And you actually calculated how often that has
happened and it has become less frequent as time has gone on, which I assumed based on my watching and seeing it a little less
often, but it's still not uncommon. Like just to give people a sense of how many times per game,
roughly, that kind of quote unquote wrong way movement within a single plate appearance happens
how often. Yeah. So I should mention that this is only for things that are kind of absolute,
as in the
odds of a hit don't need to automatically tick up when you take a ball.
Yeah.
The odds of a plate appearance ending in a hit after 3-0 count on just a random day I
picked were 10.3%, and on a 0-0 count, they were 20.5%.
So they actually halved going to 3-0.
So this is only for things like reaching base.
Reaching base is monotonically increasing.
The more balls you take, the more it goes up.
The more strikes you take, the more it goes down.
And that's also the case for strikeouts, walks, and outs.
So for only those four, I looked at times where the count ticked one way
and the odds of success or failure, as it were,
ticked in a way that is counterintuitive.
And there were 18 per game on
average out of you know 270 possible pitches on average and number of tracked pitches that i was
comparing for these is about 150 per game so i don't know 10 of things or so but if you exclude
the first two weeks of games it's only 14 per so it's it's definitely
sharpening up uh over time and it's not like those odds can't be correct you know those aren't those
aren't necessarily incorrect but they definitely do kind of raise my let's think about this one
more radar a little bit yeah yeah i'll give give a few examples here because I think people probably have not been paying
as close attention to these things, I would assume and hope for their sake, but just a few
that stood out to me and that I had shared with in venue weeks ago in some cases. There was a
White Sox-Rays game, second inning. Eloy Jimenez was batting against Drew Rasmussen, so this is a right-on-right matchup.
And when the plate appearance started, there was a 52.7% reach probability.
Or I guess that was with a 1-1 count maybe.
And I was kind of like, whoa, because Eloy Jimenez's career on base after 1-1 is under 300.
You know, he's not exactly Ted Williams.
And this is right on right.
Then he takes a ball.
It's 2-1.
His reach base probability falls to 28.8%.
And again, in real life, you know, on 2-1, his OBP is 80 points higher.
It doesn't go down. So that's the kind of thing that drew my
attention. Or, you know, fourth inning of a Cardinals-Reds game, T.J. Friedel, a left-handed
hitter, had a 61% chance to reach base against a left-handed pitcher with a 1-1 count. And no
offense to T.J. Friedel, but he's not peak Barry Bonds. And I don't think even peak Barry Bonds had a chance
that high against a lefty probably. So then he took a ball and his probability of reaching fell
to 29%. So that same kind of thing, like even in that game, Corey Dickerson was batting for the
Cardinals. He had a 41% chance to reach base with a 3-0 count, okay, sounds possible. But then he takes a strike and his probability of reaching almost doubles to 83%, which is
like, what?
So again, this is going back a bit.
And I think that kind of thing, which is maybe more glaring, they've cut back on that somehow.
But even in more recent games, I'm still seeing that sort of thing.
Like I think a late April game, maybe it was, there was a Yankees game. John Carlos Stanton
came up in the first inning. He had a 36% reach base probability before the first pitch. Fine.
Then he takes a ball. No, not at all. But then he takes a ball and that falls to 30%.
He takes another ball and it falls to 28%. So he's up 2-0, but his chances of reaching base have fallen since the plate appearance started. And then he swings and misses and suddenly his reach base probability goes way up to 47%. And then he fouls off the ball and it goes down to 24%. And it's just like, it can't move in both of those directions.
I mean, there's nothing saying it can't.
It tends not to.
Right.
But I think that one thing that is important to be careful about in these things is you should do the math.
Yeah.
Yeah. Like, I'll give you an example.
On that semi and at bat, and then you actually outperformed very slightly.
Actually, underperformed very slightly.
I just looked at it again, and it was quite close.
very slightly i just looked at it again and it was uh it was quite close they actually did about the same as the naive count based model in both briar score and gambling because they they got it
right on zero zero they were shaded the right way then there were some counterintuitive counts
and they did worse on those because the count was going up while he was taking strikes and so that
didn't do well for the fact that he ended up striking out. And then they, again, shaded him lower than the league average on two two counts.
And that worked out again.
Like you can get an okay overall result, even with some of these like, oh, that's kind of
a strange direction to tick in if your initial kind of prediction was pretty good.
Right.
And now in that game as a whole, the two models did almost exactly the same.
But I guess my point is it's really tough to just look at examples and say, like, oh, this can't be right.
This has to be wrong.
That's one of the things that is so interesting to me about this is it really is tough to evaluate these models.
And, like, you kind of want it to be telling you counterintuitive stuff.
Right.
Yeah, exactly.
It's like the Bill James, I think it's often attributed to him.
Yeah, the 80-20, you know, any useful stat should surprise you 20% of the time or else
what's the point?
So, but if it's surprising you 80% of the time, then maybe something's wrong.
Yeah.
So, you know, there was another case like Darren Ruff was facing Aaron Sanchez in one
of these Apple games and he came up in the, and he had a 39% reach probability, which kind of caught my eye because in the third inning, same batter-pitcher matchup, Ruff started with a 28% reach probability.
So it went from 28% to 39% in the next plate appearance.
Now, OK, maybe he's facing him again, and so Ruff has a better chance of getting on base. But a 40% increase seemed like a lot to me. But then it goes from 39% to start that plate appearance to 24% after he takes a first pitch ball. Then he fouls a pitch off and it goes to 32%. He takes a called strike and it goes to 27%. So with a 1-2 count, he had a higher chance of reaching base than he did with a one oh count like that kind of thing you know one thing is if you keep making predictions like
that and they aren't borne out by the data then eventually we'll just see it because you won't
do as well at predicting those counts in the long run but it is more important i feel like to get
the initial batter pitcher matchup right uhup right always than the direction things go.
Like the Sun Yang one is a great example where, you know, yeah, the movements were kind of weird.
But by just shading right in the first place, they more than made up for that.
And I think that's one of the things that is so interesting and promising to me about this kind of idea.
And I mean, I am not a data scientist.
I am not someone who is well-versed in machine learning.
I dabbled in it for some of my data-driven hitter predictions this year.
But like, you know, not fancy models like this.
I'm just using off-the-shelf stuff you can put in Python.
It's a little bit easier.
Like kind of some of the unavoidable fact of life with this
is that it's going to be black box-ish.
If you're running a bunch of different models and having the computer decide which of those is the best and then using its outputs, if it works, we will probably think something weird is happening a lot of the time.
Yeah, right.
That's the thing. that I mentioned where he goes from like 1% or 2% to homer. 19. Yeah, to 3% after the first ball to
19% after the second ball. Now, you know, that confused me because A, how does it go up one
percentage point after one ball and then it goes up from three to 19 after one additional ball,
but also just anyone having a 19% chance to homer in a plate appearance, even starting up 2-0. Like Barry
Bonds, the year he set the record, he homered in fewer than 7% of the plate appearances that he
started up 2-0. And so did Mark McGuire the year he hit 70. Now granted, those guys walked all the
time, but still 19% is astronomical. It's high. I bet you that, well, I can tell you that in all
of the recorded data that we have this year, there have been no 19% home run percentages. So that's kind of, you know, aiming at a past target.
Yeah. I picked that one out only because Kelly cited that one specifically as an example that
the time it worked. Yeah. I believe that, yeah, that press release is actually quoted in my
article. Well, that kind of thing. It's like, if that were right, how cool would that be?
Yeah. I'll give you an example of a very cool one that yeah did work and was right so cole calhoun came up to bat
against christian javier in uh the astros rangers game and before the first pitch there was an eight
percent home run probability and that's enormous yeah it's triple the probability the naive
probability of a home run in that bat and cole calhoun hit up first
pitch home run and he just you know bought the brand of the park and that is something that you
might say that doesn't make any sense but it actually did make a lot of sense when you look at
the factors a little bit closer cole calhoun has power christian javier is home run prone they're
in houston like there was just a lot of stuff that kind of lined up that pushed it higher and having the ability to show that i think is really cool
yeah i think that i think that's a valuable contribution to the discourse about like
baseball yeah i think that knowing that if the announcer said like hey this is a great spot for
for calhoun like he's a lot more likely to hit a home run than
he is in an average plate appearance like you know his type of like his swing matches up well
against javier and javier is a fly ball pitcher and calhoun's like a like calhoun hits a lot of
home runs so right getting the ball in the air sounds good for him yeah or i don't know how
granular the data gets but teams are obviously looking at things like swing plane and pitch movement and comparing
similar pitchers and similar batters. And maybe there's something that wouldn't be immediately
obvious, but wouldn't be real. It's just, we can't really know from any individual example,
whether, oh, it was picking up on this actual real special proclivity to hit a home run in
this plate appearance, or it was just a weird one that happened to be right that time so you need
many examples of that to to see yeah and hence the uh hence the large sampling right and i'll be i
mean honestly i'm probably not going to keep doing this because it's a lot of work i did four it was
a lot of work yeah it took a couple hours so Yeah, so I'm probably not going to keep doing this.
Not because I'm not interested.
I am.
But just because, I don't know, I don't want to do this every week.
I watch the games live because I like the broadcast crews, actually.
And you can't pause Apple TV.
No, you can't.
You can't record it live.
The picture quality on Apple broadcasts is great.
Yeah, it's really good.
It's beautiful cameras and everything.
I love the design in general. I find the kind of color muted aesthetic really pleasing
and i like the the edm music that comes in and out of commercial breaks like it really gets me
in the mood that this is an event yeah actually like i'm really enjoying the apple tv broadcast
so far which is uh kind of i don't know if that's everyone's experience but i i found them uh quite Yeah. clarify one thing I mentioned. I had spoken to a few people with the White Sox broadcast that were
using this very briefly last year until they decided to stop using it because of some concerns.
And I won't quote anyone. No one went on the record because they didn't want to speak for
the company or anything, and they weren't sure whether there could be a future relationship
there. But the comments were not kind, I will say, but I don't
want to just lob anonymous critiques out there. But the one thing that Kelly did mention about
a specific case with an RBI probability, from what I was told, that was a situation in this
White Sox game in September where Johan Moncada was batting with Yasmany Grandal on second
base and the probability of an RBI was listed as far higher than the probability of a hit,
which confused the crew because it was unlikely, very unlikely, that Grandal would score on anything
other than a hit. And so they said that they communicated that concern and that they were
told that the algorithm was saying that either a hit or a fly out would have resulted in an RBI, which didn't seem right because there was almost no chance that Grondahl, a catcher, would score from second on a fly out, which happens extremely rarely, but generally not with a player like Yasmin Grondahl.
So that was that specific example she brought up.
But I think there were other examples there. I could play a quick clip of the very first pitch that it was
used on those broadcasts. With Next Play Live, a company powered by InVenue, and so they give you
real-time data of what possibly might happen on the next pitch. So we're going to play around with that a little bit here tonight. So one hundred two hundred three hundred factors that they
analyze to predict what's coming
next in a game.
So you'll see it in play here in
the second inning.
I know you have you have your
own hundreds of factors that you
use sometimes thousands.
All right.
So. So OK. So you've got own hundreds of factories that you use.
Sometimes thousands.
All right.
So, okay, so based on in venue and what they've got on this at bat,
what Moncada's done, the very small sample size for Riley O'Brien,
17% chance of a hit, 33% chance of a strikeout, at least right now.
That's 0-0.
First pitch a strike. How about 0-1? i think it's going to get worse for a hit i have to tell you
well it's better how about that plus nine 26 percent i mean one of the very difficult things
in doing something like this is if you have something that is like, I don't know, let's say, let's stipulate that it is better than a baseline at predicting what will happen next.
But it is also dumb.
Like it is just essentially dumb, right?
It doesn't know baseball, right?
Right.
It's looking at a big pile of numbers coming in.
Right. It's looking at a big pile of numbers coming in.
And if you don't specify like like enough initial things, it's going to make some things that are just evidently wrong.
Even if overall it's a it's a good predictor.
It's hard to put the right guardrails in. And I'm certainly no expert on that.
But, you know, these like little things where it's evidently wrong are not disqualifying.
But if you're not really sharp on everything else, they're certainly going to drag down your predictive accuracy.
And one thing that I think is an issue separate from anything that I did in terms of studying their accuracy and testing them against other models is does it take you out of looking at the odds?
And like that one did, right?
It took them out of looking at the odds? And like that one did, right? It took them out of looking at the odds.
And so that to me seems more important than whether you can test them against my odds.
Honestly, like most people watching Apple TV
don't actually care that I tested the odds.
No, I'm sure they don't pay any attention.
And frankly, the broadcast doesn't seem to pay any attention, right?
They never really mention it or explain it.
I caught exactly one mention of the odds on the broadcast. Yeah, they never like say, hey, by the way, we're producing these odds and this is how they work or this one is interesting. It's just there, kind of.
or that there isn't a version of this that would be really interesting.
I think one of the things that I appreciate about the way that just the aesthetic of the broadcast lays out
is that those odds aren't,
they're pretty unobtrusive down in the bottom corner.
I think that there is a version of this broadcast,
and I agree with you.
I've liked these booths generally
and have thought that they've done a good job.
I think that the potential exists here for folks like us who care about this stuff too,
provided that some of the kinks get worked out of the model
and so we are not driven mad trying to figure out
how it is that Marcus Simeon's odds of reaching base
went down after taking a ball,
that we can have what we like
as much as we want to engage with it. And folks who don't care
about that don't have to, you know, like it doesn't have to be the focus of the broadcast.
And maybe, you know, it would be to the benefit of both the people producing those odds and those
who enjoy them to like have the broadcast speak to them a little bit more and use the educational
opportunity in small doses so that they're not,
you know, beating people who don't care about that over the head with it, but are also
engaging the people who do and kind of sparking curiosity. So I think that, like, we have been
very down on this. And I do think that the, like, the potential gambling aspect of it does make me
nervous. Although I think that you're right that, like that like it's not it doesn't seem like it's particularly actionable right now so like what seems fine but
there there is potential here for something really cool we just have to well not the three of us but
like you know there needs to be improvement made to to sort of the whether it's the inputs to the
model or how those inputs are weighted or sort of which things
have an effect on a plate appearance,
but not knowing how they really matter.
You know, so there's work to be done here,
but there is still potential.
So that part is good, too.
I feel like we've been very harsh,
so I felt like I should say that.
One thing that is tricky, too, is you don't want to too much
just say and give it to the baseball people
and let them
right what's fair because because sometimes we're really annoying well but also like one of the
reasons to do this is because if you just let people say well that doesn't make sense get it
out like you do want to find new insights right yeah so it is tricky to say like you know just
just iron out the parts i don't like and keep doing it which is i think why it's more useful to look at it as look here it is against this like one
factor model here's how it did that's like what that tells you is that you would have done a
better job in the games that we watched predicting this if you just looked at the count no no broader
judgment on the long-term utility of it or how they should do it or how they should
build it i don't know how they should build it i'm not good at this stuff it sounds hard
but what i can tell you yeah it's just the study like that's that was my approach to it and i think
that's actually a good way to look at it it's hard to be good at machine learning that there's a lot
of people that go into that field you know yeah and i i certainly am not good at it i don't even
barely know what it is so the people we know who are good at machine learning and care about baseball work for teams.
Right.
There's a pretty hot market for them, in fact.
So I guess, yeah, I wouldn't claim to know how to do it better by any means.
I just think it's interesting to look at it and compare it to kind of a simple model.
Right.
Because I'm naturally suspicious of complexity in general. And so I
always like to compare things to the lowest possible common denominator.
Right. Yeah, that's the thing. I think in the grand scheme of things,
whether the odds are perfectly accurate or not on a baseball broadcast, we've outlined some
reasons why it may matter in some ways, and maybe it matters more if there's wagering involved. But just in general, I guess we're kind of, even in non-baseball arenas, we're just
bombarded by data constantly and advanced manipulation of data in ways that are too
complex for laypeople to follow.
And I include myself in that.
And there are a lot of examples where models are flawed in harmful ways, like
societal ways, because people design the models often. And so even if it is a computer actually
spitting out the numbers, like humans have to make some choices about what the model looks like and
how it works and that kind of thing. And there are a lot of cases where in
other industries, in industries where it might actually matter or have some real world effect,
people will come out with very flawed models and maybe they know the models are flawed or maybe
they don't. And there are serious ramifications and consequences that can come from that.
And it's also bad because then it maybe makes people think that they can't trust data in general and just be very skeptical about that, which it's
always smart to be skeptical probably, but also not to dismiss out of hand that there might be
some utility there. So I wouldn't want it to turn anyone off and say, oh, there's no useful
application of this and the whole premise and purpose is misguided.
So that's kind of why I am rooting for the probabilities to be right.
Oh, me too.
Yeah.
All right.
Well, thank you, Ben.
We will link to your study, of course, and where people can find you at Fangraphs and
on Twitter and so forth.
But thanks for doing the work and summing up what you did.
Thanks for having me on.
All right. Just for reference here, I thought I would read you the numbers in MLB entering
Wednesday's games here, give you the splits by count that we've been talking about in this
episode in case you're not familiar with what they typically look like. These are generally
roughly the same every season. After the first pitch of the plate appearance, there's really
no such thing as an even count. Sometimes we say that 1-1 is even or 2-2 is even, but not really.
Results-wise, every count after that first pitch favors either the batter or the hitter on a league-wide basis.
And these are all after the count that I'm going to say, not on the count.
So not a plate appearance that ends on 3-0, but plate appearances that start 3-0 and may end on 3-0, but may also
end on a subsequent pitch. And this will be in the form of TOPS+. So 100 is average. Higher than 100
means better for hitters. Lower than 100 means better for pitchers. So starting with the hitter
friendliest and going to the pitcher friendliest, after 3-0 as a 256 TOPS+, after 3-1 is 210, after 2-0 is 173, after 1-0 is 129, and after 2-1 is 126.
So those are the hitter favoring counts.
Then after 1-1, 85, after 0-1, 67, after 2-2, 58, after 1-2, 35, and finally after 0-2, 26 TOPS+.
And given a big enough sample, this is about as inviolable a rule in baseball as I can think of.
Just going to stat head, I looked for hitters who have a higher career on base percentage
when they are behind in the count than they do overall,
and I set a career minimum of 50 plate appearances when behind in the count,
and there are only six, six since 1988, who have a higher on-base percentage when behind in the count.
Two of them are pitchers, and of the other four, the highest number of plate appearances
is Bubba Crosby.
Bubba Crosby, who played from 2003 to 2006, he had a.255 OBP overall and a.258 OBP when behind in the count. And again, 94 plate
appearances went behind in the count for him,.269 total. So there is essentially no one who reaches
base more often when the count is less favorable toward them over any non-small sample. Just to
close, after we recorded the interview with Kelly, Meg followed up and asked InVenue if they had any
additional comment on Ben Clemens' study. Their statement is, we all know that in sports, player averages
can't paint the whole story. InVenue believes in going beyond the average to generate predictions
for each and every individual matchup and situation. Our team has run millions of regression
data points outside of the 12 baseball games aired during Friday Night Baseball that have
been included in this study. Our studies validate that our data is more relevant and accurate than an average.
We love talking data, especially around baseball. We look forward to reviewing any studies as we
prepare to release our own in the future. All right, that will do it for today. Thanks,
as always, for listening. You can support Effectively Wild on Patreon by going to
patreon.com slash effectivelywild.
The following five listeners have already signed up
and pledged some monthly or yearly amount
to help keep the podcast going,
help us stay ad-free,
and get themselves access to some perks.
Michael Vespi, Daniel Gonzalez-Stewart,
Look for Overlap, Eric Schropp, and Nick Holcomb.
Thanks to all of you.
Patreon perks include access
to a patrons-only Discord group,
monthly bonus pods hosted by me and Meg,
we'll be recording another one of those this weekend,
and a couple of playoff live streams later in the year, among other extras.
You can also contact me and Meg via email at podcastfangraphs.com.
You can join our Facebook group at facebook.com slash group slash effectivelywild.
You can rate, review, and subscribe to Effectively Wild on iTunes and Spotify and other podcast platforms.
You can follow Effectively Wild on Twitter at EWPod.
And you can find the Effectively Wild subreddit at r slash Effectively Wild.
Thanks to Dylan Higgins for his editing and production assistance.
We will be back with a probably newsier episode sometime soon.
So talk to you then.
Oh, even headed, yes.
You've got in our numbers.
Oh, even headed, yes.
You've got in our
numbers.
Oh. I'm your boss.