Effectively Wild: A FanGraphs Baseball Podcast - Effectively Wild Episode 1478: Multisport Sabermetrics Exchange (Soccer and Rugby)

Episode Date: December 29, 2019

In the fourth installment of a special, seven-episode series on the past, present, and future of advanced analysis in non-baseball sports, Ben Lindbergh talks to StatsBomb managing editor Mike L. Good...man about soccer and then professor and soccer/rugby analyst Dr. Bill Gerrard about rugby (27:20), touching on the origins of sabermetrics-style analysis in each sport, […]

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to episode 1478 of Effectively Wild, a baseball podcast from Fangraphs presented by our Patreon supporters. I am Ben Lindberg of The Ringer, and today we are continuing our seven-episode series, the Multisport Sabermetrics Exchange. This is episode four. If you missed the start of the series, we have already covered American football, basketball, hockey, cricket, tennis, and golf. Again, the goal here is to provide a primer on the past, present, and future of advanced analysis in each of these sports. And today we're talking about soccer and rugby, two team sports played on pitches in which one can kick the ball.
Starting point is 00:00:51 I guess one can kick the ball in many sports, but in soccer and rugby, it's something players try to do. So we'll start with soccer, and I am joined now by Mike L. Goodman, who is the managing editor of Stats Bomb. He co-hosts the Double Pivot podcast, and he has worked and written for most of the same places that I have worked and written for, including ESPN Insider, and Grantland, and FiveThirtyEight, and The Ringer. Hey, Mike, how's it going? It's going well. We really have overlapped in a whole bunch of different spots. Yeah. I don't know who's following who. I guess you probably got to some of those places before I did, and I got to some of them before you did.
Starting point is 00:01:25 Yeah, it's a solid mix. that lends itself very readily to statistical analysis, and the other end is, say, completely impenetrable and we can't figure out anything, roughly where would soccer be if that's the one in the 10? An eight, maybe? I mean, I'm hard-pressed to come up with another major sport that would be harder to do this with than soccer. So it's on the low end then? So baseball's a 10? Oh, baseball's a 10, yeah. Soccer is a 2 or something. Yeah, exactly. Okay. It's up there.
Starting point is 00:02:08 Uh-huh. All right. And what, generally speaking, makes it so difficult to analyze? Yeah, I think there's probably two different things that interact. The first is that it's a very low event sport. There are maybe 10 players in a season who will take 100 shots. And, you know, you compare that to the number of shots that a given basketball player would take, the number of at-bats or player appearances that a baseball player will have. It's very hard to get enough sample size to do
Starting point is 00:02:36 the sort of robust things that you need to do with the sample size to make firm statements that you're sure are true. So that's one part of it. And the other part of it is there's just not a lot of things that it's easy to count. So you have shots and you have goals and then you move backwards to shots. And then when you move backwards beyond that, it's very, very hard to keep track of and get, you have assists, okay. And then you're moving into things like passes to shots, passes total. Then you're moving to things like tackles, where you're contesting for the ball. And then you're moving to things like interceptions or challenges. And then at every step along that path, there's a lot of subjectivity that gets added in around the margins of what is or is not an action in those moments. So it becomes very,
Starting point is 00:03:20 very hard to build the sets of data reliably that you would then even need to start doing some of the more complex analytic work. Right. And then you've got the continuous motion and you've got a lot of players on the field all the time. Yeah. I mean, that's the other part of it. That's common to a lot of sports, I guess. But yes. Yeah. I mean, that's the other part of it, right, is everything I was talking about there were on what we call on-ball actions. And I mean, there's very little, even now, work that sort of incorporates a lot call on-ball actions and i mean that doesn't there is there's very little even now work that sort of incorporates a lot of off-ball positioning and it's necessary but it's really really hard to get there so can you give us a brief history of soccer analysis of the
Starting point is 00:03:57 sabermetric kind when it started how it kind of caught on if it did what some of the major breakthroughs have been the major the the charles reap is sort of a man who's known as the founder of analytics. And there's actually a very good podcast on FiveThirtyEight that you can do, you can listen to that traces the history of him. But when he was the first person counting these things, calculating things, it never, he had influence within the English game for a time, but it never caught on in any sort of a mainstream way. And then what you have is sometime around 2008, 2009, a company called Opta begins collecting soccer data, right? So for the first time, we have somebody saying there were this many shots in a systemized way. There were this much possession for this team and
Starting point is 00:04:45 this much possession for the other team. And so we start to see a little bit more then. Then somewhere around 2012, 2013, a bunch of people working in public. And this is probably, there's some work done in-house at Opta in very real ways. And you start to see around that time, some analysts move into clubs a little bit on the margins. There's still not much done publicly. And then around 2012, 2013, a number of people working publicly just sort of rip some hockey approaches directly from hockey and drop them on soccer. And so they take basically Corsi and Fenwick and all the things looking at shots that hockey does, right? Where they look at just like shot ratios, shot totals, and that's what moved to soccer. That gives you some decent information.
Starting point is 00:05:35 And then it turns out, which happened relatively quickly at this point, that a major difference between hockey and soccer is that soccer has a much wider range of shots than hockey does in terms of how likely they are to go in the net. And so what develops then is something called expected goals. And this is sort of what everything in soccer has revolved around since then. And there have been work done on it before. Is that just because the goal is bigger, the field is bigger? Or is it that simple? Or are there other reasons?
Starting point is 00:06:02 It's that. So, well well the big thing is headers versus feet for starters as well so there's a wider variety of how you're taking the shots but also because of the way possession works you can just there's a wider variety of positioning on the field not only for the person where the person is taking the shot from but the factors leading into that shot that the body positioning of everybody else are you through on goal no defenders in front of you do you have like four players between you and goal are you moving fast up the field have you had sustained
Starting point is 00:06:34 position turns out all of these things matter to some degree or another in terms of how likely a shot is to go into the net and so you you can build a model, an expected goals model, where you just sort of calculate the chances of any given shot becoming a goal. And what I would say about these models is that they are not particularly precise to a given shot, but fairly quickly over data sets, they become quite good at both predicting themselves and predicting goals. And so that then sort of becomes the anchor that a lot of analytics gets built around going forward from that point. And that's sort of where we are now. I mean, there's stuff beyond that now, but that is the thing that has really finally sort of
Starting point is 00:07:17 trickled into the mainstream to some degree as a concept. And so that initial wave, that opta data, that's presumably people watching video and charting things. And is that initial wave, that Opta data, that's presumably people watching video and charting things. And is that all proprietary? What's publicly available? So the actual stats themselves, and I should say that now Statsbomb is a competitor of Opta's, so we also are a company that collects data to use. There are sort of legal reasons for this, but unlike in American sports, all of these data sets are proprietary to some degree or another. So in effect, these data companies are selling their data to teams. They're selling their data to say gambling outfits that want it. They're selling their data to leagues to become the
Starting point is 00:07:56 official data provider of a league, right? So rather than say in American sports where, you know, I think it was the NBA that took stats to court to preserve the right that like the basic stats of the NBA were the NBA's, the relationship is flipped. So a stats company can provide a league with official data. And how does this vary by league? Just because unlike baseball, even more so than baseball there are so many countries that play so many leagues so many different levels is yeah i mean like a sabermetric friendly league is there a spectrum there there's a lot going on i mean this is one of the challenges right of of doing data collection is that oftentimes you're looking at leagues that
Starting point is 00:08:41 maybe don't have complete video coverage. But yes, so there is a wide spectrum. Hopefully, if you're building models, the inputs account for that, right? They don't always. But I think generally speaking, publicly facing, right? If you're looking at, say, the top five leagues in Europe, that is sort of what I would maybe call like a first division cutoff. And while you can trace within the numbers, different patterns, different sort of what I would maybe call like a first division cutoff. And while you can trace within the numbers different patterns, different sort of tactics, strategies of how teams play, it may be like looking at college football, say, and saying one division – one conference really doesn't pass much and is still sort of a – God, all of my knowledge of NCAA conferences are years old at this point. It's better than mine, I can assure you.
Starting point is 00:09:28 But like, one will have a lot of spread offenses, one might have some very, you know, very specific running game attacks. So you will, but that stuff is fairly easy to see in the existing numbers. What becomes somewhat confusing or somewhat challenging is, okay, so how are individual talent levels going to translate from one league to another? Are you doing league adjustments? Are you hoping to build models that don't need league adjustments? And right, I mean, sort of inside the game, this is, I think, across sports at holds where you're sort of marrying scouting to analytics at this point. But obviously what you're hoping to do is find players with skills that translate from one league to another.
Starting point is 00:10:11 So you are sort of building, I guess, like league profiles of being able to average six shots, say, in the Netherlands is very, very different. That makes you a pretty good striker. But if you can average six shots in Germany or in England, that's a really elite level. And it's not going to translate one-to-one. Is there a difference in parks and setting? Do you need to do park factors because of surface or size or climate or any of that? People have looked into this because it is like baseball in that there is not an official park size. Sort of in America, for example, in MLS, NYCFC is notable for playing a Yankee stadium on a park that is tiny. And then you will sort of have people ask questions about, well, is the officially labeled size the same as the effective size that they're playing on?
Starting point is 00:11:00 But in Germany, and again, it varies league league to league germany they have standard park sizes people have yet to establish a concrete impact of playing on one size versus another i mean you'll get sort of common wisdom tales like for a long time tottenham hotspur played at white white heart lane which was a smaller stadium and a smaller field size. They've since moved. But people would say, well, the reason that they can't win big tournament matches at major parks like Wembley is that they are not used to playing on a pitch that's that big. And that nobody's really ever been able to concretely show that actually to be true. Right. Okay. So I know that Billy Bean obviously is associated with soccer too,
Starting point is 00:11:46 true right okay so i know that billy bean obviously is associated with soccer too but is there a billy bean and oakland a's of soccer that is most associated with embracing these ideas this is funny because it's fenway sports group who own liverpool okay uh-huh they have for a number of years implemented stats at a very very high level this guy by the name of mike edwards there uh and it is a funny story because several years, I mean, we're talking six-ish years ago now, notably, they parted ways with a manager. And on the way out the door, Brendan Rogers, this manager, talked to some media and media wrote up stories very unfavorably about the degree of control that the manager did not have because some people close to ownership, you know, were involved in decision
Starting point is 00:12:25 making. There was a very famous quip about they weren't real football men. Instead, they were sitting in their air conditioned offices looking at spreadsheets. Sounds familiar. Right. I mean, like, right, exactly. Everything that you hear in these in these conversations is everything you hear everywhere else. But eventually, after he left, the sort of analytic side ended up getting promoted to director of football. And Liverpool are currently well atop the Premier League and defending Champions League winners. And look, they are also running extremely hot. They have had three or four major transfers in a row that have just been sterling successes, and I don't care how good you are at analytics. But one of the things you understand when you do
Starting point is 00:13:03 this is you're never that good. So obviously helps running hot always helps but yeah they are they are sort of notable for that okay and are there major misconceptions that have been overturned is there like an on-base percentage equivalent of soccer so and the biggest one, again, comes surrounding expected goals. And it would be for a long time sort of common wisdom is that the way you are better at scoring goals is by kicking the ball or heading the ball better. And it turns out it is very, very hard to actually be significantly above or below average at finishing. You can do it on the margins, although to detect it takes years of work and a lot of guesswork and all these things because you have such small sample sizes. But the major difference is taking better shots. So teams and players that take better shots or prevent the opposition from taking better shots are the ones that really excel. It is much more likely that you will average, you know, the difference between a 15 goal scorer and a 10 goal scorer is not going to be that both of them have 10 expected goals and one is just good at scoring.
Starting point is 00:14:24 So scores 15. to be that both of them have 10 expected goals and one is just good at scoring, so scores 15. It's much more likely to be that one is going to have 15 expected goals and one will have 10. And I think this is something that is, again, it's slowly sort of working its way into the public consciousness, but is a pretty major change from, I think, the way a lot of people instinctively thought about the game. So has that been applied universally enough that it has changed the game, aesthetically speaking? Either you have like different types of players who are now favored or different shots that will be taken more often than they used to be taken.
Starting point is 00:14:54 It is hard to say whether this is a direct result of analytics or whether there is other stuff going on as, you know, the game changes a lot over the years. Tactics change. The ebb and flow of how teams are choosing to play changes. But what we are seeing notably in the Premier League this year is there are just less long distance shots, which tend to be bad shots. And there is increasingly an emphasis on getting the ball into the best areas to kick it with your feet and score. Like the best shots are close shots that you kick.
Starting point is 00:15:30 And then there are ways that you can create those shots, oftentimes using the width of the field to create cutback opportunities, right? So you get to the end line and pass the ball backwards into the center on the ground. And if you can create those opportunities, those are very, very high value opportunities. And for whatever reason, we have seen this season and the last couple of seasons somewhat decreasing shot numbers, but slowly increasing what I would say average expected goals per shot numbers for the best teams. Okay. And so is that or will that be perceived as a positive or a negative from a spectator perspective? Do people like long shots because, hey, the ball is flying through the air?
Starting point is 00:16:10 Or do they not like them because they don't pay off that much? I'm not sure what the answer to that question is going to be. There are a lot of – in baseball, I like the the sort of aesthetic question of like was it better when people were stringing hits together even though that was a less good way to score um is a very specific question like that's that is the question about aesthetics there are a lot of overlapping soccer questions about aesthetics because there's so much that happens without goal scoring and so for a long time there have been sort of debates over, is it good, bad, just, unjust to be a team that is very defensive and wins, right? Like, should you aspire to play
Starting point is 00:16:53 pretty soccer and pass the ball around and score goals that way, rather than defending and lumping the ball long physically? And the way that the analytics stuff of this plays in is a little bit that it doesn't fall prettily on one side of that divide or the other. You can be a high possession team that plays what would be classically thought of as pretty soccer and create really good opportunities. Or you can be a team that does that and doesn't. Same thing for counterattacking. You can be an efficient counterattacking team. You can be a non-efficient counterattacking team. So I'm not really sure that what drives the sort of aesthetic questions is going pointers and shooting at the rim going to break the game because it's just so much better than everything else? My hunch is there's just enough tactical variation in soccer and enough ways to skin a cat that you don't get there at least for a long time. Okay. And clearly the answer to this is probably it varies and it varies widely, but what's the level of adoption?
Starting point is 00:18:07 Does the typical team have an analyst? Are players contemptuous of this information or interested in this information? So I would say that it has not really gotten yet to a level where players, for the most part, are confronted with it. part are confronted with it. Inside teams themselves, at the top level, most teams have a thing that they call an analyst that doesn't pay well and that has really variable input into team operations. This is a little more tricky in soccer because there are really two fundamentally different paths here. One is, are you an analyst working with the play of the game? Because there are, of course, analytic applications to style questions, right? Are you working with the coaching staff to say, we should really get our players to stop shooting as much there,
Starting point is 00:18:56 or we should design training drills so even though we can't tell from data in games, we can tell internally who's good at shooting. Those kinds of questions. There's not a ton of adoption there. Some of the smartest teams do that. There's a little more uptake in the, you know, what we call the recruitment department, right? Where you're like identifying possible transfer targets for player acquisition. And because the global market for players is so huge and so varied, there's just a lot of low-hanging fruit. And so there has been somewhat more success just in terms of process, of analysts who go to teams having some ability to suggest approaches, suggest names.
Starting point is 00:19:39 But, I mean, certainly behind baseball, probably behind even basketball in American sports. And I just don't know enough about football and hockey to say comparably, but it's still at a pretty low level. And how is the technology progressing? Are we getting to player tracking, wearables, etc.? That is, I mean, it's not just soccer, right? Like that's the holy grail in being able to do that. Like that's the holy grail in being able to do that in basketball. Now, there are different reasons for it, right? In a sport like basketball or the NFL or football in general, it's about not having to have individuals log every action of every game.
Starting point is 00:20:18 In soccer, it's partly about that. But it's also about being able to track all the stuff going on off the ball which you just can't do by you know manual logging now my understanding of where the technology is now is it takes longer to do a game by video technology than it does for human beings to do it in terms of like quality control and scrubbing the stuff and like but every there are i mean there's a lot of movement in that direction and i don't know how close we are but that is certainly what everybody is chasing there's there's i should say there's one additional of movement in that direction. And I don't know how close we are, but that is certainly what everybody is chasing. I should say there's one additional hurdle there, which is like, again, whose data is it? Yeah. Socket just doesn't have the centralized agreements to install cameras in every ballpark
Starting point is 00:20:56 to pool that data and then to distribute that data. So that also is one of those things that once the technology is there, or that once the technology is there, or maybe before the technology is there, has to get sorted out. And what do we think matters off the ball? What can you do? What should you be doing when you're not actually near the ball? Yeah, this is a momentous question, because every step removed from taking a shot, our thinking gets more theoretical and less tied to data. We are pretty good at understanding probabilities once the ball gets close to goal on either side. But my colleague at Statsbomb, Tom Lawrence, gave a talk at our conference,
Starting point is 00:21:43 and he called everything in the middle the valley of meh. And it is just so hard to accurately value actions in that valley. Because what you were talking about is, okay, how can players move without the ball to create openings in the defense that will then lead to better chances, right? You know, how can we identify not necessarily only the value of the pass a player is playing, but the value of the pass is that he opted not to play in order to play that pass. And it is, I mean, it is, this is a major reason why I said at the top that this is like a one or a two in terms of the difficulty of analyzing the sport, because it is very, very difficult. You know, you can look at the end result and say, okay, clearly this team is doing something right. But when you then try to pry apart the pieces, it's very resistant towards like fine grained analysis in that way. But clearly there are,
Starting point is 00:22:35 being able to move the ball from certain zones to other zones is very important. Being able to take it from your own part of the field on the sidelines, where you will often have it and get it in the attacking third in the middle of the field is hugely important. The question is, and I think there are many ways to do that, but the question is, you know, prying apart are some ways definitionally better than others and are, you know, some teams, some players, some whatever better at executing those strategies than other teams. And we're pretty far from that still on sort of a rigorous basis. And because there's so much movement and travel over the course of a game, how does the sports science aspect play into this? Is there much of an emphasis on different
Starting point is 00:23:15 types of conditioning or pacing? Yeah, there's a lot of it to a degree that actually when analytics was starting to get even the smallest of toeholds in the sport, people basically assumed you were talking about sports science, right? People assumed you were talking about distance run and stuff like that, which is tracked. And the smartest teams, the best teams do a lot of really interesting stuff with sports science in terms of, you know, because the other thing is the biggest and the richest clubs, they also all have their own development academies, right? The talent pipeline is different than in American sports. So they can get kids from the time they're six, seven, eight, nine years old and do all sorts of training and monitoring and testing of skills and
Starting point is 00:23:55 abilities and stuff to really see what makes good players. And so the smartest teams do a good deal of that. And then there's the monitoring of the running and top speeds and distance run and those things, which, you know, have they been, I am not aware in the public sphere of having anything concrete to say that like, this is what, there aren't really public studies showing a lot of you, like a lot of bottom line usefulness from these things. But at the same time, I don't think you're ever hurt collecting more data in that realm. All right. And just to end, I guess, are there any big unanswered questions or future developments that we haven't touched on?
Starting point is 00:24:37 We've sort of touched on them. The major one is going to be, is there a way to systemically measure all the passing and midfield stuff that happens? And once we have tracking data, will we be able to integrate it into that? And will that reveal to us either easier ways of doing it or more efficient ways of doing it? Will there be obvious things in that data that once sort of the technical hurdles of collecting it and processing it are achieved, reveal themselves? Or are we going to be similar to where we are now, which is like, it takes just so much work to pry, you know, information out of the data that we have, right? So, you know, now we look at a lot of things like progressive passing, which is just like, who moves the ball up the field? Well,
Starting point is 00:25:29 how often do they do it? Do they pass? Do they dribble? All these things. And like, that's useful information, but we have not yet been able to tie it systemically into like a model of creating like the end result of getting wins or goals. And so the question is, is having more data of not on-ball information going to make that process easier or harder? And is it going to get us there eventually? Okay. Well, I guess I don't envy you because this sounds difficult, but on the other hand, I guess you have a lifetime of employment ahead of you because these questions are – One hopes.
Starting point is 00:26:06 I mean the thing that I'll say here is that it is – I'll make a poker comparison actually. Like when I think about soccer analytics, it feels a lot like pot limit Omaha versus like limit or no limit hold'em in other sports, right? that you are finding and executing edges, but they are smaller and more variable than they are in other sports just because there is significantly less information available to you. Got it. Okay. Well, you can follow all of this at Statsbomb.
Starting point is 00:26:36 Mike is the managing editor there. You can hear him on the Double Pivot podcast, which is also on Patreon because again, we are all on all of the same sites, including that one. So thanks for doing this. This was fun. Thanks, Ben.
Starting point is 00:26:47 I appreciate it. Okay, we'll take a quick break now, and we'll be back in just a moment with Dr. Bill Gerrard to talk about rugby. Going here to there How we get lost How we get lost Counting numbers in the air Counting numbers in the air Okay, so I am joined now by Dr. Bill Gerard. He is the professor of business and sports analytics at Leeds University Business School in the UK,
Starting point is 00:27:28 and he could be qualified to talk about any number of sports today. He's been a pioneering soccer analyst. He has worked with Billy Bean on multiple teams on the statistical analysis of soccer, but today he is here primarily to talk about rugby, But today he is here primarily to talk about rugby, which he has also been working on for at least 15 years in rugby union and rugby league, supporting teams and broadcasts. So he's done a little bit of all of it. So I'm happy to have him. Bill, hello. How are you? Thank you, Ben. Great to join you. So, yeah, as you've said, I've had my finger in a number of sports analytic pies over the years. So I will get to rugby in a second, but I guess since you have this relationship with Billy Bean,
Starting point is 00:28:18 can you tell me a little bit about what he was able to bring from baseball to soccer and what transferred over in terms of mindset or analytical principles? transferred over in terms of mindset or analytical principles? Yeah, it was just one of those friendships that grew up by pure chance in some ways. I started working in the area of soccer, initially looking at the financial background of soccer in the UK, looking at the financial performance of soccer teams. looking at the financial performance of soccer teams. And then as data became available on what was actually happening on the field, my attention moved to that. I'm a qualified soccer coach. That was my principal interest, really,
Starting point is 00:28:58 was trying to apply my data analytical skills to what was happening on the field and I was developing that approach just at the time that Moneyball was was published in 2003 and I I happened to be at the University of Michigan giving some talks on on my work and the prof who drove me to Detroit airport to catch my my flight said oh there's uh book trips come out it's you but baseball uh picked up moneyball at Detroit airport that Friday night I think about six o'clock and completed my first read by six the following morning and I was taken by it because here was someone who was actually doing what I was theorizing about and trying to develop. Here was someone who was actually applying data analysis in elite sport. partly because Billy had developed a real passion for soccer,
Starting point is 00:30:07 was a keen follower of the English Premier League and the UEFA Champions League and had persuaded others within the Oakland organisation to develop an interest and a passion for soccer and they'd acquired a franchise in the MLS for the San Jose Earthquakes and so Billy and I got together that basically the Oakland ownership were interested to see how far they could take what they were doing in sabermetrics and baseball how that could be applied to soccer so I worked with them as a technical consultant not with a team as such but with
Starting point is 00:30:46 mainly with Billy and the rest of the ownership group just to look at what was possible given the data that was now available that they could access as owners what data what could you do with the data in soccer so that that's that's how we we came to to work together and and we've kept up the friendship and the contact over the years and ended up subsequently through Billy I got involved with as Ed Altmaier the the Dutch soccer team whose general manager is there's a guy called Robert Eenhoorn who the keen baseball followers amongst you will know. He's a Dutchman who played ball and went to college in the States and got drafted by the Yankees back in the early 90s.
Starting point is 00:31:34 And Robert's now moved back to the Netherlands and in 2014 switched from running the national baseball organization in Holland, switched to soccer and runs AZ Altmar, which is almost an Oakland-type team in that they're a relatively small team within the top division of Dutch soccer. And they're trying to compete with three, by their standards, very large clubs, Ajax, Amsterdam, PSV Eindhoven and Feyenoord. And as Ed have had a history of trying to do things differently,
Starting point is 00:32:19 working within a budget and trying to take on teams that had much larger budgets. And Robert, through his baseball contacts, got in touch with Billy. within a budget and trying to take on teams that had much larger budgets and Robert through his baseball contacts got in touch with Billy Billy became an advisor to the club and One of his first pieces of advice was to get me involved so I worked with with Roberts and with the first team coaches in developing the the use of analytics over over a five-year period from 2014 through until the end of last season, at which point with a new head coach who's very analytically
Starting point is 00:32:57 orientated, we move things on in terms of taking on a full-time local analyst based within Altmar. And they continue to go from strength to strength and only yesterday beat Ajax Amsterdam to go level at the top of the table. So, yeah, it's been a great, it was a great opportunity. It's turned out to be a great friendship
Starting point is 00:33:24 working with and learning from Billy and being able to see, you know, firsthand how the Oakland A's use analytics and be able to explore how analytics can be used in other sports, particularly soccer. Right. And soccer, of course, is much more resistant to analysis than baseball. It's not structured in quite a way that lends itself to analysis so easily. So were there still things that Billy could bring over from baseball that were applicable to soccer, even if it's just general principles? Yeah, I think the key thing that Billy takes over into soccer is that mindset of that it's teams who are trying to compete with resource-rich arrivals and trying to do it through the use of analytics, that that's transferable so in some ways it's no surprise that the teams who've really grasped hold of analytics and within soccer haven't necessarily been the teams who you know have spent the most on it and have employed the most analysts the teams who I think got the most out of it are the teams who really
Starting point is 00:34:43 need it to close that resource gap just as Oakland have over so many years you know Oakland should be in terms of their budget should shouldn't be really anywhere near for most of the time that Billy's been involved they shouldn't have been anywhere near getting the playoffs and the record you, in terms of getting to the playoffs on a regular basis is on their budget to second to none. And that's why it's teams like, as I said, Altmer, initially in soccer in England, it was Bolton Wanderers, which was actually the team where I did my coaching badges. badges and one of the first team back in October 2004 in which I gave a presentation to the the coaching staff and the support staff on on Moneyball and its applicability to soccer and it's really been those kind of teams I think Manchester City, Liverpool and other leading teams in terms of resource have
Starting point is 00:35:42 spent a huge amount in terms of analysis but the teams I think you've had the real sort of benefit in terms of getting an edge have probably been the Bolton Wanderers at the time these days Burnley and you know as I said in in Holland I know firsthand that analytics has been a key part in in as ed alkmaar's ability to to compete so i think i think it's the mindset it's that approach of using all the available evidence not just the numbers using scouting reports using all the evidence that's available in a systematic way to support coaching decisions to support recruitment decisions. I think that's what you get from not just Oakland, who obviously have been pioneers,
Starting point is 00:36:31 but I think across a number of teams in the major leagues in North America, that very, very systematic approach to the use of data and evidence in general. So to switch over to rugby, one question I like to ask to start things off is to get a sense of how readily it lends itself to analysis. So on a scale of one to 10, where one is a sport that's completely impenetrable and impossible to analyze, and 10 is baseball, let's say, where would rugby rate roughly? I think rugby probably rates about six or seven in some ways. It's certainly more amenable, I think, to analysis than soccer.
Starting point is 00:37:19 Both are what I would call the invasion territorial sports. So where they're the sports, all the forms of football, of hockey, basketball, all those sports where control of territory positioning is absolutely crucial. And in a way, rugby is closer to, if you like, the baseball end of things because of the amount of structures. What's, in a sense, more challenging in soccer is the much more of freedom of play, the continuity of the play, that when the ball's turned over, the game continues and you get these multiple turnovers. In rugby, rugby's much more structured when the ball's turned over, the game continues and you get these multiple turnovers.
Starting point is 00:38:07 In rugby, rugby's much more structured with regard to the forms of the turnover so that it's more towards, if you like, the gridiron end of the scale of the invasion territorial sports rather than soccer. So whenever the ball's, some of the turnovers when possession moves from one team to the other involves a continuity in play but a large number of the turnovers actually lead to a restart in play and a very structured restart even with the ball being has gone out of play and is being
Starting point is 00:38:39 thrown in from the the side or that some form of penalty has been given away where the game restarts with what's called a scrum, basically a set piece where the two sets of players forwards engage, the ball's rolled into this scrimmage and it comes out and turnover starts. So you've got a very very structured game so being able to analyze the the tactics uh associated with those uh set piece uh restarts and and that gives you gives you a structure that that's missing within uh within soccer because of the free flowing and continuity within soccer. So can you give me a brief history of rugby analysis or sabermetrics, when it started and
Starting point is 00:39:33 how it kind of caught on to the degree that it has and maybe what some of the major breakthroughs have been? Sabermetrics, that data analysis generally within sport. Clearly, you know, its roots lie obviously in baseball, which with the pencil and paper methods, almost, you know, from the year dot, we've had pretty extensive performance data within baseball because of the focus, the structure around a one-to-one contest between the pitcher and batter. Soccer, the invasion territorial sports in general, lagged way behind because really up until the
Starting point is 00:40:17 late 90s, the only systematic information that was kept on soccer on rugby and other territorial games were who played who scored and basically who'd been a bad boy or girl in terms of discipline because that's all the information effectively that the governing bodies needed to run leagues and tournaments you know who was playing to to check that the players were eligible to play uh who scored to determine uh what the outcomes of the games were and then discipline me and to to determine you know again which players uh were uh had to be uh if you like enforced bench because of uh they for bookings and red cards so it really wasn't until the the late 90s that we started to get systematic data for games as companies developed either video systematic video
Starting point is 00:41:15 analysis routines or develop their own tracking systems. So that data really only started to become available in the late 90s within, for example, the English Premier League. But it was commercial data, it was data that was held by the teams, bought and paid for by the teams. So it wasn't data that kind of got out into the open publicly, other than some bits of that data being published in the media. So when I started putting together the data and the analysis on my own, I was fortunate that one company decided to publish a yearbook and they did it for four seasons for the English English Premier League so I was able to get you know the data on the total about 35 36 data variables on what these player what all the players did so it was season totals for about 36
Starting point is 00:42:17 variables and over four seasons so that I was able to aggregate those start to analyze on that fairly limited data set So we're really talking, you know, I had four seasons 20 teams each season so I had 80 rows of data effectively, but I started to to put together a model of what determined success in soccer and Started to then develop player rating systems so this was all in the period really sort of 2002 through to 2005 2006. at that point the data that was available as i said was primarily owned and held by the the clubs and there was relatively little use being made of it and the breakthrough only came you know gradually moneyball played a huge part and particularly the film i think the the book
Starting point is 00:43:15 in 2003 started to get some pickup in in soccer and within europe but i think i think the later film also helped uh it was easier to spend an hour and a half watching the film I guess for busy coaches than it was reading the book so gradually you found that teams were picking up on within soccer through the 2007-2008 Bolton Wanderers as I said earlier played a role in this. And you now find that a number of the staff that worked at Bolton Wanderers on performance analysis under Sam Allardyce in the period 2003-2007, many of them have gone on and now work at Manchester City and Liverpool so it's really no surprise in a sense that that those two clubs probably are the most developed but in rugby it was at the same timeline same sort of schedule and rugby there was again the data was available but two clubs very little was
Starting point is 00:44:21 being used and it was I was fortunate that I had an introduction to Saracens, who were a club that had been around for about 100 years and they'd won one trophy. They'd never won the league. And they were taken over by a South African consortium who installed as the director of rugby, Brendan Ventner. Now, Brendan had played for the, he'd been a player himself, had played for South Africa. In fact, he was a replacement in the World Cup final, the Mandela World Cup final in 1995 that was turned into the Invictus film and Brendan was a replacement that came on in that game but he's a qualified medical practitioner in fact he's got his own medical practice just outside Cape Town and he was appointed by Saracens and he very much took
Starting point is 00:45:20 to coaching rugby the same philosophy as he applied in medicine, evidence-based, and I deal with people, not diseases. I treat people, and I deal with players as people. And he brought that philosophy into Saracens, and we were introduced through a mutual friend, and I got on board in 2010. And at that point, there was virtually no data analytics being applied within rugby. There was the same sort of data available that had been available to soccer teams. But beyond providing some summary statistics, there was very little data analytics at all within Runkey
Starting point is 00:46:07 Bay Union. Saracens really broke the mold in that sense. And I worked with them for five years developing an evidence-based approach. Initially using data that was the coach's own data as they analyzed Saracens's own performances and then gradually we moved on to using the data, the commercial data that was provided in all games and I started to develop a more data analytical approach to opposition analysis. We also started to apply some data analytics to benchmarking the team for player recruitment purposes and so on. But that systematic approach to using performance data and analyzing that really just began with Saracens.
Starting point is 00:46:59 And as with Oakland, there's nothing like success to breed imitation. And that's what happened. Saracens went from being a team that had won virtually nothing to becoming the strongest team in European rugby. And as other teams looked to try and emulate that, one of the things they picked up on was the use of data analytics. But I think it also was more generally the culture that Saracens had.
Starting point is 00:47:31 They had a very supportive structure with regard to looking after players and the family of the players and creating a real togetherness within the club. In fact, when I first went there, walking into Sanderson's was totally different than going into a soccer team in that going into Sanderson's had much more of the feel of being involved in a collegiate team. There was that in terms of the the the atmosphere around the club it wasn't a club that was filled with superstars it it gave you the feel of guys who'd been you know who'd been good players at school who'd been good enough to keep on playing at
Starting point is 00:48:18 college and and extended that on so there was a real and a lot of the players, I should add, were undertaken, supported by the club, were undertaken further studies, doing postgraduate studies and so on. And the players also, led by their captain, bought into the use of the analytics. So it became part of the way that things were done. It wasn't just the use of analytics with the coaches to help develop game plans and help recruitment the the analytics fed through into the way that the players were coached and the and the uh the players themselves embraced the analytics the team captain uh who has gone on to become after he retired a coach himself uh coach himself was copied into all of the reports at a team level and so we had buy-in from the players as well as the coaches so it was
Starting point is 00:49:15 that kind of approach now is becoming much more typical across teams but we're now you know something like nine ten years down the line and as I said Saracens have been very very successful and that that has led other teams to try and emulate the approaches that Saracens developed. So in rugby specifically are there any major misconceptions that have been overturned by this new data and this new type of analysis? Are there certain players who have become more highly valued or certain strategies that have become in vogue or fallen out of favor as a result of this new way of looking at the sport? Well, I think what it's clarified, there's always been a debate within, well, I think all the territorial invasion sports, you see the
Starting point is 00:50:04 similar sort of debates in soccer, but within rugby, that debate over, if you like, possession versus territory. And Saracens adopted a style of play that was ingrained. I got involved about six months down the line in the new project. So, you know, year site for Saracens was 2009 and I got involved early 2010 one of the things I did was in analyzing the data was actually to and across teams was able to you know show that the intuitions and the expertise of the the coaches at Saracens
Starting point is 00:50:45 and what they thought was a winning formula was actually supported by the data. But I hasten to add that that style of play, which was very much an emphasis on territorial play, in particular, I guess the easiest way to think about it is that Saracens recognise that if you minimise the amount of play in your own half, and so in your own half you kick for territory, you play very little hand-in-hand pass and rugby in your own half, but principally you kick for territorial, you kick for position and get the ball into the opposition. And the reason that's so successful in a sense is, if you think about it, the more you play ball and hand rugby in your own half,
Starting point is 00:51:42 it's incredibly hard work to move the ball through possession into the opposition half. And the more you play in your own half, the more you allow the opposition to defend in your half. And there's more chance that you'll be turned over. And the closer you're turned over to your own defensive line, the more likely you are to concede points so what Saracens typically did was and and the core of their play to this day still emphasizes this is to to minimize the amount of play that doesn't mean to say you don't pass the ball in your own half but you limit the amount of play in your own half and then concentrate on the ball ball and hand play in the opposition half so play it
Starting point is 00:52:27 kicking for territory and so saracens have have won games convincingly with you know as little as 25 30 percent of the possession they try to to you know push teams back it's what soccer picking up on basketball calls a deep press and that was the way that Saracens play they'd like to to kick the ball long into the opposition territory and then get on top of the on top of the opposition and and press them deep in their own half you know principles that had come out of other territorial sports basketball and and and embraced in soccer and and the the data very much fits with that that you know the some core principles and also the importance of discipline teams that give away relatively few penalties are very disciplined in the way that they play again that came through and it was something that Saracens put a huge emphasis on,
Starting point is 00:53:28 was trying to be penalized as low as possible. But yeah, the data analytics very much showed up that possession isn't, in its own right, isn't the way to win. It's not so much how much of the ball you have. It's what you do with it. It's quality over quantity. And I think that lesson goes right across the invasion territorial sports
Starting point is 00:53:55 and particularly in soccer where, you know, some teams have made it an art form to be successful on relatively limited possession. And certainly within rugby, that's been the Saracen's philosophy, and it's been backed up by the data. And is that Saracen's style of play seen as spectator-friendly, an entertaining brand of rugby? Because in sports, sometimes the analytically optimal strategy is not necessarily the most entertaining strategy. And so in baseball, you have a lot of strikeouts now, which is generally seen as a negative thing, I think. Whereas in American football, maybe you have more passing and that's seen as exciting.
Starting point is 00:54:37 So has that made a difference in rugby? Do people like this or not like it? Oh, I think you find similar arguments with it. And going back almost as long as the games have been played, both in soccer and rugby union, where you have those who want to see much more play, possession-based play, as opposed to the less spectator-friendly territorial play. And those arguments certainly have a long history. And Saracens were, certainly as it became more publicly known about the use of analytics. They were criticised and I myself have been criticised in much the same way as some of the early developers of the use of data analytics within football, within soccer, who were associated with styles of play that minimise the amount of possession and put the emphasis on moving the ball forward and so on.
Starting point is 00:55:49 I, as much as anyone, love to see the skill and excitement involved in teams who are in soccer, so the teams such as Barcelona and so on, who move the ball, keep possession and move the ball around quickly. Similarly, in rugby union, you know, it is exciting to see teams running with the ball from the tape. But you've got to marry that up with, you know, professional sport is not just about entertainment. Part of the entertainment is trying to win. And there's always going to be that tension between, if you like, the style of play and the effectiveness of the play. And, you know, I fully understand that.
Starting point is 00:56:36 My role as a data analyst is to analyze the data and give as good advice as I possibly can for the team to achieve its objectives. And, you know, the objectives for a professional sports team are to win. And so it's not to say that you can't be successful with a possession-based game. And Barcelona clearly in soccer over many years have shown, with the quality of the players they've had, particularly Messi, that they've been incredibly successful with a possession-based game. But possession, as I said, certainly the analysis shows across the board that you don't need to have high levels of possession to be successful. But, of course, there is that trade-off what's and between what the exciting very spectator friendly styles of playing and
Starting point is 00:57:35 being successful and often being successful if you don't have a Lionel Messi on your team sometimes you have to use a more artisan approach than an artist approach to be successful and to win. And you've worked with teams in both rugby league and rugby union, which for people who don't know, which included me before I just did some reading about it, is two different variants of rugby that diverged in the 1890s and have some different rules so are the same principles applicable in both are there certain strategies that you've uncovered that are more favored in one or the other yeah and uh rugby league's an interesting one it's there's more rugby league's much more akin in terms of its structure to uh to gridiron than um and as you rightly say, the history of the game was it was really the rugby union split back in the late 19th century. And it split really because the northern teams typically tended to be working class teams and the players had to be paid. So you had the
Starting point is 00:58:42 professionals working in the employed by Northern teams, whereas the Southern teams tended to mainly be amateur. And what happened was that basically those tensions, the professional Northern teams tended to be successful in national competitions and eventually the sports split. So rugby leagues very much was the game domiciled in the northern part of England. These days, you know, so I live and work in Leeds and Leeds Rhinos are a very successful rugby league team. But the structure of it is that effectively rugby league was very much driven in some ways by
Starting point is 00:59:27 making it spectator friendly. So they took out the set pieces compared to rugby union. They reduced the, took a couple of players out of the team as well. So there were 13 players aside rather than 15. And basically teams were, if you like in gridiron terms, were allowed six downs and then the ball was handed over. And so on the sixth tackle, if you haven't scored by the time you're on your sixth tackle, the ball is then handed over.
Starting point is 00:59:56 So effectively teams get six plays to score. So it's a very fast moving game. It's a game of all the sports that I've our team sports I've analyzed it's the game that is statistically the most predictable the whole structure of the game is such that if you get your KPIs right in other words if you make your tackles and don't miss your tackles if you carry the ball and make metres and you're able to break the defensive line,
Starting point is 01:00:28 those three elements, then you will win in most games. The basic KPIs, the basic key performance indicators typically explain 80% to 85% of the variation in scoring in rugby league games. And it's that structure. The game is structured that if a team scores, then unlike soccer and rugby league and rugby union, if a team scores, the team that has conceded the score
Starting point is 01:00:57 has to kick the ball back to the opposition so that the opposition then get to another chance to score. And so what you get, particularly in rugby league, is as the momentum builds up, teams that score tend to continue and are ahead in terms of the basics of the game, the KPIs, that they tend to be able to uh turn that into uh turn that into scores i for for the media company that i was working with i i developed what was called the performance gauge which was an algorithm that based on about seven kpis uh predicted what the margin should be uh at any point in the game.
Starting point is 01:01:45 And as one of the commentators who used it said, it was uncanny how often the performance gauge would flash up and suggest that a team should be a score ahead, further ahead than they were. And that team almost inevitably scored within minutes of the performance gauge showing this. I should say the team didn't see the performance gauge. This was a little icon down in the corner of the screen for the TV viewer.
Starting point is 01:02:15 But all I was able to do was develop an algorithm that really picked up the basics of rugby league and as a sport, it's one that, you know, unlike football, soccer, I should say, that you can attack for 89, have possession for 89 minutes. And if you don't turn that into a goal and the opposition go down the other end of the 90th minute and score, you've lost the game,
Starting point is 01:02:40 even though you've completely dominated possession. That typically doesn't happen in rugby league. If you make the meters, make the breaks of the defensive line, and don't miss your tackles, there's a very high probability that you're going to win the game. So last question about the future of analysis in rugby. Is there potential for new technology? Are there player tracking systems being set up will
Starting point is 01:03:06 there be wearable technologies are teams doing innovative things when it comes to training and conditioning and injury prevention rugby union rugby league are are pretty well developed on on those latter parts certainly in terms of the you know the use of tracking devices in training and trying to minimize injury and or at least limit the the probability of injury and soft tissue industry injury that's very well developed I think that the where we're at in terms of the development of analytics within rugby that it's one of the sports, both Rugby Union, Rugby League, along with Australian rules,
Starting point is 01:03:49 that's got a relatively long history of players wearing tracking devices in competitive games. Now, soccer's never allowed that, whereas Rugby Union, Rugby League, Australian rules, they have. And so what we're finding now is that rugby union in particular is at that stage where the tracking devices the gps monitors that players have been wearing that typically today has been used the data from that's been used by the sports scientists to evaluate workloads in games. However, what we're now beginning to move to is that, of course, the GPS data is showing where all the players are at any point in time.
Starting point is 01:04:35 So, you know, you're getting something in the order of, what, 24 data points per second in terms of the position in the players. data points per second in terms of the positioning of players. And so much tactically within invasion territorial sports is about the positioning decisions the players are continuously making. And that's where we're at now, that beginning to use this tracking data for tactical analysis rather than just for limiting it to physiological analysis. And that's where I'm finding, for example, I think the developments are ongoing in rugby union
Starting point is 01:05:11 and to some extent in soccer, although using tracking based on video, that are beginning to try to do on those bigger fields what's been being done in in the nba in basketball in terms of the analysis of the positioning of players and and so i think a lot of that the the tracking data and analysis of territory that's been done and and being done in basketball is now feeding through into, certainly into rugby union and allowing teams to be able to evaluate much more systematically the position and decisions of players and helping players develop their ability to pick up, particularly when there's a turnover and teams move from offense to defense for players to
Starting point is 01:06:08 to to move to the appropriate their optimal defensive position and being able to track how well players do that so that i think is is probably where the leading edge is certainly in rugby union and and increasingly so in soccer got it and some baseball teams i know have hired people from rugby teams to staff up their sports science departments since rugby and other european sports were kind of ahead of baseball when it came to injury prevention and conditioning and that sort of thing so it's been interesting to see that cross-pollination. So this has been very enlightening. You can find out more about Bill and his work at his website, winningwithanalytics.com. And we thank you very much for your time.
Starting point is 01:06:55 Thank you for the invitation, Ben. I've really enjoyed it. All right. That will do it for today. Thanks for listening. We are more than halfway through the Multisport Sabermetrics Exchange. So our plan is to record a regular episode about baseball and everything. So that will be the next podcast to appear in your feed, and then after that, we will return to wrap up this series by the end of the
Starting point is 01:07:14 week. Next up will be esports and volleyball. And this little endeavor we're doing here on the podcast is something that's happening out there in the world, too. This meeting of minds across sports. After I recorded the soccer segment with Mike, I read an article in The Athletic about how Alex Cora, the Red Sox manager, flew over to England and met with the Fenway Sports Group in Liverpool in early November. As Mike was saying, Liverpool's been a pioneer when it comes to analytics and soccer, and it's also owned by that same group that owns the Red Sox, so they're kind of comparing notes and figuring out what they can learn from each other. I know people who work in non-baseball sports have been reading the MVP machine,
Starting point is 01:07:49 and people from basketball teams and football teams who found out about driveline baseball because of the book went to driveline to see if they could learn anything from baseball development that could apply to their own sports. Just felt like the right time for this series. You can support Effectively Wild on Patreon by going to patreon.com slash effectivelywild. The following five listeners have already signed up and pledged some small monthly amount
Starting point is 01:08:11 to help keep the podcast going and get themselves access to some perks. Danny P, Michael Sweeney, Terry Spencer, Chris Ruppar, and David Becker. Thanks to all of you. You can rate, review, and subscribe to Effectively Wild on iTunes and other podcast platforms. You can join our Facebook group at facebook.com slash group slash effectively wild. You can keep your questions and comments for me and my regular co-hosts Sam and Meg coming via
Starting point is 01:08:34 email at podcast.phantagraphs.com or via the Patreon messaging system if you are already a supporter. Thanks to Dylan Higgins for his editing assistance, and we will be back with another episode very soon. Talk to you then. See you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.