Modern Wisdom - #297 - Brian Christian - The Alignment Problem: AI's Scary Challenge

Starting point is 00:00:00 Hello friends, welcome back. My guest today is Brian Christian and we're talking about AI's scary challenge the alignment problem. Let's say that you have a computer system and you want it to do X. You give it a set of examples and you say, well, do do that. What could go wrong? Well, lots, apparently. And the implications are quite terrifying. So today, expect to learn why it's so hard to code an artificial intelligence to do what we actually wanted to. How a robot cheaters at the game of football, how human biases can be absorbed by AI systems, the most effective way to teach machines to learn, the danger if we don't get the alignment problem fixed, and much more. This is a topic I think everybody should be educated on. As the world becomes increasingly dependent on artificial intelligence, you need to understand the implications if we have a civilization built upon machine learning when we don't know

Starting point is 00:00:58 that the machines are actually going to be aligned with what we want them to do. So today, hopefully your curiosity will be satisfied and your pants might be pooped a little bit because quite, quite grave and quite scary if we don't get this right. But now it's time to learn about the alignment problem with Brian Christian. What does the quote premature optimization is the root of all evil mean. So this line comes from Donald Knuth, who is one of the, I think of him as kind of like the Yoda of computer science, just dispensing these gems of wisdom.

Starting point is 00:01:59 And there are many, I think many, like many aphorisms, you can take it in a number of different directions. And there are many, I think many, like many aphorisms, you can take it in a number of different directions. One of the ways that I think about it is, you know, a lot of the way that we make progress in math and computer science is through models. You make a model that sort of approximates the phenomenon that you're trying to deal with.

Starting point is 00:02:25 There's a great quote from Peter Norvig, another one of these luminaries in computer science. He's quoting someone from NASA saying, our job was not to land on Mars. It was to land on the mathematical model of Mars, provided to us by the geologists. So this idea that premature optimization is the root of all evil, I think if you kind of mistake the map for the territory, so to speak, if you forget that there's a gap between your model and what the reality actually is, then you can commit yourself to a set of assumptions that are later going to bite you. And so this is the sort of thing that people who are worried about AI safety, you

Starting point is 00:03:11 know, this is what keeps them up at night. What is the alignment problem? That's what we're going to be talking about today. We might as well define our terms. Yeah, so the alignment problem is this idea in AI and machine learning of the potential gap between what your intention is when you build an AI system or a machine learning system and the actual objective that the system has. So it's the potential misalignment, so to speak, between your intention, your expectation, how you want the system to behave, and what that system ultimately ends up doing. Why is it matter? I mean, this is a fear that has existed in computer science going back to at least 1960.

Starting point is 00:04:00 So Norbert Wiener, the MIT cyberneticist, was writing about this. And, you know, he says, if we use to achieve some purpose, So Norbert Wiener, the MIT cyberneticist, was writing about this. And he says, if we use to achieve some purpose, a mechanical agency that we can't interfere with once we've started it, then we had better be quite sure that the purpose that we put into the machine is the thing that we really want.

Starting point is 00:04:21 And I think a lot of people increasingly since like 2014, it has become more and more mainstream within the computer science community itself to think of this as one of the most significant challenges facing the field as we sort of enter this era of AI, that we may develop systems which, we with the best of intentions try to encode some objective into the system. The system with the best of intentions attempts to do what it thinks we want, but there's some fundamental misalignment

Starting point is 00:05:01 and that results in whatever the harm may be, whether it's dark skin people, not getting recognized by a facial recognition system or disparities in the way that parole is being dealt with, to societal level. It could be self-driving cars that fail to recognize J-walkers, and so they kill anyone who's crossing in the middle of the street because there were no J-walkers in their training data. All the way through to some of the so-called existential risks, the idea that we may

Starting point is 00:05:35 actually throw society as a whole off the rails by some system with enough power to shape the course of human civilization, but without the appropriate wisdom to know exactly what to be doing. Paperclips. Everybody turns into paperclips. Exactly. Yeah, so the paperclip maximizer is kind of the classic thought experiment that goes back to Nick Bostrom and Elias Erudkowski. And I think there's, to my mind,

Starting point is 00:06:06 there's kind of a significant culture shift that happens within the field of AI around 2015. And that is that, in some ways, we no longer need the paperclip maximizer thought experiment because we have this like growing file folder of actual real world alignment problems gone wrong, you know, whether it's social media, you know, we optimized our newsfeed for engagement and it turns out that, you know, radicalization and polarization is highly engaging. So,

Starting point is 00:06:37 you know, we paperclip ourselves. This is this quote from Norbert Wiener in your book, which I really thought encapsulated what we're talking about nicely. In the past, a partial and inadequate view of human purpose has been relatively innocuous, only because it has been accompanied by technical limitations. Human incompetence has shielded us from the full destructive impact of human folly. Awesome. Just frames so much of what we're talking about.

Starting point is 00:07:12 So basically, is it the game of balancing act between technological capability and technological wisdom? Largely speaking, I would say yes. And we never use the phrase, no how versus no what. And increasingly, we're seeing this as a kind of a paradigm shift in the field of computer science, broadly speaking. So the standard AI textbook that is used across the world, it's called Artificial Intelligence and Modern Approach by Stuart Russell and Peter Norvig, and they've just now released their updated fourth edition. And one of the things that we're seeing in the new edition

Starting point is 00:07:58 is kind of this focused shifting from, you have some objective, what is the Swiss Army knife of tools in your toolkit to optimize that objective? We're now shifting to, let's take for granted that you can optimize whatever objective you have. How do we figure out what is the right objective to actually encapsulate all of the things that you really want? And so, I mean, part of what I think is very significant is that we are starting to

Starting point is 00:08:29 turn this question of, know what in Wiener's terms into one of the central pillars of what it means to do AI research at this point is, how can we find mechanisms for essentially importing all of these complicated human desires, human norms into the language of optimization. So that's the optimistic side of the story. That's the hope is that we can actually make a science out of the sort of the know what part. Yeah, well, I mean, surely we don't want, we don't know what we want as humans right now. Anyway, I mean, the fields of ethics and morality are still contested from human to human, like how are we going to code a poorly defined concept into a language that machines can read.

Starting point is 00:09:28 I'm unfortunately, you know, it's happening, whether we're doing it well or not, we are doing it. And, you know, you see this at, you know, at YouTube, at Twitter, at Facebook, there is this process of reading user behavior and trying to figure out from the stuff people click on, the stuff people engage with, what it is they appear to want. And I think people underestimate the sophistication of these systems. Like we have these intuitions that, okay, you know, Facebook or Twitter, whatever is tracking

Starting point is 00:10:01 everything I click on. Oh, that's the tip of the iceberg. They're keeping track of how many milliseconds is a particular ad on screen. So even if you're just scrolling through and you hesitate ever so slightly to read a piece of text or look more closely and then move on, they know that. And as you say, we have a very impoverished sense of how to map human behavior to human

Starting point is 00:10:29 desire or human value. And there's often this tension, right? You know, Daniel Coniman tells us that we have these system one and system two, and they're often at odds with each other. You know, there's a reason that supermarkets put the candy right next to the register so that you don't have time to second guess yourself. I think there's a genuine challenge for social media companies, which is how do you, even if you're acting with good faith, even if all you want to do is what users will want, how do you distill some notion of what they want out of their behavior?

Starting point is 00:11:06 And one example that I give in the book, someone I know is an alcoholic, has an alcohol addiction. And social media companies have found out that if you show images of alcohol in an ad, this person will linger. And that creates this horrible feedback loop. And so I think there really is this question, which is, you know, the borough phrase from Nick Bostrom, this is philosophy on a deadline. There are these open questions in not just ethical philosophy, but cognitive science, neuroscience even.

Starting point is 00:11:50 But we don't have time. In a way, we don't have time to wait for the answer because these companies are just going. And so we're going to have to try to essentially fix the plane mid-flight. Yeah, I mean, I read superintelligence, which is kind of, I guess, the seminal book on this, at least it was in the mid to 2015's. And anyone, anyone who really wants to kind of get a good overview, I think that's a fantastic place to go. Stuart Russell's human compatible, as well, his new one is awesome. And then if you want to terrify yourself about everything else as well as AI Toby odds the precipice.

Starting point is 00:12:29 Like that's my perfect three book garage for existential risk right there. But upon reading that, that really kind of opened my eyes to just how big the potential dangers are that we're playing around with here. I struggle to feel optimistic and I don't know whether that is because the scary news stories make the headlines or because AI programmers tend to err now, it would appear that from 2014, I think you talk about the difference in one year of going to a conference, someone brought up AI security

Starting point is 00:13:10 one year and was kind of laughed out of the room, brought up AI security the next year and nobody erased an eyebrow. So there really was a very rapid pivot that I think Nick Bostrom can probably take a good amount of credit for. I agree, and Stuart, yes. Yes. I wonder whether the AI researchers are now so cataclysmic, so existentially aware that perhaps they're only showing us the terrifying stuff. I don't know. Yeah, I mean, I think it's worth making a little bit of a distinction between the actual AI safety research community, which is a small subset of AI researchers themselves. And even people who self-identify as AI researchers, there was a much larger community of data scientists that do the more sort

Starting point is 00:14:09 of day-to-day machine learning work at actual companies. So it may be the case that, you know, the negative stories are coming to the fore within the group of people that's kind of committed to working on these things. And I think that's a good thing. I mean, part of the story that I try to tell in this book is the, you know, the birth of that community, really the beginning. I sort of think of books like super intelligence as a kind of cosmic, you know, breaking the glass and pulling the fire alarm. of cosmic breaking the glass and pulling the fire alarm. And so the analogy that I, to continue that analogy, I think about the beginning of this AI safety research

Starting point is 00:14:52 movement as the first responders have arrived at the scene. And so I think there's a constructive story to be told about we're actually getting things done. We're meeting that challenge. There is a question of how does that fit into kind of the broader culture of AI? I think a lot of people are just in some ways more heads down, just trying to solve actual business problems. And so there's a question of how do we import some of the insights that we have been learning

Starting point is 00:15:31 in the safety community into the more sort of user-facing stuff. So I think that's part of the process as well. Have you got some good examples that can explain to the listeners how alignment works or how it doesn't work? Yeah, I guess those are two different questions. I mean how it doesn't work There are many ways, right? So there's this question of You know, you have a system. You wanted to do X. What could you know, you give it a set of examples and you say, you know Do that you know do this kind of, what could, you know, you give it a set of examples and you say, you know, do that, you know, do this kind of thing. What could go wrong? Well, there's this laundry list of things that could go wrong. So one thing is the examples that you gave it, there's

Starting point is 00:16:17 some kind of fundamental mismatch with the reality. And so one of the things that you hear about a lot is kind of racial demographic mismatch. So there's a facial recognition data set developed in the 2000s that was built by scraping images from newspapers online. And so the composition of faces in the data set mirrors the types of people that appeared in kind of first world, you know, North American, you know, European newspapers in the 2000s.

Starting point is 00:16:52 So the most prevalent person in this data set is George W. Bush, who is the US president during that time. And so it turns out, for example, there are twice as many pictures of George W. Bush as there are of all black women combined. But in the real world, there are not twice as many George W. Bush as all black women combined. And so you have this fundamental mismatch between the set of examples that you've used and the actual kind of reality. And so this is something that the technical name for this would be robustness to distributional shift. So is your system capable of handling the fact that the examples that it encounters out in the real world are coming from some different distribution than

Starting point is 00:17:41 what it learned on, you know, when you were training it. So that's a fundamental question is just what is the kind of data provenance? What examples went in and does that match the environment that you're going to deploy it in? And there are many, many examples of this. I think the racial bias stuff is obviously made headlines, but there are subtler examples. For example, there's a Google system that Upon inspection was you it turned out that the color red was intrinsic to its classification of something as a fire truck And most fire trucks in the US are red in the UK. I think they're also red

Starting point is 00:18:20 in Australia. They're white and neon yellow. And so that model would not be safe to deploy in your self-driving car if you were in Canberra or something. So that's, I think, a very fundamental category of thing is just, are the examples that you've given? Did they match the sorts of things it's going to see in the real world? And the second large category is what's called the objective function.

Starting point is 00:18:51 So every machine learning system has some numerical specification of what it is that you want the system to do. And often things can go very subtly but importantly wrong in how you've specified what you want the system to do. So one example of this, one of my favorite examples comes from this robotic soccer competition that was being held in the 1990s. And this team from Stanford, including Astro Teller, who's now the head of Google X, they decided they would give their robotic soccer team this little tiny incentive for taking possession of the ball. So the overall goal was like to score points and win the game,

Starting point is 00:19:40 but as kind of an incremental incentive, they were awarded the equivalent know, the equivalent of like a hundredth of a goal for taking possession of the ball because they thought that would incentivize the right sort of strategy. You can't score until you have the ball, et cetera. But what their robots learned to do was to just approach the ball and then vibrate their paddle as quickly as possible, taking possession of the soccer ball a hundred times a second. And this was much easier than actually scoring points.

Starting point is 00:20:13 So there are men, I mean, this is just one example, but it turns out that actually trying to specify numerically exactly what you want this program to do is extremely difficult to the point that it's kind of increasingly considered just unsafe to ever attempt to do that. And then we can get into what it looks like to not specify it. Yeah. So the overarching theme that I felt on going through your book, is it feels to me, like, do you remember that advert of the guy using its real hardcore strength gaffer tape

Starting point is 00:20:52 to stop a flood coming out at the side of a water tank? It was like from the 90s, he's got like a big bit of black tape and this hole in the side of a water tank and he's like, look how strong this waxer tape on it. there's this hole in the side of a water tank and he's like, look how strong this wax the tape on it. It feels to me like a lot of the incentive coding structure is that guy sticking waterproof tape on all the little holes. Surely it's not going to be scalable to try and predict every different permutation of what's potentially going to go wrong. Increasingly, I think the field is coming up to exactly the view that you're...

Starting point is 00:21:35 They should have just come and had a chat with the guy with the tape on the thing, and he would have told them that. That's right. Yeah, well, we're maybe a couple decades late to that, to the party, but we're coming around. So yeah, there has been, I would say, something of a revolution within computer science. And I think Stuart Russell gets a lot of credit for this.

Starting point is 00:22:04 He developed this technique around the turn of the millennium called inverse reinforcement learning. So reinforcement learning is what I was talking about with soccer where you have some kind of goal, which is known as the reward function, that kind of doles out points to your system. And then it's the systems job to optimize its behavior to get as many points as it possibly can. That's reinforcement learning. So inverse reinforcement learning goes the other direction. It says we're going to observe some expert behaving. So we're going to observe some expert behaving. So we're gonna watch a soccer player

Starting point is 00:22:46 or we're gonna watch someone play chess or whatever it might be. And figure out what the score of the game must be. If this person's an expert player, then we're gonna try to work backwards from their behavior to the rules of the game and what the point system is. And the basic idea here is that it offers us, perhaps, hopefully, fingers crossed.

Starting point is 00:23:19 This paradigm offers us something that's going to be more robust as we develop systems with kind of flexible capabilities in real world environments, that they can just kind of observe human behavior and try to work backwards from that to an actual numerical specification of what we care about essentially, rather than us having to somehow write it all down on paper ourselves. There's a number of hurdles that we need to overcome. You need to first off write down what we want. We also need to then translate that into code that can be understood by the machines. But first, we actually need to know what we want. And a lot of the time, what we think we want might not actually be correct. We might not understand the externalities

Starting point is 00:24:03 of asking for the thing that we want, even if the machine achieves it perfectly. And there's no machine side, malignant side effects or externalities that it's done, we could just specify the wrong goal. Or we might not understand what that goal would be. So yeah, it certainly seems like trying to bypass our idiocy by using the outcomes that we end up at is a fairly clever way to go about it. What does fairness have to do with the alignment problem? I thought this was quite interesting. It was something I hadn't come up against before. Yeah. So there's, there have been historically a number of ideas within computer science that have been referred to as fairness. You get, for example, fair allocations in game theory, you get fair scheduling on an operating system where every program gets a certain amount of time to run. But starting in, let's say, 2010, the field of computer science really became preoccupied with a notion of fairness

Starting point is 00:25:05 that took into account something closer to our kind of ethical or legal notion of fairness, which is to say like, are different groups of people affected differently by a machine learning system? So the canonical example of this is pre-trial detention. So in the US, if you're arrested, you have this arrangement hearing before a judge, and they set the date of your trial. The trial could be months away, weeks away, months away. And then there is this very specific decision

Starting point is 00:25:39 that gets made, which is, are you going to be held in jail before your trial? Or are you going to be released to go home before your trial? And this is where bail cash bail ends up getting involved in certain states. Increasingly, throughout the last couple decades, but really accelerating in the last five years or so, states have been using these algorithmic risk assessments that just give a score like one to ten. How risky is this person if we release them back into society pending their trial? These have been used by many jurisdictions, there are states passing laws, mandating the use of these sorts of things.

Starting point is 00:26:27 And so there's been a lot of scrutiny on are these models fair? And there's, you know, in the US, we have civil rights legislation going back to the 60s and 70s that articulate certain legal definitions of fairness, but they're not necessarily obvious how those actually apply to a statistical instrument. And so there's been a bit of a controversy around this particular tool called Compass,

Starting point is 00:26:58 which is just one of many of these pre-trial risk assessment tools. So it turns out that Compass is what's called calibrated, which means that if you're given an 8 out of 10 risk score, then you have the same probability it turns out to be re-rested whether you're white or black. And this is kind of the canonical definition of, quote unquote, fairness that's been used for many decades.. But increasingly people are looking at these alternative definitions of things like, okay, well, if you look at the people for whom the model makes an error, does it make the same kinds of errors, or are they different in some way? And you see, for example, that the black defendants who are miscategorized by the model

Starting point is 00:27:44 are two to one relative to white defendants to be judged riskier than they really are. Whereas white defendants conversely are the ones that are miscategorized are two to one more likely to be judged as less risky than they really are. And so people are saying, okay, well, this feels like sort of a disparate impact that we would ideally like to mitigate as well if we want the system to be fair. Along come the computer scientists and say, well, it turns out actually that it's mathematically impossible to satisfy both of those definitions of fairness at the same time. And so this is one of these cases where human intuitions kind of run into these technical challenges. And we need essentially a kind of public policy conversation around, okay, well, when these things that we,

Starting point is 00:28:44 you know, okay, well, when these things that we, you know, seem equally desirable, can't be mutually satisfied, you know, who decides what the priority should be. So this ends up being a very complicated kind of policy slash legal slash computational question, as you can imagine. It gets pretty intricate. Why were the black defendants, did the computer scientists were they able to look at what was occurring that meant that the black defendants were being judged to be more likely or higher in terms of the error rate? Yeah, I mean, there's a couple components. So for one, the model actually makes three different predictions that sometimes get conflated. One of the predictions is the person's risk of failing to appear for their

Starting point is 00:29:33 trial date. One is their risk of non-violent offense and the other is the risk of violent offense. And one of the important things to note is that these are three fairly different predictions in terms of our ability to actually observe the thing that we are trying to measure. So if you don't appear in court, the government basically, by definition, knows that you didn't appear in court. So it's essentially a perfectly observable event. Now, the model is attempting to predict whether you will commit a crime, but we don't know whether someone commits a crime. We only know whether the person is arrested and convicted,

Starting point is 00:30:23 which may or may not mean that they committed a crime. And it's interesting because for the research of this book, I ended up digging into the now almost a hundred-year-long history of these sorts of models that goes back to Chicago in the 1920s was the first one. And to my surprise, in the 30s, it was conservatives making this argument of like, now wait a minute, some guy goes on a crime spree, but he doesn't get caught. And as far as the model is concerned, he is a perfect citizen. That doesn't seem right, because now the model is going to recommend that we release more people like that. These days, you're more likely to see the critique coming from progressives that say, now wait a minute, someone's gotten wrongfully arrested and wrongfully convicted. Now the model thinks they're a criminal and

Starting point is 00:31:09 it's going to recommend detaining more people like that. And so it's funny to, you know, historically the kind of prevailing political valence of the critique has flipped, but it's the same underlying critique, which is that we can't actually measure the thing that we're attempting to predict, which is crime. And so if there are kind of striking disparities in the way that we actually observe crime, then that is going to essentially filter downstream into the model. So for example, in Manhattan, the last statistics that I heard was something like black and white Manhattanites, self-report marijuana use at about the same level. But black Manhattanites are 15 times more likely to be arrested for marijuana possession. And so, here there's a huge gap between what we can

Starting point is 00:32:10 sensibly predict and what we can actually measure. And so, this has led, I mean, there's a lot to unpack in this area, but this has led people, for example, to make the argument that you should essentially trust the failure to reappear prediction, but not trust the nonviolent offense prediction, because the one we can observe much more accurately than the other. So that's a little bit of a flavor of how some of these imbalances and how crime is actually like

Starting point is 00:32:45 observed by the police then filters down into the model's assessment of someone's quote-unquote risk. Can we talk about neural networks, what some of the problems are that you find with neural networks and why it's so hard to get an explanation out of a black box? Yeah, so part of the reason that you and I are having this conversation now and that, you know, we've seen books like Knicks and Stuart's and Tobies in the last, you know, eight years or so, is because of the rise of deep neural networks, which really kicked off in 2012. And so, I mean, it's ironic because the neural network is one of the oldest ideas that anyone had in computing. It predates the stored program computer, which I think is kind of amazing.

Starting point is 00:33:37 So it is about as old an idea as anyone's ever had in computer science and AI, but it was not until 2012 that it actually started to work. So there's a very fascinating history there. But basically, neural networks have come to dominate all of the previous approaches that people were using in the 2000s, and have just kind of swept through computer vision, computational linguistics, speech to text processing, machine translation, reinforcement learning.

Starting point is 00:34:10 You name it, there has been kind of just this successive series of kind of discontinuous breakthroughs as a result of neural networks. They are famous, they have a reputation for being kind of inscrutable and uninterpretable. And so a big frontier in AI safety research is finding ways to essentially pop the hood, so to speak, and figure out what in the heck is going on inside of a neural network. And I think that's maybe one of the more encouraging stories in AI alignment research because we're making more headway than honestly I expected.

Starting point is 00:34:50 What's the problem? Why can't you, you've got a thing, you've got this neural network, it does a thing. Why can't it tell you why it did it? Well, it's sort of like a problem of information overload. So let me use as an example the kind of flagship first real success story and deep learning, which is this image recognition system called Alex net that was used in 2012. So Alex net was designed to take in an image. And I'm trying to remember what the dimensions of the image were. It could have been like 100 pixels by 100 pixels or something like this. Ultimately, somehow output one out of a thousand different categorizations. Is this a truck? Is this a kitten? Is this a, you know, sandy beach? What is this?

Starting point is 00:35:45 And so, you know, at the simplest level, you just have pixels in and categorization out. And so, what's going on in the middle? Well, 100 by 100 pixels is 10,000 pixels, and it's RGB, so you have a total of 30,000 inputs that represent this picture. And they're just encoded as numbers from like one to 255, or from zero to 255, for how much red is in location X, how much blue is in location Y? And this goes into a network of about 600,000

Starting point is 00:36:28 of these artificial neurons. And the artificial neuron is extremely simple. It just takes in these little inputs, and it adds them up. And it says, is the sum of these inputs above a certain threshold? And if it does, then it will sort of pass along some number as an output to the next neurons in the chain. So it's very interpretable in the sense that you can look

Starting point is 00:36:56 at an individual neuron and say, like, okay, what were its inputs? Okay, it's getting a 10 here, a 5 there, it's adding them up, it's greater than something, so it's then outputting or whatever. But it's very, and so you multiply that by 600,000 neurons in, you know, 12 layers or whatever, and there's a total of like 60 million connections between all the different ones. And at the end, you get a number between one in a thousand that tells you truck, you know,

Starting point is 00:37:26 barbecue grill, whatever. The question is, it's not a problem. You know exactly what happens, so to speak, in that it's just adding numbers and comparing them to thresholds. But what in the hell does the neuron in layer five, you know, getting an eight, a one, a 25, adding that up, and then outputting a two. Like, what does that mean? And so that's really the question. It's a system that's kind of perfectly describable in detail, but it's like being given an atomic description of what's going on in someone's brain when they laugh and then trying to figure out, well, what makes something

Starting point is 00:38:14 funny? Well, let's look at this hydrogen atom over here. It's just sort of like the wrong level of description. And so that is kind of the fundamental problem that we have with neural networks. Wasn't there an interesting implication around GDPR? Yeah, so the European Union had this draft version of the GDPR bill that was circulating around 2016 or so. And it had this language in it, which kind of raised the eyebrows of these two researchers at Oxford, Bryce Goodman and Seth Flaxman. And they said, now wait a minute,

Starting point is 00:38:54 this draft version of the bill appears to create this legal concept that everyone has a right to an explanation of if you're affected by an algorithmic decision, if you're denied a mortgage or you don't get a credit card that you apply for or whatever, you are entitled to to know why. And yet, it was widely understood that you couldn't obtain an answer like that from a deep neural network. And so this created something of a panic don't know, a panic between the legal departments of tech companies and the engineering departments. And I remember hearing a lawyer for one of the big tech companies, I won't say which one, talking about meeting with

Starting point is 00:39:37 you regulators and saying, now you realize that you're putting into law something which is like scientifically impossible. And the regulators were sort of unmoved and they said, well, that's why you have, you know, it doesn't go into effect for two years. So, you know, figure it out. I think this is a very, you know, it's, we think of regulation as almost by definition stifling innovation, but here is a case where the regulators demanded something that the scientists then were given a two-year deadline of like figure it out.

Starting point is 00:40:10 And so suddenly there was this huge wave of money and research attention going into this problem. But it's still, I mean, there are a lot of really, I think, promising techniques, but in terms of this question of what is the explanation for why something happened in a neural network, it's not even clear what is legally sufficient to please the EU, let alone what is kind of standard practice at this point.

Starting point is 00:40:40 So it's still a bit of an unresolved question even now. I was listening to one of the engineers behind the YouTube algorithm talking, and then Lex Friedman must have seen the same video clip that I did, and he brought it up on a show. And he was talking about just how terrifyingly little YouTube knows about their own algorithm now that this thing is just a runaway fishery and reinforcement monster that is optimizing and doing things, but to be honest, they kind of don't really know what's going on.

Starting point is 00:41:22 Can you just speak to that? How can it be that programmers that make a thing after it's been left to run for a little while, no longer know what it's doing essentially? Yeah, and I've heard this from people high up in the engineering works of these companies saying, yeah, we have no idea what it's doing, but it's making so much money that we can't turn it off.

Starting point is 00:41:49 This is how horror movies begin, right? So, I mean, in some ways, that's the point of neural networks was precisely that they could do the things that we couldn't articulate in code, like writing, if you think about it as a contrast to writing sort of traditional software where you sit down and you type, you know, if X then go to line 12 or whatever,

Starting point is 00:42:14 that sort of canonical style of programming, there's a whole set of things that that couldn't do. Namely, the things that we didn't know how to explain our own thought process. So it's really good for sort of mimicking your explicit deliberate thought process, but it's really hard for doing kind of sense perception or motor skills or things like that that don't have like an explicit reasoning you can step through. But the dark side of that is that you don't understand how the computer is doing it either.

Starting point is 00:42:47 And so, yeah, there's been a lot of, I don't know, certainly I hear a lot of hand-ringing from people at tech companies, let's say, like, okay, we just pipe in all the possible data that we have about this person, their browsing history, their credit cards, everything they've ever clicked on, whatever it might be, into this thing, and it just spits out, you know, show them this thing and not that thing. And like, when we don't really know how that, you know, is being arrived at, but we know that when we do it, we make more money than when we don't do it. And so there's this weird kind of like, well, let's just let it rip. So that, I mean, that's exactly the kind of thing that people who are worried about AI safety are worried about, right? Well, the problem here and the best podcast I think I've heard on this was Rob Reed from

Starting point is 00:43:48 After On. He had Naval Ravakant on and they were talking about privatized gains versus socialized losses. And that paradigm, so essentially that if you are a private company who has one of these ridiculous algos that is able to just, it's a money machine and you run it and it's able to show the perfect advert, the perfect person at the perfect time, or it can create the best sandwich or it makes the amazing computer game or does whatever. But risks potentially turning the entire world into paper clips, you are privatizing all of the potential gains, but the entire world is risking all of the losses. It's what you get when you have a shared commons as well. It's why people in some developing countries

Starting point is 00:44:35 are slightly less concerned about polluting the atmosphere because they only pollute a bit of the atmosphere, but they get all of the profit. And when you hear about these things, man, like, you know, you don't think that Susan was Jijiki is trying to cause the downfall of human civilization, but I do think that she probably wants to maximize watch time. And sadly, these two things, as the power continues to increase within the algorithms and the computing space, these two things are going to start to converge more and more. Yes. And I think in some ways, I think the alignment problem is bigger than AI, that it is really

Starting point is 00:45:21 a description of what's going wrong in capitalism, in global governance, that there are a number of situations, and again, this is not just an AI thing, where someone defines some metric that sort of kind of encapsulates what we want. At the end of the day, YouTube, or Netflix, or whatever, doesn't really care about how much you watch. They just care about how much money they can make and they minimize churn and maximize user retention blah blah blah. And someone says, well, watch time seems to be mostly correlated with all that stuff. And it's a lot easier to, you know, operation lies. So let's just for the sake of argument, maximize watch time. Or, you know, at Tinder, there was this long period of time, as far as I know, years,

Starting point is 00:46:12 where the metric that their engineers were asked to optimize was swipes per week. And this is basically the alignment problem, like full stop. We come up with some reward function that kind of sort of contains what we're trying to do, but not entirely. And then we optimize the dickens out of that specification beyond the point at which it correlates with the thing we really care about. And so, you know, this goes back to where we started our conversation, premature optimization is the root of all evil.

Starting point is 00:46:48 At what point, how long did it take for someone at Tinder to say, or swipes per week really what we're about? Like, is that really the top of the pyramid of metrics that we're trying to achieve here? Same thing with watch time, right? Like, for a me, I don't know if it's the case anymore for a long time Netflix was explicitly maximizing watch time. And there was some quote, if I'm recalling this correctly, where they said, like, we're competing against, you know, playing sports, we're competing against reading a book, we're competing against like talking to your kids.

Starting point is 00:47:26 And it just sounded like horrible. But how long does it take before someone starts to realize like, oh, maybe that's not the metric that we're like going for. And I think the same thing is happening in society at the highest level, right? We've been maximizing GDP per capita, quarterly returns, you name it, while creating these socialized externalities called climate change, called the increasing genie coefficient, et cetera, et cetera. And so, now the question is whether to be sort of extra pessimistic or extra optimistic by thinking about this as not really an AI problem per se, right? Because I'm maybe more confident that we can

Starting point is 00:48:16 solve this at a technical level than I am that we can sort of change global governance change global governance or reform capitalism in some like very macro way. On the other hand, there is maybe a glimmer of hope that some of these techniques that people are developing in the AI context, things like inverse reinforcement learning, might actually be useful to tech companies for a start and even something like a national government, where they say, instead of manually designing some objective function about what we're trying to do, we'll do the inverse reinforcement learning thing where we will just present real people with kind of different scenarios and ask them to pick. Which of these, which of these newspaper front pages from the year 2030 seems to portray a better world? And then we'll somehow try to back the metrics out of that rather than having to come up with the metrics ourselves.

Starting point is 00:49:29 So tech companies are starting to do these sorts of things. And I know to me, that's the glimmer of hope is maybe we can get ourselves to a sort of unwind the tyranny of these KPIs that are sort of like controlling everything about society at the highest level. Yeah. There's something a little bit more holistic, yeah. It really does seem like you pick a metric as an engineer, as a company

Starting point is 00:49:56 that you think is the closest approximation to what you deem success for the particular, the particular company that you're running There is a reward function that's given to the algorithm for meeting that particular criteria But that criteria might not achieve the best way to get that outcome So let's say that it's time on site. There's nobody that I don't know that wishes this They didn't spend the last time on the phone. There's nobody that I know that looks at Instagram for two hours and retrospectively says that was a good use of my time. But you could imagine in another world that Instagram actually provided an experience which

Starting point is 00:50:35 kept people on site for two hours and retrospectively they were happy that they'd been on for two hours. Now, currently they're not. And I don't know if Instagram could ever achieve that with the particular platform, but we can imagine some other sort of app that could. If it was able to optimize itself in a way where it was actually achieving the outcome

Starting point is 00:50:57 that made people want to use the app rather than just race to the bottom of the brainstem and manipulated them in ways that almost forced them to use the app rather than just race to the bottom of the brain stem and manipulate them in ways that almost forced them to use the app. You would still get the same outcome that you wanted, which was particularly screen time, but all of the root causes of how you arrive at the screen time have been changed. And we would probably mostly be able to agree that that's at the advance of the wellbeing of the user who is doing it. Yeah, yeah.

Starting point is 00:51:30 So, I mean, there's always been this tension between the copious data that's easy to collect. We can measure every click. We can now measure the milliseconds of every item on the screen being on the screen. And then there's this data that's really hard to collect, which is these qualitative judgments of asking people like how happy are you, how satisfied are you. Maybe you go to someone a week later or a year later.

Starting point is 00:51:58 And so you can't directly optimize for these things in tight feedback loops because it's too hard to get the feedback or you have to wait a year to get the feedback. And so it can't be part of your actual kind of day to day iterative loop. But what you can do is try to use something like inverse reinforcement learning or there is more sorts of causal models to figure out how the data that you can observe might predict these sorts of scarce, expensive, long-term things that you really want. And rather than just directly optimizing the feedback that you have on hand, do this more

Starting point is 00:52:43 indirect thing of like, we need to model how the stuff we can observe affects these things downstream and try to optimize for those things downstream by way of these proxies. I think that is starting to happen. It's not quite as simple as just waving the magic one, but I think that's exactly the kind of approach that we need. Is this going to be able to be achieved without some sort of systemic change in terms of governance, in terms of policy, in terms of the way that we step into the algorithms,

Starting point is 00:53:20 the power of the algorithms, the amount of computing power, the amount of transparency that these companies can see into their black boxes, or maybe even the way that they generate their revenue. Because right now, from where I'm sitting, it seems like tech companies are able to make more money than ever before using increasingly advanced neural networks and algorithms, but that is being done at the expense of a lot of other things that we probably don't want to depreciate anymore. It feels to me as if we are at a little bit of a precipice, Toby, or perhaps like the apex of a curve where you, if we were to push much further, I think that you would start losing, we may have already gone past it,

Starting point is 00:54:14 but I think that we actually start to lose so much of civilization that the advantages that we begin to get from technology are negative utility. I agree with you. And I think that may be true, even if you are in a completely self-interested position at a tech company. For example, Google and Facebook have now relative to 10 years ago lost so much of the public's good will that when the Department of Justice swings the antitrust

Starting point is 00:54:46 hammer, you know, are people going to cheer or are they going to protest, right? And that makes a difference. Like, so goodwill may seem like this kind of gossip or, you know, ineffable thing, but it manifests. Council law, yeah. You know, it may shatter your entire corporation manifest. Counter-strike. Yeah. It made chatter your entire corporation if you piss enough people off. So what do we do? What's the fix?

Starting point is 00:55:08 How do we fix it? I mean, I do think at some level there is this kind of room for these more technical solutions. And so, my feet are planted more squarely in the research community. And so I can see the development of some of these things and how, for example, at Berkeley, where I'm affiliated, a lot of the PhD students in technical AI safety have started doing summer internships at tech companies. And I think that's very interesting.

Starting point is 00:55:41 It's sort of a marker of the kind of maturity of technical AI safety in just a few years that some of the stuff that was on whiteboards in 2017, 2018 is now getting actually mocked up as this kind of MVP thing at an actual tech company. So that seems good. I think one of the major forces that holds a lot of tech companies in check is the fact of how much power the actual employees themselves have.

Starting point is 00:56:17 Currently, machine learning is in such high demand that good machine learning engineers have a ton of leverage. And they're able to use that leverage to kind of convince companies to do certain benevolent things like publish, publish results openly, you know, in public journals or publish things directly onto the web or release open source. Things that companies wouldn't necessarily be otherwise disposed to do has now become sort of part of the norm of the field. And that's not coming from competitive pressure necessarily. It's not coming from regulators. It's coming from just the consciences of the individual engineers and the fact that they

Starting point is 00:56:57 have leverage over their employers because they're in this like extremely high demand category. Now, it may not always be the case because as the ranks of machine learning engineers grow, because this field is so in demand, then the individual bargaining power of those employees is going to go down. So we're going to lose a little bit of that leverage. To some of it come from regulators, I assume so, but it is not clear to me what shape that regulation is actually going to take. I mean, I think some kind of citizen participation, you know, like I wouldn't mind something, you know, I say in the book, I think that it's reasonable to imagine that we have some rights to know what the model is

Starting point is 00:57:52 that these companies have of us. And to have some kind of direct influence over that, right? So, you're starting to see this a little bit with alcohol, which was the example I mentioned earlier, where some tech companies now actually have like a toggle somewhere deep into the preferences that says like, never show me alcohol. That's like the tip of the iceberg, but you can imagine a way in which you can have some control over which version of yourself is being marketed to, right? You can say like, yes, it turns out that I, you know, when I'm in the checkout aisle,

Starting point is 00:58:30 I put a bunch of candy on my thing, but I want to not be that person. So please give me the checkout aisle that has, you know, for your vegetables, yeah, precisely. And I, you know, in theory, that should be a win-win. But we'll see, you know, in practice, it's not totally clear that they can present their model of you in a way that makes sense to you

Starting point is 00:59:02 or in a way that's like directly manipulable. Yeah. There are questions. The main thing, and I think that this kind of cuts to the heart of the discussion around the alignment problem right now, is that the general tenor and tone and feeling of users towards tech companies and of us towards ourselves and of our relationship with technology, probably in the space of the last six years to seven years has changed very much from technology being a tool that we use to technology being a tool that uses us. And I think, yeah. It just feels to me like that's the... Like, we made these tools in an effort to make life more entertaining and rich and all of these.

Starting point is 01:00:02 And man, I've had hours, tens of hours of conversations on this show through this microphone, talking about what more connected than ever but have never felt more alone. What does it mean for young children to be spending time looking at screens? Like one of the fucking iron prescriptions that I come up with is that you need to spend more time

Starting point is 01:00:22 looking at the night sky. Like in what worlds should I be prescribing the night sky as like an antidote to the way that you exist? And yet, I am. So, I think it's just a comment on that. Yeah, go ahead. No, I can't resist mentioning that the night sky itself is this externality of, you know, starlink That I mean to say nothing of urban light pollution et cetera, but You know part of the business model of

Starting point is 01:00:55 Starlink is to put a ton of satellites into space that may or may not mess up our view of the night sky, but it's not starlink's problem And so yeah, that's yet another example of this kind of socialized externality. And so, yeah, I couldn't resist pointing that out. There's a girl who I had on the show, Mara Cortona, who is the director of the Astropolitics Institute. So, this is the politics of space. Fuck me man, if that's not an interesting like read,

Starting point is 01:01:29 it is so cool, like who owns space? Can we throw our waste into space? Can we like claim bits of space? What, who owns the moon, who owns Mars? It's so, it is stuff so fascinating. But yeah man, as a sort of parting note, what do you think that we can expect as normal users of technology?

Starting point is 01:01:53 What do you think that we can expect from our sort of experience and our interaction with technology over the next decade? I think we're starting to have a sense that we are interacting with technology in a way in which all of our actions can and will be used against us, right? It's like you need to be merandized to go online or something. I I, I mean, I don't know if you have this experience as well, but when I'm using, let's say YouTube, there's a part of me that tries to decide

Starting point is 01:02:35 before I look up a video if I want to look it up in an incognito tab so that I kind of like, you know, separate off, cordon off. Like this is not part of the preference model. I want you to build of me, even though I know they're going to track my IP address and whatever, but it's not directly linked to the same viewing habits that I've kind of laid down in my regular account

Starting point is 01:02:58 and all the cookies in it. I think there is going to be this increasingly weird game theoretic aspect to using technology where we are constantly having to sus out. What inference is it going to make based on my behavior? What is their business model? What kind of feedback will my behavior create? You hear these funny stories of people saying,

Starting point is 01:03:25 like, you know, I let my two-year-old, you know, mess with my Spotify for a day, and then my recommendations have been ruined forever, you know, things like that. And I certainly have that experience, that I feel that I'm dealing with this kind of inscrutable, you know, intelligence or machinery or whatever you want to think about it, there's some process happening of which I'm a part.

Starting point is 01:03:54 And it's I'm being observed, I'm being sort of adapted to the things that I see have some weird relationship to what I've interacted with before, but I have no idea what that relationship is. And I mean, I think about Twitter as one example, you know, the Twitter app, a lot of the stuff in my feed isn't even from the people I follow. It's this kind of secondary, interchieery thing of like someone that you follow, like to tweet by someone else who knows this other person and this is their tweet and so As a consumer you have no idea like why why am I seeing this thing?

Starting point is 01:04:30 When you go to make a tweet you have no idea What process may or may not determine how that tweet reaches people or which people it reaches? But we're sort of forced to play this game. And we're forced to sort of co-adapt with these models. And I think that's one of the things that people in technology, I think, underestimate is that if somewhere secretly in the, you know, the Bowls of Twitter, they start to add a 5% bump for posts that use highly emotional language or something like that or just that naturally shakes out of the optimization. People will notice and people will change their behavior accordingly.

Starting point is 01:05:19 The technical way of putting this is, machine learning is secretly mechanism design. It's like, you can't make a model without that model becoming essentially an ascentive structure that people then start to game and then the correlations you previously observed break down. And so you constantly have to kind of reevaluate it. What do I think we can expect in the next five to 10 years? I think that process of, I don't know, I think that disempowering feeling

Starting point is 01:05:50 of just like, I have no idea what's going on here. I'm just kind of experiencing it. I have this sense that I'm part of this causal feedback mechanism and I don't really know exactly what that, how that works. But I need to take, I know that I need to take it into account, because everything I engage with will shape the future data that I even get. I mean, I think we were, we're going to see, I mean, I think we were going to see,

Starting point is 01:06:29 I think these subtle questions of like what the actual business models of these companies are like really start to rear their head in the machine learning aspect. I think that's kind of underappreciated. So for example, we have all these different apps that recommend us things, Netflix recommends us things, Amazon recommends us things, Spotify recommends us things. But the business model in each case is quite different.

Starting point is 01:06:56 And so that ends up actually manifesting in very different recommendations. So Amazon is this logistics behemoth. And so they really benefit from people doing mainstream things. If you buy the book, that's the number one book, then they probably have a copy of it just like a few miles from your house and it's gonna be easier for them than if you get an obscure book. Netflix is constantly renegotiating the licensing rights with all these film studios and TV studios. So Netflix will prefer if you watch the really obscure thing,

Starting point is 01:07:28 to the really mainstream thing because they can get the rights a lot cheaper. So Amazon is giving you this kind of centripetal force towards these sort of mainstream modal things in the culture. Netflix is the centrifugal force that's driving you to these like obscure niches. Spotify has this kind of double-sided marketplace thing where if they put too many of the mainstream artists in your recommended playlist, then the indie record labels get mad and pull out and so they're constantly wrangling to please both the listeners and the musicians. So all of these things end up manifesting

Starting point is 01:08:09 in the actual behavior of the system, but in ways that are, sort of tactically obscure, but strategically clear. I don't know where that leaves us necessarily. I think it's going to be increasingly clear that I don't know. We need to figure out how to give people a seat at the table, that if there are things like the health of public discourse at stake. We increasingly have the actual computer science to attempt to operationalize these weirdly fuzzy things.

Starting point is 01:08:54 I'm thinking about OpenAI, GPT-3. There are a lot of research papers coming out about, these very ill-defined concepts that we have, like for language, you can actually sort of fine-tune GPT-3 to meet these different criteria. So the scientific piece is getting worked out, and the question that remains is really like, how do we decide who gets a seat at the table? And whose opinion is it? And whose values are the values that are getting imprinted into this system. So that is, I think, the big question that awaits us, even when we can solve the technical and scientific aspect of the alignment problem. I think it comes back to what we said at the very, very beginning, the balance between

Starting point is 01:09:38 technological capability and technological wisdom, in that we are now, are you're telling me that the very, very cutting edge of computer science research and neural networks and GPT-3 and all this stuff is on the cusp of actually being able to potentially open up solutions to problems that most of us have only just become aware of. So you're like, you're so far ahead and the iterations on this are so rapid that, you know, think about the lumbering behemoth that is governmental policy behind us and legislation and it's psychological research into the effect on human long term. That is way, way in the back. That's in the tale of Snowpiercer,

Starting point is 01:10:25 and then right on the very, very front, getting battered by cold winds, or a couple of these computer science guys, and then the guy that's driving the train is the person that's the algorithm, and then we're somewhere in the middle, find like feeling it, and we're like, oh, isn't it interesting that all this stuff's going on?

Starting point is 01:10:39 So I'm like, yeah, I mean, the next 10 years it's going to be super, super interesting. I have this, I have this get out of jail free card that I've been using for absolutely ages, which is that all of the problems that we're coming up with now are not going to matter in 100 years, because what either going to be enslaved by a misaligned, a super intelligent, artificial, general intelligence, or will have managed to get a machine extrapolated volition to work correctly to the point where it fixes all of the problems for us.

Starting point is 01:11:11 So all of the stuff between now and then is like some weird reverse deterministic apathy fest where we don't actually really need to do anything. It's like, look, the end of the road's coming. It could be good, it could be bad. Let's enjoy the ride on the way there. I wonder how much the contributions of these smaller consequence alignment problems are going to contribute to us getting the big one right? Yeah. I mean, I agree with your snpiercer analogy broadly speaking and I don't know It's in some ways it's even worse than that because there's a computer scientist at Princeton named Arvin Narayanan who has pointed out

Starting point is 01:11:55 That a lot of the systems that are still integral to our financial system and you know airplane and airplane controls are written in like Pascal and Coball and these programming languages that like barely anyone even knows anymore. And so he was saying, you know, we have this idea that tech moves too fast for society to keep up. But in reality, a lot of these like crappy machine learning systems that were developed in the 2010s are still going to be around like in zombie mode 20 years from now and like maybe that's even more terrifying.

Starting point is 01:12:28 So I think there is this question of are we able to catch these misalignment issues in time to actually course correct. Some of them I feel more sanguine about that than others. Maybe to push back a little bit on this idea that we just need to relax. I mean, there's this question, so there's the kind of this philosophical question here, which is like, this tension between moral realism

Starting point is 01:13:01 and moral relativism. So if I've heard people say that they are, they identify as moral realists, so they think there are objective truths about like right and wrong, and that people are just bad at figuring out what those objective truths are. And so that's the kind of attitude that says, you know, I welcome our new robotic overlords, the sooner they can tell us what's right and wrong, the better. That is kind of a, we can chill scenario.

Starting point is 01:13:33 I mean, I hope, assuming that we get that Hail Mary pass, right, and the system really is aligned, but if that's the case, then the moral realist, you know, overlord just tells us what to do. If you're kind of a moral relativist, then you have this idea that, you know, whatever people say is good, is what's good. I think there, this idea of coherent extrapolated volition,

Starting point is 01:14:01 which goes back to Elias or Yudkowski from Machine Intelligence Research Institute, has been very influential in the AI safety community. And it's this idea, which I think to some degree threads the needle between those two things, which is to say, you know, the thing that we want isn't just whatever people happen to say when we pull them. It isn't some objective truth that we could all be wrong about, but it's this idea of what we would decide if we were smarter, if we had longer to think about it, if we could sort of pull together in the appropriate way. I think in some ways that ends up becoming the job for the next hundred years.

Starting point is 01:14:48 Philosophers like Wil McCaskill, who is Toby's colleague, are saying, we need this period that they're calling the long reflection. Where we basically need to just, everybody just chill. We need to take maybe a million years to just figure out what we want to do with the cosmos and, you know, take our time. There's this very influential essay by Nick Bostrom called Astronomical Waste, which says that something like, you know, every second that passes that we don't colonize the stars is equivalent to like, you know, trillions of human lives being lost that could have been lived with that, you know, had we acted sooner. And yet, we still need to take our time because the consequence of

Starting point is 01:15:35 screwing it up is even worse. So there's a lot of, I don't know, a lot of folks coming at AI safety from the philosophy side saying, what we really need to do is just chill, like leave the space of possibility open. And it's interesting to me because there's a lot of technical AI safety work on this idea of option value, preserving the ability of a system to achieve various goals in the future. So, you know, something that you might want in a system is the system doesn't take actions which permanently foreclose possibilities in that space. Whether it's shattering the vase that you can't put back together again or killing the person that you can't bring back to life, whatever it might be.

Starting point is 01:16:30 There are some really, I think, really encouraging technical results in these sorts of toy environments where if you give the agent randomly generated objective functions and say, okay, I want you to perform some task, but preserve your option value to like later do these randomly generated other tasks. The system, at least in these simplified examples, behaves with what seems like a very human amount of a lot of work. Very delicate.

Starting point is 01:17:01 It won't just push the minvars out of the way to run out the door. Man, that is so interesting. I hadn't heard about that before. But it makes complete sense that by forcing it to retain some level of optionality, you restrict the maximizing, ridiculous maximizing effect that it can go after. Man, I, I'm gonna, once we're finished up, I'm gonna ask you for some suggestions for safety researchers and stuff, because I'm gonna force this down

Starting point is 01:17:33 the audience's throat over the next year. Yeah. But man, Brian. My name check, yeah. I will do. Victoria Krakow, not Deep Mind. And one, it is one of the people working along on this. And there's an idea by a guy named Alex Turner

Starting point is 01:17:49 that's called auxiliary utility preservation. So I'll send you some links and you can, you can share some actual papers. That's my bedtime reading finished. Man, thank you for coming on. The alignment problem, how can machines learn human values will be linked in the show notes below. If people want to check out any more of your stuff, why should

Starting point is 01:18:07 they go? I'm on Twitter at Brian Christian and on the way that BrianChristian.org. Perfect, man. Thank you so much for coming on. That's my pleasure. Thank you. Yeah, oh, yeah, oh, yeah, oh, yeah

Modern Wisdom - #297 - Brian Christian - The Alignment Problem: AI's Scary Challenge

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.