Modern Wisdom - #364 - Stuart Russell - The Terrifying Problem Of AI Control

Starting point is 00:00:00 Hi friends, welcome back to the show. My guest today is Stuart Russell, he's a professor of computer science at the University of California and an author. Programming machines to do what we want them to is a challenge. The consequences of getting this wrong become very grave if that machine is super intelligent with essentially limitless resources and no regard for humanity's well-being. Stuart literally wrote the textbook on artificial intelligence, which is now used in hundreds of countries. So hopefully, it's got an answer to perhaps the most important question of this century. Today, expect to learn how AI systems have already manipulated your preferences to make

Starting point is 00:00:43 you more predictable. Why social media companies genuinely don't know what their own algorithms are doing? Why are reliance on machines can be a weakness, stew its better solution for giving machines goals, and much more? I think the main takeaway from this conversation is that programming computers to do what you actually want is really, really hard. And we don't have the luxury of getting this wrong if the computer that we're trying to program is incredibly intelligent and controlling essentially the entire world.

Starting point is 00:01:16 I'm very glad that someone like Stuart is thinking about these questions because he seems to be about as qualified as you can be, but as you'll hear, there are far fewer resources than are needed being dedicated to this area of research. Before we get on to other news, if you haven't already picked up a copy of the Modern Wisdom reading list, 100 books to read before you die,

Starting point is 00:01:37 you can go and get yours right now for free by heading to chriswillx.com slash books. It's completely free, it took me months to write and thousands of people have already downloaded it. So go and get your copy now chriswillx.com slash books. But now it's time for the wise and wonderful Stuart Russell. Let's do it Russell, fuck it up the show. Thank you, nice to be you. What do King Midas and Artificial Intelligence have in common? Good question. So King Midas is famous in two ways, right? So he had the golden touch, so people

Starting point is 00:02:38 think of him as kind of a loadstone for getting rich. But the moral of the story with King Midas is he said, I want everything I touched to turn to gold. And he got exactly what he wanted. So the gods granted his wish, and then he finds out that he can't eat because his food turns to gold and he can't drink, because his wine turns to gold, and then his family turns to gold. So he dies in misery and starvation. And this tale is basically a description of what happens or what might happen with super intelligent AI. Where the super intelligent AI plays the role of the gods and we are King Midas. And we tell the AI, this is what we want, and we make a mistake.

Starting point is 00:03:26 Right? And then the AI is pursuing this objective, and it turns out to be the wrong one. And then we have created a conflict. We basically created a chess match between us and the machines, where the machines are pursuing some objective that turns out to be in conflict with what we really want. chess match between us and the machines where the machines are pursuing some objective

Starting point is 00:03:45 that turns out to be in conflict with what we really want. And that's basically the story of how things go south with super intelligent AI. And if you look at what Alan Turing said, in 1951 he was on the radio, BBC Radio 3, the third program. And he said, basically, we should have to expect the machines to take control. End of story. And I think this is what he had in mind that they would be pursuing Objectives and we would have no way to stop them more and to fear with them Because they are more capable than us so they control the world That's the challenge the fact that it's not just the objective is misaligned

Starting point is 00:04:38 But it's that the power deploying that misalignment is so vast that there's no stopping it once it's set away Yeah, and if you're a gorilla or a chimpanzee or whatever, you thought your ancestors thought that they were the pinnacle of evolution and then they accidentally made humans and then they lost control. They have no control over their own future at all because we're here and we smarter than they are and end of subject. Yet rare that the person that's supposed to be or the agent that's supposed to be in charge is actually less capable or less powerful or less intelligent than the agent that they're commanding to do their bidding. that they're commanding to do their bidding? Yes. Yeah, we don't have any good models for how this relationship would work. So even if we do solve the control problem,

Starting point is 00:05:42 there are various issues that will still have to face. For example, how do we retain anything resembling the kind of intellectual vigor of civilization when our own mental efforts are just puny compared to those of the machines that we're supposed to control. So, you know, and in some science fiction books, for example, Ian Banks' Culture Novels, which I highly recommend to your listeners, he struggles with this because, you know, they've got super powerful AI, and everything is hunky-dory. The AI systems always do stuff that's beneficial for humans,

Starting point is 00:06:29 but in a way they end up treating humans like children. There's always this delicate balance which parents have. When do I stop doing up my kid's shoelaces and make them do their own shoelaces? And it's, except that with parents and children, the children are not supposed to be the ones who are in control of the parents. Sometimes they are, but not supposed to be. And we just don't have a model for that where the children are commanding the parents, but the parents treating the children like children and saying, okay, well, I think it's time for Johnny to, you know, learn to do his own shoe laces.

Starting point is 00:07:16 So I'm going to hold off on helping Johnny today. You know, I just don't exactly know how I just don't exactly know how it's going to work and how humans are going to continue to have the incentive to slog through 20 years of education and so on, to learn things that the machines can already do much better. That's thankfully not a problem that we need to deal with just yet. I suppose the fact that we don't have an imminent, well, I suppose we don't know if it's going to be a hard take off, so it might be imminent, it might be tomorrow. But everything suggests that it's not.

Starting point is 00:07:55 Have you got any conception around how long it will be before we do face a super intelligent AGI? Well, usually I say I will not answer that question. base a super intelligent AGI? Well, usually I say I will not answer that question. And I was at a world economic forum meeting, which was officially off the record under Chatham House rules. And I was somebody asked me that question. So I said, well, you know, off the record, there's a number. What is I do? I said, I said, I said, I off the record, you know, within the lifetime of my children. Okay. Yeah, you know, that's a flexible number because medical advances might

Starting point is 00:08:39 make their lives very long. And then 20 minutes later, it's on the daily telegraph front page. What was this? Probably 2015, I think, January 2015. Professor predicts sociopathic robots will take over the world within a generation. That was what they had in mind. So even though I tried to be So, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I that just scaling up the methods we have is going to create superintelligence. You know, simply put, you make the machines faster, you get the wrong answer more quickly. What's the bottleneck that we're facing at the moment then? Is it hardware? Is it algorithms? It's definitely not hardware. I think we probably have more than enough hardware to create super intelligent AI already.

Starting point is 00:09:50 I think it's, well, algorithms, but it's basic conceptual gaps in how we're approaching the problem. Our current deep learning systems in a real sense don't know anything. And so it's very hard for them to accumulate knowledge over time. What does that mean that they don't know anything. So they can learn, they can learn a sort of an input output function. And in the process of doing that, they can acquire internal representations that facilitate the representation of that function and so on. But you don't learn, let's say Newton's Laws of Physics,

Starting point is 00:10:52 and then become able to apply those laws to other problems. They have to retrain from scratch or almost from scratch on new problems. And if you think about the way science works, which is the best example of human accumulation of knowledge, we know that it wasn't simply the accumulation of raw sensory data and then training a giant network on vast quantities of raw sensory data

Starting point is 00:11:21 because all the people who had that sensory data are dead. the last quantities of raw sensory data, because all the people who had that sensory data are dead. So whatever they learned, whatever they accumulated, had to be expressed in the form of knowledge, which subsequent generations could then take on board and use to do the next phase. And so a cumulative learning approach based on the acquisition of knowledge

Starting point is 00:11:49 which can be used for multiple purposes, we don't really know how to do that, at least in the deep learning framework. In the classical AI framework where we had explicit representations of knowledge using logic or probability theory theory that could be done. And I still think it can be done in those paradigms. I just think we need to put a lot more effort.

Starting point is 00:12:14 And maybe to be a bit more flexible, I think one thing we learned from deep learning is that there is advantage in being as it were sloppy. So that we don't have to think, okay, how can I learn Newton's law as well? I'm gonna put F and M and A together in some order and I'll eventually find the right law, right? That assumes that you have F and M and A

Starting point is 00:12:44 already as precisely defined concepts. But in the learning process, you can be more sloppy. You don't have to have exactly the precise definition of mass and exactly the precise definition of force. You can have a kind of a sloppy definition. It's something going on about how big and heavy the thingy is, right? And there's something going on about how hard am I pushing it? And there's something going on about how is it moving?

Starting point is 00:13:14 And gradually, those things gel. And so you can simultaneously learn the knowledge and the concepts that go into the formulation of the knowledge. And so, I think that idea is something that we could bring back to the classical tradition and improve it. What are some of the challenges around language? So this is a second big area where I think we need breakthroughs. So the language models that we have right now, GPT-3 and so on, which everyone is very excited about, their job is basically to predict the next word,

Starting point is 00:14:01 and they become very, very good at predicting the next word and then they can have them predict to the next word they can then add that next word and that's how they generate text. So you can just keep repeatedly applying it and it will start spitting out things that look like sentences and so on. But what they're really doing is predicting the next word based on all the previous words that were in the text sequence. And it's a little bit like astronomy in the time of tolamy. So tolamic astronomy was what happened before we had any idea that the planets were massive objects moving under the force of gravity So we plotted out the apparent motion of the planets and this through the heavens and

Starting point is 00:15:07 motion of the planets and through the heavens. And we basically describe their shape. And if you look at the shape, if you sort of plot it out over the course of a night, it's this, you know, mostly sort of somewhat circular looking ox, but with wiggles and spirals and so on, added in, you know, over long time scales because of the relative motion of the planets around the sun. over long time scales because of the relative motion of the planets around the sun. And so, tollic-mayak-adastronomy, just consistent of describing those big, cop-aided, wiggly, circled, spirally shapes. And then you could gradually extrapolate them, right? So once you understood the pattern of this big, wiggly, spirally shape, you could then extend it. I got the shape. Now I'm just going to keep drawing it and

Starting point is 00:15:52 say, OK, well, next week, the planet should be here. And you'd be right. And so that's sort of what's going on with these text prediction algorithms, right? That they're taking all the previous words, which is by analogy to the positions of the planets, and then saying, OK, I get the shape. I'm going to predict what it's going to be in two words, three words forwards into the future. But there's no sense of why. Why is the word on the page? The word is not on the page because the previous thousand words were on the page All right, that's not the real cause of why that word is there the real cause of why that word is there because There's a world

Starting point is 00:16:38 Outside this in the text and someone is trying to say something about it, right? the text and someone is trying to say something about it. And that's sort of what I call the physics of text. There's no knowing beyond the simple output. So this is, I guess, is this similar to a philosophical zombie in a way that you're able to output a thing that looks like it's a simulate room of intelligence within a sort of narrow band but there's nothing going on deeper below the surface. That's one way I'm not I'm not here talking about is there real conscious understanding? We haven't got there yet. But just does this causal model of why the text is there? Does that approximate reality in any way?

Starting point is 00:17:35 No. The reality of text is that people are trying to say things and they're trying to say things about a world that they live in and are acting in. So, you know, the simplest model would be they're just trying to say true things, but actually they're trying to get something done in that world, and part of getting something done is saying true things, and sometimes it's saying false things, and sometimes it's asking questions. But this, you know, you can see this connection, you know, why the real world matters. Because statistically, the fact that there's a real world,

Starting point is 00:18:15 and we're all talking about the same real world means that every document in the world is correlated with every other statistically. So if one document says JK Rowling wrote Harry Potter, right? And another document written in Russian, a year later says, the author of Harry Potter is, what do you expect the next word to be? Well, you expect it to be JK Rowling because they're talking about the world. And even if this is a new way of saying it and a new language, it's correlated through this common hidden variable, if you like, which is the real world. And none of that is there in current deep learning models of language. So I think they're fundamentally flawed.

Starting point is 00:19:15 And this is one of the reasons why they take trillions of words of text. I mean, they read about as much as all human beings in history have read. Right? And so, and they're still, they still make stupid mistakes. They still kind of lose the plot. One of the things you see is that because they have a, you know, they're predicting the next word based on a relatively limited memory of the previous text. They kind of lose the plot. So as they go on,

Starting point is 00:19:50 they'll start either repeating themselves or contradicting them what they said earlier on in the text and so on. So, you know, having said that, they do exhibit quite impressive kind of short term recall and question answering. And a certain amount of generalization is going on, right? Because you can see that because you can ask them questions or you can tell them things using a name or a place that they've never ever seen before. And then you can ask them questions about that name or that place, and they'll answer it correctly. So there's some generalization going on, right?

Starting point is 00:20:33 They've learned a general pattern, and then they're able to instantiate the general pattern with particular people or places or whatever it might be. And so, you know, that's a sign that learning has happened. But generally, we don't understand what's going on beyond that. And so we don't know when they're just spouting gibberish, right? You think it's answering your question and actually it's just spouting complete gibberish. You don't know. I suppose the challenge here is that the main way that we communicate is through language.

Starting point is 00:21:06 So if you're not a computer programmer and you wanted to have a conversation with a super intelligent AGI home assistant, you would need to tell it what you mean. It would not only need to be able to understand the words that came out of your mouth, but our language, our use of language is imprecise also. So it also needs to be able to work out what you meant to mean, not just what you said. Then it needs to interpret it. Then it needs to be able to convert that into something that it can do within itself.

Starting point is 00:21:35 And then it needs to enact that. So yeah, I mean. Yeah, so I mean, we built systems that could do that, even in the 60s and 70s. And they sort of work the way you would expect. They understood the structure of the language. They, what's called parsing the sentences. So figuring out what's the subject, what's the object.

Starting point is 00:22:00 And then converting that into an internal logical representation. And then doing the reasoning necessary to answer the question, or to add the fact to these systems internal knowledge base, and then generating answers to questions, and converting the answers back into language. So that process, we've known how to do, is just been very difficult to make it robust because the variety of ways of saying things is enormous. We speak in ways that aren't grammatical,

Starting point is 00:22:34 but still perfectly comprehensible. We do things like lie, right? The last thing you want to do is for the system to believe everything everyone says, because then it's very easy to manipulate. So it has to understand that what's coming out of our mouth is as Wicconstein produces a move in a game. It's not gospel truth. It's an action that we are taking and the action might be to try to fool you or to try to make you do something that you would not otherwise do or whatever.

Starting point is 00:23:09 So that level is completely not there. You can, GPT-3 takes all text as gospel truth or whatever, right? It doesn't distinguish between fiction and fact, between propaganda and truth and so on. It's all just text. What are some of the big ways that we could get artificial intelligence wrong? So I think the current approach to AI,

Starting point is 00:23:41 which has been there really since the beginning and in the book, human compatible, I call it the standard model, which is a word that people use in physics to refer to, you know, all the laws of physics that we pretty much agree on, right? So in AI, the standard model has been to build machines that behave rationally and this notion of rational comes from philosophy and economics that you take actions that can be expected to achieve your objectives. And that goes back to Aristotle and other places. So we took that model and we created AI systems that fit that model.

Starting point is 00:24:27 Now, with people, we have our own objectives, so we can be rational with respect to our objectives. Of course, machine doesn't have objectives intrinsically. So we put the objective in and it seems like a perfectly reasonable plan. I get in my automated taxi, I say take me to the airport, that becomes the taxi's objective and then it takes me to the airport, it figures out a plan to do it and does it. Pretty much all AI systems have been built on this basis that one of the inputs that's required to the algorithm is the objective. If it's a game playing program, the objective is checkmate or whatever it might be.

Starting point is 00:25:15 If it's a root planning algorithm, then it's the destination. If it's a reinforcement learning algorithm, then it's the reward and punishment definition, and so on. So, and this is a pretty common paradigm, not just AI, but the control theory systems that fly are airplanes. They minimize a cost function. So the engineer specifies a cost function, which penalizes deviations from the desired trajectory, and then the algorithms will optimize a given cost function.

Starting point is 00:25:56 And okay, so what's the problem with that? The problem is, as I mentioned earlier, when you brought up King Midus, we don't know how to specify the objective completely and correctly. And so for artificially defined problems, like chess, chess comes with a definition of checkmate. So it's sort of fooling us into thinking that this is an easy problem to specify the objective. But take the automated taxi, the self-driving car, is the destination the objective? Well, that would not be because then it might drive you there at 200 miles an hour and you come home with 50 speeding tickets and if you weren't

Starting point is 00:26:42 dead. So obviously, safety is also part of the objective, right? Okay, well fine, safety, but then how do you trade off safety and getting to the destination? Right. If you prioritize safety above everything else, then you're never going to leave the garage because just going out onto the road and cause some risk. Well, okay, so then we have to put in some tradeoff between safety and making progress. And then you've got, you know, obeying the laws, then you've got not pissing off

Starting point is 00:27:14 all the other drivers on the road. Then you've got not shaking up the passenger right by starting and stopping too much. And the list goes on and on and on. And the self-driving car companies are now facing this problem. And they have whole committees and they have meetings all the time trying to figure out, okay, we get the latest data from our cars in the field and all the mistakes they made and tweak all the objectives to get the behavior better and so on. So even for that problem, it's really hard. And if you had something like curing cancer

Starting point is 00:27:48 or fixing the carbon dioxide levels, you can see how things go wrong. What a cure cancer really fast, it sounds good. OK, great. Well, then we'll induce tumors in the entire human population so that we can run millions of experiments in parallel on different ways of curing them. You just don't want to be in that situation, right? And so the answer, it seems to me, is we have to get rid of the standard model. And so here my, I wrote a textbook based on the standard model.

Starting point is 00:28:39 In fact, it's sort of in many ways, it made the standard model, the standard model, and here am I saying, actually, sorry, Chaps, we go to overall, and we're going to have to rebuild the whole field. And so you're going to get rid of this assumption that the human is going to supply the complete fixed objective. It's too complex. It would be too arduous. I'm going to guess to be able to program in plugging the little holes in the bottom of the boat for each one of the ways that the machine could slightly go off course. So you've got safety, okay, we write the safety algorithm, okay, we've got

Starting point is 00:29:29 speed, we write the speed algorithm. I'm going to guess that the goal would be to get a more robust sort of scalable, general solution to this that would be able to find a problem, a solution to all potential problems that would be able to optimize the outcome across all potential challenges. Yeah, sort of. I mean, it's, if you, so I mean, basically what you have to build machines that know that they don't know what the objective is and act accordingly. So what does act accordingly mean? Well, to the extent that the machine does know the objective, it can take actions, as long as those actions don't mess

Starting point is 00:30:16 with parts of the world that the algorithm isn't sure about your preferences. Right, you know, So if you have a machine that's going to try to restore carbon dioxide levels in the atmosphere to their pre-industrial concentrations, that's a really good objective. Well, it wouldn't be a good objective if the solution was get rid of half the oxygen. Because then we would all have slowly as fixate. So that would be really bad. Don't do that. What if it means turning the oceans into sulfuric acid? Yeah, okay, don't do that. So you need the machine to actually ask permission. Right? So you need the machine to actually ask permission. Right? And that's, and it would have an incentive to do that. So it knows that it doesn't know what the objective is, but

Starting point is 00:31:12 it knows that its mission is to further human objectives, whatever they are. So it has an incentive to ask, to ask permission, to defer. If we say stop, that is what I meant. It has an incentive to obey, because it wants to avoid doing whatever it is that violates our objectives. And so you get these new kinds of behaviors, the system that believes that it has the objective, becomes a kind of religious fanatic, right?

Starting point is 00:31:48 It pays no attention when we say, you know, stop you're destroying the world, it's, I'm sorry, I've got the objective, you know, whatever you're saying is wrong because I got the objective and I'm pursuing it, right? We don't, you know, we don't want machines like that. So in this new model, it seems much more difficult and in a way it is much more difficult to satisfy an objective that you don't know, right? But it produces these behaviors, you know, asking questions, asking permission, deferring, you know, and in the extreme case, allowing yourself to be switched off. If the machine might do something really catastrophic, then we would want to switch it off. Now, a machine that believes that it has the correct objective is going to prevent you from switching it off,

Starting point is 00:32:41 because that would be failing. It wouldn't achieve its objective if it gets switched off. The machine that knows that it doesn't know what the objective is actually wants you to switch it off, right? Because it doesn't want to do anything sufficiently bad that you'd want to switch it off. So it wants, it has a positive incentive to allow itself to be switched off. And so this new model, I don't think it's perfect, but it's a huge step beyond the way we've been thinking about AI

Starting point is 00:33:16 for the last 70 years. And I think it's the core of a solution that will allow us, you know, not to end up liking Midas. What's not perfect about it? I think the biggest problem that I'm wrestling with right now is the fact that human objectives are actually, should we say, plastic or malleable, right? And you can tell that because we don't have them when we born, right? When we're born, we have pretty simple objectives.

Starting point is 00:34:00 And so it's something about our culture, maturation, et cetera, that creates adults who have to some extent fairly definite preferences about the future. So the way I think about it is not asking you to write them down, right? Because in the end, that's really hopeless. them down. That's really hopeless. But if I could show you two movies of the future, future A, future B, and you could watch those and reset yourself and watch the other one

Starting point is 00:34:40 and reset yourself and then say which one do I prefer? I think that's a reasonable back-of-the-envelope description of what we're talking about. Everything you care about in the future. And if the movie, if you couldn't quite tell, whether you liked AOV because there's some detail missing, then you can get some more detail on those parts. And a future where the oceans are turned into sulfuric acid and we all die oxygen deprivation. It's pretty clear that's not the future we prefer. So the issue with plasticity Plasticity and malleability is that although I might say I like future A today,

Starting point is 00:35:32 right? Tomorrow I'm a new person and I might like future B instead, but it's got too late because now you stuck me into future A. And so the first problem there is, well, who do you believe, right? Do you, you know, you're making a decision now, should I respect the preferences of the person now, or should I anticipate how you're going to change in future and respect your future self? And I don't, you know, philosophers haven't really given us the good answer to that question. So that's one part, right? It's a deep philosophical issue.

Starting point is 00:36:14 The more problematic part is that if our preferences can be changed, then the machine could satisfy our preferences by changing them rather than by satisfying them. So it could find out ways to change our preferences so that we'd be happy with whatever it was going to do rather than it figuring out how to make us happy with the preferences that we have. You know, and you could say, well, yeah, politicians do that and advertisers do that, right? We don't think of that as a good thing. And it could be, you know, with machines, it could be a much more extreme version of that.

Starting point is 00:36:59 And so, so I think of what's in the book as kind of version zero of the theory, and version one would have to deal with this aspect. You know, there are other difficult questions to answer. Like, you know, obviously machines are making decisions not on behalf of one person, but on behalf of everybody. And how exactly do you trade off the preferences of individuals who all have different futures that they prefer? And that's not a new question. You know, it's thousands of years old. And I think that that's, I feel that's a manageable question.

Starting point is 00:37:45 I think that that's, I feel that's a manageable question. And crudely speaking, the answer is you add them up. And that's what's called the utilitarian approach. And we associate names like Bentham and Mill with that idea. And more recently, Hassanye, who was a Berkeley economics professor who won the Nobel Prize and put a lot of utilitarianism onto an axiomatic footing. So what it's interesting actually to understand what that means because a lot of people have a emotional dislike of utilitarianism partly because the word utilitarian, that sort of refers to

Starting point is 00:38:35 gray plastic furniture and council flat. The branding problem. Yeah, it's a branding problem exactly. It got sort of mixed up with with the wrong word and You know people complain about it, you know not being sufficiently a egalitarian and not You know people assume that it refers to money like you know maximizing the amount of money in the world, and nothing to do with that. But the kinds of axioms that Harsani proposed, when you actually think about them, they, they probably, you know, most people would accept them as quite reasonable. So for example, you say suppose you've got two futures, future A and future B and future B is exactly the same as future A except one person is happier in future

Starting point is 00:39:36 B than they were in future A. Everyone else is exactly as happy as they were before. He said, well, it seems reasonable that you'd say future B is better than future A. And so he has a couple of axioms like that. And from those axioms, you can derive the utilitarian solution, which is basically add them up. Fine, whichever policy maximizes the sum total of human happiness. And so I think there are various difficulties involved.

Starting point is 00:40:15 So when you say the sum total of human happiness, are you including all the people who haven't yet been born? And if so, what about actions that affect who gets born? Right? And that sounds like, you know, that sounds pretty weird, but actually, you know, the Chinese government with their one child policy, right? They wiped out 500 million people with that policy. So they, quote, killed more people than anyone in history as ever, killed way worse than the Holocaust,

Starting point is 00:40:56 way worse than Stalin. By preventing them from being able to have existed. And so preventing them from being existing, it was huge. Was it the right, was that a moral decision, was that the correct decision? Really hard to say. I mean, the reason they did it was because they were afraid that they would have mass starvation

Starting point is 00:41:19 if they had too much population growth. And they had experienced what mass starvation was like. So, you know, it's arguable that it was a reasonable thing to do, but it did lead to a lot of people, what existing. Presumably going for, right, it's really hard. Presumably going for just raw utilitarianism has a ton of awful externalities as well, though. Like the most happiness for the most people. Okay, well there's two variables we can play about with there. We could just make tons and tons and tons of people. There you go. Okay, well there we go. We've got

Starting point is 00:41:54 everyone's not that happy, but there's a lot of people and it's actually managed to make up for it. Or yeah, yeah, I mean this, yeah, this is Derek Parfett, who's a British philosopher, has a book called Reasons and Persons and this is one of the arguments in the book. And he calls it the repugnant conclusion, which is that we should make basically infinitely many people who have a barely acceptable existence. And if you watch the Avengers, one of the Marvel, the one where Thanos is collecting the stones of power or whatever they call it. He's proposing one side of that philosophical argument,

Starting point is 00:42:44 which is that he should get rid of half the people in the universe, and then the rest will be more than twice as happy. Yeah, dangerous. Dangerous using Thanos as the basis for your philosophical justification, isn't it? Yeah, so you have to get these things right before you give him the big glove. Before you give him the big glove and that's the same question we face with AI. But it's not as if there's an obviously better solution, right? So alternative to utilitarianism is sometimes called the deontological or the right-based approach, where you simply write down a bunch of rules saying, can't do this, can't do this, can't do this, can't do this, can't do this. I have to do that, I have to do that.

Starting point is 00:43:39 The utilitarian can quite easily accommodate a lot of those rules. So if you say you can't kill people, well, the Tillars Air and so well, of course you can't kill people because the person who gets killed, that's not what they want the future to be like. And so the utilitarian solution would avoid And so the utilitarian solution would avoid murder. And mill goes on for pages and pages and pages. About, well, of course, moral rules are, I'm not throwing out moral rules. I'm just saying that if you actually go back to first principles, they follow from utilitarianism, but we don't

Starting point is 00:44:27 have the time and energy to go back to first principles all the time. So we write down a bunch of moral rules. And I think there are more complicated arguments about avoiding strategic complications when we're making decisions in society. It's much easier if there are more rules rather than thinking all the time, okay, well if I do this and then they might do that and I might do this and they might do that and I might do this, right, so sort of playing this complicated chess game with with with eight billion people all the time, it's just easier if there are rules that everyone those exist and will be respected. But the interesting place is what happens in the corner cases, right? Do we say no, the rule, no matter what the utilitarian calculation is, the rule is absolute.

Starting point is 00:45:27 And I think the answer is no. You can start out with some easy rules. The rule says you can't have to eat fish on Friday. Well, is that an absolute rule? Well, I don't know. I mean, if there were no fish and my child is starving and that, you know, the only thing for them to eat is some meat. I'll give them some meat, right? So we clearly see that, you know that rules are an approximation. And when we're in difficult corner cases, we fall back to first principles. And so I don't see that there's the degree of conflict between utilitarian and deontological

Starting point is 00:46:22 approaches that some people see. One of the typical arguments in utilitarianism, against utilitarianism, sorry, would say something like, well, with your organs, I could say five people's lives. Your kidneys, your lungs, maybe your heart. So I'm entitled to just go around ripping the organs out of people to save other people's lives. Well, of course, that's not what utilitarianism would suggest, because if that were how we

Starting point is 00:46:58 behave, life would be intolerable for everybody on Earth. We'd be constantly looking around over our shoulders and grabbing our kidneys. So, you know, so it's just, you know, so the, the utilitarian solution, sometimes called the rule utilitarianism is that it's useful to have these rules about behavior, not just to consider the individual act,

Starting point is 00:47:31 but to consider what if that act were allowed? What if there are a rule that you could always do that act, then it would be terrible. So I think you can reconcile a lot of these debates. But the examples that we've already touched on, the fact that our preference changes, the fact that we have to consider people who don't yet exist or might not exist. These are important unsolved questions no matter what philosophical place you come from. It might sound like very far future predictions, but the user being manipulated to make their preferences easier by the machine is actually something that's already happened. Can you take us through what social media content algorithms have done? Sure, yeah. So the social media content algorithms, right, they decide

Starting point is 00:48:30 what you read and what you watch. And they do that for literally billions of people for hours every day, right. So in that sense, they have more control over human cognitive input than any dictator in history has ever had. More than Stalin, more than Kim Il Sung, more than Hitler, they have massive power over human beings. And they are completely unregulated. And people are reasonably concerned about what effect they're having. And so what they do is basically they set an objective because they're good standard model machine learning algorithms. And so they're set objective, let's say maximize click through, right?

Starting point is 00:49:27 The parability that you're going to click on the next thing. So all the imaginables like this is YouTube, you know, you watch your video and lo and behold, another video pops up, right? And am I going to watch the next video that it sends me to watch or am I going to, you know, close the window? And so click through or, you you know engagement or various other metrics. These are the things that the algorithm is trying to optimize. And I suspect originally the companies thought well this is good because you know it's good for us if they

Starting point is 00:50:04 click on things we make money and it's good for us. If they click on things we make money. And it's good for people because the algorithm will learn to send people stuff they're interested in. If they click on it, it's because they wanted to click on it. Yeah, right. And there's no point sending them stuff that they don't like. They're just cluttering up their input, so to speak. But, you know, I think the algorithms had other ideas. And the way that an algorithm maximizes click-through in the long run is not just by learning what you want, right? Because you are not a fixed thing. And so you can get more long run, click the

Starting point is 00:50:48 roots, if you change the person into someone who's more predictable, right? Who's, for example, you know, addicted to a certain kind of violent pornography, And so YouTube can make you into that person by gradually sending you the gateway drugs and then more and more extreme content, whatever direction. So the algorithm doesn't know that you're human being or you have a brain.

Starting point is 00:51:22 As far as it's concerned, you're just a string of clicks. Content click, content click, content click. But it wants to turn you into a string of clicks that in the long run, there's more clicks and less long clicks. And so it learns to change people So it learns to change people into more extreme, more predictable, mainly, but it turns out probably more extreme versions of themselves. So if you indicate that you're interested in climate science, it might try to turn you into an eco-terrorist, you know, you know, articles full of outrage and so on. If you're interested in cars, it might try to turn you into someone who just watches endless, endless reruns of top gear.

Starting point is 00:52:20 Why is the person that's extreme or predictable? Well, I think this is an empirical hypothesis on my part, right? If you're more extreme, you have a higher emotional response to content that affirms your current views of the world. And so in politics we call it red meat, right? The kind of content that gets the base riled up about whatever it is that riled up about, whether it's the environment or immigrants flooding or shores or whatever it might be, right?

Starting point is 00:53:03 If once you get the sense that someone might be a little bit upset about too many immigrants, then you send them stuff about all the bad things that immigrants do, and videos of people climbing over walls and sneaking into beaches and all the rest of the stuff. And human propagandists have known this forever, but historically human propagandists could only produce one message. Whereas the content algorithms can produce, in theory one propaganda stream for each human being, especially tailored to them. The algorithm knows how you engage with every single piece of content. Your typical

Starting point is 00:53:57 Hitler's propaganda sitting in Berlin had absolutely no idea on a moment-to-moment basis how people were reacting to the stuff that they were broadcasting. They could see it in the aggregate over longer periods of time that certain kinds of content was effective in the aggregate but they don't have anything like the degree of control that these algorithms have.

Starting point is 00:54:24 degree of control that these algorithms have. And one of the strange things is that we actually have very little insight into what the algorithms are actually doing. So what I've described to you seems to be a logical consequence of how the algorithms operate and what they're trying to maximize. But I don't have hard empirical evidence that this is really what's happening to people because the platforms are pretty opaque. But they're opaque to themselves. They're opaque to themselves. So, you know, Facebook's own oversight board doesn't have access to the algorithms and the data to see what's going on.

Starting point is 00:55:08 Who does? I think the engineers, but their job is to maximize click-through, right? So pretty much there isn't anyone who doesn't already have a vested interest in this, who has access to what's happening. And that I think is something that we're trying to fix both at the government level. So there's this new organization called the Global Partnership on AI, which is, you know, it could just be,

Starting point is 00:55:48 you know, yet another do-goody talking shop, but it actually has government representatives sitting on it. So it can make direct policy recommendations to governments. And in some sense, it has the force of governments behind it when it's talking to the Facebooks and Google's of the world. So we're in the process of seeing if we can develop agreements between governments and platforms for a certain type of transparency. So it doesn't mean looking at whatever, looking at what Chris is watching on YouTube. I do not want to do that.

Starting point is 00:56:30 You do not want to do that at all. It means being able to find out how much terrorist content is being pumped out, whereas it coming from, who is it going to? It's slightly more sort of aggregated stuff like typical data scientists do. Yeah, and possibly being able to do some kinds of experiments, like if the recommendation algorithm works this way, what effects do we see on users compared to an algorithm that works in a different way?

Starting point is 00:57:07 So, to me, that's the really interesting question is, how do the recommendation algorithms work and what effect do they have on people? And if we find that they really are manipulating people, that they're sort of a consistent drift that a person who starts in a particular place will get driven in some direction that they might not have wanted to be driven in, then that's really a problem

Starting point is 00:57:40 and we have to think about different algorithms. And so in AI, we often distinguish between reinforcement learning algorithms, which are trying to maximize a long-term sum of rewards. So in this case, the long-term rate of clicks on the content stream is what the algorithm is trying to maximize. Those kinds of algorithms, by definition, will manipulate. Because the action that they can take is to choose a particular piece of content to send you. And then the state of the world that they are trying to change is your brain.

Starting point is 00:58:22 And so they will learn to do it. A supervised learning algorithm is one that's trying to get it right right now. So they are trying to predict whether or not you're going to click on a given piece of content. predict whether or not you're going to click on a given piece of content. So a supervised learning algorithm that learns a good model of what you will and won't click on could be used to decide what to send you in a way that's not based on reinforcement learning and long term maximization. But simply, OK, given a model of what you're likely to click on, we'll send you something that's consistent with that model. In that case, I think you could imagine that it would work in such a way that it wouldn't move you.

Starting point is 00:59:21 It wouldn't cause you to change your preferences. But if it was done right, it could sort of leave you roughly where you are. Are you familiar with the term audience capture? You know what this means from a creator, an online creator's perspective? I can imagine but not as not as a technical term. Yeah, well it's not a technical term, but it's basically when you have a particular creator online who finds a message narrative rhetoric that resonates with the audience.

Starting point is 00:59:52 And what you see is that this particular creator becomes captured and they start to feed their own audience a message that they know is going to be increasingly more well-liked. And for the most part, this actually does look like a slide toward one side of the one particular direction or the other, at least politically it does. But with anything it does, too, that people inevitably sort of niche down and then they bring their audience along with it. So the fascinating thing here, I mean, first off, it's unbelievable that these algorithms that

Starting point is 01:00:25 are simply there to try and maximize time on site or click through or watch time or whatever, that they have managed to find a way, things that we programmed, managed to find a way to program us for it to be able to do its job better. I mean, that, when I read that in your book, it's insane. Like that's one of the most terrifying things that it's happening right. It happened. Like everybody that's listening to this has had something occur with regards to their preferences, their worldview, whatever it might be.

Starting point is 01:00:55 Something has slid in one way or another. You may be right, it may not be toward the extremes. I would say anecdotally based on what I see in the world, increasing levels of partisanship, no matter what it is, whether it be sports, politics, race relations, anything. People are moving toward the extremes. Why is this happening? Oh, well, you know, it's people getting into echo chambers and they're only being shown stuff like that.

Starting point is 01:01:21 And also the fact that the algorithms are actually trying to make them more predictable. But on top of that as well, there's another layer, which is the creation of the content itself that comes in from the creators. And they have their own levels of manipulation, which is occurred from their feed, then they kind of second-order that into what do I want to create, what have I seen that's successful, what does my audience seem to resonate with from me? So you have layers and layers of manipulation going on here. Yeah, and I think in some ways the creators are being manipulated by the system. I think every journalist now is thinking, okay, I have to get something that's clickbait.

Starting point is 01:02:09 I have to write an article that can have a headline that is sufficiently attractive that it'll get clicked on. It's almost the point where the headline and the article are completely divorced from each other. And you can see this now in the comments, right? The people writing the comments at the end of the article will say, oh, I'm really pissed off. This is just clickbait. The article really doesn't say anything about the thing you said you were going to say. So on so, so this, it was not as if this has never been going on. Obviously, you can't ban people from writing interesting articles or I often think about the novel and it says on the back, I couldn't put it down. What should we ban?

Starting point is 01:02:55 Novels, because that's addictive. You can't have that. No, but I think it wasn't too bad before because the feedback loop was very slow. And there wasn't this targeting of individuals by algorithms who are, you know, so you think about the number of learning opportunities for the algorithm, right? I mean, it's billions every day for the YouTube selection algorithm, right? So it's the amount, the consistency, the frequency, and the customization of the learning opportunities for manipulation, so much greater. I mean, it's millions or billions of times greater and more systematic. And that systematic element, so it reminds me, I don't know if it's

Starting point is 01:03:53 apocryphal, but there's a story about the psychology lecturer and he's been teaching the students about subliminal effects. And the students decide to play it, trick on him, which is every time he's on the left-hand side of the room, they pay attention, they're really interested. And every time he walks onto the right-hand side of the room, they all really bored, start checking out the email. And by the end of the lecture, he's glued against the left hand. Right? And he has no idea that he's being manipulated. But because of the

Starting point is 01:04:33 fact that this was like systematic and sufficiently frequent, it has a very, very strong effect. very, very strong effect. And I think that the difference here is that it's because it's algorithmic and it's tied into this very high frequency interaction that people have with social media. It has a huge effect and it has a pretty rapid effect as well. What are some of the concerns you had, you mentioned earlier on about, is it infeabled, becoming too infeabled to the machines? Yeah, so this is, I think, one of two major concerns, if we manage to create superhuman AI and to control it, one concern is the misuse concern.

Starting point is 01:05:30 So I call it the Dr. Evil problem. Dr. Evil doesn't want to use the provably safe control play. He wants to make his own AI that's going to take over the world. And you can imagine that gets out of control and bad things happen. The infealment problem is sort of overuse, that we, because we have available to us AI systems

Starting point is 01:05:57 that can run our civilization for us, we lose the incentive to know how to run it ourselves. And that problem, you know, it's really a really hard problem to figure out how to prevent it. Because inevitably the AI would have to make the human do something that probably in the moment the human didn't want to do. The AI would actually be programming itself to be less useful than it could be in order to give us a sort of Hormesis stressor dose that allows us to stay useful. Yeah, I mean it's so when I say overuse, I literally mean that, right, that we would use AI too much for our own good.

Starting point is 01:06:54 And so EM Forster has a story, you know, so he usually wrote, you know, a late Victorian early Edwardian social, you know, British upper class social issue kinds of novels, but er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, internet, email, Moons. One, is it written? 1909. It has iPads, video conferencing,

Starting point is 01:07:33 you know, people are obese because they never get out of their chairs. They're stuck on the computer all the time. They start to astute face-to-face contact because they are basically glued to the screen and lose contact with the physical environment altogether. All the kinds of things that people complain about now he wrote about. And the machine, right? So it's not just the internet, it's a whole, it's a whole, it's called the machine that looks after everyone's physical needs.

Starting point is 01:08:15 And so, so they just spend their time glued to the screen. They don't know how the machine works anymore. And they don't need to because the machine runs itself. And then, of course, it stops running itself. Things go south. But, you know, I did a little back-of-the-envelope calculations. So it turns out that about 100 billion humans have lived. And our civilization is passed on by teaching the next generation of humans everything we know

Starting point is 01:08:56 and then maybe they'll learn a bit more. Right, and if that process fails, right, you know, if it goes into reverse that the next generation, those less, then you could imagine things unraveling. So the total amount of time spent just passing on civilization is years of effort. And for that to end would be the biggest tragedy that one could possibly imagine, right? But if we have no incentive to learn it all because finally, right, instead of having to put it into the heads of the next generation of humans, we could just put it into the heads of the machine and then they take care of it. Right?

Starting point is 01:09:47 And if you've seen War E, that's exactly what happens. They even show what happens to the generations over time. They become stupider and fatter and less, they can't run their own civilization anymore. So the machines should say, because this is such an important thing, right? It's our value to us and to our descendants that we are capable, that we are knowledgeable, that we know how to do things, that we have autonomy and intellectual vigor. Those are really important values. But so the machines should say, okay, we are going to stand back and let these humans, we have to let the

Starting point is 01:10:33 humans tie their own shoelaces. Otherwise, they'll never learn. But we, right, we are short-sighted lazy greedy people and we might say, no, you have to tie our shield laces. We keep doing that. And then, oh, we lose that autonomy in that intellectual figure. So this is a cultural problem, right? It's a problem with us. The technology might be saying, no, no, no, no, no, no, but we're overriding it.

Starting point is 01:11:06 All of these problems. All of these problems are problems with us. The problems are the fact that our goals are plastic, the fact that our language is imprecise, the fact that we are sometimes rational, sometimes irrational, that we don't have an omnipotent view that we can see what we want when we're going to want it. Also, these challenges around the fact that sometimes we want something in the moment that we're not going to want in the future. And that we're going to complain at the algorithm and say, well, no, no, machine, you're supposed to be here to do my bidding. And now you're telling me that I've got to walk to the shop to get the milk. I want you to get the milk. Well, Sire, I don't know why middle ages, middle ages, peasants now. Well, Sire, you must go to the milk, get the

Starting point is 01:11:44 milk yourself. You know that it is good for your calves and yourasant now? Well, Sai, you must go to the milk, get the milk yourself. You know that it is good for your calves and your bone density. So, I mean, this, we haven't even touched on how this even begins to be converted into computer code, which is, I imagine, a whole other complete minefield of difficulty to be able to actually get what we're talking about.

Starting point is 01:12:03 This is purely sort of within the realm of philosophy. What are some of the challenges that we have here when you have a all-powerful superbeing that can do whatever you want? Yeah, I mean, it's inevitable in a sense, because what we've always wanted is technology that's beneficial to us. And a lot of the time we say, oh, well, here's a technological idea. I hope it's going to be beneficial to us, like the Mojicare, for example.

Starting point is 01:12:41 And then it turns out not to be, or is arguably, although it conveyed lots of benefits, it might have ended up being our destruction. It's one of the black-boostrums, back balls, right, or gray balls, I suppose, out of the urn. Yeah, so it's responsible largely for the destruction of the climate. And so unless we get out of this, it will have been a really bad idea to do it. But almost by definition, with something as powerful as AI or as super-delusion AI, you need to know, right, it's either going to be very beneficial or very

Starting point is 01:13:27 not beneficial, right? It's not going to be like margarine versus butter or something like that, right? And so, so we have to ask ourselves, okay, what does beneficial mean? What does beneficial mean? If we're going to prove a theorem that's developing such technology, it's actually going to be beneficial, then inevitably it comes back to what are humans and what do we want? And how do we work? And so, yeah, so that kind of surprised me when I started along this path of trying to solve the control problem. You know, I had ideas for algorithms that would do it and so on, but I didn't realize the extent to which it would push me into these humanistic questions. And that's been fascinating, and it's a little bit of a minefield for a technologist to stray into these areas because they operate in different

Starting point is 01:14:42 ways, in many ways they're much more vicious than the technological fields, because in technology it's sort of, there's us humans and then it works or it doesn't write or it's true, it's a true scientific theory or isn't. So there's this third party called Nature out there. But in the humanistic areas, there isn't a third party, right? It's just one school or four to another school. It's just debates all the way down.

Starting point is 01:15:14 Writing it out for supremacy, and it takes a while to adjust to that. But the questions are super important and really fascinating. So I've enjoyed it a lot. So coming back to the question of the algorithms. One way to think about it is that perhaps a little bit less daunting is to look back at what's happened in the AI with respect to, should we say, ordinary uncertainty.

Starting point is 01:15:59 So in the early days of AI, we mostly worked on problems where the rules were known and fixed in deterministic, like the rules of chess, or finding a path through a map or something like that, right? We know what the map is, we know that if you turn left, you go left. So we have logical rules. We could use these deterministic symbolic techniques to solve the problem. And then we found that as we moved into real world,

Starting point is 01:16:33 uncertainty becomes really important. So if you're controlling the Mars rover, and you give it some command to, you know, to go 70 meters in a particular direction because it takes 10 minutes for the commands to go back with the forwards. Is it going to get there? Well, you don't know. It might get stuck or it might, you know, deviate a little bit or, you know, one wheel will start spinning and it won't make any progress. Who knows? So real-world problems, you always have to handle the uncertainty in your knowledge of the physics of the world and in your, even just what your senses are telling you, right, that their

Starting point is 01:17:18 senses themselves are imperfect and noisy and incomplete. So uncertainty became a core consideration in AI around early 1980s, I would say. And so that period from late 80s to early 2000s was really the period where probability was, the dominant paradigm for AI research. But in all that time, it does not seem to have occurred to anyone, except for a few, I think, very bright people,

Starting point is 01:17:55 that there is also uncertainty in the objective. So we have all these problem formulations for decision-making under uncertainty, but they assume that you know the objective exactly imperfectly. And at least looking back at that now, it's like, well, that's bonkers, right? It's just as bonkers as assuming that you know the physics of the world exactly imperfectly, or that your senses give you exact and perfect access to the state of the world at all times. Because we had already seen many examples of objective failure, where we specified the wrong objective and the machine that something complete, you know, that we thought was completely

Starting point is 01:18:39 bunkers, but in fact, it was doing exactly what we told it to do. We just didn't realize what we had told it. In fact, it was doing exactly what we told it to do. We just didn't realize what we had told it. And my favorite example is one from simulated evolution. And so simulated evolution, you define a fitness landscape, which simulated creatures are considered to be more fit or less fit than they have, or they get to reproduce and mutate, and gradually you can evolve creatures that are really, really good at whatever it is you want them to be good at. So, the objective was, well, what they wanted was to evolve creatures that could run really fast. So they specified the objective as the maximum velocity of the center of mass of the creature.

Starting point is 01:19:38 And that sounds like a hundred miles high, that would then fall over. And in falling they went really, really fast. So they won the competition, right? They turned out to be the solution to that problem. Someone thought they were going to get some supercharged nitro cheaters or leopards or something. Exactly. Instead, you end up with trees reaching up into the stratosphere and then falling all over the place. Yeah.

Starting point is 01:20:14 So, I thought that was a great example. And of course that's all, that's a simulation in the lab. So people for ho ho ho, and let me fix the problem. But of course, in the real world, right, in your climate engineering system or your, you know, your economic governor or whatever it might be, right? You can't just go ho ho ho ho, and you'll fix it. It will press the reset button. Brian Christian told me about a problem robots playing football football and they'd

Starting point is 01:20:48 put a very small utility function in for gaining control of the ball the possession is an instrumental goal towards scoring the you can't score if you don't have the ball and What the robot found was that it could actually maximize it's utility function by going up to the ball and vibrating its paddle a hundred times a second up against ball, just far easier than actually trying to score. It ended up thinking it had done really great and the guys just had these sort of seizureing robots all over the pitch vibrating up against the ball. Yeah, exactly. Which is sort of what happens with little kids sometimes, right?

Starting point is 01:21:26 They want to get the ball, they want to get the ball, they want to get better and know what to do with it once they've got it. Yeah, yeah. But anyway, the, yeah, I mean that, some of these problems have technical solutions, that that particular one, there's a there's a theorem for what kind of supplemental rewards you, you can provide that are intended to be helpful.

Starting point is 01:21:49 But we'll end up not changing the final behavior. So it'll make it easier to learn the final behavior, but it will ensure that the final behavior is still optimal according to the original objective. So you can fix that aspect of the problem. But if you leave something out from the objective, you left it out and That's equivalent to saying it's worth zero Right, so anything you leave out of the objectives like saying it has it It's a value zero to humans and that's you know, that's a big problem So you you almost always want to say well, I told you some of the things I care about,

Starting point is 01:22:27 but there's other stuff I haven't told you and you should be aware of that. It feels to me like there's sort of two cars in this race. On one side, we have technological development that has lots of facets, hardware, algorithms, so on and so forth. And then on the other side, you have the control problem. You have getting the alignment right. It has to be that the alignment problem gets across the finish line before the technology does or else we are

Starting point is 01:22:57 rolling, essentially rolling the dice and hoping that we've got it right by some sort of fluke. And I imagine that there are far more ways of getting it wrong than there are of got it right by some sort of fluke. And I imagine that there are far more ways of getting it wrong than there are of getting it right. Yeah, well, I mean, getting it wrong is actually the default, right? If we just continue pushing on the AI technology in the standard model, we'll get it wrong. Are you, so is there any chance that if you continue

Starting point is 01:23:23 to the standard model that it could be right, or would you give it such a low chance that it's negligible? I think it's negligible. I think arguably what's happening with social media is an example of getting it wrong. The other people have pointed out that we don't need to wait for AI corporations that are maximizing quarterly discounted profit stream are also examples of machines pursuing incorrectly defined objectives that are destroying the world. If you look at the climate issue from that point of view. I find it sort of enlightening, right? We have been outwitted by this AI system called the fossil fuel industry, right? It happens to have human

Starting point is 01:24:13 components, but the way corporations are designed, right, they are machines with human components. And actually the individual preferences of the humans in those machines don't really matter very much. Because the machine is designed to maximize profit. And they outwitted the human race. They develop more than 50 years. They've been running global propaganda and subversion campaign to enable them to keep selling fossil fuels. And they won. We can all say we're right, or we know that we shouldn't be doing this, and we know that we know the causes of climate change, but we lost.

Starting point is 01:25:04 We know the causes of climate change, but we lost. A lot fewer implications of that than an all-knowing, all-powerful artificial intelligence, though. So although the implications are still grave, if the climate problems get worse, it's not the same. And again, the control problem simply has to get across the line. If you're essentially adamant that currently, if you scale up the competence, not probably not the power, I suppose, of the computation that we have, it's bad situation. But obviously you have.

Starting point is 01:25:47 But the reverse usability, right? I mean, climate change probably isn't gonna make us extinct, unless there's some real chaos theory catastrophe that happens. And eventually we'll all be so fed up that we actually retake control from the fossil fuel industry. And that's sort of happening. But, yeah, with AI, it could be irreversible. There's lots of control.

Starting point is 01:26:16 And, you know, if I'm right, that examples like social media are showing that we are already seeing the negative consequences of incorrectly defined objectives and even relatively weak machine learning algorithms that are pursuing them, then we should pay attention. These are the canaries in the coal mine. We should be saying, okay, we need to slow down, and we need to look at this different paradigm. And the standard model is sort of just one corner.

Starting point is 01:27:02 It's the corner where the objective is completely and perfectly known, at least that's the, that's the corner where it's appropriate to use this data model, right? And there's this all the rest of the building, we haven't even looked at yet, right? Where there's uncertainty about what the objective is and the system can behave accordingly.

Starting point is 01:27:25 And we've just, you know, just in the last few years, have we had any algorithm that can solve this new category of problem? And it does, I mean, so the algorithms exist, right? They're very simple and they work in very restricted and simple instances of the problem, but they show the right properties, that they defer, they ask permission, they understand what the human is trying to teach them about human preferences. And it seems our job in what's called for one of the better words, the AI safety community, our job is to build out that technology

Starting point is 01:28:19 to create all the new algorithms and theoretical frameworks and demonstration systems and so on to convince the rest of the AI community that this is the right way to do AI. Because we can't do the other thing in the race. We can't slow down the technological progress because trying to neuter one particular agent or actor or country or nation state or even one group of nations Doesn't guarantee that some other group is not going to China saying right will stop doesn't mean that America will say Where we're just gonna keep going or vice versa. Yeah, well, it's just I mean the

Starting point is 01:29:01 Potential upside that people are seeing is so huge right, I mean it When I say it will be the biggest event in human history I Mean it why right it's not because You know Our advantages humans are whole civilization is based on a certain level of intelligence

Starting point is 01:29:30 that we have. So, we, our brains, are the source of the intelligence fuel that makes our civilization go around. If we have access to a lot more, all of a sudden, right? That is a step change in our civilization. And on the upside, it might enable us to solve problems that have been very resistant, like disease, poverty, conflict.

Starting point is 01:30:07 And on the downside, it might just be the last thing we do. So... If you could, if you had a God's eye view, would you put a pause on technological development for a hundred years outside of the control problem for a thousand years, for 50,000 years, because we've spoken about the dangers of killing people that haven't yet been born. And when you're talking about civilizational potential, the observable universe, you know, galactic sized sized empires von Neumann probes Making everything is at you know you're talking trillions and trillions of human lives even if you go from the utilitarian approach

Starting point is 01:30:51 You have an unlimited amount of utility and happiness that could be given and because We're unable at the moment to slow down technology Potentially within the next hundred years all of that could be snuffed out Yeah, it's interesting. In many of the works, I think in Bostroms and in Max Tecmark's and others, the argument is based on these quintillions of future humans who might be able to colonize the universe and so on. That's never been a motivation for me. If I have a picture, it's just of a small village with a church and people playing cricket on the village green.

Starting point is 01:31:39 That's what I don't want that to disappear. I don't want the civilization that we have to be gone, because it's the only thing that has value. I try not to think about what I do if I was God, as you say, it's not. Not good for the ego. Well, I just... I mean, obviously, I don't...

Starting point is 01:32:13 No one I think is going to be able to switch off scientific progress. You know, there are precedents, the biologists switched off progress on direct modification of the human genome in ways that are what they call heritable modifications, germline editing. They switched that off. They said, you know, in 1975, or gradually from 1977, I have onwards, they decided that that was not something they wanted to do. Which is interesting because a large part of the history of genetics and that whole branch of biology, the improvement of the human stock was actually one of the major objectives. And eugenics before the Second World War thought of itself as a noble mission.

Starting point is 01:33:22 You could argue about that, but what the biologist is to say, you know what, we could do this, but we're not. That was a big step. And is it possible for that to happen in AI? I think it's much more difficult and AI, I think is much more difficult because in biology, we are continuing to understand the developmental biology, right? So how does a given DNA sequence produce an organism, right? And what goes wrong? And, you know, is it a problem with genes or a problem with the development environment of the organism or what? And if you understand all those questions, then presumably you could, you could then say, okay, now I know how to modify the human genome so we can avoid all those problems. So the scientific knowledge is moving ahead, but the decision is we're not going to use that knowledge for

Starting point is 01:34:39 that kind of thing. And you can draw that boundary pretty easily because you know we're talking about physical you know physical procedures involving actual human beings and so on and that's being regulated for many decades already and so with AI once you understand how to do something, it's pretty much done. Right there, mathematics and code are just two sides of the same coin. And, you know, code, mathematical ideas, you can't go around looking at everyone's whiteboard and saying, okay, I see you've got, you know, sigma for x equals 1 to, okay, stop right there, right? That's one too many Greek symbols.

Starting point is 01:35:36 You've got to stop writing, right? You know, so we, because the question of, you know,-making and learning and so on, these are fundamental questions, we can't stop research on them. So I have to assume that the scientific understanding of how to do it is just going to continue to grow. If it was the case, which some people seem to think that to go from a scientific understanding to a deployed system would require some massive gigawatt installation with billions of GPUs and so on and so forth, then perhaps you could regulate at that point. Because there would be a physical limitation that would be quite easy to enact,

Starting point is 01:36:29 okay, you can't have this much power, this many. I'm going to guess that you feel otherwise that you don't need that much hardware to run something that could be quite dangerous. Correct. Yeah. I think we already have enough power, as I said. Yeah, I think we already have enough power, as I said. And it's very hard to do meaningful calculations, but in just in terms of raw numerical operations per second, a Google TPU pod, which is the tensor processing unit, you know, even three or four years ago, was operating at a higher rate than the possible theoretical maximum capacity of the human brain.

Starting point is 01:37:16 Right, so ballpark figure for the human brain is 10 to 17 operations per second, but I don't think any neuroscientists believe that we're doing anything like that much, right? I mean, they would probably ballpark it at 12 or 10 to 13 or something like that. But if you grant every possibility, it's 10 to the 17, where or, you know, TPU pod, which is, you know, sort of wardrobe sized thing is a 10 to the 17, you know, the biggest supercomputer is a 10 to the 18. So, you know, I think we have way more than enough power to build a super intelligent machine.

Starting point is 01:37:59 So I just don't, I don't think that trying to cut it off at the, you know, large scale hardware installation level is going to be feasible either. Anyway, you know, if you remember the old Apple ads or the G5. So the US had these export, put export controls on anything that was more than one gigaflop, right, which sounds ridiculous now, but they put export because they didn't want those falling into the hands of the Russians or the Chinese. So Apple produced its ad with a little G5, this little cube, and they had all these tanks surrounding this little G5. This little G5 is now under the US government has too hot to handle, yeah, exactly.

Starting point is 01:38:48 Right, so they use it as advertising material. So it's just unlikely that you could prevent the creation of super intelligent AI just by regulating hardware installations. So I do think of it as you say a race. I think we may see good catastrophes that are more obvious and unequivocal than what's happening in social media. And you know, that could happen on a small scale and self-driving cars. You know, I thought when the first Tesla completely failed to see a huge white truck and crash straight into it at full speed. I thought, you know, that kind of accident should at least say, okay, maybe the AI systems are not as good as we thought they were, but didn't seem to have much impact.

Starting point is 01:39:59 And, you know, we've killed several more people pretty much the same way. you know, we've killed several more people pretty much the same way. So it would have to be, I think, something, something pretty major would have to happen. We say that, but I've been thinking this over the last 16 months that COVID should have been the biggest wake up call for sin bio, for natural pandemics, for natural pandemics for engineered pandemics for research into that anything over that side of the aisle for whatever it is BSL three or BSL four labs. They should all be on the moon. They should all be on the bottom of the ocean. We should be airgapped from them. And No one's talking about that. Rob Reed. Rob Reed's talking about it, and's it. Like, there's no one. No one's bothered. They're not. I think I don't know. I have heard some biologists talking about

Starting point is 01:40:52 re-evaluating. After this global pandemic, maybe we should have a meeting, you know, have a cup of tea or something. But I just think humans, because life's so comfortable at the moment for us mostly, and because we have attached our sense of well-being to the progress of technology, I think everybody is praying at the altar of that currently, and the presumption is always more technology is good. There may be some hiccups along the way, but mostly we'll be able to fix it. And if we can't, we'll make a technology that can fix it, that will be enabled by the technology that was wrong.

Starting point is 01:41:28 But there is a, as you say, a step change. You know, it's not just a change in degree, it's a change of kind. When we reach this particular level, recursive self-improvement, blah, blah, blah, game over. Yeah, well, I think arguably the same thing is going on in biology, whether it's germline modification or synthesis of pathogens. And interestingly, for DNA synthesis, there is a response. It's not widely publicized, but all the manufacturers

Starting point is 01:42:10 of DNA synthesis equipment are now building in circuitry, which is non-public, which is a black box circuitry, which is checking the sequences to make sure you're not synthesizing any disease organism. And there, I think there's even a notification requirement. So, if you try, someone will be knocking on your door very quickly. So, that's, I think, a very sensible precaution. And even so, right, that there's a movement within synthetic biology, sort of libertarian movement, the garage movement saying,

Starting point is 01:42:54 we should be able to synthesize whatever we want. It's our right, you know, it's scientific freedom. I'm sorry. I think you're nuts, right? There is no scientific freedom is a value, but it doesn't trump all other values, including continued human existence. So, eventually, I think AI, computer science,

Starting point is 01:43:21 and AI have got to accept the same thing. Right? We've now reached a point where we're able to have a really significant effect on society or good or bad. And so that's time to grow up. It's time to accept that you have responsibilities and That society has a right to control what effect you have That's a whole other conversation there when we think about the current push towards libertarianism on the internet web 3.0 decentralized you can't stop me you can't stop my money

Starting point is 01:44:00 I can be wherever I want do whatever I want with whoever I want I think that there's going to be a serious sort of cultural conflict there. One of the reasons that you wrote human compatible was as a wake up call to people within the sort of AI research safety community. We're approaching what, two years, almost exactly two years since it was published. What has the effect of the book and your subsequent work, press, been? Has it had anything close to the impact that you intended? Has it had any other sorts of impacts? That's a very good question. And honestly, I don't know the answer. There's certainly more academic interest in these topics. You know, the num, right, we have a center at Berkeley and

Starting point is 01:45:01 the number of people coming to the workshop has been increasing rapidly. You know, workshops that we hold at the main conferences are growing really fast, hundreds and hundreds of people. I often, you know, I get emails from all kinds of AI researchers who say, you know, okay, I agree with everything you were saying in the book, how can I redirect my research to be helpful in this? I believe based on various sort of grapevine dribs and drabs that the questions of control have filtered up to the highest levels of government in various countries. That this is now one of the risks that's considered when people take stock of what do we need to pay attention to over the horizon.

Starting point is 01:46:01 This is one of the risks. But on the other hand, if I was just a mid-level AI manager in a technology corporation, would I change what I'm doing? Well, I think my recommendation is always I think my recommendation is always look at the objective that you're specifying for your algorithms, look at the actions that your algorithm could take and ask, could any of the effects of your algorithm go outside the scope that you are thinking of for your objective and have you thought about all the possible consequences and whether they would be desirable. That should be something that we do as a matter of course and I would say the new EU AI regulations

Starting point is 01:47:07 AI regulations actually instantiate some of that. So, that's reasonably good. So, I, but there is no, you know, I can go and buy TensorFlow. I can, you know, I can go download reinforcement learning libraries. There's tons of software in the standard model, but there's almost nothing in the new model. There's just tons of research literally every chapter of the textbook has to be rewritten from scratch. And you can't rewrite it until someone's done all the work to create the new algorithms and the new theorems and even to figure out the right way to formulate the problems. So that's something that I have to do.

Starting point is 01:48:01 And my students have to do. And the other groups around the world that are working on this. And the sooner we get that done, better, right? These will be better AI systems. It's not, you could think of them as well. OK, you're over there doing those traditional AI systems. You're bad, bad, bad, bad, bad. You know, and you need to fix your approach.

Starting point is 01:48:23 But that's never been very effective as a way to get people to change their behavior. You want them to get up in the morning and say, I'm going to build a really good AI system today. But what that means is that is beneficial to human beings. Just like an engineer, a civil engineer who designs bridges, he gets up every morning and say, I'm going to build the best bridge I can. And what does best bridge mean? What's come to mean bridges that don't fall down. And I was reading the memoir of the guy who ran the Russian BioWepens program. And it was clear that he got up every morning and said, I'm going to do the best science I can today.

Starting point is 01:49:07 What did best mean for him? It means, you know, making anthrax more fatal, you know, and making infectious diseases more infectious, right? That's what best meant for him. So you can affect what people do by affecting what they think of as good science or good engineering. And to me, it stands for reason that it's not good engineering if it does things that make you very, very unhappy. So, we need to change the way people think about what is good AI and then give them the tools to make it.

Starting point is 01:49:44 And time's running out. Yep. Stuart Russell, ladies and gentlemen, thank you very much for joining me today. If people want to check out your stuff, where should they go? So the book, Human Compatible, is available, including as an audio book with a very plummy accent, not mine. So it's pretty good.

Starting point is 01:50:05 My webpage, you can Google me, my webpage has all the publications. The Center for Human Compatible AI has a bunch of resources. And then there are other groups, such as the Future of Humanity Institute at Oxford, Center for the Study of Accessistential Risks at Cambridge, Future of Life Institute at MIT in Harvard.

Starting point is 01:50:32 So there's a bunch of groups around the world and we're working as artists we can to train students and get them out into all the universities and then teach the next generation how to go forward. Good luck, Fima. Thank you very much. Very nice talking to you Chris. you

Modern Wisdom - #364 - Stuart Russell - The Terrifying Problem Of AI Control

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.