Modern Wisdom - #364 - Stuart Russell - The Terrifying Problem Of AI Control
Episode Date: August 28, 2021Stuart Russell is a Professor of Computer Science at the University of California and an author. Programming machines to do what we want them to is a challenge. The consequences of getting this wrong ...become very grave if that machine is superintelligent with essentially limitless resources and no regard for humanity's wellbeing. Stuart literally wrote the textbook on Artificial Intelligence which is now used in hundreds of countries, so hopefully he's got an answer to perhaps the most important question of this century. Expect to learn how AI systems have already manipulated your preferences to make you more predictable, why social media companies genuinely don't know what their own algorithms are doing, why our reliance on machines can be a weakness, Stuart's better solution for giving machines goals and much more... Sponsors: Get 20% discount on the highest quality CBD Products from Pure Sport at https://puresportcbd.com/modernwisdom (use code: MW20) Get perfect teeth 70% cheaper than other invisible aligners from DW Aligners at http://dwaligners.co.uk/modernwisdom Extra Stuff: Buy Human Compatible - https://amzn.to/3jh2lX5 Get my free Reading List of 100 books to read before you die → https://chriswillx.com/books/ To support me on Patreon (thank you): https://www.patreon.com/modernwisdom - Get in touch. Join the discussion with me and other like minded listeners in the episode comments on the MW YouTube Channel or message me... Instagram: https://www.instagram.com/chriswillx Twitter: https://www.twitter.com/chriswillx YouTube: https://www.youtube.com/ModernWisdomPodcast Email: https://chriswillx.com/contact/ Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Hi friends, welcome back to the show. My guest today is Stuart Russell, he's a professor
of computer science at the University of California and an author.
Programming machines to do what we want them to is a challenge. The consequences of getting
this wrong become very grave if that machine is super intelligent with essentially limitless
resources and no regard for humanity's well-being. Stuart literally wrote the textbook on artificial intelligence, which is now used in hundreds
of countries.
So hopefully, it's got an answer to perhaps the most important question of this century.
Today, expect to learn how AI systems have already manipulated your preferences to make
you more predictable.
Why social media companies genuinely don't know what their own algorithms are doing?
Why are reliance on machines can be a weakness, stew its better solution for giving machines
goals, and much more?
I think the main takeaway from this conversation is that programming computers to do what you
actually want is really, really hard.
And we don't have the luxury of getting this wrong if the computer that we're trying to program
is incredibly intelligent and controlling essentially the entire world.
I'm very glad that someone like Stuart is thinking about these questions
because he seems to be about as qualified as you can be,
but as you'll hear, there are far fewer resources
than are needed being dedicated to this area of research.
Before we get on to other news,
if you haven't already picked up a copy
of the Modern Wisdom reading list,
100 books to read before you die,
you can go and get yours right now
for free by heading to chriswillx.com slash books.
It's completely free, it took me months to write and thousands of people have already downloaded it.
So go and get your copy now chriswillx.com slash books.
But now it's time for the wise and wonderful Stuart Russell. Let's do it Russell, fuck it up the show.
Thank you, nice to be you.
What do King Midas and Artificial Intelligence have in common? Good question.
So King Midas is famous in two ways, right? So he had the golden touch, so people
think of him as kind of a loadstone for getting rich. But the moral of the story with King Midas is
he said, I want everything I
touched to turn to gold. And he got exactly what he wanted. So the gods granted his wish,
and then he finds out that he can't eat because his food turns to gold and he can't drink,
because his wine turns to gold, and then his family turns to gold. So he dies in misery and starvation. And this tale is basically a description of what happens or what might happen with
super intelligent AI.
Where the super intelligent AI plays the role of the gods and we are King Midas.
And we tell the AI, this is what we want, and we make a mistake.
Right?
And then the AI is pursuing this objective,
and it turns out to be the wrong one.
And then we have created a conflict.
We basically created a chess match
between us and the machines,
where the machines are pursuing some objective
that turns out to be in conflict with what we really want. chess match between us and the machines where the machines are pursuing some objective
that turns out to be in conflict with what we really want.
And that's basically the story of how things go south with super intelligent AI.
And if you look at what Alan Turing said, in 1951 he was on the radio, BBC Radio 3, the third program.
And he said, basically, we should have to expect the machines to take control.
End of story.
And I think this is what he had in mind that they would be pursuing Objectives and we would have no way to stop them more and to fear with them
Because they are more capable than us so they control the world
That's the challenge the fact that it's not just the objective is misaligned
But it's that the power deploying that misalignment is so vast that there's no stopping it once it's set away
Yeah, and if you're a gorilla or a chimpanzee or whatever, you thought your ancestors thought that they were the pinnacle of evolution and then they accidentally made humans and then
they lost control.
They have no control over their own future at all because we're here and we smarter than they are and end of subject.
Yet rare that the person that's supposed to be or the agent that's supposed to be in charge is actually less capable or less powerful or less intelligent than the agent that they're commanding to do their bidding.
that they're commanding to do their bidding?
Yes. Yeah, we don't have any good models for how this relationship would work.
So even if we do solve the control problem,
there are various issues that will still have to face. For example, how do we retain anything resembling the kind of intellectual vigor of civilization
when our own mental efforts are just puny compared to those of the machines that we're supposed to control. So, you know, and in some science fiction books,
for example, Ian Banks' Culture Novels,
which I highly recommend to your listeners,
he struggles with this because, you know,
they've got super powerful AI,
and everything is hunky-dory.
The AI systems always do stuff that's beneficial for humans,
but in a way they end up treating humans like children. There's always this delicate balance
which parents have. When do I stop doing up my kid's shoelaces and make them do their own shoelaces?
And it's, except that with parents and children, the children are not supposed to be the ones
who are in control of the parents.
Sometimes they are, but not supposed to be.
And we just don't have a model for that where the children are
commanding the parents, but the parents treating the children like children and saying,
okay, well, I think it's time for Johnny to, you know, learn to do his own shoe laces.
So I'm going to hold off on helping Johnny today. You know, I just don't exactly know how
I just don't exactly know how it's going to work and how humans are going to continue to have the incentive to slog through 20 years of education and so on, to learn things
that the machines can already do much better.
That's thankfully not a problem that we need to deal with just yet.
I suppose the fact that we don't have an imminent,
well, I suppose we don't know if it's going to be a hard take off,
so it might be imminent, it might be tomorrow.
But everything suggests that it's not.
Have you got any conception around how long it will be
before we do face a super intelligent AGI?
Well, usually I say I will not answer that question. base a super intelligent AGI?
Well, usually I say I will not answer that question.
And I was at a world economic forum meeting, which was officially off the record
under Chatham House rules. And I was somebody asked me that question.
So I said, well, you know, off the record, there's a number. What is I do? I said, I said, I said, I off the record, you know, within the
lifetime of my children. Okay. Yeah, you know, that's a flexible number because medical advances might
make their lives very long. And then 20 minutes later, it's on the daily telegraph front page.
What was this?
Probably 2015, I think, January 2015. Professor predicts sociopathic robots will take over the world
within a generation. That was what they had in mind. So even though I tried to be So, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I think, I that just scaling up the methods we have is going to create superintelligence.
You know, simply put, you make the machines faster, you get the wrong answer more quickly.
What's the bottleneck that we're facing at the moment then? Is it hardware? Is it algorithms?
It's definitely not hardware. I think we probably have more than enough hardware
to create super intelligent AI already.
I think it's, well, algorithms,
but it's basic conceptual gaps
in how we're approaching the problem.
Our current deep learning systems
in a real sense don't know anything. And so it's very hard for them to accumulate knowledge over time.
What does that mean that they don't know anything. So they can learn, they can learn a sort of an input output function.
And in the process of doing that, they can acquire internal representations that facilitate
the representation of that function and so on. But you don't learn, let's say Newton's Laws of Physics,
and then become able to apply those laws to other problems.
They have to retrain from scratch or almost from scratch
on new problems.
And if you think about the way science works,
which is the best example of human accumulation of knowledge,
we know that it wasn't simply the accumulation
of raw sensory data and then training a giant network
on vast quantities of raw sensory data
because all the people who had that sensory data are dead.
the last quantities of raw sensory data, because all the people who had that sensory data are dead.
So whatever they learned, whatever they accumulated,
had to be expressed in the form of knowledge,
which subsequent generations could then take on board
and use to do the next phase.
And so a cumulative learning approach
based on the acquisition of knowledge
which can be used for multiple purposes,
we don't really know how to do that,
at least in the deep learning framework.
In the classical AI framework where we had
explicit representations of knowledge
using logic or probability theory theory that could be done.
And I still think it can be done in those paradigms.
I just think we need to put a lot more effort.
And maybe to be a bit more flexible,
I think one thing we learned from deep learning
is that there is advantage in being as it were sloppy.
So that we don't have to think,
okay, how can I learn Newton's law as well?
I'm gonna put F and M and A together in some order
and I'll eventually find the right law, right?
That assumes that you have F and M and A
already as precisely defined concepts.
But in the learning process, you can be more sloppy.
You don't have to have exactly the precise definition of mass and exactly the precise definition of force.
You can have a kind of a sloppy definition. It's something going on about how big and heavy the thingy
is, right?
And there's something going on about how hard am I pushing it?
And there's something going on about how
is it moving?
And gradually, those things gel.
And so you can simultaneously learn the knowledge
and the concepts that go into the formulation of the knowledge.
And so, I think that idea is something that we could bring back to the classical tradition and improve it.
What are some of the challenges around language?
So this is a second big area where I think we need breakthroughs.
So the language models that we have right now, GPT-3 and so on,
which everyone is very excited about, their job is basically to predict the next word,
and they become very, very good at predicting the next word and
then they can have them predict to the next word they can then add that next word and that's how
they generate text. So you can just keep repeatedly applying it and it will start spitting out things
that look like sentences and so on. But what they're really doing is predicting the next word
based on all the previous words that were in the text sequence. And it's a little bit like
astronomy in the time of tolamy. So tolamic astronomy
was what happened before we had any idea that the planets were massive objects moving under the force of gravity
So we plotted out the apparent motion of the planets and this through the heavens and
motion of the planets and through the heavens. And we basically describe their shape. And if you look at the shape, if you sort of plot it out over the course of a night, it's this, you know,
mostly sort of somewhat circular looking ox, but with wiggles and spirals and so on, added in,
you know, over long time scales because of the relative motion of the planets around the sun.
over long time scales because of the relative motion of the planets around the sun. And so, tollic-mayak-adastronomy, just consistent of describing those big,
cop-aided, wiggly, circled, spirally shapes.
And then you could gradually extrapolate them, right?
So once you understood the pattern of this big, wiggly, spirally shape,
you could then extend it. I got the shape. Now I'm just going to keep drawing it and
say, OK, well, next week, the planet should be here. And you'd be right. And so that's
sort of what's going on with these text prediction algorithms, right?
That they're taking all the previous words, which is by analogy to the positions of the planets,
and then saying, OK, I get the shape. I'm going to predict what it's going to be in two words,
three words forwards into the future. But there's no sense of why. Why is the word on the page?
The word is not on the page because the previous thousand words were on the page
All right, that's not the real cause of why that word is there the real cause of why that word is there because
There's a world
Outside this in the text and someone is trying to say something about it, right?
the text and someone is trying to say something about it.
And that's sort of what I call the physics of text.
There's no knowing beyond the simple output. So this is, I guess, is this similar to a philosophical zombie in a way that
you're able to output a thing that looks like it's a simulate room of intelligence within a sort of narrow band but there's nothing
going on deeper below the surface. That's one way I'm not I'm not here talking
about is there real conscious understanding? We haven't got there yet. But just does this causal model of why the text is there?
Does that approximate reality in any way?
No. The reality of text is that people are trying to say things
and they're trying to say things about a world that they live in and are acting in.
So, you know, the simplest model would be they're just trying to say true things,
but actually they're trying to get something done in that world,
and part of getting something done is saying true things,
and sometimes it's saying false things, and sometimes it's asking questions.
But this, you know, you can see this connection,
you know, why the real world matters. Because statistically, the fact that there's a real world,
and we're all talking about the same real world means that every document in the world is correlated with every other statistically. So if one document says
JK Rowling wrote Harry Potter, right? And another document written in Russian,
a year later says, the author of Harry Potter is, what do you expect the next word to be? Well,
you expect it to be
JK Rowling because they're talking about the world. And even if this is a new way of saying it
and a new language, it's correlated through this common hidden variable, if you like, which is the real world. And none of that is there in current deep learning models
of language.
So I think they're fundamentally flawed.
And this is one of the reasons why they take trillions
of words of text.
I mean, they read about as much as all human beings in history have read.
Right?
And so, and they're still, they still make stupid mistakes.
They still kind of lose the plot.
One of the things you see is that because they have a, you know, they're predicting the
next word based on a relatively limited memory of the previous text. They kind of lose the plot. So as they go on,
they'll start either repeating themselves or
contradicting them what they said earlier on in the text and so on.
So, you know, having said that, they do exhibit quite
impressive kind of short term recall and question answering.
And a certain amount of generalization is going on, right?
Because you can see that because you can ask them questions or you can tell them things
using a name or a place that they've never ever seen before. And then you can ask them questions about that name or that place, and they'll answer it correctly.
So there's some generalization going on, right?
They've learned a general pattern, and then they're able to instantiate the general pattern
with particular people or places or whatever it might be.
And so, you know, that's a sign that learning has happened.
But generally, we don't understand what's going on beyond that.
And so we don't know when they're just spouting gibberish, right?
You think it's answering your question and actually it's just spouting complete gibberish.
You don't know.
I suppose the challenge here is that the main way that we communicate is through language.
So if you're not a computer programmer and you wanted to have a conversation with a super
intelligent AGI home assistant, you would need to tell it what you mean.
It would not only need to be able to understand the words that came out of your mouth, but
our language, our use of language is imprecise also.
So it also needs to be able to work out what you meant to mean,
not just what you said.
Then it needs to interpret it.
Then it needs to be able to convert that into something that it can do within itself.
And then it needs to enact that.
So yeah, I mean.
Yeah, so I mean, we built systems that could do that,
even in the 60s and 70s.
And they sort of work the way you would expect.
They understood the structure of the language.
They, what's called parsing the sentences.
So figuring out what's the subject, what's the object.
And then converting that into an internal logical representation.
And then doing the reasoning necessary to answer the question,
or to add the fact to these systems internal knowledge base,
and then generating answers to questions,
and converting the answers back into language.
So that process, we've known how to do,
is just been very difficult to make it robust because
the variety of ways of saying things is enormous. We speak in ways that aren't grammatical,
but still perfectly comprehensible. We do things like lie, right? The last thing you want
to do is for the system to believe everything everyone says, because then it's very easy to manipulate.
So it has to understand that what's coming out of our mouth is as Wicconstein produces
a move in a game.
It's not gospel truth.
It's an action that we are taking and the action might be to try to fool you or to try
to make you do something
that you would not otherwise do or whatever.
So that level is completely not there.
You can, GPT-3 takes all text as gospel truth or whatever, right?
It doesn't distinguish between fiction and fact,
between propaganda and truth and so on.
It's all just text.
What are some of the big ways
that we could get artificial intelligence wrong?
So I think the current approach to AI,
which has been there really since the beginning
and in the book, human compatible, I call it the standard model, which is a word that
people use in physics to refer to, you know, all the laws of physics that we pretty much
agree on, right?
So in AI, the standard model has been to build machines that behave rationally and this notion of rational comes from philosophy and economics
that you take actions that can be expected to achieve your objectives.
And that goes back to Aristotle and other places.
So we took that model and we created AI systems that fit that model.
Now, with people, we have our own objectives,
so we can be rational with respect to our objectives.
Of course, machine doesn't have objectives intrinsically.
So we put the objective in and it seems like a perfectly reasonable plan.
I get in my automated taxi, I say take me to the airport, that becomes the taxi's
objective and then it takes me to the airport, it figures out a plan to do it and does it.
Pretty much all AI systems have been built on this basis that one of the inputs that's required to the algorithm is the objective.
If it's a game playing program, the objective is checkmate or whatever it might be.
If it's a root planning algorithm, then it's the destination.
If it's a reinforcement learning algorithm, then it's the reward and punishment definition, and so on.
So, and this is a pretty common paradigm, not just AI,
but the control theory systems that fly are airplanes.
They minimize a cost function.
So the engineer specifies a cost function,
which penalizes deviations from the desired trajectory,
and then the algorithms will optimize a given cost function.
And okay, so what's the problem with that?
The problem is, as I mentioned earlier,
when you brought up King Midus, we don't know
how to specify the objective completely and correctly. And so for artificially defined
problems, like chess, chess comes with a definition of checkmate. So it's sort of fooling
us into thinking that this is an easy problem to specify the objective. But take the automated taxi, the self-driving car,
is the destination the objective? Well, that would not be because then it might drive you
there at 200 miles an hour and you come home with 50 speeding tickets and if you weren't
dead. So obviously, safety is also part of the objective,
right? Okay, well fine, safety, but then how do you trade off safety and getting to the destination?
Right. If you prioritize safety above everything else, then you're never going to leave the garage
because just going out onto the road and cause some risk.
Well, okay, so then we have to put in some tradeoff
between safety and making progress.
And then you've got, you know,
obeying the laws, then you've got not pissing off
all the other drivers on the road.
Then you've got not shaking up the passenger
right by starting and stopping too much.
And the list goes on and on and on.
And the self-driving car companies are
now facing this problem. And they have whole committees and they have meetings all the
time trying to figure out, okay, we get the latest data from our cars in the field and all
the mistakes they made and tweak all the objectives to get the behavior better and so on. So even for that problem, it's really hard. And if you had something like curing cancer
or fixing the carbon dioxide levels,
you can see how things go wrong.
What a cure cancer really fast, it sounds good.
OK, great.
Well, then we'll induce tumors in the entire human population so that we can
run millions of experiments in parallel on different ways of curing them. You just don't
want to be in that situation, right? And so the answer, it seems to me, is we have to get rid of the standard model.
And so here my, I wrote a textbook based on the standard model.
In fact, it's sort of in many ways, it made the standard model, the standard model, and here am I saying, actually,
sorry, Chaps, we go to overall, and we're going to have to rebuild the whole field.
And so you're going to get rid of this assumption that the human is going to supply the complete
fixed objective.
It's too complex.
It would be too arduous.
I'm going to guess to be able to program in plugging the little holes in the bottom of
the boat for each one of the ways that the machine could slightly go off course. So you've got safety, okay, we write the safety algorithm, okay, we've got
speed, we write the speed algorithm. I'm going to guess that the goal would be to get a more
robust sort of scalable, general solution to this that would be able to find a problem,
a solution to all potential problems that would be able to optimize the outcome across all potential challenges.
Yeah, sort of. I mean, it's, if you, so I mean, basically what you have to build machines
that know that they don't know what the objective is and act accordingly.
So what does act accordingly mean?
Well, to the extent that the machine does know the objective,
it can take actions, as long as those actions don't mess
with parts of the world that the algorithm
isn't sure about your preferences.
Right, you know, So if you have a machine that's going to try to restore carbon dioxide levels in the atmosphere
to their pre-industrial concentrations, that's a really good objective.
Well, it wouldn't be a good objective if the solution was get rid of half the oxygen. Because then we would all have slowly as fixate.
So that would be really bad. Don't do that. What if it means turning the oceans into sulfuric acid?
Yeah, okay, don't do that. So you need the machine to actually ask permission.
Right? So you need the machine to actually ask permission. Right? And that's, and it would have an incentive to do that. So it knows that it doesn't know what the objective is, but
it knows that its mission is to further human objectives, whatever they are. So it has
an incentive to ask, to ask permission, to defer.
If we say stop, that is what I meant.
It has an incentive to obey, because it wants to avoid doing
whatever it is that violates our objectives.
And so you get these new kinds of behaviors,
the system that believes that it has the objective,
becomes a kind of religious fanatic, right?
It pays no attention when we say, you know, stop you're destroying the world, it's, I'm sorry,
I've got the objective, you know, whatever you're saying is wrong because I got the objective and
I'm pursuing it, right? We don't, you know, we don't want machines like that. So in this new model, it seems much more difficult
and in a way it is much more difficult to satisfy an objective that you don't know, right?
But it produces these behaviors, you know, asking questions, asking permission, deferring,
you know, and in the extreme case, allowing yourself to be switched off. If the machine might do something really catastrophic,
then we would want to switch it off.
Now, a machine that believes that it has the correct objective is going to prevent you from switching it off,
because that would be failing.
It wouldn't achieve its objective if it gets switched off.
The machine that knows that it doesn't know what the objective is actually wants you to
switch it off, right?
Because it doesn't want to do anything sufficiently bad that you'd want to switch it off.
So it wants, it has a positive incentive to allow itself to be switched off.
And so this new model, I don't think it's perfect,
but it's a huge step beyond the way we've been thinking about AI
for the last 70 years.
And I think it's the core of a solution that will allow us, you know, not to end up liking Midas.
What's not perfect about it?
I think the biggest problem that I'm wrestling with right now is the fact that human objectives are actually,
should we say, plastic or malleable, right?
And you can tell that because we don't have them
when we born, right?
When we're born, we have pretty simple objectives.
And so it's something about our culture,
maturation, et cetera, that creates adults
who have to some extent fairly definite preferences
about the future.
So the way I think about it is not asking you
to write them down, right?
Because in the end, that's really hopeless.
them down. That's really hopeless. But if I could show you two movies of the future, future A, future B, and you could watch those and reset yourself and watch the other one
and reset yourself and then say which one do I prefer? I think that's a reasonable back-of-the-envelope description
of what we're talking about.
Everything you care about in the future.
And if the movie, if you couldn't quite tell,
whether you liked AOV because there's some detail missing,
then you can get some more detail on those parts.
And a future where the oceans are turned into sulfuric acid and we all die
oxygen deprivation. It's pretty clear that's not the future we prefer. So the issue with plasticity Plasticity and malleability is that although I might say I like future A today,
right? Tomorrow I'm a new person and I might like future B instead, but it's got too late because
now you stuck me into future A. And so the first problem there is, well, who do you believe, right? Do you, you know,
you're making a decision now, should I respect the preferences of the person now, or should I anticipate
how you're going to change in future and respect your future self? And I don't, you know, philosophers
haven't really given us
the good answer to that question.
So that's one part, right?
It's a deep philosophical issue.
The more problematic part is that if our preferences
can be changed, then the machine could satisfy
our preferences by changing them rather than by satisfying them.
So it could find out ways to change our preferences so that we'd be happy with whatever it was
going to do rather than it figuring out how to make us happy with the preferences that
we have. You know, and you could say, well, yeah, politicians do that and advertisers do that, right?
We don't think of that as a good thing.
And it could be, you know, with machines, it could be a much more extreme version of that.
And so, so I think of what's in the book as kind of version zero of the theory,
and version one would have to deal with this aspect.
You know, there are other difficult questions to answer.
Like, you know, obviously machines are making decisions not on behalf of one person,
but on behalf of everybody.
And how exactly do you trade off the preferences
of individuals who all have different futures that they prefer? And that's not a new question.
You know, it's thousands of years old. And I think that that's, I feel that's a manageable question.
I think that that's, I feel that's a manageable question.
And crudely speaking, the answer is you add them up.
And that's what's called the utilitarian approach. And we associate names like Bentham and Mill
with that idea.
And more recently, Hassanye, who was a Berkeley economics professor who
won the Nobel Prize and put a lot of utilitarianism onto an axiomatic footing.
So what it's interesting actually to understand what that means because a lot of people have a
emotional dislike of utilitarianism partly because the word utilitarian, that sort of refers to
gray plastic furniture and council flat. The branding problem. Yeah, it's a branding problem exactly. It got sort of mixed up with
with the wrong word and
You know people complain about it, you know not being sufficiently a
egalitarian and
not
You know people assume that it refers to money like you know maximizing the amount of money in the world, and nothing to do with that.
But the kinds of axioms that Harsani proposed, when you actually think about them, they, they probably, you know, most people would accept them as quite reasonable. So for example, you say suppose you've got two futures, future A and
future B and future B is exactly the same as future A except one person is happier in future
B than they were in future A. Everyone else is exactly as happy as they were before. He said, well, it seems reasonable that you'd say
future B is better than future A.
And so he has a couple of axioms like that.
And from those axioms, you can derive the utilitarian solution,
which is basically add them up.
Fine, whichever policy maximizes the sum total
of human happiness.
And so I think there are various difficulties involved.
So when you say the sum total of human happiness,
are you including all the people who haven't yet been born?
And if so, what about actions that affect who gets born?
Right? And that sounds like, you know, that sounds pretty weird, but actually, you know,
the Chinese government with their one child policy, right? They wiped out 500 million people with that policy.
So they, quote, killed more people
than anyone in history as ever,
killed way worse than the Holocaust,
way worse than Stalin.
By preventing them from being able to have existed.
And so preventing them from being existing, it was huge.
Was it the right, was that a moral decision,
was that the correct decision?
Really hard to say.
I mean, the reason they did it was because they were afraid
that they would have mass starvation
if they had too much population growth.
And they had experienced what mass starvation was like. So, you know, it's arguable that it was a
reasonable thing to do, but it did lead to a lot of people, what
existing. Presumably going for, right, it's really hard.
Presumably going for just raw utilitarianism has a ton of awful
externalities as well, though. Like the most happiness for the
most people. Okay, well there's two variables we can play about with there. We could just make
tons and tons and tons of people. There you go. Okay, well there we go. We've got
everyone's not that happy, but there's a lot of people and it's actually managed to make up for it.
Or yeah, yeah, I mean this, yeah, this is Derek Parfett, who's a British philosopher, has a book called
Reasons and Persons and this is one of the arguments in the book.
And he calls it the repugnant conclusion, which is that we should make basically infinitely
many people who have a barely acceptable existence. And if you watch the Avengers,
one of the Marvel, the one where Thanos
is collecting the stones of power or whatever they call it.
He's proposing one side of that philosophical argument,
which is that he should get rid of half the people in the universe, and then the rest will be more than twice as happy.
Yeah, dangerous. Dangerous using Thanos as the basis for your philosophical justification, isn't it?
Yeah, so you have to get these things right before you give him the big glove.
Before you give him the big glove and that's the same question we face with AI.
But it's not as if there's an obviously better solution, right?
So alternative to utilitarianism is sometimes called the deontological or the right-based approach,
where you simply write down a bunch of rules saying, can't do this, can't do this, can't do this,
can't do this, can't do this. I have to do that, I have to do that.
The utilitarian can quite easily accommodate a lot of those rules. So if you say you can't kill people,
well, the Tillars Air and so well, of course you can't kill people because the person who gets killed,
that's not what they want the future to be like. And so the utilitarian solution would avoid
And so the utilitarian solution would avoid murder. And mill goes on for pages and pages and pages.
About, well, of course, moral rules are,
I'm not throwing out moral rules.
I'm just saying that if you actually
go back to first principles, they follow from utilitarianism, but we don't
have the time and energy to go back to first principles all the time. So we write down a bunch
of moral rules. And I think there are more complicated arguments about
avoiding strategic complications when we're making decisions in society. It's much easier if there are more rules
rather than thinking all the time, okay, well if I do this and then they might do that and I might
do this and they might do that and I might do this, right, so sort of playing this complicated
chess game with with with eight billion people all the time, it's just easier if there are rules that everyone those exist and will be respected.
But the interesting place is what happens in the corner cases, right?
Do we say no, the rule, no matter what the utilitarian calculation is, the rule is absolute.
And I think the answer is no.
You can start out with some easy rules.
The rule says you can't have to eat fish on Friday.
Well, is that an absolute rule? Well, I don't know. I mean, if there were
no fish and my child is starving and that, you know, the only thing for them to eat is
some meat. I'll give them some meat, right? So we clearly see that, you know that rules are an approximation.
And when we're in difficult corner cases, we fall back to first principles.
And so I don't see that there's the degree of conflict between utilitarian and deontological
approaches that some people see.
One of the typical arguments in utilitarianism,
against utilitarianism, sorry, would say something like,
well, with your organs, I could say five people's lives.
Your kidneys, your lungs, maybe your heart.
So I'm entitled to just go around ripping the organs out of people to save other people's
lives.
Well, of course, that's not what utilitarianism would suggest, because if that were how we
behave, life would be intolerable for everybody on Earth.
We'd be constantly looking around over our shoulders
and grabbing our kidneys.
So, you know, so it's just, you know,
so the, the utilitarian solution, sometimes called
the rule utilitarianism is that it's useful
to have these rules about behavior,
not just to consider the individual act,
but to consider what if that act were allowed? What if there are a rule that you could always do that act, then it would be terrible. So I think you can reconcile a lot of these debates.
But the examples that we've already touched on, the fact that our preference changes, the
fact that we have to consider people who don't yet exist or might not exist.
These are important unsolved questions no matter what philosophical place you come
from. It might sound like very far future predictions, but the user being manipulated to make their
preferences easier by the machine is actually something that's already happened.
Can you take us through what social media content algorithms have done?
Sure, yeah. So the social media content algorithms, right, they decide
what you read and what you watch. And they do that for literally billions of people
for hours every day, right. So in that sense, they have more control over human cognitive input than any dictator in
history has ever had.
More than Stalin, more than Kim Il Sung, more than Hitler, they have massive power over
human beings.
And they are completely unregulated.
And people are reasonably concerned about what effect they're having.
And so what they do is basically they set an objective because they're good standard model machine learning algorithms. And so they're set objective, let's say maximize click through, right?
The parability that you're going to click on the next thing.
So all the imaginables like this is YouTube, you know, you watch your video and
lo and behold, another video pops up, right?
And am I going to watch the next video that it sends me to watch or am I going to,
you know, close the window?
And so click through or, you you know engagement or various other metrics. These are the things that
the algorithm is trying to optimize. And I suspect originally the
companies thought well this is good because you know it's good for us if they
click on things we make money and it's good for us. If they click on things
we make money. And it's good for people because the algorithm will learn to send people
stuff they're interested in. If they click on it, it's because they wanted to click on
it. Yeah, right. And there's no point sending them stuff that they don't like. They're just
cluttering up their input, so to speak. But, you know, I think the algorithms had other ideas.
And the way that an algorithm maximizes click-through in the long run is not just by learning
what you want, right?
Because you are not a fixed thing. And so you can get more long run, click the
roots, if you change the person into someone who's more
predictable, right? Who's, for example, you know, addicted to a
certain kind of violent pornography, And so YouTube can make you into that person
by gradually sending you the gateway drugs
and then more and more extreme content,
whatever direction.
So the algorithm doesn't know that you're human being
or you have a brain.
As far as it's concerned, you're just a string of clicks.
Content click, content click, content click. But it wants to turn you into a string of clicks
that in the long run, there's more clicks and less long clicks. And so it learns to change people
So it learns to change people into more extreme, more predictable, mainly, but it turns out probably more extreme versions of themselves.
So if you indicate that you're interested in climate science, it might try to turn you
into an eco-terrorist, you know, you know, articles full of outrage and so on. If you're interested in cars,
it might try to turn you into someone who just watches endless, endless reruns of top
gear.
Why is the person that's extreme or predictable? Well, I think this is an empirical hypothesis on my part, right?
If you're more extreme, you have a higher emotional response to content that affirms
your current views of the world.
And so in politics we call it red meat, right?
The kind of content that gets the base riled up
about whatever it is that riled up about,
whether it's the environment or immigrants flooding
or shores or whatever it might be, right?
If once you get the sense that someone might be a little bit upset
about too many immigrants, then you send them stuff
about all the bad things that immigrants do,
and videos of people climbing over walls
and sneaking into beaches and all the rest of the stuff.
And human propagandists have known this forever, but historically human propagandists
could only produce one message. Whereas the content algorithms can produce, in theory one propaganda stream for each human being, especially tailored to them.
The algorithm knows how you engage with every single piece of content. Your typical
Hitler's propaganda sitting in Berlin had absolutely no idea
on a moment-to-moment basis how people were reacting to the stuff
that they were broadcasting.
They could see it in the aggregate
over longer periods of time
that certain kinds of content was effective in the aggregate
but they don't have anything like the degree of control
that these algorithms have.
degree of control that these algorithms have. And one of the strange things is that we actually have very little insight into what the algorithms
are actually doing.
So what I've described to you seems to be a logical consequence of how the algorithms
operate and what they're trying to maximize. But I don't have hard empirical evidence that
this is really what's happening to people because the platforms are pretty opaque.
But they're opaque to themselves.
They're opaque to themselves. So, you know, Facebook's own oversight board doesn't have access
to the algorithms and the data to see what's going on.
Who does?
I think the engineers, but their job is to maximize click-through, right?
So pretty much there isn't anyone who doesn't already have
a vested interest in this, who has access to what's happening.
And that I think is something that we're trying to fix
both at the government level.
So there's this new organization called the Global Partnership
on AI, which is, you know, it could just be,
you know, yet another do-goody talking shop, but it actually has government representatives
sitting on it. So it can make direct policy recommendations to governments. And in some
sense, it has the force of governments behind it when it's talking
to the Facebooks and Google's of the world. So we're in the process of seeing if we can
develop agreements between governments and platforms for a certain type of transparency.
So it doesn't mean looking at whatever,
looking at what Chris is watching on YouTube.
I do not want to do that.
You do not want to do that at all.
It means being able to find out how much terrorist content
is being pumped out, whereas it coming from,
who is it going to?
It's slightly more sort of aggregated stuff like typical data scientists do.
Yeah, and possibly being able to do some kinds of experiments, like if the recommendation
algorithm works this way, what effects do we see on users compared to an algorithm that works
in a different way?
So, to me, that's the really interesting question is, how do the recommendation algorithms work
and what effect do they have on people?
And if we find that they really are manipulating people,
that they're sort of a consistent drift
that a person who starts in a particular place
will get driven in some direction
that they might not have wanted to be driven in,
then that's really a problem
and we have to think about different algorithms.
And so in AI, we often distinguish between reinforcement learning algorithms, which are trying
to maximize a long-term sum of rewards.
So in this case, the long-term rate of clicks on the content stream is what the algorithm
is trying to maximize.
Those kinds of algorithms, by definition, will manipulate.
Because the action that they can take is to choose a particular piece of content to send you.
And then the state of the world that they are trying to change is your brain.
And so they will learn to do it. A supervised learning algorithm
is one that's trying to get it right right now. So they are trying to
predict whether or not you're going to click on a given piece of content.
predict whether or not you're going to click on a given piece of content. So a supervised learning algorithm that learns a good model of what you will and won't click on
could be used to decide what to send you in a way that's not based on reinforcement learning and long term maximization. But simply, OK, given a model of what you're likely to click on,
we'll send you something that's consistent with that model.
In that case, I think you could imagine
that it would work in such a way that it wouldn't move you.
It wouldn't cause you to change your preferences.
But if it was done
right, it could sort of leave you roughly where you are.
Are you familiar with the term audience capture? You know what this means from a creator, an
online creator's perspective? I can imagine but not as not as a technical term.
Yeah, well it's not a technical term, but it's basically when you have a particular creator online
who finds a message narrative rhetoric
that resonates with the audience.
And what you see is that this particular creator becomes captured
and they start to feed their own audience a message
that they know is going to be increasingly more well-liked.
And for the most part, this actually does look like a slide toward one side of the
one particular direction or the other, at least politically it does.
But with anything it does, too, that people inevitably sort of niche down and then they bring their
audience along with it. So the fascinating thing here, I mean, first off, it's
unbelievable that these algorithms that
are simply there to try and maximize time on site or click through or watch time or
whatever, that they have managed to find a way, things that we programmed, managed to
find a way to program us for it to be able to do its job better.
I mean, that, when I read that in your book, it's insane.
Like that's one of the most terrifying things that it's happening right.
It happened.
Like everybody that's listening to this has had something occur with regards to their
preferences, their worldview, whatever it might be.
Something has slid in one way or another.
You may be right, it may not be toward the extremes.
I would say anecdotally based on what I see in the world, increasing levels of partisanship,
no matter what it is, whether it be sports, politics, race relations, anything.
People are moving toward the extremes.
Why is this happening?
Oh, well, you know, it's people getting into echo chambers and they're only being shown
stuff like that.
And also the fact that the algorithms are actually trying to make them more predictable.
But on top of that as well, there's another layer, which is the creation of the content itself
that comes in from the creators. And they have their own levels of manipulation, which is
occurred from their feed, then they kind of second-order that into what do I want to create,
what have I seen that's successful, what does my audience seem to resonate with from me?
So you have layers and layers of manipulation going on here.
Yeah, and I think in some ways the creators are being manipulated by the system.
I think every journalist now is thinking, okay, I have to get something that's clickbait.
I have to write an article that can have a headline that is sufficiently attractive that
it'll get clicked on. It's almost the point where the headline and the article are completely
divorced from each other. And you can see this now in the comments, right? The people
writing the comments at the end of the article will say, oh, I'm really pissed off. This
is just clickbait. The article really doesn't say anything about the thing you said you
were going to say. So on so, so this, it was not as if this has never been going on. Obviously, you can't ban people from writing interesting articles or I often think about
the novel and it says on the back, I couldn't put it down.
What should we ban?
Novels, because that's addictive.
You can't have that. No, but I think it wasn't too bad before because the feedback loop was very slow.
And there wasn't this targeting of individuals by algorithms who are,
you know, so you think about the number of learning opportunities for the algorithm, right? I mean, it's billions
every day for the YouTube selection algorithm, right? So it's the amount, the consistency,
the frequency, and the customization of the learning opportunities for manipulation,
so much greater. I mean, it's millions or billions of times greater
and more systematic. And that systematic element, so it reminds me, I don't know if it's
apocryphal, but there's a story about the psychology lecturer and he's been teaching
the students about subliminal effects. And the students decide to play it, trick on him,
which is every time he's on the left-hand side of the room,
they pay attention, they're really interested.
And every time he walks onto the right-hand side of the room,
they all really bored, start checking out the email.
And by the end of the lecture, he's glued
against the left hand. Right? And he has no idea that he's being manipulated. But because of the
fact that this was like systematic and sufficiently frequent, it has a very, very strong effect.
very, very strong effect. And I think that the difference here is that it's because it's algorithmic and it's tied into this very high frequency interaction that people have with
social media. It has a huge effect and it has a pretty rapid effect as well.
What are some of the concerns you had, you mentioned earlier on about,
is it infeabled, becoming too infeabled to the machines?
Yeah, so this is, I think, one of two major concerns,
if we manage to create superhuman AI and to control it,
one concern is the misuse concern.
So I call it the Dr. Evil problem.
Dr. Evil doesn't want to use the provably safe control
play.
He wants to make his own AI that's going to take over the world.
And you can imagine that gets out of control
and bad things happen.
The infealment problem is sort of overuse,
that we, because we have available to us AI systems
that can run our civilization for us,
we lose the incentive to know how to run it ourselves.
And that problem, you know, it's really a really hard problem to figure out how to prevent it.
Because inevitably the AI would have to make the human do something
that probably in the moment the human didn't want to do. The AI would actually be programming
itself to be less useful than it could be in order to give us a sort of Hormesis stressor
dose that allows us to stay useful. Yeah, I mean it's so when I say overuse, I literally mean that, right, that we would
use AI too much for our own good.
And so EM Forster has a story, you know, so he usually wrote, you know, a late Victorian
early Edwardian social, you know, British upper class social issue
kinds of novels, but er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er,
er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, er, internet, email,
Moons.
One, is it written?
1909.
It has iPads, video conferencing,
you know, people are obese because they never get out of their chairs.
They're stuck on the computer all the time.
They start to astute face-to-face contact because they are basically glued to the screen and
lose contact with the physical environment altogether. All the kinds of things that people
complain about now he wrote about.
And the machine, right?
So it's not just the internet, it's a whole, it's a whole,
it's called the machine that looks after everyone's physical needs.
And so, so they just spend their time glued to the screen.
They don't know how the machine works anymore.
And they don't need to because the machine runs itself.
And then, of course, it stops running itself. Things go south.
But, you know, I did a little back-of-the-envelope calculations.
So it turns out that about 100 billion humans have lived.
And our civilization is passed on
by teaching the next generation of humans everything we know
and then maybe they'll learn a bit more.
Right, and if that process fails,
right, you know, if it goes into reverse
that the next generation, those less, then you could imagine
things unraveling. So the total amount of time spent just passing on civilization is years of effort. And for that to end would be the biggest tragedy that one could possibly
imagine, right? But if we have no incentive to learn it all because finally, right, instead
of having to put it into the heads of the next generation of humans, we could just put it into the heads of the machine and then they take care of it.
Right?
And if you've seen War E, that's exactly what happens.
They even show what happens to the generations over time.
They become stupider and fatter and less, they can't run their own civilization anymore.
So the machines should say, because this is such an important thing, right?
It's our value to us and to our descendants that we are capable, that we are knowledgeable,
that we know how to do things, that we have autonomy and intellectual vigor.
Those are really important values. But so the machines should
say, okay, we are going to stand back and let these humans, we have to let the
humans tie their own shoelaces. Otherwise, they'll never learn. But we, right, we
are short-sighted lazy greedy people and we might say, no, you have to tie our
shield laces.
We keep doing that.
And then, oh, we lose that autonomy in that intellectual figure.
So this is a cultural problem, right?
It's a problem with us.
The technology might be saying, no, no, no, no, no, no, but we're overriding it.
All of these problems. All of these problems are problems with us. The problems are the fact that
our goals are plastic, the fact that our language is imprecise, the fact that we are sometimes rational,
sometimes irrational, that we don't have an omnipotent view that we can see what we want when we're
going to want it. Also, these challenges around the fact that sometimes we want something in the moment that we're not going to want in the
future. And that we're going to complain at the algorithm and say, well, no, no, machine,
you're supposed to be here to do my bidding. And now you're telling me that I've got to
walk to the shop to get the milk. I want you to get the milk. Well, Sire, I don't know
why middle ages, middle ages, peasants now. Well, Sire, you must go to the milk, get the
milk yourself. You know that it is good for your calves and yourasant now? Well, Sai, you must go to the milk, get the milk yourself.
You know that it is good for your calves
and your bone density.
So, I mean, this, we haven't even touched on
how this even begins to be converted into computer code,
which is, I imagine, a whole other
complete minefield of difficulty
to be able to actually get what we're talking about.
This is purely sort of within the realm of philosophy.
What are some of the challenges that we have here
when you have a all-powerful superbeing
that can do whatever you want?
Yeah, I mean, it's inevitable in a sense,
because what we've always wanted is technology that's beneficial to us.
And a lot of the time we say, oh, well, here's a technological idea.
I hope it's going to be beneficial to us, like the Mojicare, for example.
And then it turns out not to be, or is arguably, although it conveyed lots of benefits,
it might have ended up being our destruction.
It's one of the black-boostrums, back balls, right, or gray balls, I suppose, out of the
urn.
Yeah, so it's responsible largely for the destruction of the climate.
And so unless we get out of this, it will have been a really bad idea to do it.
But almost by definition, with something as powerful as AI or as super-delusion AI,
you need to know, right, it's either going to be very beneficial or very
not beneficial, right? It's not going to be like margarine versus butter or something like that, right?
And so, so we have to ask ourselves, okay, what does beneficial mean?
What does beneficial mean? If we're going to prove a theorem that's developing such technology, it's actually going to be beneficial, then inevitably it comes back to what are humans and what do we want?
And how do we work? And so, yeah, so that kind of surprised me when I started along this path of trying to
solve the control problem.
You know, I had ideas for algorithms that would do it and so on, but I didn't realize the extent to which it would push me into
these humanistic questions. And that's been fascinating, and it's a little bit of a
minefield for a technologist to stray into these areas because they operate in different
ways, in many ways they're much more vicious than the technological
fields, because in technology it's sort of, there's us humans and then it works or it doesn't
write or it's true, it's a true scientific theory or isn't.
So there's this third party called Nature out there.
But in the humanistic areas,
there isn't a third party, right?
It's just one school or four to another school.
It's just debates all the way down.
Writing it out for supremacy,
and it takes a while to adjust to that.
But the questions are super important
and really fascinating.
So I've enjoyed it a lot.
So coming back to the question of the algorithms.
One way to think about it is that perhaps a little bit less daunting is to look
back at what's happened in the AI with respect to, should we say, ordinary uncertainty.
So in the early days of AI, we mostly worked on problems where the rules were known and fixed
in deterministic, like the rules of chess, or finding a path through a map or something
like that, right?
We know what the map is, we know that if you turn left, you go left.
So we have logical rules.
We could use these deterministic symbolic techniques
to solve the problem.
And then we found that as we moved into real world,
uncertainty becomes really important.
So if you're controlling the Mars rover,
and you give it some command to, you know,
to go 70 meters in a particular direction because it takes 10 minutes for the commands to go back
with the forwards. Is it going to get there? Well, you don't know. It might get stuck or it might,
you know, deviate a little bit or, you know, one wheel will start spinning and it won't make any
progress. Who knows? So real-world problems, you always have to handle the uncertainty in your knowledge of the physics
of the world and in your, even just what your senses are telling you, right, that their
senses themselves are imperfect and noisy and incomplete. So uncertainty became a core consideration in AI
around early 1980s, I would say.
And so that period from late 80s to early 2000s
was really the period where probability was,
the dominant paradigm for AI research.
But in all that time,
it does not seem to have occurred to anyone, except for a few,
I think, very bright people,
that there is also uncertainty in the objective.
So we have all these problem formulations for decision-making under uncertainty,
but they assume that you know the objective exactly imperfectly.
And at least looking back at that now, it's like, well, that's bonkers, right?
It's just as bonkers as assuming that you know the physics of the world exactly imperfectly,
or that your senses give you exact and perfect access to the state of the world at all times.
Because we had already seen many examples of objective failure, where we specified the
wrong objective and the machine that something complete, you know, that we thought was completely
bunkers, but in fact, it was doing exactly what we told it to do.
We just didn't realize what we had told it. In fact, it was doing exactly what we told it to do. We just didn't realize what we had told it.
And my favorite example is one from simulated evolution. And so simulated evolution,
you define a fitness landscape, which simulated creatures are considered to be more fit or less fit than they have,
or they get to reproduce and mutate, and gradually you can evolve creatures that are really,
really good at whatever it is you want them to be good at.
So, the objective was, well, what they wanted was to evolve creatures that could run really fast.
So they specified the objective as the maximum velocity of the center of mass of the creature.
And that sounds like a hundred miles high,
that would then fall over. And in falling they went really, really fast. So they won the competition,
right? They turned out to be the solution to that problem. Someone thought they were going to
get some supercharged nitro cheaters or leopards or something.
Exactly.
Instead, you end up with trees reaching up into the stratosphere
and then falling all over the place.
Yeah.
So, I thought that was a great example.
And of course that's all, that's a simulation in the lab.
So people for ho ho ho, and let me fix the problem.
But of course, in the real world, right, in your climate engineering system or your, you
know, your economic governor or whatever it might be, right?
You can't just go ho ho ho ho, and you'll fix it.
It will press the reset button.
Brian Christian told me about a problem robots playing football football and they'd
put a very small utility function in
for gaining control of the ball
the possession is an instrumental goal towards scoring the you can't score if you don't have the ball and
What the robot found was that it could actually maximize it's utility function by going up to the ball and
vibrating its paddle a hundred times a second up against ball, just far easier than actually trying to score.
It ended up thinking it had done really great and the guys just had these sort of
seizureing robots all over the pitch vibrating up against the ball.
Yeah, exactly. Which is sort of what happens with little kids sometimes, right?
They want to get the ball, they want to get the ball,
they want to get better and know what to do with it once they've got it.
Yeah, yeah.
But anyway, the, yeah, I mean that,
some of these problems have technical solutions,
that that particular one,
there's a there's a theorem for what kind of supplemental rewards you,
you can provide that are intended to be helpful.
But we'll end up not changing the final behavior.
So it'll make it easier to learn the final behavior, but it will ensure that the final behavior is still optimal according to the original objective.
So you can fix that aspect of the problem.
But if you leave something out from the objective, you left it out and
That's equivalent to saying it's worth zero
Right, so anything you leave out of the objectives like saying it has it
It's a value zero to humans and that's you know, that's a big problem
So you you almost always want to say well, I told you some of the things I care about,
but there's other stuff I haven't told you
and you should be aware of that.
It feels to me like there's sort of two cars in this race.
On one side, we have technological development
that has lots of facets, hardware,
algorithms, so on and so forth.
And then on the other side, you have the control problem. You have getting the alignment right. It has to be that the alignment
problem gets across the finish line before the technology does or else we are
rolling, essentially rolling the dice and hoping that we've got it right by
some sort of fluke. And I imagine that there are far more ways of getting it
wrong than there are of got it right by some sort of fluke. And I imagine that there are far more ways of getting it wrong than there are of getting it right.
Yeah, well, I mean, getting it wrong
is actually the default, right?
If we just continue pushing on the AI technology
in the standard model, we'll get it wrong.
Are you, so is there any chance that if you continue
to the standard model that it could be right,
or would you give it such a low chance that it's negligible?
I think it's negligible.
I think arguably what's happening with social media is an example of getting it wrong.
The other people have pointed out that we don't need to wait for AI corporations that are maximizing quarterly discounted profit stream
are also examples of machines pursuing incorrectly defined objectives that are destroying the world.
If you look at the climate issue from that point of view. I find it sort of enlightening, right? We have been
outwitted by this AI system called the fossil fuel industry, right? It happens to have human
components, but the way corporations are designed, right, they are machines with human components.
And actually the individual preferences of the humans in those machines don't really matter very much.
Because the machine is designed to maximize profit.
And they outwitted the human race.
They develop more than 50 years.
They've been running global propaganda and subversion campaign to enable them to keep selling fossil fuels.
And they won. We can all say we're right, or we know that we shouldn't be doing this,
and we know that we know the causes of climate change, but we lost.
We know the causes of climate change, but we lost.
A lot fewer implications of that than an all-knowing, all-powerful artificial intelligence, though.
So although the implications are still grave,
if the climate problems get worse, it's not the same. And again, the control problem
simply has to get across the line. If you're essentially adamant that currently, if you scale up
the competence, not probably not the power, I suppose, of the computation that we have,
it's bad situation.
But obviously you have.
But the reverse usability, right?
I mean, climate change probably isn't gonna make us extinct,
unless there's some real chaos theory catastrophe that happens.
And eventually we'll all be so fed up
that we actually retake control from the fossil fuel industry.
And that's sort of happening.
But, yeah, with AI, it could be irreversible.
There's lots of control.
And, you know, if I'm right, that examples like social media are showing
that we are already seeing the negative consequences
of incorrectly defined objectives and even relatively weak
machine learning algorithms that are pursuing them,
then we should pay attention.
These are the canaries in the coal mine. We should be saying, okay, we need to slow down,
and we need to look at this different paradigm.
And the standard model is sort of just one corner.
It's the corner where the objective is completely
and perfectly known, at least that's the,
that's the corner where it's appropriate
to use this data model, right?
And there's this all the rest of the building,
we haven't even looked at yet, right?
Where there's uncertainty about what the objective is
and the system can behave accordingly.
And we've just, you know, just in the last few years,
have we had any algorithm that can solve this new category of problem?
And it does, I mean, so the algorithms exist, right?
They're very simple and they work in very restricted and simple instances of the problem,
but they show the right properties, that they defer, they ask permission, they understand
what the human is trying to teach them about human preferences. And it seems our job in what's called
for one of the better words, the AI safety community,
our job is to build out that technology
to create all the new algorithms and theoretical frameworks
and demonstration systems and so on to convince
the rest of the AI community that this is the right way to do AI.
Because we can't do the other thing in the race. We can't slow down the technological progress
because trying to neuter one particular agent or actor or country or nation state or even one group of nations
Doesn't guarantee that some other group is not going to China saying right will stop doesn't mean that America will say
Where we're just gonna keep going or vice versa. Yeah, well, it's just I mean
the
Potential upside that people are seeing is so huge
right, I mean
it
When I say it will be the biggest event in human history I
Mean it why right it's not
because
You know
Our advantages humans are whole civilization is based on a certain level of intelligence
that we have.
So, we, our brains, are the source of the intelligence fuel that makes our civilization
go around.
If we have access to a lot more, all of a sudden, right?
That is a step change in our civilization.
And on the upside, it might enable us to solve problems
that have been very resistant, like disease, poverty,
conflict.
And on the downside, it might just be the last thing we do.
So...
If you could, if you had a God's eye view,
would you put a pause on technological development for a hundred years outside of the control problem
for a thousand years, for 50,000 years, because we've spoken about the dangers of killing people
that haven't yet been born. And when you're talking about civilizational potential, the observable
universe, you know, galactic sized sized empires von Neumann probes
Making everything is at you know you're talking trillions and trillions of human lives even if you go from the utilitarian approach
You have an unlimited amount of utility and happiness that could be given and because
We're unable at the moment to slow down technology
Potentially within the next hundred years all of that could be snuffed out
Yeah, it's interesting. In many of the works, I think in Bostroms and in Max Tecmark's and others,
the argument is based on these quintillions of future humans who might be able to colonize the universe and so on.
That's never been a motivation for me.
If I have a picture, it's just of a small village with a church and people playing cricket
on the village green.
That's what I don't want that to disappear.
I don't want the civilization that we have to be gone,
because it's the only thing that has value.
I try not to think about what I do if I was God,
as you say, it's not.
Not good for the ego.
Well, I just...
I mean, obviously, I don't...
No one I think is going to be able to switch off scientific progress.
You know, there are precedents, the biologists switched off progress on direct
modification of the human genome in ways that are what they call heritable modifications,
germline editing. They switched that off. They said, you know, in 1975, or gradually from
1977, I have onwards, they decided that that was not something they wanted to do.
Which is interesting because a large part of the history of genetics and that whole branch of biology, the improvement of
the human stock was actually one of the major objectives.
And eugenics before the Second World War thought of itself as a noble mission.
You could argue about that, but what the biologist is to say, you know what,
we could do this, but we're not. That was a big step. And
is it possible for that to happen in AI? I think it's much more difficult
and AI, I think is much more difficult because in biology, we are continuing to understand the developmental biology, right? So how does a given DNA sequence produce an organism, right?
And what goes wrong? And, you know, is it a problem with genes or a problem with the
development environment of the organism or what? And if you understand all those questions,
then presumably you could, you could then say, okay, now I know how to modify the human genome so we can avoid all those problems. So the scientific
knowledge is moving ahead, but the decision is we're not going to use that knowledge for
that kind of thing. And you can draw that boundary pretty easily because you know we're talking about physical
you know physical procedures involving actual human beings and so on and that's being regulated
for many decades already and so with AI once you understand how to do something, it's pretty much done.
Right there, mathematics and code are just two sides of the same coin.
And, you know, code, mathematical ideas, you can't go around looking at everyone's whiteboard and
saying, okay, I see you've got, you know, sigma for x equals 1 to, okay, stop right there,
right?
That's one too many Greek symbols.
You've got to stop writing, right?
You know, so we, because the question of, you know,-making and learning and so on, these are fundamental
questions, we can't stop research on them.
So I have to assume that the scientific understanding of how to do it is just going to continue
to grow. If it was the case,
which some people seem to think that to go from a scientific understanding to a deployed
system would require some massive gigawatt installation with billions of GPUs and so
on and so forth, then perhaps you could regulate at that point. Because there would be a physical limitation that would be quite easy to enact,
okay, you can't have this much power, this many. I'm going to guess that you
feel otherwise that you don't need that much hardware to run something that could be quite dangerous.
Correct. Yeah. I think we already have enough power, as I said.
Yeah, I think we already have enough power, as I said. And it's very hard to do meaningful calculations,
but in just in terms of raw numerical operations per second,
a Google TPU pod, which is the tensor processing unit, you know, even three or four years ago, was
operating at a higher rate than the possible theoretical maximum capacity of the human
brain.
Right, so ballpark figure for the human brain is 10 to 17 operations per second, but I
don't think any neuroscientists believe that we're doing
anything like that much, right? I mean, they would probably ballpark it at 12 or 10 to
13 or something like that. But if you grant every possibility, it's 10 to the 17, where
or, you know, TPU pod, which is, you know, sort of wardrobe sized thing is a 10 to the 17,
you know, the biggest supercomputer is a 10 to the 18.
So, you know, I think we have way more than enough power to build a super intelligent
machine.
So I just don't, I don't think that trying to cut it off at the, you know, large scale hardware installation level is going to be feasible either.
Anyway, you know, if you remember the old Apple ads or the G5.
So the US had these export, put export controls on anything that was more than one gigaflop,
right, which sounds ridiculous now, but they put export because they didn't want those falling into the hands of the Russians
or the Chinese.
So Apple produced its ad with a little G5, this little cube, and they had all these tanks
surrounding this little G5.
This little G5 is now under the US government has too hot to handle, yeah, exactly.
Right, so they use it as advertising material.
So it's just unlikely that you could prevent the creation of super intelligent AI just by regulating hardware installations.
So I do think of it as you say a race. I think we may see
good catastrophes that are more obvious and unequivocal than what's happening in social media.
And you know, that could happen on a small scale and self-driving cars.
You know, I thought when the first Tesla completely failed to see a huge white truck and crash straight into it at full speed.
I thought, you know, that kind of accident should at least say, okay, maybe the AI systems are
not as good as we thought they were, but didn't seem to have much impact.
And, you know, we've killed several more people pretty much the same way.
you know, we've killed several more people pretty much the same way.
So it would have to be, I think, something, something pretty major would have to happen. We say that, but I've been thinking this over the last 16 months that COVID should have
been the biggest wake up call for sin bio, for natural pandemics, for natural pandemics for engineered pandemics for research into that anything over that side of the aisle for whatever it is
BSL three or BSL four labs. They should all be on the moon. They should all be on the bottom of the ocean. We should be airgapped from them.
And
No one's talking about that. Rob Reed. Rob Reed's talking about it, and's it. Like, there's no one. No one's bothered. They're not.
I think I don't know. I have heard some biologists talking about
re-evaluating.
After this global pandemic, maybe we should have a meeting, you know,
have a cup of tea or something. But I just think humans, because life's so comfortable at the moment
for us mostly, and because we have attached our sense of well-being to the progress of technology,
I think everybody is praying at the altar of that currently, and the presumption is always more
technology is good. There may be some hiccups along the way, but mostly we'll be able to fix it.
And if we can't, we'll make a technology that can fix it, that will be enabled
by the technology that was wrong.
But there is a, as you say, a step change.
You know, it's not just a change in degree,
it's a change of kind.
When we reach this particular level,
recursive self-improvement, blah, blah, blah, game over.
Yeah, well, I think arguably the same thing is going on in biology, whether it's germline
modification or synthesis of pathogens.
And interestingly, for DNA synthesis, there is a response. It's not widely publicized, but all the manufacturers
of DNA synthesis equipment are now building in circuitry, which is non-public, which is
a black box circuitry, which is checking the sequences to make sure you're not synthesizing any disease
organism.
And there, I think there's even a notification requirement.
So, if you try, someone will be knocking on your door very quickly.
So, that's, I think, a very sensible precaution. And even so, right, that there's a movement
within synthetic biology, sort of libertarian movement,
the garage movement saying,
we should be able to synthesize whatever we want.
It's our right, you know, it's scientific freedom.
I'm sorry.
I think you're nuts, right?
There is no scientific freedom is a value,
but it doesn't trump all other values,
including continued human existence.
So, eventually, I think AI, computer science,
and AI have got to accept the same thing.
Right?
We've now reached a point where we're able to have a really significant effect on society
or good or bad.
And so that's time to grow up.
It's time to accept that you have responsibilities and That society has a right to control what effect you have
That's a whole other conversation there when we think about the current push towards libertarianism on the internet web
3.0 decentralized you can't stop me you can't stop my money
I can be wherever I want do whatever I want with whoever I want
I think that there's going to be a serious sort of cultural conflict there. One of the reasons that you wrote
human compatible was as a wake up call to people within the sort of AI research safety
community. We're approaching what, two years, almost exactly two years since it was published. What has the effect of the book and your subsequent work, press,
been? Has it had anything close to the impact that you intended? Has it had any other sorts of
impacts? That's a very good question. And honestly, I don't know the answer.
There's certainly more academic interest in these topics.
You know, the num, right, we have a center at Berkeley and
the number of people coming to the workshop has been increasing rapidly.
You know, workshops that we hold at the main conferences are growing really fast, hundreds and hundreds of people. I often, you know, I get emails from all kinds of AI
researchers who say, you know, okay, I agree with everything you were saying in the book, how can I redirect my research to be helpful in this?
I believe based on various sort of grapevine dribs
and drabs that the questions of control
have filtered up to the highest levels of government in various countries.
That this is now one of the risks that's considered when people take stock of what do we need
to pay attention to over the horizon.
This is one of the risks.
But on the other hand, if I was just a mid-level AI manager
in a technology corporation,
would I change what I'm doing? Well, I think my recommendation is always
I think my recommendation is always look at the objective that you're specifying for your algorithms, look at the actions that your algorithm could take and ask, could any
of the effects of your algorithm go outside the scope that you are thinking of for your objective and have you
thought about all the possible consequences and whether they would be desirable.
That should be something that we do as a matter of course and I would say the new EU AI regulations
AI regulations actually instantiate some of that. So, that's reasonably good.
So, I, but there is no, you know, I can go and buy TensorFlow.
I can, you know, I can go download reinforcement learning libraries. There's tons of software in the
standard model, but there's almost nothing in the new model. There's just tons of research
literally every chapter of the textbook has to be rewritten from scratch. And you can't rewrite it until someone's done all the work
to create the new algorithms and the new theorems
and even to figure out the right way to formulate the problems.
So that's something that I have to do.
And my students have to do.
And the other groups around the world that are working on this.
And the sooner we get that done, better, right?
These will be better AI systems.
It's not, you could think of them as well.
OK, you're over there doing those traditional AI systems.
You're bad, bad, bad, bad, bad.
You know, and you need to fix your approach.
But that's never been very effective as a way to get people to change their
behavior. You want them to get up in the morning and say, I'm going to build a really good AI
system today. But what that means is that is beneficial to human beings. Just like an engineer,
a civil engineer who designs bridges, he gets up every morning and say,
I'm going to build the best bridge I can. And what does best bridge mean? What's come to mean
bridges that don't fall down. And I was reading the memoir of the guy who ran the Russian BioWepens
program. And it was clear that he got up every morning and said, I'm going to do the best science I
can today.
What did best mean for him?
It means, you know, making anthrax more fatal, you know, and making infectious diseases
more infectious, right?
That's what best meant for him.
So you can affect what people do by affecting what they think of as good science or good
engineering.
And to me, it stands for reason that it's not good engineering if it does things that make you very, very unhappy.
So, we need to change the way people think about what is good AI and then give them the tools to make it.
And time's running out.
Yep.
Stuart Russell, ladies and gentlemen,
thank you very much for joining me today.
If people want to check out your stuff, where should they go?
So the book, Human Compatible, is available,
including as an audio book with a very plummy accent, not mine.
So it's pretty good.
My webpage, you can Google me,
my webpage has all the publications.
The Center for Human Compatible AI
has a bunch of resources.
And then there are other groups,
such as the Future of Humanity Institute at Oxford,
Center for the Study of
Accessistential Risks at Cambridge, Future of Life Institute at MIT in Harvard.
So there's a bunch of groups around the world and we're working as artists we
can to train students and get them out into all the universities and then
teach the next generation how to go forward.
Good luck, Fima. Thank you very much. Very nice talking to you Chris.
you