Modern Wisdom - #297 - Brian Christian - The Alignment Problem: AI's Scary Challenge
Episode Date: March 20, 2021Brian Christian is a programmer, researcher and an author. You have a computer system, you want it to do X, you give it a set of examples and you say "do that" - what could go wrong? Well, lots appare...ntly, and the implications are pretty scary. Expect to learn why it's so hard to code an artificial intelligence to do what we actually want it to, how a robot cheated at the game of football, why human biases can be absorbed by AI systems, the most effective way to teach machines to learn, the danger if we don't get the alignment problem fixed and much more... Sponsors: Get 20% discount on the highest quality CBD Products from Pure Sport at https://puresportcbd.com/modernwisdom (use code: MW20) Get perfect teeth 70% cheaper than other invisible aligners from DW Aligners at http://dwaligners.co.uk/modernwisdom Extra Stuff: Buy The Alignment Problem - https://amzn.to/3ty6po7 Follow Brian on Twitter - https://twitter.com/brianchristian Get my free Ultimate Life Hacks List to 10x your daily productivity → https://chriswillx.com/lifehacks/ To support me on Patreon (thank you): https://www.patreon.com/modernwisdom - Get in touch. Join the discussion with me and other like minded listeners in the episode comments on the MW YouTube Channel or message me... Instagram: https://www.instagram.com/chriswillx Twitter: https://www.twitter.com/chriswillx YouTube: https://www.youtube.com/ModernWisdomPodcast Email: https://www.chriswillx.com/contact Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
Hello friends, welcome back. My guest today is Brian Christian and we're talking about AI's scary challenge the alignment problem.
Let's say that you have a computer system and you want it to do X.
You give it a set of examples and you say, well, do do that. What could go wrong? Well, lots, apparently. And the implications are quite terrifying. So today, expect to learn why it's so hard to code an artificial intelligence
to do what we actually wanted to. How a robot cheaters at the game of football, how human
biases can be absorbed by AI systems, the most effective way to teach machines to learn,
the danger if we don't get the alignment problem fixed, and much more. This is a topic I think everybody should be
educated on. As the world becomes increasingly dependent on artificial intelligence, you need to
understand the implications if we have a civilization built upon machine learning when we don't know
that the machines are actually going to be aligned with what we want them to do. So today, hopefully your curiosity will
be satisfied and your pants might be pooped a little bit because quite, quite grave and
quite scary if we don't get this right. But now it's time to learn about the alignment
problem with Brian Christian.
What does the quote premature optimization is the root of all evil mean.
So this line comes from Donald Knuth, who is one of the,
I think of him as kind of like the Yoda of computer science,
just dispensing these gems of wisdom.
And there are many, I think many, like many aphorisms,
you can take it in a number of different directions. And there are many, I think many, like many aphorisms,
you can take it in a number of different directions.
One of the ways that I think about it is,
you know, a lot of the way that we make progress
in math and computer science is through models.
You make a model that sort of approximates the phenomenon
that you're trying to deal with.
There's a great quote from Peter Norvig, another one of these luminaries in computer science.
He's quoting someone from NASA saying, our job was not to land on Mars.
It was to land on the mathematical model of Mars, provided to us by the geologists.
So this idea that premature optimization is the root of all evil,
I think if you kind of mistake the map for the territory, so to speak,
if you forget that there's a gap between your model and what the reality actually is,
then you can commit yourself to a set of assumptions that are later going to bite you.
And so this is the sort of thing that people who are worried about AI safety, you
know, this is what keeps them up at night. What is the alignment problem? That's
what we're going to be talking about today. We might as well define our terms.
Yeah, so the alignment problem is this idea in AI and machine learning of the potential gap between what your intention is when you build an AI system or a machine learning system and the actual objective that the system has.
So it's the potential misalignment, so to speak, between your intention, your expectation, how you want the system to behave,
and what that system ultimately ends up doing.
Why is it matter?
I mean, this is a fear that has existed in computer science
going back to at least 1960.
So Norbert Wiener, the MIT cyberneticist,
was writing about this.
And, you know, he says, if we use to achieve some purpose, So Norbert Wiener, the MIT cyberneticist, was writing about this.
And he says, if we use to achieve some purpose,
a mechanical agency that we can't interfere with once we've
started it, then we had better be quite sure that the purpose
that we put into the machine is the thing that we really
want.
And I think a lot of people increasingly since like 2014, it has become
more and more mainstream within the computer science community itself to think of this
as one of the most significant challenges facing the field as we sort of enter this era of AI, that we may develop systems which,
we with the best of intentions try to encode
some objective into the system.
The system with the best of intentions attempts
to do what it thinks we want,
but there's some fundamental misalignment
and that results in whatever the harm may be,
whether it's dark skin people,
not getting recognized by a facial recognition system or disparities in the way that parole
is being dealt with, to societal level.
It could be self-driving cars that fail to recognize J-walkers, and so they kill anyone
who's crossing in the middle of the street because there were no J-walkers in their training
data.
All the way through to some of the so-called existential risks, the idea that we may
actually throw society as a whole off the rails by some system with enough power to shape the course of human civilization, but without
the appropriate wisdom to know exactly what to be doing.
Paperclips.
Everybody turns into paperclips.
Exactly.
Yeah, so the paperclip maximizer is kind of the classic thought experiment that goes back
to Nick Bostrom and Elias Erudkowski.
And I think there's, to my mind,
there's kind of a significant culture shift
that happens within the field of AI around 2015.
And that is that, in some ways,
we no longer need the paperclip maximizer thought experiment
because we have this like growing file folder
of actual real world alignment problems gone wrong, you
know, whether it's social media, you know, we optimized our newsfeed for engagement and
it turns out that, you know, radicalization and polarization is highly engaging. So,
you know, we paperclip ourselves.
This is this quote from Norbert Wiener in your book, which I really thought encapsulated
what we're talking about nicely.
In the past, a partial and inadequate view of human purpose has been relatively innocuous,
only because it has been accompanied by technical limitations.
Human incompetence has shielded us from the full destructive impact of human folly.
Awesome.
Just frames so much of what we're talking about.
So basically, is it the game of balancing act between technological capability and technological wisdom?
Largely speaking, I would say yes.
And we never use the phrase, no how versus no what.
And increasingly, we're seeing this as a kind of a paradigm shift in the field of computer
science, broadly speaking.
So the standard AI textbook that is used across the world, it's called Artificial Intelligence and Modern Approach by Stuart Russell and Peter Norvig,
and they've just now released their updated fourth edition.
And one of the things that we're seeing in the new edition
is kind of this focused shifting from,
you have some objective, what is the Swiss Army knife of tools in your toolkit
to optimize that objective?
We're now shifting to, let's take for granted that you can optimize whatever objective you
have.
How do we figure out what is the right objective to actually encapsulate all of the things
that you really want?
And so, I mean, part of what I think is very significant is that we are starting to
turn this question of, know what in Wiener's terms into one of the central pillars of what
it means to do AI research at this point is, how can we find mechanisms for essentially importing all of these complicated human desires,
human norms into the language of optimization. So that's the optimistic side of the story.
That's the hope is that we can actually make a science out of the sort of the know what part.
Yeah, well, I mean, surely we don't want, we don't know what we want as humans right now.
Anyway, I mean, the fields of ethics and morality are still contested from human to
human, like how are we going to code a poorly defined concept into a language that machines
can read.
I'm unfortunately, you know, it's happening, whether we're doing it well or not, we are doing it.
And, you know, you see this at, you know,
at YouTube, at Twitter, at Facebook,
there is this process of reading user behavior
and trying to figure out from the stuff people click on,
the stuff people engage with, what it is they appear to want.
And I think people underestimate the sophistication of these systems.
Like we have these intuitions that, okay, you know, Facebook or Twitter, whatever is tracking
everything I click on.
Oh, that's the tip of the iceberg.
They're keeping track of how many milliseconds
is a particular ad on screen.
So even if you're just scrolling through
and you hesitate ever so slightly to read a piece of text
or look more closely and then move on, they know that.
And as you say, we have a very impoverished sense of how to map human behavior to human
desire or human value.
And there's often this tension, right?
You know, Daniel Coniman tells us that we have these system one and system two, and they're
often at odds with each other.
You know, there's a reason that supermarkets put the candy right next to the register so
that you don't have time to second guess yourself. I think there's a genuine challenge for
social media companies, which is how do you, even if you're acting with good faith, even
if all you want to do is what users will want, how do you distill some notion of what they want out of their behavior?
And one example that I give in the book,
someone I know is an alcoholic, has an alcohol addiction.
And social media companies have found out
that if you show images of alcohol in an ad,
this person will linger. And that creates this horrible feedback loop.
And so I think there really is this question, which is, you know, the borough phrase from
Nick Bostrom, this is philosophy on a deadline. There are these open questions in not just ethical philosophy, but cognitive science,
neuroscience even.
But we don't have time. In a way, we don't have time to wait for the answer because
these companies are just going. And so we're going to have to try to essentially fix the plane mid-flight.
Yeah, I mean, I read superintelligence, which is kind of, I guess, the seminal book on
this, at least it was in the mid to 2015's. And anyone, anyone who really wants to kind of
get a good overview, I think that's a fantastic place to go. Stuart Russell's human compatible,
as well, his new one is awesome. And then if you want to terrify yourself
about everything else as well as AI
Toby odds the precipice.
Like that's my perfect three book garage
for existential risk right there.
But upon reading that, that really kind of opened my eyes
to just how big the potential dangers
are that we're playing around with here. I struggle to feel optimistic
and I don't know whether that is because the scary news stories make the headlines or
because AI programmers tend to err now, it would appear that from 2014, I think you talk
about the difference in one year of going to a conference, someone brought up AI security
one year and was kind of laughed out of the room, brought up AI security the next year and
nobody erased an eyebrow. So there really was a very rapid pivot that I think Nick Bostrom
can probably take a good amount of credit for. I agree, and Stuart, yes.
Yes.
I wonder whether the AI researchers are now so cataclysmic, so existentially aware that
perhaps they're only showing us the terrifying stuff. I don't know. Yeah, I mean, I think it's worth making a little bit of a distinction between the actual AI
safety research community, which is a small subset of AI researchers themselves. And even
people who self-identify as AI researchers, there was a much larger community of data scientists that do the more sort
of day-to-day machine learning work at actual companies. So it may be the case that, you
know, the negative stories are coming to the fore within the group of people that's kind
of committed to working on these
things. And I think that's a good thing. I mean, part of the story that I try to tell in this book is
the, you know, the birth of that community, really the beginning. I sort of think of books like
super intelligence as a kind of cosmic, you know, breaking the glass and pulling the fire alarm.
of cosmic breaking the glass and pulling the fire alarm. And so the analogy that I, to continue that analogy,
I think about the beginning of this AI safety research
movement as the first responders have arrived at the scene.
And so I think there's a constructive story
to be told about we're actually getting things done.
We're meeting that challenge.
There is a question of how does that fit into kind of the broader culture of AI?
I think a lot of people are just in some ways more heads down,
just trying to solve actual business problems.
And so there's a question of how do we import some of the insights that we have been learning
in the safety community into the more sort of user-facing stuff.
So I think that's part of the process as well.
Have you got some good examples that can explain to the listeners how alignment works or how it doesn't work?
Yeah, I guess those are two different questions. I mean how it doesn't work
There are many ways, right? So there's this question of
You know, you have a system. You wanted to do X. What could you know, you give it a set of examples and you say, you know
Do that you know do this kind of, what could, you know, you give it a set of examples and you say, you know, do that, you know, do this kind of thing. What could go wrong? Well, there's this laundry
list of things that could go wrong. So one thing is the examples that you gave it, there's
some kind of fundamental mismatch with the reality. And so one of the things that you hear
about a lot is kind of racial demographic mismatch.
So there's a facial recognition data set
developed in the 2000s that was built
by scraping images from newspapers online.
And so the composition of faces in the data set mirrors
the types of people that appeared in kind of first world, you know, North American,
you know, European newspapers in the 2000s.
So the most prevalent person in this data set is George W. Bush, who is the US president
during that time.
And so it turns out, for example, there are twice as many pictures of George W. Bush as
there are of all black women combined.
But in the real world, there are not twice as many George W. Bush as all black women combined. And so you have this fundamental mismatch between the set of examples that you've used and the actual kind of reality. And so this is something that
the technical name for this would be robustness
to distributional shift. So is your system capable of handling the fact that the examples
that it encounters out in the real world are coming from some different distribution than
what it learned on, you know, when you were training it. So that's a fundamental question is just what is the kind of data provenance?
What examples went in and does that match the environment that you're going to deploy
it in?
And there are many, many examples of this.
I think the racial bias stuff is obviously made headlines, but there are subtler examples.
For example, there's a Google system that
Upon inspection was you it turned out that the color red was intrinsic to its classification of something as a fire truck
And most fire trucks in the US are red in the UK. I think they're also red
in Australia. They're white and neon yellow.
And so that model would not be safe to deploy in your self-driving car
if you were in Canberra or something.
So that's, I think, a very fundamental category of thing
is just, are the examples that you've given?
Did they match the sorts of things
it's going to see in the real world?
And the second large category is what's called the objective function.
So every machine learning system has some numerical specification of what it is that you
want the system to do.
And often things can go very subtly but importantly wrong in how you've specified what you want the system to do.
So one example of this, one of my favorite examples comes from this robotic soccer competition that was being held in the 1990s.
And this team from Stanford, including Astro Teller, who's now the head of Google X,
they decided they would give their robotic soccer team
this little tiny incentive for taking possession of the ball.
So the overall goal was like to score points and win the game,
but as kind of an incremental incentive,
they were awarded the equivalent know, the equivalent of
like a hundredth of a goal for taking possession of the ball because they thought that would
incentivize the right sort of strategy.
You can't score until you have the ball, et cetera.
But what their robots learned to do was to just approach the ball and then vibrate their
paddle as quickly as possible, taking possession of the soccer
ball a hundred times a second. And this was much easier than actually scoring points.
So there are men, I mean, this is just one example, but it turns out that actually trying
to specify numerically exactly what you want this program to do is extremely difficult
to the point that it's kind of increasingly considered just unsafe to ever attempt to
do that.
And then we can get into what it looks like to not specify it.
Yeah.
So the overarching theme that I felt on going through your book, is it feels to me, like, do you remember that advert
of the guy using its real hardcore strength gaffer tape
to stop a flood coming out at the side of a water tank?
It was like from the 90s, he's got like a big bit
of black tape and this hole in the side of a water tank
and he's like, look how strong this waxer tape on it.
there's this hole in the side of a water tank and he's like, look how strong this wax the tape on it. It feels to me like a lot of the incentive coding structure is that guy
sticking waterproof tape on all the little holes. Surely it's not going to be scalable to try and predict every different
permutation of what's potentially going to go wrong.
Increasingly, I think the field is coming up to exactly the view that you're...
They should have just come and had a chat with the guy with the tape on the thing,
and he would have told them that.
That's right.
Yeah, well, we're maybe a couple decades late to that,
to the party, but we're coming around.
So yeah, there has been, I would say,
something of a revolution within computer science.
And I think Stuart Russell gets a lot of credit for this.
He developed this
technique around the turn of the millennium called inverse reinforcement learning. So reinforcement
learning is what I was talking about with soccer where you have some kind of goal, which is known
as the reward function, that kind of doles out points to your system. And then it's the systems job
to optimize its behavior to get as many points as it possibly
can. That's reinforcement learning. So inverse reinforcement
learning goes the other direction. It says we're going to
observe some expert behaving. So we're going to observe some expert behaving. So we're gonna watch a soccer player
or we're gonna watch someone play chess
or whatever it might be.
And figure out what the score of the game must be.
If this person's an expert player,
then we're gonna try to work backwards
from their behavior
to the rules of the game and what the point system is.
And the basic idea here is that it offers us, perhaps, hopefully, fingers crossed.
This paradigm offers us something that's going to be more robust as we develop systems
with kind of flexible capabilities in real world environments, that they can just kind of
observe human behavior and try to work backwards from that to an actual numerical specification
of what we care about essentially, rather than us having to somehow write it all down
on paper ourselves. There's a number of hurdles that we need to overcome. You need to first off write down
what we want. We also need to then translate that into code that can be understood by the
machines. But first, we actually need to know what we want. And a lot of the time, what
we think we want might not actually be correct. We might not understand the externalities
of asking for the thing that we want, even if the machine achieves it perfectly. And there's no machine side,
malignant side effects or externalities that it's done, we could just specify the wrong goal.
Or we might not understand what that goal would be. So yeah, it certainly seems like
trying to bypass our idiocy by using the outcomes that we end up at is a fairly clever way to go about it.
What does fairness have to do with the alignment problem? I thought this was quite interesting. It was something I hadn't come up against before.
Yeah. So there's, there have been historically a number of ideas within computer science that have been referred to as fairness.
You get, for example, fair allocations in game theory, you get fair scheduling on an operating system where every program gets a certain amount of time to run.
But starting in, let's say, 2010, the field of computer science really became preoccupied with a notion of fairness
that took into account something closer to our kind of ethical or legal notion of fairness,
which is to say like, are different groups of people affected differently by a machine
learning system?
So the canonical example of this is pre-trial detention.
So in the US, if you're arrested, you have this arrangement
hearing before a judge, and they set the date of your trial.
The trial could be months away, weeks away, months away.
And then there is this very specific decision
that gets made, which is, are you going to be held in jail
before your trial? Or are you going to be released
to go home before your trial? And this is where bail cash bail ends up getting involved in certain
states. Increasingly, throughout the last couple decades, but really accelerating in the last five years or so,
states have been using these algorithmic risk assessments that just give a score like one
to ten.
How risky is this person if we release them back into society pending their trial?
These have been used by many jurisdictions, there are states passing laws, mandating the use of these sorts of things.
And so there's been a lot of scrutiny
on are these models fair?
And there's, you know, in the US,
we have civil rights legislation going back to the 60s
and 70s that articulate certain legal definitions
of fairness, but they're not necessarily obvious how those actually
apply to a statistical instrument.
And so there's been a bit of a controversy around this particular tool called Compass,
which is just one of many of these pre-trial risk assessment tools.
So it turns out that Compass is what's called calibrated,
which means that if you're given an 8 out of 10 risk score, then you have the same probability
it turns out to be re-rested whether you're white or black. And this is kind of the canonical
definition of, quote unquote, fairness that's been used for many decades.. But increasingly people are looking at these alternative definitions of things like,
okay, well, if you look at the people for whom the model makes an error, does it make the
same kinds of errors, or are they different in some way?
And you see, for example, that the black defendants who are miscategorized by the model
are two to one relative to white defendants to be judged riskier than they really are.
Whereas white defendants conversely are the ones that are miscategorized are two to one more likely to be judged as less risky than they really are. And so people are saying, okay, well, this feels like sort of a disparate impact that we
would ideally like to mitigate as well if we want the system to be fair.
Along come the computer scientists and say, well, it turns out actually that it's mathematically
impossible to satisfy both of those definitions of fairness at the same time. And so this is one of these cases
where human intuitions kind of run into these technical challenges.
And we need essentially a kind of public policy conversation
around, okay, well, when these things that we,
you know, okay, well, when these things that we, you know, seem equally
desirable, can't be mutually satisfied, you know, who decides what the priority should
be. So this ends up being a very complicated kind of policy slash legal slash computational
question, as you can imagine. It gets pretty intricate.
Why were the black defendants, did the computer scientists were they able to look at what was
occurring that meant that the black defendants were being judged to be more likely or higher in terms
of the error rate? Yeah, I mean, there's a couple components. So for one, the model actually makes three different predictions that sometimes
get conflated. One of the predictions is the person's risk of failing to appear for their
trial date. One is their risk of non-violent offense and the other is the risk of violent offense. And one of the important things to note is that these are three fairly different predictions
in terms of our ability to actually observe the thing that we are trying to measure.
So if you don't appear in court, the government basically, by definition, knows that you
didn't appear in court.
So it's essentially a perfectly observable event.
Now, the model is attempting to predict whether you will commit a crime,
but we don't know whether someone commits a crime.
We only know whether the person is arrested and convicted,
which may or may not mean that they committed a crime.
And it's interesting because for the research of this book, I ended up digging into the now almost
a hundred-year-long history of these sorts of models that goes back to Chicago in the 1920s was
the first one. And to my surprise, in the 30s, it was conservatives making this argument of like, now wait a minute,
some guy goes on a crime spree, but he doesn't get caught. And as far as the model is concerned,
he is a perfect citizen. That doesn't seem right, because now the model is going to recommend that
we release more people like that. These days, you're more likely to see the critique coming from
progressives that say, now wait a minute, someone's gotten wrongfully arrested and wrongfully convicted. Now the model thinks they're a criminal and
it's going to recommend detaining more people like that. And so it's funny to, you know,
historically the kind of prevailing political valence of the critique has flipped, but it's
the same underlying critique, which is that we can't actually measure the thing that we're attempting to
predict, which is crime. And so if there are kind of striking disparities in the way that we
actually observe crime, then that is going to essentially filter downstream into the model. So for example, in Manhattan,
the last statistics that I heard was something like black and white Manhattanites, self-report
marijuana use at about the same level. But black Manhattanites are 15 times more likely to be arrested for marijuana possession.
And so, here there's a huge gap between what we can
sensibly predict and what we can actually measure.
And so, this has led, I mean, there's a lot to unpack in this area,
but this has led people, for example,
to make the argument that you should essentially trust the
failure to reappear prediction, but not trust the nonviolent offense prediction,
because the one we can observe much more accurately than the other.
So that's a little bit of a flavor of how some of these
imbalances and how crime is actually like
observed by the police then filters down into the model's assessment of someone's quote-unquote
risk.
Can we talk about neural networks, what some of the problems are that you find with neural
networks and why it's so hard to get an explanation out of a black box?
Yeah, so part of the reason that you and I are having this conversation now and that, you
know, we've seen books like Knicks and Stuart's and Tobies in the last, you know, eight years
or so, is because of the rise of deep neural networks, which really kicked off in 2012. And so, I mean, it's ironic because the neural network is one of the oldest ideas that anyone
had in computing. It predates the stored program computer, which I think is kind of amazing.
So it is about as old an idea as anyone's ever had in computer science and AI, but it
was not until 2012 that it actually started to work.
So there's a very fascinating history there.
But basically, neural networks have come to dominate
all of the previous approaches that people were using
in the 2000s, and have just kind of swept through
computer vision, computational linguistics, speech to text processing,
machine translation, reinforcement learning.
You name it, there has been kind of just this
successive series of kind of discontinuous breakthroughs
as a result of neural networks.
They are famous, they have a reputation
for being kind of inscrutable and uninterpretable.
And so a big frontier in AI safety research is finding ways to essentially pop the hood,
so to speak, and figure out what in the heck is going on inside of a neural network.
And I think that's maybe one of the more encouraging stories in AI alignment research because we're making more headway than honestly I expected.
What's the problem? Why can't you, you've got a thing, you've got this neural network, it does a thing.
Why can't it tell you why it did it? Well, it's sort of like a problem of information overload.
So let me use as an example the kind of flagship first real success story and deep learning,
which is this image recognition system called Alex net that was used in 2012.
So Alex net was designed to take in an image. And I'm trying to remember what the dimensions of the image were.
It could have been like 100 pixels by 100 pixels or something like this.
Ultimately, somehow output one out of a thousand different categorizations.
Is this a truck? Is this a kitten? Is this a, you know, sandy beach? What is this?
And so, you know, at the simplest level, you just have pixels in and categorization out.
And so, what's going on in the middle? Well, 100 by 100 pixels is 10,000 pixels, and it's RGB, so you have a total of 30,000 inputs
that represent this picture.
And they're just encoded as numbers
from like one to 255, or from zero to 255,
for how much red is in location X,
how much blue is in location Y?
And this goes into a network of about 600,000
of these artificial neurons.
And the artificial neuron is extremely simple.
It just takes in these little inputs, and it adds them up.
And it says, is the sum of these inputs
above a certain threshold?
And if it does, then it will sort of pass along
some number as an output to the next neurons in the chain.
So it's very interpretable in the sense that you can look
at an individual neuron and say, like, okay,
what were its inputs?
Okay, it's getting a 10 here, a 5 there,
it's adding them up, it's greater than something,
so it's then outputting or whatever.
But it's very, and so you multiply that by 600,000 neurons in, you know, 12 layers or whatever,
and there's a total of like 60 million connections between all the different ones.
And at the end, you get a number between one in a thousand that tells you truck, you know,
barbecue grill, whatever.
The question is, it's not a problem.
You know exactly what happens, so to speak, in that it's just adding numbers and comparing
them to thresholds.
But what in the hell does the neuron in layer five, you know, getting an eight,
a one, a 25, adding that up, and then outputting a two. Like, what does that mean? And so that's
really the question. It's a system that's kind of perfectly describable in detail, but it's like being given an atomic description of what's going
on in someone's brain when they laugh and then trying to figure out, well, what makes something
funny? Well, let's look at this hydrogen atom over here. It's just sort of like the wrong
level of description. And so that is kind of the fundamental problem that we have with neural networks.
Wasn't there an interesting implication around GDPR?
Yeah, so the European Union had this draft version
of the GDPR bill that was circulating around 2016 or so.
And it had this language in it,
which kind of raised the eyebrows of these two
researchers at Oxford, Bryce Goodman and Seth Flaxman. And they said, now wait a minute,
this draft version of the bill appears to create this legal concept that everyone has a right
to an explanation of if you're affected by an algorithmic decision, if you're denied
a mortgage or you don't
get a credit card that you apply for or whatever, you are entitled to to know why. And yet,
it was widely understood that you couldn't obtain an answer like that from a deep neural network.
And so this created something of a panic don't know, a panic between the legal
departments of tech companies and the engineering departments. And I remember hearing a lawyer
for one of the big tech companies, I won't say which one, talking about meeting with
you regulators and saying, now you realize that you're putting into law something which is like scientifically impossible.
And the regulators were sort of unmoved and they said, well, that's why you have, you
know, it doesn't go into effect for two years.
So, you know, figure it out.
I think this is a very, you know, it's, we think of regulation as almost by definition
stifling innovation, but here is a case where the regulators demanded something
that the scientists then were given a two-year deadline
of like figure it out.
And so suddenly there was this huge wave
of money and research attention going into this problem.
But it's still, I mean, there are a lot of really,
I think, promising techniques,
but in terms of this
question of what is the explanation for why something happened in a neural network, it's
not even clear what is legally sufficient to please the EU, let alone what is kind of
standard practice at this point.
So it's still a bit of an unresolved question even now. I was listening to one of the engineers
behind the YouTube algorithm talking,
and then Lex Friedman must have seen the same video clip
that I did, and he brought it up on a show.
And he was talking about just how terrifyingly little
YouTube knows about their own algorithm now that this thing is just
a runaway fishery and reinforcement monster that is optimizing and doing things, but to
be honest, they kind of don't really know what's going on.
Can you just speak to that?
How can it be that programmers that make a thing
after it's been left to run for a little while,
no longer know what it's doing essentially?
Yeah, and I've heard this from people
high up in the engineering works of these companies saying,
yeah, we have no idea what it's doing,
but it's making so much money that we can't turn it off.
This is how horror movies begin, right?
So, I mean, in some ways, that's the point
of neural networks was precisely that
they could do the things that we couldn't articulate in code,
like writing, if you think about it as a contrast
to writing sort of traditional software
where you sit down and you type, you know,
if X then go to line 12 or whatever,
that sort of canonical style of programming,
there's a whole set of things that that couldn't do.
Namely, the things that we didn't know how to explain
our own thought process.
So it's really good for sort of mimicking your explicit deliberate thought
process, but it's really hard for doing kind of sense perception or motor skills
or things like that that don't have like an explicit reasoning you can step
through. But the dark side of that is that you don't understand how the computer is doing it either.
And so, yeah, there's been a lot of, I don't know, certainly I hear a lot of hand-ringing from
people at tech companies, let's say, like, okay, we just pipe in all the possible data that we have
about this person, their browsing history, their credit cards,
everything they've ever clicked on, whatever it might be, into this thing, and it just spits out,
you know, show them this thing and not that thing. And like, when we don't really know
how that, you know, is being arrived at, but we know that when we do it, we make more money than when we don't do it.
And so there's this weird kind of like, well, let's just let it rip.
So that, I mean, that's exactly the kind of thing that people who are worried about AI safety are worried about, right? Well, the problem here and the best podcast I think I've heard on this was Rob Reed from
After On. He had Naval Ravakant on and they were talking about privatized gains versus
socialized losses. And that paradigm, so essentially that if you are a private company
who has one of these ridiculous algos that is able to just, it's a money machine and you run it and it's able to show the perfect
advert, the perfect person at the perfect time, or it can create the best sandwich or it
makes the amazing computer game or does whatever.
But risks potentially turning the entire world into paper clips, you are privatizing all
of the potential gains, but the entire world is risking all of the losses. It's what you
get when you have a shared commons as well. It's why people in some developing countries
are slightly less concerned about polluting the atmosphere because they only pollute
a bit of the atmosphere, but they get all of the profit. And when you hear about these things, man, like, you know, you
don't think that Susan was Jijiki is trying to cause the downfall of human
civilization, but I do think that she probably wants to maximize watch time. And
sadly, these two things, as the power continues to increase within the algorithms and the computing
space, these two things are going to start to converge more and more.
Yes.
And I think in some ways, I think the alignment problem is bigger than AI, that it is really
a description of what's going wrong in capitalism, in global governance,
that there are a number of situations, and again, this is not just an AI thing, where someone
defines some metric that sort of kind of encapsulates what we want. At the end of the day, YouTube,
or Netflix, or whatever, doesn't really care about how much
you watch. They just care about how much money they can make and they minimize churn and maximize
user retention blah blah blah. And someone says, well, watch time seems to be mostly correlated
with all that stuff. And it's a lot easier to, you know, operation lies. So let's just for the sake of argument, maximize watch
time. Or, you know, at Tinder, there was this long period of time, as far as I know, years,
where the metric that their engineers were asked to optimize was swipes per week. And this
is basically the alignment problem, like full stop.
We come up with some reward function that kind of sort of contains what we're trying
to do, but not entirely.
And then we optimize the dickens out of that specification beyond the point at which it
correlates with the thing we really care about.
And so, you know, this goes back to where we started our conversation,
premature optimization is the root of all evil.
At what point, how long did it take for someone at Tinder
to say, or swipes per week really what we're about?
Like, is that really the top of the pyramid of metrics
that we're trying to achieve here?
Same thing with watch time, right? Like, for a me, I don't know if it's the case anymore for a long time Netflix was explicitly maximizing
watch time. And there was some quote, if I'm recalling this correctly, where they said,
like, we're competing against, you know, playing sports, we're competing against reading
a book, we're competing against like talking to your kids.
And it just sounded like horrible.
But how long does it take before someone starts to realize like, oh, maybe that's not
the metric that we're like going for.
And I think the same thing is happening in society at the highest level, right?
We've been maximizing GDP per capita, quarterly returns, you name it, while creating these
socialized externalities called climate change, called the increasing genie coefficient,
et cetera, et cetera. And so, now the question is whether to be sort of extra pessimistic or extra optimistic by thinking
about this as not really an AI problem per se, right? Because I'm maybe more confident that we can
solve this at a technical level than I am that we can sort of change global governance
change global governance or reform capitalism in some like very macro way. On the other hand,
there is maybe a glimmer of hope that some of these techniques that people are developing in the AI context, things like inverse reinforcement learning, might actually be useful to tech companies for a start and even something like a national government,
where they say, instead of manually designing some objective function about what we're trying to do,
we'll do the inverse reinforcement learning thing where we will just
present real people with kind of different scenarios and ask them to pick. Which of these, which of these newspaper front pages from the year 2030 seems to portray a better world?
And then we'll somehow try to back the metrics out of that rather than having to come up
with the metrics ourselves.
So tech companies are starting to do these sorts of things.
And I know to me, that's the glimmer of hope is maybe we can get ourselves to a sort of
unwind the tyranny of these KPIs that are sort of like controlling everything about society
at the highest level.
Yeah.
There's something a little bit more holistic, yeah.
It really does seem like you pick a metric
as an engineer, as a company
that you think is the closest approximation
to what you deem success for the particular,
the particular company that you're running
There is a reward function that's given to the algorithm for meeting that particular criteria
But that criteria might not achieve the best way to get that outcome So let's say that it's time on site. There's nobody that I don't know that wishes this
They didn't spend the last time on the phone. There's nobody that I know that looks at Instagram for two hours and retrospectively says that
was a good use of my time.
But you could imagine in another world that Instagram actually provided an experience which
kept people on site for two hours and retrospectively they were happy that they'd been on for two
hours.
Now, currently they're not.
And I don't know if Instagram could ever achieve that
with the particular platform,
but we can imagine some other sort of app that could.
If it was able to optimize itself in a way
where it was actually achieving the outcome
that made people want to use the app
rather than just race to the bottom of the brainstem
and manipulated them in ways that almost forced them to use the app rather than just race to the bottom of the brain stem and manipulate them in ways that almost forced them to
use the app. You would still get the same outcome that you
wanted, which was particularly screen time, but all of the root
causes of how you arrive at the screen time have been changed.
And we would probably mostly be able to agree that that's
at the advance of the wellbeing of the user who is doing it. Yeah, yeah.
So, I mean, there's always been this tension
between the copious data that's easy to collect.
We can measure every click.
We can now measure the milliseconds of every item
on the screen being on the screen.
And then there's this data that's really hard to collect, which is these qualitative
judgments of asking people like how happy are you, how satisfied are you.
Maybe you go to someone a week later or a year later.
And so you can't directly optimize for these things in tight feedback loops because it's
too hard to
get the feedback or you have to wait a year to get the feedback.
And so it can't be part of your actual kind of day to day iterative loop.
But what you can do is try to use something like inverse reinforcement learning or there
is more sorts of causal models to figure out how the data that you can observe might
predict these sorts of scarce, expensive, long-term things that you really want.
And rather than just directly optimizing the feedback that you have on hand, do this more
indirect thing of like, we need to model how the stuff we can observe
affects these things downstream and try to optimize for those things downstream
by way of these proxies.
I think that is starting to happen.
It's not quite as simple as just waving the magic one, but I think that's exactly the
kind of approach that we need.
Is this going to be able to be achieved without some sort of systemic change in terms of
governance, in terms of policy, in terms of the way that we step into the algorithms,
the power of the algorithms, the amount of computing power, the amount of transparency
that these companies can see into their black boxes, or maybe even the way that they generate
their revenue. Because right now, from where I'm sitting, it seems like tech companies
are able to make more money than ever before using increasingly advanced neural networks and algorithms, but
that is being done at the expense of a lot of other things that we probably don't want
to depreciate anymore.
It feels to me as if we are at a little bit of a precipice, Toby, or perhaps like the apex of a curve where you, if we were
to push much further, I think that you would start losing, we may have already gone past it,
but I think that we actually start to lose so much of civilization that the advantages that
we begin to get from technology are negative utility.
I agree with you.
And I think that may be true, even if you are
in a completely self-interested position at a tech company.
For example, Google and Facebook have now
relative to 10 years ago lost so much of the public's good will
that when the Department of Justice swings the antitrust
hammer, you know, are people going to cheer or are they going to protest, right? And that makes
a difference. Like, so goodwill may seem like this kind of gossip or, you know, ineffable thing,
but it manifests. Council law, yeah. You know, it may shatter your entire corporation
manifest. Counter-strike.
Yeah.
It made chatter your entire corporation if you piss enough people off.
So what do we do?
What's the fix?
How do we fix it?
I mean, I do think at some level there is this kind of room for these more technical solutions.
And so, my feet are planted more squarely in the research community. And so I can see the development of some of these things
and how, for example, at Berkeley,
where I'm affiliated,
a lot of the PhD students in technical AI safety
have started doing summer internships at tech companies.
And I think that's very interesting.
It's sort of a marker of the kind of maturity
of technical AI safety in just a few years
that some of the stuff that was on whiteboards in 2017, 2018
is now getting actually mocked up as this kind of MVP thing
at an actual tech company.
So that seems good.
I think one of the major forces that holds a lot of
tech companies in check is the fact of how much power the actual employees themselves have.
Currently, machine learning is in such high demand that good machine learning engineers have a ton of leverage. And they're able to use that leverage to kind of convince companies to do certain benevolent
things like publish, publish results openly, you know, in public journals or publish things
directly onto the web or release open source.
Things that companies wouldn't necessarily be otherwise disposed to do has now become
sort of part of the norm of the field.
And that's not coming from competitive pressure necessarily.
It's not coming from regulators.
It's coming from just the consciences of the individual engineers and the fact that they
have leverage over their employers because they're in this like extremely high demand category. Now, it may not always be the case because as the ranks
of machine learning engineers grow, because this field is so
in demand, then the individual bargaining power of those employees is going to go down.
So we're going to lose a little bit of that leverage.
To some of it come from regulators, I assume so,
but it is not clear to me what shape that regulation is actually going to take.
I mean, I think some kind of citizen participation, you know, like I wouldn't mind something, you know, I say in the book, I think that it's reasonable to imagine
that we have some rights to know what the model is
that these companies have of us.
And to have some kind of direct influence over that, right?
So, you're starting to see this a little bit
with alcohol, which was the example I mentioned earlier,
where some tech companies now actually have like a toggle somewhere deep into the preferences that says like, never
show me alcohol. That's like the tip of the iceberg, but you can imagine a way in which
you can have some control over which version of yourself is being marketed to, right? You can say like, yes, it turns out that I,
you know, when I'm in the checkout aisle,
I put a bunch of candy on my thing,
but I want to not be that person.
So please give me the checkout aisle that has, you know,
for your vegetables, yeah, precisely.
And I, you know, in theory, that should be a win-win.
But we'll see, you know, in practice,
it's not totally clear that they can present
their model of you in a way that makes sense to you
or in a way that's like directly manipulable.
Yeah. There are questions. The main thing, and I think that this kind of cuts to the heart
of the discussion around the alignment problem right now, is that the general tenor and tone and feeling of users towards tech companies and of us towards ourselves and of our relationship with technology, probably in the space of the last six years
to seven years has changed very much from technology being a tool that we use to technology
being a tool that uses us.
And I think, yeah.
It just feels to me like that's the...
Like, we made these tools in an effort to make life more entertaining and rich and all of these.
And man, I've had hours, tens of hours of conversations
on this show through this microphone,
talking about what more connected
than ever but have never felt more alone.
What does it mean for young children
to be spending time looking at screens?
Like one of the fucking iron prescriptions
that I come up with is that you need to spend more time
looking at the night sky.
Like in what worlds should I be prescribing the night sky as like an antidote to the way that you exist?
And yet, I am.
So, I think it's just a comment on that.
Yeah, go ahead.
No, I can't resist mentioning that the night sky itself is this externality of, you know, starlink
That I mean to say nothing of urban light pollution et cetera, but
You know part of the business model of
Starlink is to put a
ton of satellites into space that may or may not mess up our view of the night sky, but it's not starlink's problem
And so yeah, that's yet another example of this kind of socialized externality.
And so, yeah, I couldn't resist pointing that out.
There's a girl who I had on the show, Mara Cortona, who is the director of the Astropolitics
Institute.
So, this is the politics of space.
Fuck me man, if that's not an interesting like read,
it is so cool, like who owns space?
Can we throw our waste into space?
Can we like claim bits of space?
What, who owns the moon, who owns Mars?
It's so, it is stuff so fascinating.
But yeah man, as a sort of parting note,
what do you think that we can expect
as normal users of technology?
What do you think that we can expect
from our sort of experience
and our interaction with technology over the next decade?
I think we're starting to have a sense that we are interacting with
technology in a way in which all of our actions can and will be used against
us, right? It's like you need to be merandized to go online or something. I I, I mean, I don't know if you have this experience
as well, but when I'm using, let's say YouTube,
there's a part of me that tries to decide
before I look up a video if I want to look it up
in an incognito tab so that I kind of like,
you know, separate off, cordon off.
Like this is not part of the preference model.
I want you to build of me, even though I know they're going
to track my IP address and whatever,
but it's not directly linked to the same viewing habits
that I've kind of laid down in my regular account
and all the cookies in it.
I think there is going to be this increasingly weird game
theoretic aspect to using technology where we are constantly
having to sus out.
What inference is it going to make based on my behavior?
What is their business model?
What kind of feedback will my behavior create?
You hear these funny stories of people saying,
like, you know, I let my two-year-old, you know,
mess with my Spotify for a day,
and then my recommendations have been ruined forever,
you know, things like that.
And I certainly have that experience,
that I feel that I'm dealing with this kind of
inscrutable, you know, intelligence or machinery or whatever you want
to think about it, there's some process happening of which I'm a part.
And it's I'm being observed, I'm being sort of adapted to the things that I see have
some weird relationship to what I've interacted with before, but I have no idea what that relationship
is.
And I mean, I think about Twitter as one example, you know, the Twitter app, a lot of the
stuff in my feed isn't even from the people I follow.
It's this kind of secondary, interchieery thing of like someone that you follow, like to
tweet by someone else who knows this other person and this is their tweet and so
As a consumer you have no idea like why why am I seeing this thing?
When you go to make a tweet you have no idea
What process may or may not determine how that tweet reaches people or which people it reaches?
But we're sort of forced to play this game. And we're forced to sort of
co-adapt with these models. And I think that's one of the things that people in technology,
I think, underestimate is that if somewhere secretly in the, you know, the Bowls of Twitter,
they start to add a 5% bump for posts that use highly emotional language or something like that or just that
naturally shakes out of the optimization.
People will notice and people will change their behavior accordingly.
The technical way of putting this is, machine learning is secretly mechanism design. It's like, you can't make a model without that model becoming
essentially an ascentive structure
that people then start to game
and then the correlations you previously observed break down.
And so you constantly have to kind of reevaluate it.
What do I think we can expect in the next five to 10 years?
I think that process of,
I don't know, I think that disempowering feeling
of just like, I have no idea what's going on here.
I'm just kind of experiencing it.
I have this sense that I'm part of this causal feedback mechanism
and I don't really know exactly what that, how that works.
But I need to take, I know that I need to take it into account,
because everything I engage with will shape the future data that I even get.
I mean, I think we were, we're going to see,
I mean, I think we were going to see,
I think these subtle questions of like what the actual business models of these companies
are like really start to rear their head
in the machine learning aspect.
I think that's kind of underappreciated.
So for example,
we have all these different apps that recommend us things, Netflix recommends
us things, Amazon recommends us things, Spotify recommends us things.
But the business model in each case is quite different.
And so that ends up actually manifesting in very different recommendations.
So Amazon is this logistics behemoth.
And so they really benefit from people doing
mainstream things. If you buy the book, that's the number one book, then they
probably have a copy of it just like a few miles from your house and it's
gonna be easier for them than if you get an obscure book. Netflix is constantly
renegotiating the licensing rights with all these film studios and TV studios.
So Netflix will prefer if you watch the really obscure thing,
to the really mainstream thing because they can get the rights a lot cheaper.
So Amazon is giving you this kind of centripetal force towards these
sort of mainstream modal things in the culture.
Netflix is the centrifugal force that's driving you to these like obscure niches.
Spotify has this kind of double-sided marketplace thing where if they put too many of the mainstream
artists in your recommended playlist, then the indie record labels get mad and pull out and so
they're constantly wrangling to please both the listeners and the musicians.
So all of these things end up manifesting
in the actual behavior of the system,
but in ways that are,
sort of tactically obscure, but strategically clear.
I don't know where that leaves us necessarily.
I think it's going to be increasingly clear that I don't know.
We need to figure out how to give people a seat at the table, that if there are things
like the health of public discourse at stake. We increasingly have the actual computer science
to attempt to operationalize these weirdly fuzzy things.
I'm thinking about OpenAI, GPT-3.
There are a lot of research papers coming out about,
these very ill-defined concepts that we have, like for language, you can actually
sort of fine-tune GPT-3 to meet these different criteria. So the scientific piece is getting
worked out, and the question that remains is really like, how do we decide who gets a seat at the
table? And whose opinion is it? And whose values are the values that are getting imprinted into this system. So that is, I think, the big question that awaits us, even when we can solve the technical
and scientific aspect of the alignment problem.
I think it comes back to what we said at the very, very beginning, the balance between
technological capability and technological wisdom, in that we are now, are you're telling me that the very, very cutting edge of computer
science research and neural networks and GPT-3 and all this stuff is on the cusp of actually
being able to potentially open up solutions to problems that most of us have only just
become aware of. So you're like, you're so far ahead and the iterations on this are so rapid that, you
know, think about the lumbering behemoth that is governmental policy behind us and legislation
and it's psychological research into the effect on human long term.
That is way, way in the back.
That's in the tale of Snowpiercer,
and then right on the very, very front,
getting battered by cold winds,
or a couple of these computer science guys,
and then the guy that's driving the train
is the person that's the algorithm,
and then we're somewhere in the middle,
find like feeling it, and we're like,
oh, isn't it interesting that all this stuff's going on?
So I'm like, yeah, I mean, the next 10 years
it's going to be super, super interesting.
I have this, I have this get out of jail free card that I've been using for absolutely ages,
which is that all of the problems that we're coming up with now are not going to matter in 100 years,
because what either going to be enslaved by a misaligned, a super intelligent, artificial,
general intelligence, or will have managed to get a machine
extrapolated volition to work correctly to the point where it fixes all of the problems
for us.
So all of the stuff between now and then is like some weird reverse deterministic apathy
fest where we don't actually really need to do anything.
It's like, look, the end of the road's coming.
It could be good, it could be bad. Let's enjoy the ride on the way there. I wonder how much the contributions of
these smaller consequence alignment problems are going to contribute to us getting the big
one right?
Yeah. I mean, I agree with your snpiercer analogy broadly speaking and I don't know
It's in some ways it's even worse than that because there's a computer scientist at Princeton named Arvin Narayanan who has pointed out
That a lot of the systems that are still integral to our financial system and you know
airplane
and airplane controls are written in like Pascal and Coball and these programming languages that like barely anyone even knows anymore.
And so he was saying, you know, we have this idea that tech moves too fast for society
to keep up.
But in reality, a lot of these like crappy machine learning systems that were developed
in the 2010s are still going to be around like in zombie mode 20 years from now and like maybe
that's even more terrifying.
So I think there is this question of are we able to catch these misalignment issues in
time to actually course correct.
Some of them I feel more sanguine about that than others.
Maybe to push back a little bit on this idea
that we just need to relax.
I mean, there's this question,
so there's the kind of this philosophical question here,
which is like, this tension between moral realism
and moral relativism.
So if I've heard people say that they are, they
identify as moral realists, so they think there are objective truths about like right and wrong,
and that people are just bad at figuring out what those objective truths are. And so that's the
kind of attitude that says, you know, I welcome our new robotic overlords,
the sooner they can tell us what's right and wrong,
the better.
That is kind of a, we can chill scenario.
I mean, I hope, assuming that we get that Hail Mary pass,
right, and the system really is aligned,
but if that's the case, then the moral realist, you know,
overlord just tells us what to do.
If you're kind of a moral relativist,
then you have this idea that, you know,
whatever people say is good, is what's good.
I think there, this idea of coherent extrapolated volition,
which goes back to Elias or Yudkowski
from Machine Intelligence Research Institute, has been very influential in the AI safety community.
And it's this idea, which I think to some degree threads the needle between those two things,
which is to say, you know, the thing that we want isn't just whatever people happen to
say when we pull them.
It isn't some objective truth that we could all be wrong about, but it's this idea of what we would
decide if we were smarter, if we had longer to think about it, if we could sort of pull together
in the appropriate way. I think in some ways that ends up becoming the job for the next hundred years.
Philosophers like Wil McCaskill, who is Toby's colleague, are saying, we need this
period that they're calling the long reflection.
Where we basically need to just, everybody just chill.
We need to take maybe a million years to just figure out what we want to do with the cosmos and,
you know, take our time. There's this very influential essay by Nick Bostrom called Astronomical Waste,
which says that something like, you know, every second that passes that we don't colonize the stars
is equivalent to like, you know, trillions of human lives being lost that could have been lived with that, you know,
had we acted sooner. And yet, we still need to take our time because the consequence of
screwing it up is even worse. So there's a lot of, I don't know, a lot of folks coming at AI
safety from the philosophy side saying, what we really need to do is just chill, like leave the space of possibility open.
And it's interesting to me because there's a lot of technical AI safety work on this idea
of option value, preserving the ability of a system to achieve various goals in the future.
So, you know, something that you might want in a system is the system doesn't take actions
which permanently foreclose possibilities in that space.
Whether it's shattering the vase that you can't put back together again or killing
the person that you can't bring back to life, whatever it might be.
There are some really, I think, really encouraging technical results in these sorts of toy environments
where if you give the agent randomly generated objective functions and say, okay, I want you to perform some task,
but preserve your option value to like later do
these randomly generated other tasks.
The system, at least in these simplified examples,
behaves with what seems like a very human amount of
a lot of work.
Very delicate.
It won't just push the minvars out of the way
to run out the door.
Man, that is so interesting. I hadn't heard about that before.
But it makes complete sense that by forcing it to retain some level of optionality,
you restrict the maximizing, ridiculous maximizing effect that it can go after. Man, I, I'm gonna, once we're finished up,
I'm gonna ask you for some suggestions
for safety researchers and stuff,
because I'm gonna force this down
the audience's throat over the next year.
Yeah.
But man, Brian.
My name check, yeah.
I will do.
Victoria Krakow, not Deep Mind.
And one, it is one of the people working along on this.
And there's an idea by a guy named Alex Turner
that's called auxiliary utility preservation.
So I'll send you some links and you can,
you can share some actual papers.
That's my bedtime reading finished.
Man, thank you for coming on.
The alignment problem,
how can machines learn human values
will be linked in the show notes below. If people want to check out any more of your stuff, why should
they go?
I'm on Twitter at Brian Christian and on the way that BrianChristian.org.
Perfect, man. Thank you so much for coming on.
That's my pleasure. Thank you. Yeah, oh, yeah, oh, yeah, oh, yeah