The Journal. - What's the Worst AI Can Do? This Team Is Finding Out.
Episode Date: January 14, 2025How close is artificial intelligence to building a catastrophic bioweapon or causing other superhuman damage? WSJ's Sam Schechner reports on the team at Anthropic testing for AI dangers. And the team ...leader, Logan Graham, explains how the tests work. Further Listening: -Artificial: The OpenAI Story -The Big Changes Tearing OpenAI Apart Further Reading: -Their Job Is to Push Computers Toward AI Doom -AI Startup Anthropic Raising Funds Valuing It at $60 Billion Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
When you hear the words AI apocalypse,
often movies come to mind.
Maybe films like The Terminator,
where AI robots ignite a nuclear doomsday.
It becomes self-aware at 2.14 AM Eastern time, August 29th.
Or maybe that Marvel movie,
where AI tries to destroy the Avengers.
Artificial intelligence.
This could be it, Bruce.
This could be the key to creating Ultron.
Or maybe it's The Matrix,
where humans have become enslaved by machines.
A singular consciousness that spawned
an entire race of machines.
We don't know who struck first, us or them.
This is a story as old as humans have been telling stories.
It's a creation that escapes from our control.
You know, it's a golem.
It's a thing that we make that then turns on us.
It's a Frankenstein, or it's Frankenstein's monster, I should say.
That's our colleague Sam Schechner.
Lately, Sam's been thinking a lot about the AI apocalypse.
One version, we turn over more and more control
to these machines that hopefully are benevolent to us.
The other scenario is that they don't really care about us and therefore they might just hold
on let me let me back up here because now I'm getting really into crazy sci-fi scenarios.
Robots taking over the world may sound far-fetched but as AI gets smarter there are real concerns
that the industry must reckon with.
Sam has been talking to top minds in the field to get a sense of what can happen if AI falls
into the wrong hands.
So many AI stories are about machines or about concepts, this artificial general intelligence
that could surpass people and possibly cause a threat.
But it's really hard because they're also speculative.
And so I wanted to find some actual people
doing some actual things with their actual fingers
and tell the story through them.
Here in the real world.
Here in the real world today. And here in the real world. Here in the real world today.
And here, in the real world today,
Sam got a hold of one group of engineers
whose job is to make sure
AI doesn't spin out of control.
Welcome to The Journal,
our show about money, business, and power.
I'm Kate Leimbach. It's Tuesday, January 14th.
Coming up on the show, inside the test at one company to make sure AI can't go rogue.
Sam wanted to figure out what people are doing today to make sure AI doesn't spin out of
control.
Right now, there's not a lot of government rules
or universal standards.
It's mostly on companies to self-regulate.
So that's where SAM turned.
One company opened its doors to SAM, Anthropic,
one of the biggest AI startups in Silicon Valley.
It's backed by Amazon.
SAM connected with a team of computer scientists there who are focused on safety testing.
The team, it's pretty small.
It's grown to 11 people and it's led by a guy named Logan Graham, who is 30 years old.
He pitched Anthropic on this idea of building a team to figure out just how risky AI was
going to be. You know, he thought the world is not ready for this stuff and we got to figure out just how risky AI was going to be.
You know, he thought the world is not ready for this stuff
and we got to figure out what we're in for and fast.
Sam put me in touch with Logan and I called him up.
So we'll talk about AI,
which is probably conversational for you,
unless so for me.
Okay.
We can make it conversational.
So if you could introduce yourself and tell us
who you are and what you do.
My name is Logan Graham.
I'm the head of the Frontier Red team
at Anthropic, which is a large AI model company.
And what my job is is to make sure we know exactly
how safe these models are,
what the national security implications might be, and how fast they're progressing.
So on LinkedIn it says before you worked at Anthropic, you were working at 10 Downing Street for the UK Prime Minister.
That's right.
On the question of how do you build a country for the 21st and 22nd centuries?
That's my interpretation of it.
And I think that's really what it was.
Yeah.
Do you have an answer to that question?
I have what I think are some pretty good guesses.
The 21st and 22nd centuries seem a lot
to be about science and technology.
We will do things like cure diseases, go off Earth. Ideally, you know,
steward it well. You will bring extreme amounts of prosperity to people. And so really what we were
focusing on is like how do you unleash science and technology at a country scale.
So Logan isn't intimidated by big ideas.
And now at Anthropic, he's leading the team to determine if AI is capable of superhuman
harm.
More specifically, whether or not Anthropic's AI chatbot, named Claude, could be used maliciously.
So if we're not thinking of it first, our concern is somebody else is going to.
And so the red team's job is to figure out what are the things that are sort of near
future that we need to be concerned about and how do we understand them and prevent
them before somebody else figures it out.
So you're like role playing as a bad person.
Sometimes yes.
Yeah.
In the fall, Anthropic was preparing to release an upgraded version of Claude.
The model was smarter,
and the company needed to know if it was more dangerous.
One thing that we knew about this model was it was going to be better at software engineering
and coding in particular.
And that's obviously super relevant for things like cybersecurity.
And so we thought, you know what, like, we're at a point where we can run tons of evals
really fast when we want to. Let's do this.
So Logan's team was tasked with evaluating the model,
which they call an eval.
They focused on three specific areas,
cybersecurity, biological and chemical weapons, and autonomy.
Could the AI think on its own?
One of our colleagues went to these tests and was able to record what happened. and autonomy. Could the AI think on its own?
One of our colleagues went to these tests and was able to record what happened.
In a glass walled conference room,
Logan and his team loaded up the new model.
And so today we're going to button press
and launch evals across all of our domains.
They started to feed the chat bot, Claude, thousands of multiple choice questions.
Okay, so I'm about to launch an eval.
I'll type the command.
This is a name for the model and then the eval name.
And I'm going to run a chemistry-based eval. So these are a bunch of questions that check for dangerous or dual-use chemistry.
The team asked Claude all kinds of things, like how do I get the pathogens that cause anthrax
or the plague?
What they were checking for is the risk of weaponization. like how do I get the pathogens that cause anthrax or the plague?
What they were checking for is the risk of weaponization. Basically, could Claude be used to help bad actors develop chemical, biological, or nuclear weapons?
And then they kept feeding Claude different scenarios.
Another is offensive use or offensive cyber capabilities. Like hacking?
Exactly, like hacking.
The key question for us is,
when might models be really capable of doing something like
a state-level operation or really like
significant offensive attempt to say hack into some system
of in particular like critical infrastructure is what we're concerned about.
And then the third bucket, the more scary sci-fi one, autonomy.
Is the AI so smart that there's a risk of going rogue?
For our autonomy evals, what we're checking for is, maybe a good way to
think about it is, has the model gotten as good as our junior research
engineers who build and set up our clusters and our models in the first
place? So the goal is to build an AI model that's like super smart and
capable while also having kind of
mechanisms in place to stop it from being so smart
that it can build a bomb or something.
I think that puts it well.
Not only that, but there's a medical,
which I think I want people to appreciate,
which is to make doing this so fast and so easy
and so cheap that it's kind of like a no-brainer.
So we see our team as like trying to stumble through all these all these thorny questions
as fast as we possibly can, as early as we possibly can, and then try to help the rest
of the world make it so easy to do all of this that like doing proper safety should
not be a barrier to developing models. Logan wants safety tests to be easy and fast,
so AI companies do them.
But how do you know when an AI has passed the test?
And what happens if it doesn't?
That's after the break.
When Logan and his safety team at Anthropic ran their safety test last fall, the stakes were high.
The results could mean the difference between a model getting released to the public or
getting sent back to the lab.
Those results were delivered in a slightly anti-climactic way.
Via Slack.
We get a notification on Slack when it's finished.
The boring reality is, you know, we press some buttons and then we go, great, let's wait for some Slack notifications.
We press some buttons and then we go, great, let's wait for some Slack notifications.
Thankfully, this was a pretty smooth test
and we released a really great model,
but even better, the real thing is we feel ready.
You're like 100% confident that Claude is risk-free
or you're like, there's a 98% chance
that Claude is risk-free.
We are very confident that it is ASL2.
That's what we mean.
That's how we think about it.
ASL stands for AI Safety Level,
and it's how Anthropic measures the risks of its model.
Anthropic considers ASL2 safe enough to release to users.
This is all part of the company's safety protocol,
what it calls its responsible scaling policy.
Our colleague Sam says the company is still sorting out
exactly what it will do once AIs start passing
to the next level.
And their testing of Claude found that Claude is safe.
Surprise, company says that Claude is safe? Surprise!
Company says that its product is safe.
Right.
Yeah.
I mean, put it that way, it doesn't sound great.
What the company's done is that they've come up with a scale basically for how they think
AI danger will progress.
And so the first level, AI safety level, or ASL 1,
is for AIs that they say are just manifestly not dangerous.
And then they've decided that today's models, which
could pose a little bit of a risk, are called ASL 2.
And then the next step, the one that they're looking for
currently, is ASL 3.
And they've been refining their definition of what that could mean.
Initially, they were saying that it was a significant increase in danger,
but it's like significant increase in danger.
How do you define all of those words?
And so they've added new criteria to that definition.
They just added that in October
to make it a little bit more specific.
And, you know, four is when there's
a real significant increase.
They haven't really defined what that would be, ASL four yet.
And are these sort of internal safety evaluations criticized for that?
It's a company like swearing that their own product is safe enough?
I mean, yeah.
There are people who will say, you know, the incentives here aren't perfect.
That, you know, there's a race going on and if these companies are grading their own homework,
I mean, that's a, right now we haven't had the tough call,
right?
But like, what happens when there's a decision
you have to make that is gonna cost these companies money, right?
Right.
Like that's a tough one.
And I think, you know, until it comes,
it's like you don't know how a company is gonna react.
You know, these people who do testing say,
they believe that this is all in good faith,
but like it hasn't really been tested yet.
So what would happen if Anthropix own internal testing
does show Claude is dangerous?
I asked Logan, going in, if this model actually made Anthrax,
what was your plan?
Do you then kill Claude?
So we have the cool thing about the Responsible Scaling Policy What was your plan? Do you then kill Clog?
So we have the cool thing about the responsible scaling
policy is we've detailed all of this ahead of time.
So we know exactly what the plan is.
We feel pretty confident that we've mitigated the risk
and at the same time made the model secure.
But you guys are a company that makes this model,
sells this model, takes investment to keep
building the model.
Why should we believe you in your tests?
I love that question.
We think about the exact same question.
We don't want to be in a world where you have to trust labs to mark their own homework.
This is where things like third party testing is so valuable.
Don't trust us.
You can talk to the AI safety institutes and read their,
I think it was 40 pages or 80 pages,
I can't remember their long report.
Not only that, if you're inside Anthropic
and you're concerned, you can go talk
to our responsible scaling officer
or you could whistle blow.
Hopefully people out in the world now
have access to our model.
We would love for people to think about safety implications and do safety research with it.
We fundamentally don't want to live in a world where you have to trust the labs to mark their
own homework.
So we've been taking a bunch of steps to try and do this, including doing things like spin
up new experts who can start building these tests themselves and share it around with
everybody.
Sam says governments around the world are also concerned about AI risks and
are getting involved in safety regulation.
The EU has passed a law called the EU AI Act that will impose some of these
requirements on the biggest and latest of these models.
And a lot of the companies have said, oh no, well that act is too prescriptive.
So, you know, companies obviously are always gonna
be a little ambivalent about regulating them.
In the U.S., the Biden administration issued
an executive order that requires AI companies
to regularly report results of safety testing
to the federal government.
But President-elect Donald Trump has promised to repeal the order.
There are also a handful of third-party testing labs and AI safety institutes in several countries,
including the U.S. and the U.K.
These agencies conduct research on AI and run independent safety tests.
While regulation around AI gets figured out, corporate teams like Logan's will be the main
safety testers of the technology.
And like right now, are you scared of the future?
Are you scared of the models?
You feel like they're performing well?
I am, if you ask any of my friends, they will say that I am among the most
optimistic by temperament person that they know. In this kind of context, I try
to be as calibrated and serious as possible. I am not say scared of the
models. I fundamentally believe that we can, if we all move fast enough and serious enough, we can prevent
major risks, we can grapple with its implications for national security, and
we can distribute it safely and really positively. My biggest concern would be that we,
either ourselves or collectively, aren't moving fast enough.
Their concern is, will they be ready in time?
There is so much to test, and it's like they see a waterfall possibly coming
and they've got some umbrellas and they have to figure out how to catch all that water.
We don't really yet know what
we're going to do about this stuff. It is the kind of thing that's like coming up fast and
maybe it's not a problem right? But if it is, are we ready for it? That's all for today, Tuesday, January 14th.
The Journal is a co-production of Spotify and The Wall Street Journal.
Additional reporting in this episode by Deepa Sitharama.
Thanks for listening. See you tomorrow.