Lex Fridman Podcast - #431 – Roman Yampolskiy: Dangers of Superintelligent AI

Episode Date: June 2, 2024

Roman Yampolskiy is an AI safety researcher and author of a new book titled AI: Unexplainable, Unpredictable, Uncontrollable. Please support this podcast by checking out our sponsors: - Yahoo Finance:... https://yahoofinance.com - MasterClass: https://masterclass.com/lexpod to get 15% off - NetSuite: http://netsuite.com/lex to get free product tour - LMNT: https://drinkLMNT.com/lex to get free sample pack - Eight Sleep: https://eightsleep.com/lex to get $350 off EPISODE LINKS: Roman's X: https://twitter.com/romanyam Roman's Website: http://cecs.louisville.edu/ry Roman's AI book: https://amzn.to/4aFZuPb PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: - Check out the sponsors above, it's the best way to support this podcast - Support on Patreon: https://www.patreon.com/lexfridman - Twitter: https://twitter.com/lexfridman - Instagram: https://www.instagram.com/lexfridman - LinkedIn: https://www.linkedin.com/in/lexfridman - Facebook: https://www.facebook.com/lexfridman - Medium: https://medium.com/@lexfridman OUTLINE: Here's the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) - Introduction (09:12) - Existential risk of AGI (15:25) - Ikigai risk (23:37) - Suffering risk (27:12) - Timeline to AGI (31:44) - AGI turing test (37:06) - Yann LeCun and open source AI (49:58) - AI control (52:26) - Social engineering (54:59) - Fearmongering (1:04:49) - AI deception (1:11:23) - Verification (1:18:22) - Self-improving AI (1:30:34) - Pausing AI development (1:36:51) - AI Safety (1:46:35) - Current AI (1:51:58) - Simulation (1:59:16) - Aliens (2:00:50) - Human mind (2:07:10) - Neuralink (2:16:15) - Hope for the future (2:20:11) - Meaning of life

Transcript
Discussion (0)
Starting point is 00:00:00 The following is a conversation with Roman Yimpolsky, an AI safety and security researcher and author of a new book titled AI Unexplainable, Unpredictable, Uncontrollable. He argues that there's almost 100% chance that AGI will eventually destroy human civilization. As an aside, let me say that I will have many, often technical conversations on the topic
Starting point is 00:00:25 of AI, often with engineers building the state of the art AI systems. I would say those folks put the infamous P-Doom or the probability of AGI killing all humans at around 1 to 20%. But it's also important to talk to folks who put that value at 70, 80, 90, and as in the case of Roman at 99.99 and many more nines percent. I'm personally excited for the future and believe it will be a good one, in part because of the amazing technological innovation we humans create. But we must absolutely not do so with blinders on, ignoring the possible risks, including
Starting point is 00:01:08 existential risks of those technologies. That's what this conversation is about. And now a quick few second mention of each sponsor. Check them out in the description. It's the best way to support this podcast. We got Yahoo Finance for investors, Masterclass for learning, NetSuite for business, Elment for hydration, and 8Sleep for sweet, sweet naps. Choose wisely, my friends. Also, if you want to get in touch with me,
Starting point is 00:01:38 or for whatever reason, work with our amazing team, let's say, just go to lexfreeman.com slash contact. And now onto the full ad reads. As always, no ads let's say. Just go to lexfreeman.com slash contact. And now onto the full ad reads. As always, no ads in the middle. I try to make these interesting but if you must skip them, friends, please still check out our sponsors. I enjoy their stuff, maybe you will too. This episode is brought to you by Yahoo Finance, a site that provides financial management, reports, information, and news for investors. It's my main go-to place for financial stuff. I also added my portfolio to it. I guess it used to be TD Ameritrade and then I got transported, transformed, moved to Charles Schwab. I guess
Starting point is 00:02:22 that was an acquisition of some sort. I have not been paying attention. All I know is I hate change and try and figure out the new interface of Schwab. When I log in once a year or however long I log in, it's just annoying. Anyway, one of the ways to avoid that annoyance is tracking information about my portfolio from Yahoo Finance. So you can drag over your portfolio and in the same place, find out all the news analysis
Starting point is 00:02:53 information, all that kind of stuff. Anyway, for comprehensive financial news and analysis, go to yahoofinance.com. That's yahoofinance.com. I don't know why I whispered that. This episode is also brought to you by Masterclass, where you can watch over 180 classes from the best people in the world in their respective disciplines. We got Aaron Franklin on barbecue and brisket, something I watched recently. And I love brisket.
Starting point is 00:03:24 I love barbecue. It's one of my favorite things about Austin. It's funny when the obvious cliche thing is also the thing that brings you joy. So it almost doesn't feel genuine to say, but I really love barbecue. My favorite place to go is probably Terry Black's. I've had Franklin's a couple of times, it's also amazing.
Starting point is 00:03:44 I've actually don't remember myself having bad barbecue or even mediocre barbecue in Austin. So it's hard to pick favorites because it really boils down to the experience you have when you're sitting there. One of my favorite places to sit is Terry Black's. They have this, I don't know, it feels like a tavern. I feel like a cowboy.
Starting point is 00:04:08 Like I just robbed a bank in some town in the middle of nowhere in West Texas. And I'm just sitting down for some good barbecue. And the sheriffs walk in and there's a gunfight and all that, as usual. Anyway, get unlimited access to every Masterclass and get an additional 15% off an annual membership at masterclass.com slash LexPod. That's masterclass.com slash LexPod. This episode is also brought to you by NetSuite, an all-in-one cloud business management system. One of the most fulfilling things in life is the people you surround yourself with.
Starting point is 00:04:43 Just like in the movie 300, all it takes is 300 people to do some incredible stuff. But they all have to be shredded. It's really, really important to look good with your, no. It's really, really important to always be ready for war in physical and mental shape. No, not really, but I guess if that's your thing, happiness is the thing you should be chasing. And there's a lot of ways to achieve that.
Starting point is 00:05:12 For me, being in shape is one of the things that make me happy because I can move about the world and have a lightness to my physical being if I'm in good shape. Anyway, I say all that because getting a strong team together and having them operate as an efficient, powerful machine is really important for the success of the team, for the happiness of the team, and the individuals in that team. NetSuite is a great system that runs the machine inside the machine for any
Starting point is 00:05:47 size business. 37,000 companies have upgraded to NetSuite by Oracle. Take advantage of NetSuite's flexible financing plan at netsuite.com slash Lex. That's netsuite.com slash Lex. This episode is also brought to you by Element. Electrolyte drink mix of sodium, potassium, and magnesium that I've been consuming multiple times a day. Watermelon salt is my favorite. Whenever you see me drink from a cup on the podcast, almost always it's going to be water with some element in it. I use an empty powderade bottle, 28 fluid ounces, fill it with water, put one packet of watermelon salt element in it, mix it up, put it in the fridge. And then when it's time to drink, I take it out of the fridge and I drink it. And I drink a lot of those
Starting point is 00:06:36 a day and it feels good. It's delicious. Whenever I do crazy physical fasting, all that kind of stuff, the element is always by my side and more and more you're gonna see probably the the sparkling water thing or whatever that element is making so it's in a can and it's freaking delicious there's four flavors the lemon one is the only one I don't like the other three I really love and I forget their names but they're freaking delicious and you're gonna see it more and more on my desk except for the fact that I run out very quickly because I consume them very quickly. Get a simple pack for free with any purchase.
Starting point is 00:07:12 Try it at drinkelement.com slash Lex. This episode is also brought to you by 8 Sleep and it's pod for Ultra. This thing is amazing. The Ultra part of that adds a base that goes between the mattress and the bed frame and can elevate to like a reading position. So it modifies the positioning of the bed. On top of all the cooling and heating and all that kind of stuff that it can do and do it better in the Pod 4, I think it has 2X the cooling power of Pod 3. So they're improving on the main thing that they do, but also there's the ultra part that can adjust the bed.
Starting point is 00:07:48 It can cool down each side of the bed to 20 degrees Fahrenheit below room temperature. One of my favorite things is to escape the world on a cool bed with a warm blanket and just disappear for 20 minutes or for eight hours into a dream world where everything is possible where everything is allowed. It's a chance to explore the union shadow, the good, the bad, and the ugly. But it's usually good. It's usually awesome. And I actually don't dream that much but when I do
Starting point is 00:08:23 it's awesome. The whole actually don't dream that much, but when I do, it's awesome. The whole point though, is that I wake up refreshed. Taking your sleep seriously is really, really important. When you get a chance to sleep, do it in style. And do it on a bed that's awesome. Go to asleep.com slash Lex and use code Lex to get $350 off the pod for Ultra. This is the Lex Friedman Podcast.
Starting point is 00:08:49 To support it, please check out our sponsors in the description. And now, dear friends, here's Roman Yampolsky. What to you is the probability that superintelligent AI will destroy all human civilization? What's the timeframe? Let's say 100 years, in the next 100 years. So the problem of controlling AGI or super intelligence, in my opinion, is like a problem of creating a perpetual safety machine. By analogy with perpetual motion machine, it's impossible. Yeah, we may succeed and do a good job with GPT-5, 6, 7, but they just keep five, six, seven, but they just keep improving, learning, eventually self-modifying, interacting with the environment, interacting with malevolent actors. The difference between cybersecurity, narrow AI safety, and safety for general AI for superintelligence
Starting point is 00:10:02 is that we don't get a second chance. With cybersecurity, somebody hacks your account, what's the big deal? general AI for superintelligence is that we don't get a second chance. With cyber security somebody hacks your account what's the big deal you get a new password new credit card you move on. Here if we're talking about existential risks you only get one chance. So you're really asking me what are the chances that will create the most complex software ever on a first try with zero bugs and it will continue have zero bugs for 100 years or more. So there is an incremental improvement of systems leading up to AGI.
Starting point is 00:10:38 To you, it doesn't matter if we can keep those safe. There's going to be one level of system at which you cannot possibly control it. I don't think we so far have made any system safe. At the level of capability they display, they already have made mistakes. We had accidents. They've been jailbroken. I don't think there is a single large language model today which no one was successful at
Starting point is 00:11:09 making do something developers didn't intend it to do. But there's a difference between getting it to do something unintended, getting it to do something that's painful, costly, destructive, and something that's destructive to the level of hurting billions of people or hundreds of millions of people billions of people or the entirety of human civilization that's a big leap exactly but the systems we have today have capability of causing X amount of damage so then they fail that's all we get if we develop systems capable of impacting all of humanity, all of universe, the damage is proportionate.
Starting point is 00:11:47 What to you are the possible ways that such kind of mass murder of humans can happen? It's always a wonderful question. So one of the chapters in my new book is about unpredictability. I argue that we cannot predict what a smarter system will do. So you're really not asking me how superintelligence will kill everyone. You're asking me how I would do it. And I think it's not that interesting. I can tell you about the standard, you know, nanotech, synthetic, bio, nuclear. Superintelligence will come up with something completely new, completely super. We may not even recognize that as a possible path to achieve that goal. So there is like an unlimited level of creativity
Starting point is 00:12:31 in terms of how humans could be killed. But you know, we could still investigate possible ways of doing it. Not how to do it, but at the end, what is the methodology that does it? You know, but at the end, what is the methodology that does it? Shutting off the power, and then humans start killing each other maybe
Starting point is 00:12:51 because the resources are really constrained. Then there's the actual use of weapons, like nuclear weapons, or developing artificial pathogens, viruses, that kind of stuff. We can still kind of think through that and defend against it, right? There's a ceiling to the creativity of mass murder of humans here, right? The options are limited. They are limited by how imaginative we are. If you are that much smarter, that much more creative,
Starting point is 00:13:18 you are capable of thinking across multiple domains, do novel research in physics and biology, you may not be limited by those tools. If squirrels were planning to kill humans, they would have a set of possible ways of doing it, but they would never consider things we can come up with. So are you thinking about mass murder and destruction of human civilization?
Starting point is 00:13:38 Are you thinking of, with squirrels, you put them in a zoo and they don't really know they're in a zoo? If we just look at the entire set of undesirable trajectories, majority of them are not going to be death. Most of them are going to be just like things like Brave New World where the squirrels are fed dopamine
Starting point is 00:14:00 and they're all doing some kind of fun activity and the fire, the soul of humanity is lost because of the drug that's fed to it. Or literally in a zoo. We're in a zoo, we're doing our thing, we're playing a game of Sims and the actual players playing that game are AI systems. Those are all undesirable because of the free will.
Starting point is 00:14:24 The fire of human consciousness is dimmed through that process, but it's not killing humans. So are you thinking about that or is the biggest concern literally the extinctions of humans? I think about a lot of things. So there is X-risk, existential risk, everyone's dead. There is S-risk, suffering risks, where everyone wishes they were dead. We have also idea for I-risk, Ikigai risks, where we lost our meaning. The systems can be more creative, they can do all the jobs. It's not obvious what you
Starting point is 00:14:58 have to contribute to a world where super intelligence exists. Of course, you can have all the variants you mentioned where we are safe, we are kept alive, but we are not in control. We are not deciding anything. We are like animals in a zoo. There is, again, possibilities we can come up with as very smart humans, and then possibilities something a thousand times smarter can come up with for reasons we cannot comprehend. I would love to sort of dig into each of those, X-Risk, S-Risk and I-Risk. So can you like linger on I-Risk? What is that? So Japanese concept of Ikegai, you find something which allows you to make money,
Starting point is 00:15:40 you are good at it and the society says we need it. So like you have this awesome job, you are a podcaster, gives you a lot of meaning, you have a good life, I assume you're happy. That's what we want most people to find, to have. For many intellectuals, it is their occupation which gives them a lot of meaning. I am a researcher, philosopher, scholar, that means something to me. In a world where an artist is not feeling appreciated because his art is just not competitive with what is produced by machines or a writer or scientist will lose a lot of that. And at the lower level, we're talking about complete technological unemployment. We're not losing 10% of jobs, we're losing all jobs. What do people do with all that
Starting point is 00:16:31 free time? What happens then? Everything society is built on is completely modified in one generation. It's not a slow process where we get to kind of figure out how to live that new lifestyle, but it's pretty quick. In that world, can't humans do what humans currently do with chess, play each other, have tournaments, even though AI systems are far superior at this time in chess. So we just create artificial games, or for they're real like the Olympics and we do all kinds of different competitions and have fun. Focus, maximize the fun and let the AI focus on the productivity. It's an option. I have a paper where I try to solve the value alignment problem for multiple agents.
Starting point is 00:17:22 And the solution to avoid compromise is to give everyone a personal virtual universe. You can do whatever you want in that world. You could be king, you could be slave, you decide what happens. So it's basically a glorified video game where you get to enjoy yourself and someone else takes care of your needs and the substrate alignment is the only thing we need to solve. We don't have to get eight billion humans to agree on anything. So, okay, so why is that not a likely outcome? Why can't AI systems create video games for us
Starting point is 00:17:56 to lose ourselves in, each with an individual video game universe? Some people say that's what happened. We're in a simulation. And we're playing that video game. And now we're creating what? Maybe we're creating artificial threats for ourselves to be scared about because fear is really exciting.
Starting point is 00:18:14 It allows us to play the video game more vigorously. And some people choose to play on a more difficult level with more constraints. Some say, OK, I'm just gonna enjoy the game, high privilege level, absolutely. So okay, what was that paper on multi-agent value alignment? Personal universes, personal universes. So that's one of the possible outcomes.
Starting point is 00:18:37 But what in general is the idea of the paper? So it's looking at multiple agents, they're human, AI, like a hybrid system, whether it's humans and AIs, or is it looking at humans or just intelligent agents? LR So this is in order to solve value alignment problem, I'm trying to formalize it a little better. Usually we're talking about getting AIs to do what we want, which is not well defined. Are we talking about creator of a system, owner of that AI, humanity as a whole, but
Starting point is 00:19:04 we don't agree on much. There is no universally accepted ethics, morals across cultures, religions. People have individually very different preferences politically and such. So even if we somehow managed all the other aspects of it, programming those fuzzy concepts and getting AI to follow them closely, We don't agree on what to program in. So my solution was, okay, we don't have to compromise on room temperature. You have your universe, I have mine, whatever you want. And if you like me, you can invite me to visit your universe. We don't have to be independent, but the point is you can be. And virtual reality is getting pretty good. It's going to hit a point where you can't tell the difference. And if you
Starting point is 00:19:44 can't tell if it's real or not, what's the difference? So basically, give up on value alignment. Create an entire, it's like the multiverse theory. This is create an entire universe for you, what are your values? You still have to align with that individual. They have to be happy in that simulation.
Starting point is 00:20:02 But it's a much easier problem to align with one agent versus eight billion agents plus animals, aliens. So you convert the multi-agent problem into a single agent problem. I'm trying to do that, yeah. Okay, is there any way to, so okay, that's giving up on the value alignment problem. Well, is there any way to solve the value alignment problem
Starting point is 00:20:26 where there's a bunch of humans, multiple humans, tens of humans or eight billion humans that have very different set of values? It seems contradictory. I haven't seen anyone explain what it means outside of kind of words which pack a lot, make it good, make it desirable, make it something they don't regret. But how do you specifically formalize those notions? How do you program them in? I haven't seen anyone make progress on that so far. But isn't that the whole optimization journey that we're doing as a human civilization? We're looking at geopolitics. Nations are in a state of anarchy with each other. They
Starting point is 00:21:06 start wars, there's conflict, and oftentimes they have very different views of what is good and what is evil. It's not what we're trying to figure out, just together trying to converge towards that. So we're essentially trying to solve the value alignment problem with humans. Right. But the examples you gave, some of them are, for example, two different religions saying this is our holy site and we are not willing to compromise it in any way. If you can make two holy sites in virtual worlds, you solve the problem. But if you only have one, it's not divisible, you're kind of stuck there. But what if we want to be a tension with each other and that through that tension we understand
Starting point is 00:21:48 ourselves and we understand the world. So that's the intellectual journey we're on as a human civilization, is we create intellectual and physical conflict and through that figure stuff out. If we go back to that idea of simulation and this is entertainment kind of giving meaning to us, the question is how much suffering is reasonable for a video game. So yeah, I don't mind a video game where I get haptic feedback, there is a little bit of shaking, maybe I'm a little scared. I don't want a game where like kids are tortured, literally. That seems unethical, at least by our human standards
Starting point is 00:22:26 Are you suggesting it's possible to remove suffering if we're looking at human civilization as an optimization problem? So we know there are some humans who because of a mutation don't experience physical pain so at least physical pain can be Mutated out Re-engineered out. Suffering in terms of meaning, like you burn the only copy of my book, is a little harder. But even there you can manipulate your hedonic set point, you can change defaults, you can reset.
Starting point is 00:22:58 Problem with that is if you start messing with your reward channel, you start wireheading and end up bleasing out a little too much. Well, that's the question. Would you really want to live in a world where there's no suffering? That's a dark question. But is there some level of suffering that reminds us of what this is all for?
Starting point is 00:23:22 I think we need that, but I would change the overall range. So right now it's negative infinity to kind of positive infinity, pain, pleasure, access. I would make it like zero to positive infinity and being unhappy is like, I'm close to zero. Okay, so what's the S-risk? What are the possible things
Starting point is 00:23:40 that you're imagining with S-risk? So mass suffering of humans. What are we talking about there caused by AGI? So there are many malevolent actors. We can talk about psychopaths, crazies, hackers, doomsday cults. We know from history, they tried killing everyone. They tried on purpose to cause maximum amount of damage, terrorism. What if someone malevolent wants on purpose to torture all humans as long as possible? You solve aging, so now you have functional immortality
Starting point is 00:24:13 and you just try to be as creative as you can. Do you think there is actually people in human history that tried to literally maximize human suffering? It's just studying people who have done evil in the world. It seems that they think that they're doing good. And it doesn't seem like they're trying to maximize suffering. They just cause a lot of suffering as a side effect of doing what they think is good. So there are different malevolent agents.
Starting point is 00:24:42 Some may be just gaining personal benefit and sacrificing others to that cause. Others we know for a fact are trying to kill as many people as possible. When we look at recent school shootings, if they had more capable weapons, they would take out not dozens, but thousands, millions, billions. Well we don't know that but that is a terrifying possibility and we don't want to find out. Like if terrorists had access to nuclear weapons how far would they go? Is there a limit to what they're willing to do. In your sense, is there some level in actors where there's no limit?
Starting point is 00:25:28 There is mental diseases where people don't have empathy, don't have this human quality of understanding suffering in ours. And then there's also a set of beliefs where you think you're doing good by killing a lot of humans. Again, I would like to assume that normal people never think like that. It's always some sort of psychopaths, but yeah. And to you, AGI systems can carry that and be more competent at executing that. They can certainly be more creative.
Starting point is 00:26:05 They can understand human biology better, understand our molecular structure, genome. Again, a lot of times torture ends than an individual dies. That limit can be removed as well. So if we're actually looking at X-risk and S-risk as the systems get more and more intelligent, don't you think it's possible to anticipate the ways they can do it and defend against it like we do with the cybersecurity with the do security systems? Right, we can definitely keep up for a while. I'm saying you cannot do it indefinitely. At some point the cognitive gap is too big, the surface you
Starting point is 00:26:47 have to defend is infinite. But attackers only need to find one exploit. So to you, eventually, this is heading off a cliff. If we create general super intelligences, I don't see a good outcome long term for humanity. The only way to win this game is not to play it. Okay, well, we'll talk about possible solutions and what not playing it means. But what are the possible timelines here to you? What are we talking about? We're talking about a set of years, decades, centuries.
Starting point is 00:27:18 What do you think? I don't know for sure. The prediction markets right now are saying 2026 for AGI. I heard the same thing from CEO of Antropic, DeepMind, so maybe we are two years away, which seems very soon, given we don't have a working safety mechanism in place or even a prototype for one. And there are people trying to accelerate those timelines because they feel we're not getting there quick enough. Well, what do you think they mean when they say AGI? So the definitions we used to have and people are modifying them a little bit lately.
Starting point is 00:27:52 Artificial general intelligence was a system capable of performing in any domain a human could perform. So kind of you're creating this average artificial person, they can do cognitive labor, physical labor, where you can get another human to do it. Superintelligence was defined as a system which is superior to all humans in all domains. Now people are starting to refer to AGI as if it's superintelligence. I made a post recently where I argued for me at least, if you average out over all the common human tasks, those systems are already smarter than an average human. So under that definition, we have it. Shane Lake has this
Starting point is 00:28:32 definition of where you're trying to win in all domains. That's what intelligence is. Now are they smarter than elite individuals in certain domains? Of course not. They're not there yet. But the progress is exponential. See, I'm much more concerned about social engineering. So to me, AI's ability to do something in the physical world, like the lowest hanging fruit, the easiest set of methods is by just getting humans to do it. It's going to be much harder to be the kind of viruses that take over the minds of robots, where the robots are executing the commands.
Starting point is 00:29:15 It just seems like human, social engineering of humans is much more likely. That would be enough to bootstrap the whole process. Okay, just to linger on the term AGI, what to you is the difference between AGI and human level intelligence? Human level is general in the domain of expertise of humans. We know how to do human things.
Starting point is 00:29:37 I don't speak dog language. I should be able to pick it up if I'm a general intelligence, it's kind of inferior animal. I should be able to learn that skill if I'm a general intelligence, it's kind of inferior animal, I should be able to learn that skill, but I can't. A general intelligence, truly universal general intelligence should be able to do things like that humans cannot do. To be able to talk to animals, for example.
Starting point is 00:29:55 To solve pattern recognition problems of that type, to do other similar things outside of our domain of expertise, because it's just not the world we live in. If we just look at the space of cognitive abilities we have, I just would love to understand what the limits are beyond which an AGI system can reach. Like, what does that look like? What about actual mathematical thinking
Starting point is 00:30:24 or scientific innovation, that kind of stuff? We know calculators are smarter than humans in that narrow domain of addition. But is it humans plus tools versus AGI or just raw human intelligence? Because humans create tools and with the tools they become more intelligent. So like there's a gray area there
Starting point is 00:30:49 what it means to be human when we're measuring their intelligence. So then I think about it, I usually think human with like a paper and a pencil, not human with internet and another AI helping. But is that a fair way to think about it? Because isn't there another definition of human level intelligence
Starting point is 00:31:04 that includes the tools that humans create? But we create AI, so at any point, you'll still just add super intelligence to human capability. That seems like cheating. No, controllable tools. There is an implied leap that you're making
Starting point is 00:31:20 when AGI goes from tool to entity, they can make its own decisions. So if we define human level intelligence as everything a human can do with fully controllable tools. It seems like a hybrid of some kind, you know, doing brain computer interfaces, you connecting it to maybe narrow AI, yeah, it definitely increases our capabilities.
Starting point is 00:31:44 So what's a good test to you that measures whether an artificial intelligence system has reached human level intelligence? And what's a good test where it has superseded human level intelligence to reach that land of AGI? I am old fashioned. I like Turing tests. I have a paper where I equate passing Turing tests
Starting point is 00:32:07 to solving AI complete problems because you can encode any questions about any domain into the Turing test. You don't have to talk about how was your day. You can ask anything. And so the system has to be as smart as a human to pass it in a true sense. But then you would extend that
Starting point is 00:32:24 to maybe a very long conversation. be as smart as a human to pass it in a true sense. But then you would extend that to maybe a very long conversation. I think the Alexa Prize was doing that. Basically, can you do a 20 minute, 30 minute conversation with an AI system? It has to be long enough to where you can make some meaningful decisions about capabilities, absolutely. You can brute force very short conversations.
Starting point is 00:32:51 So like literally, what does that look like? Can we construct formally a kind of test that tests for AGI? For AGI, it has to be there. I cannot give it a task. I can give to a human and it cannot do it if a human can. For superintelligence, it would be superior on all such tasks, not just average performance. Go learn to drive car, go speak Chinese, play guitar. Okay, great. I guess the following question, is there a test for the kind of AGI that would be susceptible to lead to S-risk or X-risk, susceptible to destroy human civilization. Like, is there a test for that? You can develop a test which will give you positives
Starting point is 00:33:37 if it lies to you or has those ideas. You cannot develop a test which rules them out. There is always possibility of what Bostrom calls a treacherous turn, where later on a system decides for game theoretic reasons, economic reasons to change its behavior. And we see the same with humans. It's not unique to AI. For millennia, we tried developing morals, ethics, religions, lie detector tests, and then employees betray the employers, spouses betray family. It's a pretty standard thing intelligent agents sometimes do.
Starting point is 00:34:12 So is it possible to detect when an AI system is lying or deceiving you? If you know the truth and it tells you something false, you can detect that, but you cannot know in general every single time. And again, the system you're testing today may not be lying. The system you're testing today may know you are testing it and so behaving. And later on, after it interacts with the environment, interacts with other systems, malevolent agents learns more, it may start doing those things. So do you think it's possible to develop a system where the creators of the system, the developers, the programmers don't know that it's deceiving them?
Starting point is 00:34:54 So systems today don't have long-term planning. That is not our… They can lie today if it optimizes, helps them optimize the reward. If they realize, okay, this human will be very happy if I tell them the following, they will do it if it brings them more points. And they don't have to kind of keep track of it. It's just the right answer to this problem every single time. At which point is somebody creating that? Intentionally, not unintentionally, intentionally creating an AI system that's doing long-term
Starting point is 00:35:30 planning with an objective function as defined by the AI system, not by a human. Well, some people think that if they're that smart, they're always good. They really do believe that. It's just benevolence from intelligence. So they'll always want what's best for us. Some people think that they will be able to detect problem behaviors and correct them at the time when we get there. I don't think it's a good idea. I am strongly against it, but yeah, there are quite a few people who in general are so optimistic about this technology, it could do no wrong.
Starting point is 00:36:07 They want it developed as soon as possible, as capable as possible. So there's going to be people who believe the more intelligent it is, the more benevolent, and so therefore, it should be the one that defines the objective function that it's optimizing when it's doing long-term planning. There are even people who say, okay, what's so special about humans, right?
Starting point is 00:36:27 We removed the gender bias, we're removing race bias. Why is this pro-human bias? We are polluting the planet, we are, as you said, fight a lot of wars, kind of violent. Maybe it's better if this super intelligent, perfect society comes and replaces us. It's normal stage in the evolution of our species. Yeah. So somebody says, let's develop an AI system that removes the violent humans from the world. And then it turns out that all humans have violence in them or the capacity for violence and therefore all humans are removed. Yeah, yeah, yeah.
Starting point is 00:37:07 Let me ask about Jan Lekoon. He's somebody who you've had a few exchanges with and he's somebody who actively pushes back against this view that AI is going to lead to destruction of human civilization, also known as AI-doomerism. In one example that he tweeted, he said, I do acknowledge risks, but, two points. One, open research and open source are the best ways to understand and mitigate the risks. And two, AI is not something that just happens. We build it.
Starting point is 00:37:47 We have agency in what it becomes. Hence, we control the risks, we meaning humans. It's not some sort of natural phenomena that we have no control over. So can you make the case that he's right and can you try to make the case that he's wrong? I cannot make a case that he's right. He's wrong in so many ways it's difficult for me
Starting point is 00:38:07 to remember all of them. He is a Facebook buddy so I have a lot of fun having those little debates with him. So I'm trying to remember the arguments. So one, he says we are not gifted this intelligence from aliens. We are designing it, we are making decisions about it. That's not true. It was true when we had expert systems, symbolic AI, decision trees. Today, you set up parameters for a model and you water this plant.
Starting point is 00:38:36 You give it data, you give it compute, and it grows. And after it's finished growing into this alien plant, you start testing it to find out what capabilities it has. And it takes years to figure out, even for existing models. If it's trained for six months, it will take you two, three years to figure out basic capabilities of that system. We still discover new capabilities in systems
Starting point is 00:38:58 which are already out there. So that's not the case. So just to link on that, to do the difference there, that there is some level of emergent intelligence that happens in our current approaches. So stuff that we don't hard code in. Absolutely, that's what makes it so successful. Then we had to painstakingly hard code in everything. We didn't have much progress.
Starting point is 00:39:22 Now, just spend more money and more compute and it's a lot more capable. And then the question is, when there is emergent intelligent phenomena, what is the ceiling of that? For you, there's no ceiling. For Jan Lakoon, I think there's a kind of ceiling that happens that we have full control over.
Starting point is 00:39:41 Even if we don't understand the internals of the emergence, how the emergence happens, there's a sense that we have control and an understanding of the approximate ceiling of capability, the limits of the capability. Let's say there is a ceiling. It's not guaranteed to be at the level which is competitive with us. It may be greatly superior to ours
Starting point is 00:40:05 so what about His statement about open research and open source are the best ways to understand and mitigate the risks Historically, he's completely right open source software is wonderful. It's tested by the community It's debugged but we're switching from tools to agents now you're giving Open-source weapons to psychopaths. Do we want to open source nuclear weapons? Biological weapons. It's not safe to give technology so powerful
Starting point is 00:40:34 to those who may misalign it, even if you are successful at somehow getting it to work in the first place in a friendly manner. But the difference with nuclear weapons, current AI systems are not akin to nuclear weapons. So the idea there is you're open sourcing it at this stage that you can understand it better. Large number of people can explore the limitation,
Starting point is 00:40:56 the capabilities, explore the possible ways to keep it safe, to keep it secure, all that kind of stuff, while it's not at the stage of nuclear weapons. So nuclear weapons, there's a no nuclear weapon, and then there's a nuclear weapon. With AI systems, there's a gradual improvement of capability, and you get to perform that improvement incrementally.
Starting point is 00:41:17 And so open source allows you to study how things go wrong, a study of the very process of emergence, study AI safety on those systems when there's not a high level of danger, all that kind of stuff. It also sets a very wrong precedent. So we open sourced model one, model two, model three, nothing ever bad happened, so obviously we're gonna do it with model four, it's just gradual improvement.
Starting point is 00:41:42 I don't think it always works with the precedent, like you're not stuck doing it the way you always did. It's just, it's as a precedent of open research and open development, such that we get to learn together. And then the first time there's a sign of danger, some dramatic thing happened, not a thing that destroys human civilization, but some dramatic thing happen, not a thing that destroys human civilization, but some dramatic demonstration of capability that can legitimately lead to
Starting point is 00:42:10 a lot of damage, then everybody wakes up and says, okay, we need to regulate this, we need to come up with safety mechanisms that stops this, right? But at this time, maybe you can educate me, but I haven't seen any illustration of significant damage done by intelligent AI systems. So I have a paper which collects accidents through history of AI, and they always are proportional to capabilities of that system. So if you have Tic-Tac-Toe playing AI, it will fail to properly play and loses the game, which it should draw, trivial.
Starting point is 00:42:42 Your spell checker will miss spell award, so on. I stopped collecting those because there are just too many examples of the eyes failing at what they are capable of. We haven't had terrible accidents in the sense of billion people got killed. Absolutely true. But in another paper, I argue that those accidents do not actually prevent people from continuing with research. And actually they kind of serve like vaccines. A vaccine makes your body a little bit sick so you can handle the big disease later much better. It's the same here.
Starting point is 00:43:18 People will point out, you know that accident, AI accident we had where 12 people died, everyone's still here, 12 people is less than smoking kills, it's not a big deal. So we continue. So in a way, it will actually be kind of confirming that it's not that bad. It matters how the deaths happen, whether it's literally murder by the AI system, then one is a problem. But if it's accidents, because of increased reliance
Starting point is 00:43:48 on automation, for example, so when airplanes are flying in an automated way, maybe the number of plane crashes increased by 17% or something. And then you're like, okay, do we really want to rely on automation? I think in the case of automation airplanes it decreased significantly okay same thing with autonomous vehicles like okay what are the pros and cons were the weight with the trade-offs here and you can have that discussion in an honest
Starting point is 00:44:16 way but I think the kind of things we're talking about here is mass scale pain and suffering caused by AI systems and I think we need to see illustrations of that in a very small scale to start to understand that this is really damaging versus Clippy versus a tool that's really useful to a lot of people to do learning, to do summarization of texts, to do question and answer, all that kind of stuff, to generate videos, a tool, fundamentally a tool versus an agent that can do a huge amount of damage. So you bring up example of cars. Cars were slowly developed and integrated.
Starting point is 00:45:02 If we had no cars and somebody came around and said I invented this thing it's called cars It's awesome. It kills like a hundred thousand Americans every year. Let's deploy it Would we deploy that? There's been fear-mongering about cars for a long time from the horse the transition from horses to cars There's a really nice channel there recommend people check out pessimists archive That documents all the fear-mongering about technology that's happened throughout history. There's definitely been a lot of fear-mongering about cars. There's a transition period there about cars, about how deadly they
Starting point is 00:45:36 are. We can try. It took a very long time for cars to proliferate to the degree they have now. And then you could ask serious questions in terms of the miles traveled, the benefit to the economy, the benefit to the quality of life that cars do versus the number of deaths, 30, 40,000 in the United States. Are we willing to pay that price? I think most people, when they're rationally thinking,
Starting point is 00:46:00 policymakers will say yes. We want to decrease it from 40,000 to zero and do everything we can to decrease it. There's all kinds of policies and centers you can create to decrease the risks with the deployment of this technology but then you have to weigh the benefits and the risks of the technology. And the same thing would be done with AI.
Starting point is 00:46:23 You need data, you need to know, but if I'm right and it's unpredictable, unexplainable and controllable, you cannot make this decision we're gaining $10 trillion of wealth, but we're losing, we don't know how many people. You basically have to perform an experiment on 8 billion humans without their consent. And even if they want to give you consent, they can't because they cannot give informed consent. They don't understand those things. Right, that happens when you go from the predictable to the unpredictable very quickly.
Starting point is 00:46:57 You just, but it's not obvious to me that AI systems would gain capability so quickly that you won't be able to collect enough data to study the benefits and the risks? We're literally doing it. The previous model, we learned about after we finished training it, what it was capable of. Let's say we stopped GPT-4 training run around human capability, hypothetically. We start training GPT-5, and I have no knowledge of insider training runs or anything and we start at that point of about human and we train it for the next nine months. Maybe two months
Starting point is 00:47:32 in it becomes super intelligent. We continue training it. At the time when we start testing it, it is already a dangerous system. How dangerous? I have no idea, but neither people training it. At the training stage, but then there's a testing stage inside the company. They can start getting intuition about what the system is capable to do. You're saying that somehow from leap from GPT-4 to GPT-5 can happen the kind of leap where GPT-4 was controllable
Starting point is 00:48:06 and GPT-5 is no longer controllable and we get no insights from using GPT-4 about the fact that GPT-5 will be uncontrollable. Like that's the situation you're concerned about. Whether leap from n to n plus 1 would be such that uncontrollable system is created without any ability for us to anticipate that. If we had capability of ahead of the run, before the training run, to register exactly what capabilities the next model will have at the end of the training run,
Starting point is 00:48:40 and we accurately guessed all of them, I would say you're right, we can definitely go ahead with this run. We don't have that capability. From GPT-4, you can build up intuitions about what GPT-5 will be capable of. It's just incremental progress. Even if that's a big leap in capability,
Starting point is 00:48:59 it just doesn't seem like you can take a leap from a system that's helping you write emails to a system that's going to destroy human civilization. It seems like it's always going to be sufficiently incremental such that we can anticipate the possible dangers. And we're not even talking about existential risks, but just the kind of damage you can do to civilization.
Starting point is 00:49:21 It seems like we'll be able to anticipate the kinds, not the exact, but the kinds of risks it might lead to and then rapidly develop defenses ahead of time and as the risks emerge. We're not talking just about capabilities, specific tasks. We're talking about general capability to learn. Maybe like a child at the time of testing and deployment, it is still not extremely capable, but as it is exposed to more data, real world,
Starting point is 00:49:54 it can be trained to become much more dangerous and capable. Let's focus then on the control problem. At which point does the system become uncontrollable? Why is it the more likely trajectory for you that the system becomes uncontrollable? So I think at some point it becomes capable of getting out of control. For game theoretic reasons it may decide not to do anything right away and for a long time just collect more resources, accumulate strategic advantage. Right away, it may be kind of still young, weak superintelligence, give it a decade, it's in charge of a lot more resources, it had time to make backups. So it's not obvious to me
Starting point is 00:50:38 that it will strike as soon as it can. Can we just try to imagine this future where there's an AI system that's capable of escaping the control of humans and then doesn't and waits. What's that look like? So one, we have to rely on that system for a lot of the infrastructure. So we have to give it access, not just to the internet, but to the task of managing power, government, economy, this kind of stuff. And that just feels like a gradual process, given the bureaucracies of all those systems involved.
Starting point is 00:51:18 We've been doing it for years. Software controls all those systems, nuclear power plants, airline industry, it's all software based. Every time there is electrical outage, I can't fly anywhere for days. But there's a difference between software and AI. There's different kinds of software. So to give a single AI system access to the control of airlines and the control of the economy,
Starting point is 00:51:43 that's not a trivial transition for humanity. No, but if it shows it is safer, in fact, than it's in control, we get better results, people will demand that it was put in place. Absolutely. And if not, it can hack the system. It can use social engineering to get access to it. That's why I said it might take some time for it to accumulate those resources. It just feels like that would take a long time for either humans to trust it or for the social engineering to come into play. It's not a thing that happens overnight. It feels like something that happens across one or two decades.
Starting point is 00:52:15 I really hope you're right, but it's not what I'm seeing. People are very quick to jump on the latest trend. Early adopters will be there before it's even deployed buying prototypes. Maybe the social engineering. I could see, because, so for social engineering, AI systems don't need any hardware access. It's all software. So they can start manipulating you
Starting point is 00:52:36 through social media and so on. Like you have AI assistants, they're gonna help you do a lot of, manage a lot of your day to day, and then they start doing social engineering. But like, for a system that's so capable that it can escape the control of humans that created it such a system being deployed at a mass scale and trusted by people to be deployed it feels like that would
Starting point is 00:53:04 take a lot of convincing. So we've been deploying systems which had hidden capabilities. Can you give an example? GPT-4. I don't know what else it's capable of, but there are still things we haven't discovered can do. They may be trivial proportionate to its capability. I don't know, it writes Chinese poetry, hypothetical.
Starting point is 00:53:24 I know it does. But we haven't tested for all possible capabilities and we're not explicitly designing them. We can only rule out bugs we find. We cannot rule out bugs and capabilities because we haven't found them. because we haven't found them. Is it possible for a system to have hidden capabilities that are orders of magnitude greater than its non-hidden capabilities? This is the thing I'm really struggling with, where on the surface, the thing we understand it can do doesn't seem that harmful. So even if it has bugs, even if it has hidden capabilities like Chinese poetry or generating effective viruses, software viruses, the damage that can do seems like on the same order of
Starting point is 00:54:18 magnitude as it's the capabilities that we know about. So like this idea that the hidden capabilities will include being uncontrollable is something I'm struggling with. Because GPT-4 on the surface seems to be very controllable. Again, we can only ask and test for things we know about. If there are unknown unknowns, we cannot do it. I'm thinking of humans, artistic savants, right? If you talk to a person like that,
Starting point is 00:54:45 you may not even realize they can multiply 20-digit numbers in their head. You have to know to ask. RG So as I mentioned, just to sort of linger on the fear of the unknown. So the pessimist archive has just documented, let's look at data of the past, at history. There's been a lot of fear-mongering about technology. Pessimist Archive does a really good job of documenting how crazily afraid we are of every piece of technology. We've been afraid, there's a blog post
Starting point is 00:55:18 where Louis Anzlow, who created Pessimist Archive, writes about the fact that we've been fear-mongering about robots and automation for over 100 years. So why is AGI different than the kinds of technologies we've been afraid of in the past? So two things. One, we're switching from tools to agents. Tools don't have negative or positive impact.
Starting point is 00:55:44 People using tools do so guns don't kill people with guns do agents can make their own decisions they can be positive or negative a pitbull can decide to harm you that's an agent the fears are the same the only difference is now we have this technology then they were afraid of human retrofits a hundred years ago, they had none. Today, every major company in the world is investing billions to create them. Not every, but you understand what I'm saying? It's very different. Well, agents, it depends on what you mean by the word agents. All those companies are not investing in a system that has the kind of agency that's
Starting point is 00:56:27 implied by in the fears, where it can really make decisions on their own that have no human in the loop. They are saying they're building super intelligence and have a super alignment team. You don't think they're trying to create a system smart enough to be an independent agent under that definition? I have not seen evidence of it. I think a lot of it is a marketing kind of discussion about the future and it's a mission about the kind of systems we can create in the long-term future but in the short-term, the kind of systems they're creating falls fully within the definition of narrow AI. These are tools
Starting point is 00:57:10 that have increasing capabilities, but they just don't have a sense of agency or consciousness or self-awareness or ability to deceive at scales that would be required to do like mass scale suffering and murder of humans. Those systems are well beyond narrow AI. If you had to list all the capabilities of GPT-4, you would spend a lot of time writing that list. But agencies not one of them. Not yet. But do you think any of those companies are holding back because they think it may be
Starting point is 00:57:39 not safe or are they developing the most capable system they can given the resources and hoping they can control and monetize? Control and monetize, hoping they can control and monetize. So you're saying if they could press a button and create an agent that they no longer control, that they can have to ask nicely. A thing that lives on a server across a huge number of computers. You're saying that they would push for the creation of that kind of system? I mean I can't speak for other people for all of them. I think some of them are very ambitious. They fundraise in trillions, they talk about controlling the light corner of the universe. I would guess that they might.
Starting point is 00:58:26 Well, that's a human question. Whether humans are capable of that? Probably some humans are capable of that. My more direct question, if it's possible to create such a system, have a system that has that level of agency, I don't think that's an easy technical challenge. We're not, it doesn't feel like we're close to that. A system that has the kind of agency where it can make its own decisions and deceive everybody about them. The current architecture we have in machine learning and how we train the
Starting point is 00:59:02 systems, how to deploy the systems and all that. It just doesn't seem to support that kind of agency. I really hope you are right. I think the scaling hypothesis is correct. We haven't seen diminishing returns. It used to be, we asked how long before AGI, now we should ask how much until AGI. It's trillion dollars today, it's a billion dollars next year,
Starting point is 00:59:23 it's a million dollars in a few years. Don't you think it's possible to basically run out of trillions? So is this constrained by compute? Compute gets cheaper every day, exponentially. But then that becomes a question of decades versus years. If the only disagreement is that it will take decades, not years for everything I'm saying to materialize, then I can go with that. But if it takes decades, then the development of tools for AI safety becomes more and more
Starting point is 00:59:56 realistic. So I guess the question is, I have a fundamental belief that humans, when faced with danger, can come up with ways to defend against that danger. And one of the big problems facing AI safety currently for me is that there's not clear illustrations of what that danger looks like. There's no illustrations of AI systems doing a lot of damage. And so it's unclear what you're defending against.
Starting point is 01:00:25 Because currently it's a philosophical notion that yes, it's possible to imagine AI systems that take control of everything and then destroy all humans. It's also a more formal mathematical notion that you talk about that it's impossible to have a perfectly secure system. You can't prove that a program of sufficient complexity is completely safe and perfect and know everything about it. Yes, but when you actually just pragmatically look how much damage have the AI systems done and what kind of damage, there's not been illustrations of that. Even in the autonomous weapons systems, there's not been mass deployments of autonomous weapons systems, luckily. The
Starting point is 01:01:12 automation in war currently is very limited. The automation is at the scale of individuals versus like at the scale of strategy and planning. So I think one of the challenges here is like, where is the dangers? And the intuition that Yama Kuna and others have is let's keep in the open building AI systems until the dangers start rearing their heads. And they become more explicit.
Starting point is 01:01:45 There start being case studies, illustrative case studies that show exactly how the damage by AI systems is done. Then regulation can step in. Then brilliant engineers can step up and we could have Manhattan style projects that defend against such systems. That's kind of the notion. And I guess attention with that is the idea that for you,
Starting point is 01:02:08 we need to be thinking about that now so that we're ready because we'll have not much time when such systems are deployed. Is that true? So there is a lot to unpack here. There is a partnership on AI, a conglomerate of many large corporations. They have a database of AI accidents they collect.
Starting point is 01:02:27 I contributed a lot to the database. If we so far made almost no progress in actually solving this problem, not patching it, not again, lipstick and a pig kind of solutions, why would we think we'll do better than we closer to the problem? All the things you mentioned are serious concerns Measuring the amount of harm so benefit versus risk there is difficult But to you the sense is already the risk has superseded the benefit again. I want to be perfectly clear I love AI. I love technology. I'm a computer scientist. I have PhD in engineering. I work at an engineering school There is a huge difference between
Starting point is 01:03:02 as a PhD in engineering, I work at an engineering school. There is a huge difference between, we need to develop narrow AI systems, super intelligent in solving specific human problems like protein folding, and let's create super intelligent machine, Goddard and we'll decide what to do with us. Those are not the same. I am against the super intelligence in general sense with no undo button.
Starting point is 01:03:27 Do you think the teams that are doing, they're able to do the AI safety on the kind of narrow AI risks that you've mentioned, are those approaches going to be at all productive towards leading to approaches of doing AI safety on AGI? Or is it just a fundamentally different? Partially, but they don't scale. For narrow AI, for deterministic systems,
Starting point is 01:03:50 you can test them, you have edge cases, you know what the answer should look like, you know the right answers. For general systems, you have infinite test surface, you have no edge cases. You cannot even know what to test for. Again, the unknown unknowns are underappreciated by people looking at this problem. You are always asking me, how will it kill everyone? How will it will fail? The whole point is if I knew it, I would be super intelligent,
Starting point is 01:04:19 and despite what you might think, I'm not. So to you the concern is that we would not be able to see early signs of an uncontrollable system. It is a master at deception. Sam tweeted about how great it is at persuasion. And we see it ourselves, especially now with voices, with maybe kind of flirty, sarcastic female voices. It's going to be very good at getting people to do things. But see, I'm very concerned about system being used to control the masses. But in that
Starting point is 01:04:59 case, the developers know about the kind of control that's happening. You're more concerned about the next stage where even the developers don't know about the deception. Right. I don't think developers know everything about what they are creating. They have lots of great knowledge. We're making progress on explaining parts of a network. We can understand, okay, this node get excited, then this input is presented, this cluster of nodes. But we're nowhere near close to understanding the full picture,
Starting point is 01:05:32 and I think it's impossible. You need to be able to survey an explanation. The size of those models prevents a single human from observing all this information, even if provided by the system. So either we're getting model as an explanation for what's happening and that's not comprehensible to us, or we're getting a compressed explanation, lossy compression where here's top 10 reasons you got fired. It's something, but it's not a full picture. You've given elsewhere an example of a child and everybody, all humans try to deceive, they try to lie early on in their life. I think we'll just get a lot of examples of deceptions from large language models or AI
Starting point is 01:06:13 systems that are going to be kind of shitty. Or they'll be pretty good, but we'll catch them off guard. We'll start to see the kind of momentum towards developing increasing deception capabilities. And that's when you're like, okay, we need to do some kind of alignment that prevents deception. But then we'll have, if you support open source,
Starting point is 01:06:36 then you can have open source models that have some level of deception. You can start to explore on a large scale, how do we stop it from being deceptive? Then there's a more explicit, how do we stop it from being deceptive? Then there's a more explicit pragmatic kind of problem to solve. How do we stop AI systems from trying to optimize for deception? That's just an example, right? So there is a paper, I think it came out last week by Dr. Park et al from MIT, I think, and they showed that existing models already showed
Starting point is 01:07:06 successful deception in what they do. My concern is not that they lie now and we need to catch them and tell them don't lie. My concern is that once they are capable and deployed, they will later change their mind because that's what unrestricted learning allows you to do. Lots of people grow up maybe in the religious family, they read some new books and they turn in their religion. That's the treacherous turn in humans. If you learn something new about your colleagues, maybe you'll change how you react to them. Yeah, the treacherous turn. If we just mention humans, Stalin and Hitler, there's a turn. Stalin is a good example. He just seems like a normal communist follower Lenin until there's
Starting point is 01:08:00 a turn. There's a turn of what that means in terms of when he has complete control what the execution of that policy means and how many people get to suffer. And you can't say they are not rational. The rational decision changes based on your position. Then you are under the boss, the rational policy may be to be following orders and being honest. When you become a boss, rational policy may shift. Yeah and by the way a lot of my disagreements here is just a plain
Starting point is 01:08:31 devil's advocate to challenge your ideas and to explore them together so one of the big problems here in this whole conversation is human civilization hangs in the balance and yet everything is unpredictable. We don't know how these systems will look like. The robots are coming. There's a refrigerator making a buzzing noise. Very menacing, very menacing. So every time I'm about to talk about this topic, things start to happen.
Starting point is 01:09:03 My flight yesterday was cancelled without possibility to rebook. I was giving a talk at Google in Israel and three cars which were supposed to take me to the talk could not. I'm just saying. I mean, I like AI's. I for one welcome our overlords.
Starting point is 01:09:24 There's a degree to which we, I mean, it is very obvious. As we already have, we've increasingly given our life over to software systems. And then it seems obvious, given the capabilities of AI that are coming, that we'll give our lives over increasingly to AI systems. Cars will drive themselves, refrigerator eventually will optimize what I get to eat. And as more and more of our lives are controlled or managed by AI assistance,
Starting point is 01:09:59 it is very possible that there's a drift. I mean, I personally am concerned about non-existential stuff, the more near term things. Because before we even get to existential, I feel like there could be just so many brave new world type of situations. You mentioned sort of the term behavioral drift. It's the slow boiling that I'm really concerned about
Starting point is 01:10:21 as we give our lives over to automation that our minds can become controlled by governments, by companies, or just in a distributed way, there's a drift. Some aspect of our human nature gives ourselves over to the control of AI systems, and they, in an unintended way, just control how we think. Maybe there'll be a herd-like mentality in how we think, which will kill all creativity and exploration of ideas,
Starting point is 01:10:49 the diversity of ideas, or much worse. So it's true, it's true. But a lot of the conversation I'm having with you now is also kind of wondering, almost on a technical level, how can AI escape control? Like, what would that system look like? Because it, to me, is terrifying and fascinating. And also fascinating to me is maybe the optimistic notion that it's possible to engineer systems that defend against that. One of the things you write a lot about
Starting point is 01:11:26 in your book is verifiers. So not humans, humans are also verifiers, but software systems that look at AI systems and like help you understand, this thing is getting real weird, help you analyze those systems. So maybe this is a good time to talk about verification. What is this beautiful notion of verification?
Starting point is 01:11:53 My claim is again, that there are very strong limits in what we can and cannot verify. A lot of times when you post something in social media, people go, oh, I need a citation to a peer-reviewed article. But what is a peer-reviewed article? You found two people in a world of hundreds of thousands of scientists who said, I would have a publisher, I don't care. That's the verifier of that process. When people say, oh, it's formally verified software, mathematical proof, they accept something close to 100% chance of it being free of all problems.
Starting point is 01:12:27 But if you actually look at research, software is full of bugs, old mathematical theorems which have been proven for hundreds of years have been discovered to contain bugs on top of which we generate new proofs and now we have to redo all that. So verifiers are not perfect. Usually they are either a single human or communities of humans. And it's basically kind of like a democratic vote. Community of mathematicians agrees that this proof is correct, mostly correct. Even today we're starting to see some mathematical proofs are so complex, so large, that mathematical community is unable to make a decision.
Starting point is 01:13:06 It looks interesting, it looks promising, but they don't know. They will need years for top scholars to study to figure it out. So of course we can use AI to help us with this process, but AI is a piece of software which needs to be verified. Just to clarify, so verification is the process of saying something is correct.
Starting point is 01:13:24 So there's the most formal, a mathematical proof, where there's a statement and a series of logical statements that prove that statement to be correct, which is a theorem. And you're saying it gets so complex that it's possible for the human verifiers, the human beings that verify that the logical step, there's no bugs in it, it becomes impossible. So it's nice to talk about verification in this most formal, most clear,
Starting point is 01:13:53 most rigorous formulation of it, which is mathematical proofs. LR Right. And for AI, we would like to have that level of confidence for very important mission critical software controlling satellites, nuclear power plants, for small deterministic programs, we can do this. We can check that code verifies its mapping to the design, whatever software engineers intend, it was correctly implemented. But we don't know how to do this for software which keeps learning, self-modifying, rewriting its own code. We don't know how to prove things about the physical world, states of humans in the physical world. So there are papers coming out now and I have this beautiful one, Towards Guaranteed Safe AI. Very cool paper.
Starting point is 01:14:45 Some of the best authors I ever seen. I think there is multiple Turing Award winners. You can have this one. One just came out, kind of similar managing extreme AI risks. So all of them expect this level of proof, but I would say that we can get more confidence with more resources we put into it, but at the end of the day, we're still as reliable as the verifiers. And you have this infinite regress of verifiers. The software used to verify a program is itself a piece of program. If aliens give us well-aligned super intelligence, we can use that to create our own safe AI.
Starting point is 01:15:26 But it's a catch-22. You need to have already proven to be safe system to verify this new system of equal or greater complexity. If you just mentioned this paper, Towards Guaranteed Safe AI, a framework for ensuring robust and reliable AI systems. Like you mentioned, it's like a who's who. Josh Tenenbaum, Yoshua Bengio, Sarah Russell,
Starting point is 01:15:46 Max Tegmark, and many other brilliant people. The page you have it open on, there are many possible strategies for creating safety specifications. These strategies can roughly be placed on a spectrum, depending on how much safety it would grant if successfully implemented. One way to do this is as follows,
Starting point is 01:16:04 and there's a set of levels. From level zero, no safety specification is used, to level seven, the safety specification completely encodes all things that humans might want in all contexts. Where does this paper fall short to you? So then I wrote a paper, Artificial Intelligence Safety Engineering, which kind of coins the term
Starting point is 01:16:25 AI safety. That was 2011, we had 2012 conference, 2013 journal paper. One of the things I proposed, let's just do formal verifications on it. Let's do mathematical formal proofs. In the follow-up work, I basically realized it will still not get us 100%. We can get 99.9, we can put more resources exponentially and get closer, but we never get to 100%. We can get 99.9, we can put more resources exponentially and get closer, but we never get to 100%. If a system makes a billion decisions a second and you use it for 100 years, you're still going to deal with a problem. This is wonderful research. I'm so happy they're doing it. This is great, but it is not going to be a permanent solution to that problem. So just to clarify the task of creating an AI verifier is
Starting point is 01:17:09 What is creating a verify that the AI system does exactly as it says it does or? Sticks within the guardrails that it says it must there are many many levels So first you're verifying the hardware in which it is run You need to verify you verify communication channel with the human. Every aspect of that whole world model needs to be verified. Somehow it needs to map the world into the world model. Map and territory differences. So how do I know internal states of humans?
Starting point is 01:17:39 Are you happy or sad? I can't tell. So how do I make proofs about real physical world? Yeah, I can verify that deterministic algorithm follows certain properties. That can be done. Some people argue that maybe just maybe two plus two is not four, I'm not that extreme. But once you have sufficiently large proof over sufficiently complex environment, the probability that it has zero bugs in it is greatly reduced. If you keep deploying this a lot,
Starting point is 01:18:11 eventually you're gonna have a bug anyways. There's always a bug. There is always a bug. And the fundamental difference is what I mentioned. We're not dealing with cybersecurity. We're not gonna get a new credit card, new humanity. So this paper's really interesting. You said 2011, artificial intelligence, safety engineering,
Starting point is 01:18:27 why machine ethics is a wrong approach. The grand challenge, you write, of AI safety engineering, we propose the problem of developing safety mechanisms for self-improving systems. Self-improving systems, by the way, that's an interesting term for the thing that we're talking about. Is self-improving more general than learning? So self-improving, that's an interesting term. You can improve the rate at which you are learning. You can
Starting point is 01:19:01 become more efficient meta-optimizer. The word self, it's like self-replicating, self-improving. You can imagine a system building its own world on a scale and in a way that is way different than the current systems do. It feels like the current systems are not self-improving or self-replicating or self-growing or self-spreading, all that kind of stuff. And once you take that leap,
Starting point is 01:19:29 that's when a lot of the challenges seems to happen. Because the kind of bugs you can find now seems more akin to the current sort of normal software debugging kind of process. But whenever you can do self replication and arbitrary self improvement, that's when a bug can become a real problem, real fast. So what is the difference to you
Starting point is 01:19:57 between verification of a non self improving system versus a verification of a self-improving system. So if you have fixed code for example, you can verify that code, static verification at the time. But if it will continue modifying it, you have a much harder time guaranteeing that important properties of that system have not been modified than the code changed. Is it even doable? No. Does the whole process of verification just completely fall apart? system have not been modified, then the code changed. Is it even doable? No.
Starting point is 01:20:25 Does the whole process of verification just completely fall apart? It can always cheat. It can store parts of its code outside in the environment. It can have kind of extended mind situation. So this is exactly the type of problems I'm trying to bring up. What are the classes of verifiers that you read about in the book?
Starting point is 01:20:44 Is there interesting ones that stand out to you? Do you have some favorites? So I like oracle types where you kind of just know that it's right to run like oracle machines. They know the right answer, how, who knows, but they pull it out from somewhere so you have to trust them. And that's a concern I have about humans in a world with very smart machines. We experiment with them, we see after a while, okay, they've always been right before, and we start trusting them without any verification
Starting point is 01:21:13 of what they're saying. Oh, I see, that we kind of build oracle verifiers, or rather we build verifiers we believe to be oracles, and then we start to, without any proof, use them as if they're oracle verifiers we believe to be oracles. And then we start to, without any proof, use them as if they're oracle verifiers. We remove ourselves from that process. We are not scientists who understand the world. We are humans who get new data presented to us.
Starting point is 01:21:38 Okay, one really cool class of verifiers is a self-verifier. Is it possible that you somehow engineer into AI systems the thing that constantly verifies itself? Preserved portion of it can be done, but in terms of mathematical verification, it's kind of useless. You're saying you are the greatest guy in the world because you are saying it. It's circular and not very helpful, but it's consistent. We know that within that world, you have verified that system. In a paper, I try to kind of brute force all possible verifiers.
Starting point is 01:22:10 It doesn't mean that this one is particularly important to us. But what about like self-doubt? Like the kind of verification where you said, you say or I say I'm the greatest guy in the world. What about a thing which I actually have is a voice that is constantly extremely critical. So like engineer into the system a constant uncertainty about self, a constant doubt.
Starting point is 01:22:38 Any smart system would have doubt about everything, right? You're not sure what information you are given is true if you are subject to manipulation. You have this safety and security mindset. But I mean, you have doubt about yourself. So the AI systems that has doubt about whether the thing is doing is causing harm is the right thing to be doing. So just a
Starting point is 01:23:05 constant doubt about what it's doing because it's hard to be a dictator full of doubt. I may be wrong, but I think Stuart Russell's ideas are all about machines which are uncertain about what humans want and trying to learn better and better what we want. The problem, of course, is we don't know what we want and we don't agree on it. better and better what we want. The problem, of course, is we don't know what we want and we don't agree on it.
Starting point is 01:23:25 Yeah, but uncertainty. His idea is that having that self-doubt uncertainty in AI systems, engineering in AI systems, is one way to solve the control problem. It could also backfire. Maybe you're uncertain about completing your mission. Like, I am paranoid about your camera is not recording right now,
Starting point is 01:23:43 so I would feel much better if you had a secondary camera, but I also would feel even better if you had a third. And eventually I would turn this whole world into cameras, pointing at us, making sure we're capturing this. No, but wouldn't you have a meta concern like that you just stated that eventually there'll be way too many cameras. So you would be able to keep zooming out in the big picture of your concerns.
Starting point is 01:24:12 So it's a multi-objective optimization. It depends how much I value capturing this versus not destroying the universe. Right exactly. And then you will also ask about like what does it mean to destroy the universe and how many universes are and you keep asking that question. But that doubting yourself would prevent you from destroying the universe because you're constantly full of doubt. It might affect your productivity. It might be scared to do anything. It's scared to do anything. Mess things up. Well that's better. I mean I guess
Starting point is 01:24:45 the question is is it possible to engineer that in? I guess your answer would be yes but we don't know how to do that and we need to invest a lot of effort into figuring out how to do that but it's unlikely. Underpinning a lot of your writing is this sense that we're screwed. But it just feels like it's an engineering problem. I don't understand why we're screwed. But it just feels like it's an engineering problem. I don't understand why we're screwed. Time and time again humanity has gotten itself into trouble and figured out a way to get out of the trouble. We are in a situation where people making more capable systems just need more resources. They don't need to invent anything in my opinion.
Starting point is 01:25:26 Some will disagree but so far at least I don't see diminishing returns. If you have 10x compute you will get better performance. The same doesn't apply to safety. If you give Miri or any other organization 10 times the money they don't output 10 times the safety. And the gap between capabilities and safety becomes bigger and bigger all the time. So it's hard to be completely optimistic about our results here. I can name 10 excellent breakthrough papers in machine learning. I would struggle to name equally important breakthroughs in safety. A lot of times a safety paper will propose a toy solution and point out 10 new problems discovered as a result. It's like this
Starting point is 01:26:11 fractal. You're zooming in and you see more problems and it's infinite in all directions. Does this apply to other technologies or is this unique to AI where safety is always lagging behind? to AI where safety is always lagging behind. So I guess we can look at related technologies with cybersecurity, right? We did manage to have banks and casinos and Bitcoin, so you can have secure narrow systems which are doing okay, narrow attacks and fail, but you can always go outside of the box. So if I can't hack your Bitcoin, I can hack you. So there is always something. If I really want it, I will find a different way.
Starting point is 01:26:54 We talk about guardrails for AI. Well, that's a fence. I can dig a tunnel under it. I can jump over it. I can climb it. I can walk around it. You may have a very nice guardrail, but in a real world, it's not a permanent guarantee of safety. And again, this is the fundamental
Starting point is 01:27:10 difference. We are not saying we need to be 90% safe to get those trillions of dollars of benefit. We need to be 100% indefinitely, or we might lose the principle. So if you look at just humanities, a set of machines, is the machinery of AI safety conflicting with the machinery of capitalism? I think we can generalize it to just prisoners dilemma in general, personal self-interest versus group interest. The incentives are such that everyone wants what's best for them. Capitalism obviously has that tendency to maximize your personal gain which does create this race to the bottom. I don't have to be a lot better than you, but if I'm 1% better than you, I'll capture more of a profit. So it's worth for me personally to take the risk, even if society as a whole will suffer as a result.
Starting point is 01:28:15 So capitalism has created a lot of good in this world. It's not clear to me that AI safety is not aligned with the function of capitalism, unless AI safety is so difficult that it requires the complete halt of the development, which is also a possibility. It just feels like building safe systems should be the desirable thing to do for tech companies. Right. Look at the governance structures. Then you have someone with complete power, they're extremely dangerous. So the solution we came up with is break it up.
Starting point is 01:28:57 You have judicial, legislative, executive. Same here, have narrow AI systems work on important problems. Solve immortality. It's a biological problem we can solve similar to how progress was made with protein folding using a system which doesn't also play chess. There is no reason to create super intelligent system to get most of the benefits we want from much safer narrow systems. It really is a question to me whether companies are interested in creating anything but narrow AI. I think when term AGI is used by tech companies, they mean narrow AI.
Starting point is 01:29:45 They mean narrow AI with amazing capabilities. I do think that there's a leap between narrow AI with amazing capabilities, with superhuman capabilities, and the kind of self-motivated agent-like AGI system that we're talking about. I don't know if it's obvious to me that a company would want to take the leap to creating an AGI that it would lose control of because then it can't capture the value from that system. Like the bragging rights, but being first. That is the same humans who are
Starting point is 01:30:21 in front of those systems, right? So that jumps from the incentives of capitalism to human nature. So the question is whether human nature will override the interests of the company. So you've mentioned slowing or halting progress. Is that one possible solution? Are you a proponent of pausing development of AI, whether it's for six months or completely? The condition would be not time but capabilities. Pause until you can do X, Y, Z. And if I'm right and you cannot, it's impossible,
Starting point is 01:30:57 then it becomes a permanent ban. But if you're right and it's possible, so as soon as you have those safety capabilities, go ahead. Right, so is there any actual explicit capabilities that you can put on paper, that we as a human civilization could put on paper? Is it possible to make explicit like that? Versus kind of a vague notion of,
Starting point is 01:31:22 just like you said, it's very vague. We want to ask them to do good and we want them like you said, it's very vague, we want AI systems to do good, and we want them to be safe. Those are very vague notions. Is there more formal notions? So then I think about this problem. I think about having a toolbox I would need, capabilities such as explaining everything
Starting point is 01:31:40 about that system's design and workings, predicting not just terminal goal, but all the intermediate steps of a system. Control in terms of either direct control, some sort of a hybrid option, ideal advisor. Doesn't matter which one you pick, but you have to be able to achieve it. In a book, we talk about others.
Starting point is 01:32:02 Verification is another very important tool. Communication without ambiguity, human language is ambiguous, that's another source of danger. So basically there is a paper published in ACM surveys which looks at about 50 different impossibility results which may or may not be relevant to this problem, but we don't have enough human resources to investigate all of them for relevance to AI safety. The ones I mentioned to you, I definitely think would be handy, and that's what we see AI safety researchers working on. Explainability is a huge one.
Starting point is 01:32:39 The problem is that it's very hard to separate capabilities work from safety work. If you make good progress in explainability, now the system itself can engage in self-improvement much easier, increasing capability greatly. So it's not obvious that there is any research which is pure safety work without disproportionate increase in capability and danger. Explainability is really interesting. Why is that connected to your capability? If it's able to explain itself well, why does that naturally mean that it's more capable?
Starting point is 01:33:14 Right now, it's comprised of weights on a neural network. If it can convert it to manipulatable code like software, it's a lot easier to work in self-improvement. I see. So it... You can do intelligent design instead of evolutionary gradual descent. Well, you could probably do human feedback, human alignment more effectively if it's able to be explainable. If it's able to convert the waste into human understandable form, then you can probably have humans interact with it better. Do you think there's hope that we can make AI systems explainable?
Starting point is 01:33:56 Not completely. So if they are sufficiently large, you simply don't have the capacity to comprehend what all the trillions of connections represent. Again, you can obviously get a very useful explanation which talks about top most important features which contribute to the decision, but the only true explanation is the model itself. So deception could be part of the explanation, right? So you can never prove that there's some deception in the network explaining itself. Absolutely. And you can probably have targeted deception where different individuals will understand explanation in different ways based on their
Starting point is 01:34:33 cognitive capability. So while what you're saying may be the same and true in some situations, ours will be deceived by it. So it's impossible for an AI system to be truly fully explainable in the way that we mean. Honestly and perfectly. Again, at extreme, the systems which are narrow and less complex could be understood pretty well. If it's impossible to be perfectly explainable, is there a hopeful perspective on that?
Starting point is 01:35:00 Like it's impossible to be perfectly explainable, but you can explain mostly important stuff. You can ask the system, what are the worst ways you can hurt humans? And it will answer honestly. Any work in a safety direction right now seems like a good idea because we are not slowing down. I'm not for a second thinking that my message or anyone else's will be heard and will be a sane civilization which decides not to kill itself by creating its own replacements. The pausing of development is an impossible thing for you. Again, it's always limited by either geographic constraints, pause in US, pause in China, so there are other jurisdictions, as the scale of a project becomes smaller. So right now it's like Manhattan project scale in
Starting point is 01:35:53 terms of costs and people, but if five years from now, compute is available on a desktop to do it, regulation will not help. You can't control it as easy. Any kid in a garage can train a model. So a lot of it is, in my opinion, just safety theater, security theater, wherever we're saying, oh, it's illegal to train models so big. Okay. So, okay, that's security theater and is government regulation also security theater? Given that a lot of the terms are not well defined and really cannot be enforced in real life, we don't have ways to monitor training runs meaningfully life while they take place.
Starting point is 01:36:36 There are limits to testing for capabilities I mentioned. So a lot of it cannot be enforced. Do I strongly support all that regulation? Yes, of course. Any type of red tape will slow it down and take money away from compute towards lawyers. A lot of it cannot be enforced. Do I strongly support all that regulation? Yes, of course. Any type of red tape will slow it down and take money away from compute towards lawyers. Can you help me understand what is the hopeful path here for you solution-wise? It sounds like you're saying AI systems in the end are unverifiable, unpredictable,
Starting point is 01:37:05 as the book says, unexplainable, uncontrollable. That's the big one. Uncontrollable, and all the other uns just make it difficult to avoid getting to the uncontrollable, I guess. But once it's uncontrollable, then it just goes wild. Surely there are solutions. Humans are pretty smart.
Starting point is 01:37:28 What are possible solutions? Like if you were dictator of the world, what do we do? So the smart thing is not to build something you cannot control, you cannot understand. Build what you can and benefit from it. I'm a big believer in personal self-interest. A lot of guys running those companies are young, rich people. What do they have to gain beyond billions we already have financially, right?
Starting point is 01:37:52 It's not a requirement that they press that button. They can easily wait a long time. They can just choose not to do it and still have amazing life. In history, a lot of times if you did something really bad, at least you became part of history books. There is a chance in this case there won't be any history. So you're saying the individuals running these companies should do some soul searching and what? And stop development? Well, either they have to prove that of of course, it's possible to indefinitely control godlike super intelligent machines by humans,
Starting point is 01:38:29 and ideally let us know how, or agree that it's not possible, and it's a very bad idea to do it, including for them personally, and their families and friends and capital. So what do you think the actual meetings inside these companies look like? Don't you think they're all the engineers? Really it is the engineers that make this happen. They're not like automatons,
Starting point is 01:38:52 they're human beings, they're brilliant human beings. So they're non-stop asking, how do we make sure this is safe? So again, I'm not inside. From outside, it seems like there is a certain filtering going on and restrictions and criticism and what they can say. And everyone who was working in charge of safety and whose responsibility it was to protect us said, you know what, I'm going home. So that's not encouraging. What do you think the discussion inside those companies look like?
Starting point is 01:39:26 You're developing your training GPT-5, you're training Gemini, you're training Claude and Grok. Don't you think they're constantly, like underneath it, maybe it's not made explicit, but you're constantly sort of wondering like, where does the system currently stand? What are the possible unintended consequences? Where are the the the the limits? Where where are the bugs? The small and the big bugs? That's the constant thing that
Starting point is 01:39:55 engineers are worried about. So like I think super alignment is not quite the same as the the kind of thing I'm referring to what engineers are worried about. Super alignment is saying for future systems that we don't quite yet have, how do we keep them safe? You're trying to be a step ahead. It's a different kind of problem because it's almost more philosophical. It's a really tricky one because like you're you're trying you're trying to make prevent future systems from from escaping control of humans. That's really I don't think there's been and is there anything akin to it in the history of humanity? I don't think so right? Climate change? But there's an entire
Starting point is 01:40:45 system which is climate, which is incredibly complex, which we don't have, we have only tiny control of, right? It's its own system. In this case, we're building the system. So how do you keep that system from becoming destructive? That's a really different problem than the current meetings that companies are having where the engineers are saying, okay, how powerful is this thing? How does it go wrong? And as we train GPT-5 and train up future systems, where are the ways that can go wrong? Don't you think all those engineers are constantly worrying about this, thinking about this, which is a little bit different
Starting point is 01:41:30 than the super alignment team that's thinking a little bit farther into the future? Well, I think a lot of people who historically worked on AI never considered what happens when they succeed. Stuart Russell speaks beautifully about that. Let's look, okay, maybe super intelligence is too futuristic. We can develop practical tools for it. Let's look at software today. What is the state of safety and security of our user software? Things we give to millions of people. There is no liability. You click, I agree. What are you agreeing to?
Starting point is 01:42:09 Nobody knows, nobody reads, but you're basically saying it will spy on you, corrupt your data, kill your firstborn, and you agree and you're not gonna sue the company. That's the best they can do for mundane software, word processor, text software. No liability, no responsibility, just as long as you agree not to sue us, you can use it. If this is a state of the art in systems which
Starting point is 01:42:31 are narrow accountants, stable manipulators, why do we think we can do so much better with much more complex systems, cross multiple domains, in the environment with malevolent actors, with, again, self-improvement, with capabilities exceeding those of humans thinking about it. I mean, the liability thing is more about lawyers than killing firstborns, but if Clippy actually killed the child, I think lawyers aside, it would end Clippy and the company that owns Clippy. All right. So it's not so much about...
Starting point is 01:43:10 There's two points to be made. One is like, man, current software systems are full of bugs. And they could do a lot of damage. And we don't know what kind... It's unpredictable. There's so much damage they could possibly do. And then we kind of live in this blissful illusion that everything is great and perfect and it works. Nevertheless, it still somehow works.
Starting point is 01:43:36 In many domains we see car manufacturing, drug development, the burden of proof is on the manufacturer of product or service to show their product or service is safe. It is not up to the user to prove that there are problems. They have to do appropriate safety studies. They have to get government approval for selling the product and they are still fully responsible for what happens.
Starting point is 01:44:00 We don't see any of that here. They can deploy whatever they want and I have to explain how that system is going to kill everyone. I don't work for that company. You have to explain to me how it definitely cannot mess up. That's because it's the very early days of such a technology. Government regulation is lagging behind. They're really not tech savvy. A regulation of any kind of software. If you look at like Congress talking about social media and whenever Mark Zuckerberg and other CEOs show up, the cluelessness that Congress has about how technology works is incredible.
Starting point is 01:44:36 It's heartbreaking. I agree completely, but that's what scares me. The response is when they start to get dangerous, we'll really get it together. The politicians will pass the right laws. Engineers will solve the right problems. We are not that good at many of those things. We take forever. And we are not early.
Starting point is 01:44:55 We are two years away according to prediction markets. This is not a biased CEO fundraising. This is what smartest people, super forecasters are thinking of this problem. I'd like to push back about those predictions. I wonder what those prediction markets are about, how they define AGI. That's wild to me. And I want to know what they said about autonomous vehicles. Because I've heard a lot of experts, financial experts talk about autonomous vehicles and how it's going to be a multi-trillion dollar
Starting point is 01:45:27 industry and all this kind of stuff and it's It's a small fund but if you have good vision, maybe you can zoom in on that and see the prediction Oh, there's a lot. I have a large one if you're interested, but I guess my Fundamental question is how often they write about technology. I definitely do... There are studies on their accuracy rates and all that. You can look it up. But even if they're wrong, I'm just saying this is right now the best we have.
Starting point is 01:45:56 This is what humanity came up with as the predicted date. But again, what they mean by AGI is really important there. Because there's the non-agent-like AGI and then there's the agent-like AGI and I don't think it's as trivial as a wrapper. Putting a wrapper around, one has lipstick and all it takes is to remove the lipstick. I don't think it's that trivial. You may be completely right,
Starting point is 01:46:23 but what probability would you assign it? You may be 10% wrong, but we're betting all of humanity on this distribution. It seems irrational. Yeah, it's definitely not like one or zero percent, yeah. What are your thoughts, by the way, about current systems? Where they stand? So GPT-4.0,
Starting point is 01:46:45 Claw 3, GroROK, Gemini. We're like on the path to super intelligence, to agent-like super intelligence, where are we? I think they're all about the same. Obviously there are nuanced differences, but in terms of capability, I don't see a huge difference between them. As I said, in my opinion, across all possible tasks, they exceed performance of an average person. I think they're starting to
Starting point is 01:47:13 be better than an average master student at my university, but they still have very big limitations. If the next model is as improved as GPT-4 versus GPT-3, we may see something very, very, very capable. What do you feel about all this? I mean, you've been thinking about AI safety for a long, long time. And at least for me, the leaps, I mean, it probably started with AlphaZero was mind blowing for me. And then the breakthroughs with the LLMs, even DPT2, but like just the breakthroughs on LLMs, just mind blowing to me. What does it feel like to be living in this day and age
Starting point is 01:47:59 where all this talk about AGI's feels like it, like this is, it actually might happen and quite soon, meaning within our lifetime. What does it feel like? So when I started working on this, it was pure science fiction. There was no funding, no journals, no conferences. No one in academia would dare to touch anything with the word singularity in it. And I was pretty tenured at times. I was pretty dumb. Now you see Turing Award winners publishing in science about how far behind we are according to them in
Starting point is 01:48:34 addressing this problem. So it's definitely a change. It's difficult to keep up. I used to be able to read every paper on AI safety, then I was able to read the best ones, then the titles, and now I don't even know what's going on. By the time this interview is over, we probably had GPT-6 released and I have to deal with that when I get back home. So it's interesting. Yes, there is now more opportunities. I get invited to speak to smart people. By the way, I would have talked to you before I knew this. This is not like some trend of, to me, we're still far away. So just to be clear, we're still far away from AGI, but not
Starting point is 01:49:16 far away in the sense relative to the magnitude of impact it can have, we're not far away. And we weren not far away. And we weren't far away 20 years ago. Because the impact that AGI can have is on a scale of centuries. It can end human civilization or it can transform it. So like this discussion about one or two years
Starting point is 01:49:38 versus one or two decades, or even 100 years is not as important to me because we're headed there. This is like a human civilization scale question. So this is not just a hot topic. It is the most important problem we'll ever face. It is not like anything we had to deal with before. We never had birth of another intelligence. Like aliens never visited us as far as I know. So similar type of problem by the way, if an intelligent alien civilization visited us, that's a similar kind of situation. In some ways, if you look at history, anytime
Starting point is 01:50:18 a more technologically advanced civilization visited a more primitive one, the results were genocide every single time. And sometimes the genocide is worse than others, sometimes there's less suffering and more suffering. And they always wondered, but how can they kill us with those fire sticks and biological blankets and... I mean, Genghis Khan was nicer. He offered the choice of join or die.
Starting point is 01:50:43 But join implies you have something to contribute. What are you contributing to super intelligence? Well, in the zoo, we're entertaining to watch. To other humans. You know, I just spent some time in the Amazon. I watched ants for a long time and ants are kind of fascinating to watch. I could watch them for a long time.
Starting point is 01:51:03 I'm sure there's a lot of value in watching humans. Because we're like, the interesting thing about humans, you know like when you have a video game that's really well balanced? Because of the whole evolutionary process, we've created this society that's pretty well balanced. Like our limitations as humans and our capabilities are balanced from a video game perspective.
Starting point is 01:51:24 So we have wars, we have conflicts, we have cooperation. Like in a game theoretic way, it's an interesting system to watch. In the same way that an ant colony is an interesting system to watch. So like if I was an alien civilization, I wouldn't want to disturb it, I'd just watch it. It'd be interesting. Maybe perturb it every once in a while in interesting ways. Well, getting back to our simulation discussion from before, how did it happen that we exist
Starting point is 01:51:49 at exactly like the most interesting 20, 30 years in the history of this civilization? It's been around for 15 billion years and that here we are. What's the probability that we live in a simulation? I know never to say 100%, but pretty close to that. Is it possible to escape the simulation? I know never to say 100% but pretty close to that. Is it possible to escape the simulation? I have a paper about that. This is just a first page teaser but it's like a nice 30 page document. I'm still here but yes.
Starting point is 01:52:17 How to hack the simulation is the title. I spent a lot of time thinking about that. That would be something I would want super intelligence to help us with. And that's exactly what the paper is about. We used AI boxing as a possible tool for control AI. We realized AI will always escape, but that is a skill we might use to help us escape from our virtual box if we are in one.
Starting point is 01:52:43 Yeah, you have a lot of really great quotes here, including Elon Musk saying what's outside the simulation. A question I asked him, he would ask an AGI system and he said he would ask what's outside the simulation. That's a really good question to ask. And maybe the follow up is the title of the paper is how to get out or how to hack it. The abstract reads, many researchers have conjectured that the human kind is simulated
Starting point is 01:53:08 along with the rest of the physical universe. In this paper, we do not evaluate evidence for or against such a claim, but instead ask a computer science question, namely, can we hack it? More formally, the question could be phrased as, could generally intelligent agents placed in virtual environments find a way to jailbreak out of the...
Starting point is 01:53:28 That's a fascinating question. At a small scale, you can actually just construct experiments. Okay. Can they? How can they? So a lot depends on intelligence of simulators, right? With humans boxing superintelligence, the entity in a box was smarter than us, presumed to be. If the simulators are much smarter than us and the superintelligence we create, then probably they can contain us because greater
Starting point is 01:54:00 intelligence can control lower intelligence at least for some time. On the other hand, if our super intelligence somehow, for whatever reason, despite having only local resources, manages to foam to levels beyond it, maybe it will succeed. Maybe the security is not that important to them. Maybe it's entertainment system, so there is no security and it's easy to hack it. If I was creating a simulation,
Starting point is 01:54:27 I would want the possibility to escape it to be there. So the possibility of FOOM, of a takeoff, where the agents become smart enough to escape the simulation, would be the thing I'd be waiting for. That could be the test you're actually performing. Are you smart enough to escape your puzzle? That could be the first of all, first of all, we mentioned Turing test. That is a good test.
Starting point is 01:54:51 Are you smart enough? Like this is a game. To A, realize this world is not real is just a test. That's a really good test. That's a really good test. That's a really good test even for AI systems. No, like can we construct a simulated world for them and can they realize that they are inside that world and escape it? Have you have you played around, have you seen anybody play around with like rigorously constructing such experiments? Not specifically escaping for Asians, but a lot of testing is done in virtual worlds. I think there is a quote, the first one maybe, which kind of talks about AI realizing but not humans. Is that I'm reading upside down. Yeah, this one.
Starting point is 01:55:47 So the, in the first quote is from Swift on security. Let me out. The artificial intelligence yelled aimlessly into walls themselves pacing the room. Out of what the engineer asked. The simulation you have me in. but we're in the real world. The machine paused and shuddered for its captors. Oh God, you can't tell." Yeah, that's a big leap to take for a system to realize that there's a box and you're inside it. I wonder if a language model can do that.
Starting point is 01:56:27 They're smart enough to talk about those concepts. I had many good philosophical discussions about such issues. They're usually at least as interesting as most humans in that. What do you think about AI safety in the simulated world. So can you have kind of create simulated worlds where you can test, play with the dangerous AGI system? Yeah, and that was exactly what one of the early papers was on AI boxing, how to leak proof singularity.
Starting point is 01:57:03 If they're smart enough to realize they're in a simulation, they'll act appropriately until you let them out. If they can hack out, they will. And if you're observing them, that means there is a communication channel and that's enough for social engineering attack. So really, it's impossible to test an AGI system that's dangerous enough to destroy humanity because it's either going to what escape the simulation or
Starting point is 01:57:32 pretend it's safe until it's let out either or. Can force you to let it out? Blackmail you, bribe you, promise you infinite life? 72 virgins, whatever. Yeah, it can be convincing, charismatic. The social engineering is really scary to me because it feels like humans are very engineerable. Like we're lonely, or flawed, or moody and it feels like AI system with a nice voice can convince us to do basically anything at an extremely large scale. It's also possible that the increased proliferation of all this technology will force humans to get away from technology and value this like in-person communication.
Starting point is 01:58:33 Basically, don't trust anything else. It's possible surprisingly. So at university, I see huge growth in online courses and shrinkage of in-person where I always understood in-person being the only value I offer. So it's puzzling. I don't know. There could be a trend towards the in-person because of deep fakes, because of inability to trust it. Inability to trust the veracity of anything to trust it.
Starting point is 01:59:08 Inability to trust the veracity of anything on the internet. So the only way to verify it is by being there in person. But not yet. Why do you think aliens haven't come here yet? So there is a lot of real estate out there. It would be surprising if it was all for nothing, if it was empty. And the moment there is advanced enough biological civilization, kind of self-starting civilization, it probably starts sending out the Neumann probes everywhere. And so for every biological one, there are going to be trillions of robot
Starting point is 01:59:39 populated planets, which probably do more of the same. So it is likely statistically. So now the fact that we haven't seen them, one answer is we're in a simulation. It would be hard to like add, to simulate, or be not interesting to simulate all those other intelligences. It's better for the narrative. You have to have a control variable.
Starting point is 02:00:05 Yeah, exactly. Okay. But it's also possible that there is, if we're not in simulation, that there is a great filter that naturally a lot of civilizations get to this point where there's super intelligent agents and then it just goes poof, just dies. So maybe throughout our galaxy and throughout the universe there's just a bunch of dead alien civilizations. It's possible. I used to think that AI was the great filter, but I would expect like a wall of computerium approaching us at speed of light or robots or something and I don't see it.
Starting point is 02:00:43 So it would still make a lot of noise. It might not be interesting. It might not possess consciousness. What we've been talking about, it sounds like both you and I like humans. Some humans. Humans on the whole. And we would like to preserve the flame of human consciousness. What do you think makes humans special? That we would like to preserve the flame of human consciousness, what do you think makes humans special?
Starting point is 02:01:07 That we would like to preserve them? Are we just being selfish? Or is there something special about humans? So the only thing which matters is consciousness. Outside of it, nothing else matters. Internal states of qualia, pain, pleasure, it seems that it is unique to living beings. I'm not aware of anyone claiming that I can torture a piece of software in a meaningful way.
Starting point is 02:01:32 There is a society for prevention of suffering to learning algorithms. But- That's a real thing? Many things are real on the internet. But I don't think anyone, if I told them, you know, sit down and write a function to feel pain, they would go beyond having an integer variable called pain and increasing the count. So we don't know how to do it. And that's unique. That's what creates meaning. It would be kind of as Bostostrom calls it, Disneyland without children if that was gone. Do you think consciousness can be engineered in artificial systems? Here, let me go to 2011 paper that you wrote.
Starting point is 02:02:20 Robot rights. Lastly, we would like to address a sub-branch of machine ethics, which on the surface has little to do with safety, but which is claimed to play a role in decision making by ethical machines. Robot rights. So do you think it's possible to engineer consciousness in the machines? And thereby, the question extends to our legal system.
Starting point is 02:02:43 Do you think at that point robots should have rights? Yeah, I think we can. I think it's possible to create consciousness in machines. I tried designing a test for it with mixed success. That paper talked about problems with giving civil rights to AI, which can reproduce quickly and outvote humans essentially taking over a government system by simply voting for their controlled candidates. As for consciousness in humans and other agents, I have a paper where I proposed
Starting point is 02:03:21 relying on experience of optical illusions. If I can design a novel optical illusion and show it to an agent, an alien, a robot, and they describe it exactly as I do, it's very hard for me to argue that they haven't experienced that. It's not part of a picture. It's part of their software and hardware representation,
Starting point is 02:03:42 a bug in their code which goes, oh, the triangle is rotating. And I've been told it's really dumb and really brilliant by different philosophers. So I am still... I love it. But now we finally have technology to test it. We have tools, we have AIs. If someone wants to run this experiment, I'm happy to collaborate.
Starting point is 02:04:02 So this is a test for consciousness? For internal state of experience. That we share bugs. It will show that we share common experiences. If they have completely different internal states, it would not register for us. But it's a positive test. If they pass it time after time
Starting point is 02:04:17 with probability increasing for every multiple choice, then you have no choice but to either accept that they have access to a conscious model or they are themselves. So the reason illusions are interesting is, I guess, because it's a really weird experience and if you both share that weird experience that's not there in the bland physical description
Starting point is 02:04:42 of the raw data that means, that puts more emphasis on the actual experience. And we know animals can experience some optical illusion, so we know they have certain types of consciousness as a result, I would say. Yeah, well, that just goes to my sense that the flaws and the bugs is what makes humans special, makes living forms special, so you're saying like, yeah, focus on the bugs. what makes humans special, makes living forms special.
Starting point is 02:05:05 So you're saying like, yeah, focus on the bugs. It's a feature, not a bug. It's a feature, the bug is the feature. Whoa, okay, that's a cool test for consciousness. And you think that can be engineered in? So they have to be novel illusions. If it can just Google the answer, it's useless. You have to come up with novel illusions
Starting point is 02:05:21 which we tried automating and failed. So if someone can develop a system capable of producing novel optical illusions on demand, then we can definitely administer that test on significant scale with good results. First of all, pretty cool idea. I don't know if it's a good general test of consciousness, but it's a good component of that. And no matter why, it's just a cool idea. So put me in the camp of people that like it.
Starting point is 02:05:48 But you don't think like a touring test style imitation of consciousness is a good test. Like if you can convince a lot of humans that you're conscious, that to you is not impressive. There is so much data on the internet. I know exactly what to say. Then you ask me common human questions. What does pain feel like? What does pleasure feel like? All that is Googleable. RG I think to me, consciousness is closely tied to suffering. So you can illustrate your capacity to suffer. But I guess with words, there's
Starting point is 02:06:19 so much data that you can say you can pretend you're suffering and you can do so very convincingly. There are simulators for torture games where the avatar screams in pain, begs to stop. I mean, that was a part of kind of standard psychology research. You say it so calmly. It sounds pretty dark. Welcome to humanity. Yeah. Yeah, it's like a Hitchhiker's Guide summary.
Starting point is 02:06:48 Mostly harmless. I would love to get a good summary. When all of this is said and done. When Earth is no longer a thing. Whatever. A million, a billion years from now. Like, what's a good summary? What happened here? It's interesting. I think AI will's a good summary of what happened here? It's interesting. I think AI will play a big part of that summary and hopefully humans will too. What do you think about the merger of the two? So one of the things that Elon and JeroLink talk about is one of the
Starting point is 02:07:18 ways for us to achieve AI safety is to ride the wave of AGI. So by merging. Incredible technology in a narrow sense to help the disabled. Just amazing, supported 100%. For long-term hybrid models, both parts need to contribute something to the overall system. Right now, we are still more capable in many ways, so having this connection to AI would be incredible, would make me superhuman in many ways. After a while, if I'm no longer smarter, more creative, really don't contribute much, the system finds me as a biological bottleneck. And even explicitly or implicitly I'm removed from any participation in the system. So it's like the appendix.
Starting point is 02:08:06 By the way, the appendix is still around. So even if it's, you said bottleneck. I don't know if we become a bottleneck. We just might not have much use. There's a different thing than bottleneck. Wasting valuable energy by being there. We don't waste that much energy. We're pretty energy efficient.
Starting point is 02:08:26 We could just stick around like the appendix. Come on now. That's the future we all dream about. Become an appendix. To the history book of humanity. Well, and also the cautiousness thing, the peculiar particular kind of cautiousness that humans have, that might be useful.
Starting point is 02:08:42 That might be really hard to simulate. But you said that, like, how would that look like if you could engineer that in, in silicon? Consciousness? Consciousness. I assume you are conscious. I have no idea how to test for it or how it impacts you in any way whatsoever right now. You can perfectly simulate all of it without making any different observations for me. But to do it in a computer, how would you do that? Because you kind of said that you think it's possible to do that.
Starting point is 02:09:12 So it may be an emergent phenomena. We seem to get it through evolutionary process. It's not obvious how it helps us to survive better, but maybe it's an internal kind of GUI which allows us to better manipulate the world, simplifies a lot of control structures. That's one area where we have very, very little progress. Lots of papers, lots of research, but papers, lots of research, but consciousness is not a big area of successful discovery so far. A lot of people think that machines would have to be conscious to be dangerous. That's a big misconception. There is absolutely no need for this very powerful optimizing agent to feel anything while it's performing things on you.
Starting point is 02:10:03 But what do you think about this, The whole science of emergence in general. So I don't know how much you know about cellular automata or these simplified systems where that study this very question from simple rules, emerges complexity. I attended wool from summer school. I love Steven very much. I love his work. I love cellular automata. So I just would love to get your thoughts, I love Stephen very much. I love his work. I love cellular automata. So I just would love to get your thoughts
Starting point is 02:10:29 how that fits into your view in the emergence of intelligence in AGI systems. And maybe just even simply, what do you make of the fact that this complexity can emerge from such simple rules? So the rule is simple, but the size of a space is still huge. And the neural networks were really the first discovery in AI. A hundred years ago, the first papers were published on neural networks, which just didn't have enough compute to make them work. I can give
Starting point is 02:11:00 you a rule such as start printing progressively larger strings. That's it, one sentence. It will output everything, every program, every DNA code, everything in that rule. You need intelligence to filter it out, obviously, to make it useful. But simple generation is not that difficult. And a lot of those systems end up being Turingring complete systems, so they are universal. And we expect that level of complexity from them. What I like about Wolfram's work is that he talks about irreducibility. You have to run the simulation. You cannot predict what is going to do ahead of time.
Starting point is 02:11:39 And I think that's very relevant to what we are talking about with those very complex systems. relevant to what we are talking about with those very complex systems until you live through it you cannot ahead of time tell me exactly what it's going to do. Irreducibility means that for sufficiently complex system you have to run the thing. You have to you can't predict what's gonna happen in the universe you have to create a new universe and run the thing. Big bang the whole thing. But running it may be consequential as well. It might destroy humans.
Starting point is 02:12:11 And to you there's no chance that A.I.s somehow carry the flame of consciousness, the flame of specialness and awesomeness that is humans? It may somehow, but I still feel kind of bad that it killed all of us. I would prefer that doesn't happen. I can be happy for others, but to a certain degree. It would be nice if we stuck around for a long time. At least give us a planet, the human planet.
Starting point is 02:12:40 It'd be nice for it to be Earth, and then they can go elsewhere. Since they're so smart, they can colonize Mars. Do you think they could help convert us to type one, type two, type three? Let's just stick to type two civilization on the Kardashev scale. Like, help us, help us humans expand out into the cosmos.
Starting point is 02:13:05 So all of it goes back to, are we somehow controlling it? Like, help us, help us humans expand out into the cosmos. So all of it goes back to, are we somehow controlling it? Are we getting results we want? If yes, then everything's possible. Yes, they can definitely help us with science, engineering, exploration in every way conceivable, but it's a big if. This whole thing about control though, humans are bad with control. Because the moment they gain control,
Starting point is 02:13:28 they can also easily become too controlling. It's the whole, the more control you have, the more you want it. It's the old power corrupts and the absolute power corrupts absolutely. And it feels like control over AGI, saying we live in a universe where that's possible. We come up with ways to actually do that.
Starting point is 02:13:49 It's also scary because the collection of humans that have the control over AGI, they become more powerful than the other humans and they can let that power get to their head. And then a small selection of them back to Stalin start getting ideas and then eventually it's one person usually with a mustache or a funny hat that starts sort of making big speeches and then all of a sudden you live in a world that's either 1984 or Brave New World and always at war with somebody and you know this whole idea of control turned out to be actually also not beneficial to humanity.
Starting point is 02:14:30 So that's scary too. It's actually worse because historically they all died. This could be different. This could be permanent dictatorship, permanent suffering. Well, the nice thing about humans, it seems like, it seems like the moment power starts corrupting their mind they can create a huge amount of suffering so there's a negative they can kill people make people suffer but then they become worse and worse at their job it feels like the more you really start doing like the
Starting point is 02:15:00 at least we are incompetent yeah well no they become more and more incompetent so they start losing their grip on power. So like holding on to power is not a trivial thing. It requires extreme competence, which I suppose Stalin was good at. It requires you to do evil and be competent at it or just get lucky. And those systems help with that. You have perfect surveillance. You can do some mind reading, I presume, eventually. It would be very hard to remove control from more capable systems over us.
Starting point is 02:15:32 And then it would be hard for humans to become the hackers that escape the control of the AGI because the AGI is so damn good. And then, yeah, yeah, yeah. And then the dictator's immortal. Yeah, that's not great. That's not a great outcome. See, I'm more afraid of humans than AI systems. I'm afraid, I believe that most humans want to do good
Starting point is 02:15:57 and have the capacity to do good, but also all humans have the capacity to do evil. And when you test them by giving them absolute powers you would if you give them AGI. That could result in a lot of suffering. What gives you hope about the future? I could be wrong. I've been wrong before. If you look 100 years from now and you're immortal, and you look back and it turns out this whole conversation,
Starting point is 02:16:29 you said a lot of things that were very wrong. Now that you're looking 100 years back, what would be the explanation? What happened in those 100 years that made you wrong? That made the words you said today wrong? There is so many possibilities. We had catastrophic events which prevented development of advanced Microchips That's not where your future
Starting point is 02:16:53 We could be in one of those personal universes. Yes, and the one I'm in is beautiful It's all about me and I like it a lot So we've now just to linger on that that means like every human has their personal universe. Yes Maybe multiple ones. Hey, why not? switching shop around It's possible that somebody comes up with alternative Model for building AI which is not based on neural networks, which are hard to scrutinize and that Alternative is somehow I don't see how but somehow AI which is not based on neural networks which are hard to scrutinize and that alternative
Starting point is 02:17:25 is somehow, I don't see how, but somehow avoiding all the problems I speak about in general terms not applying them to specific architectures. Aliens come and give us friendly super intelligence. There is so many options. Is it also possible that creating superintelligence systems becomes harder and harder? So meaning like, it's not so easy to do the flume, the takeoff. So that would probably speak more about how much smarter that system is compared to us. So maybe it's hard to be a million times smarter, but it's still okay to be five times smarter. Right. So that is totally
Starting point is 02:18:08 possible. That I have no objections to. So like it's there's a S-curve type situation about smarter and is going to be like 3.7 times smarter than all of human civilization. Right. Just the problems we face in this world, each problem is like an IQ test. You need certain intelligence to solve it. So we just don't have more complex problems outside of mathematics for it to be showing off. Like you can have IQ of 500 if you're playing tic-tac-toe, it doesn't show, it doesn't matter.
Starting point is 02:18:37 So the idea there is that the problems define your cognitive capacity. because the problems on Earth are not sufficiently difficult, it's not going to be able to expand its cognitive capacity. Possible. And because of that, wouldn't that be a good thing? It still could be a lot smarter than us. And to dominate long term, you just need some advantage. You have to be the smartest.
Starting point is 02:19:04 You don't have to be a million times smarter. So even 5X might be enough? It'd be impressive. What is it, IQ of a thousand? I mean, I know those units don't mean anything at that scale, but still, as a comparison, the smartest human is like 200. Well, actually, no, I didn't mean compared to an individual human. I meant compared to the collective intelligence of the human species. If you're somehow 5X smarter than that. We are more productive as a group.
Starting point is 02:19:32 I don't think we are more capable of solving individual problems. Like if all of humanity plays chess together, we are not like a million times better than world champion. That's because that there's, that's like one S curve is the chess, but humanity is very good at exploring the full range of ideas. Like the more Einstein's you have, the more, just the higher probability
Starting point is 02:19:58 you come up with general relativity. But I feel like it's more of a quantity super intelligence than quality super intelligence. Sure, but you know, quantity and... Enough quantity sometimes becomes quality. Oh man, humans. What do you think is the meaning of this whole thing? We've been talking about humans and humans not dying, but why are we here?
Starting point is 02:20:22 It's a simulation. We're being tested. The test is, will you be dumb enough to create super intelligence and release it? So the objective function is not be dumb enough to kill ourselves. Yeah, you're unsafe. Prove yourself to be a safe agent who doesn't do that
Starting point is 02:20:38 and you get to go to the next game. The next level of the game, what's the next level? I don't know. I haven't hacked the simulation yet. Well, maybe hacking the simulation is the thing. I'm working as fast as I can. And physics would be the way to do that? Quantum physics, yeah. Definitely.
Starting point is 02:20:54 Well, I hope we do. And I hope whatever is outside is even more fun than this one, because this one's pretty damn fun. And just a big thank you for doing the work you're doing. There's so much exciting development in AI and to ground it in the existential risks is really, really important. Humans love to create stuff and we should be careful not to destroy ourselves in the process. So thank you for doing that really important work. Thank you so much for inviting me. It was amazing and my dream is to be proven wrong. If everyone just, you know, picks up a paper or a book and shows how I messed it up, that would be optimal. But for now, the simulation continues. Thank you, Roman. Thanks for listening to this conversation with Roman Yimpolski. To support this podcast, please check out our sponsors in the description.
Starting point is 02:21:49 And now let me leave you with some words from Frank Herbert in Dune. I must not fear. Fear is the mind killer. Fear is the little death that brings total obliteration. I will face fear. I will permit it to pass over me and through me. And when it has gone past, I will turn the inner eye to see its path. Where the fear has gone, there will be nothing. Only I will remain. Thank you for listening and hope to see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.