Lex Fridman Podcast - Yann LeCun: Deep Learning, Convolutional Neural Networks, and Self-Supervised Learning

Starting point is 00:00:00 The following is a conversation with Yalekun. He's considered to be one of the fathers of deep learning, which, if you've been hiding under Iraq, is the recent revolution in AI that's captivated the world with the possibility of what machines can learn from data. He's a professor in New York University of ICE President and Chief AI scientist at Facebook and co-recipient the Turing Award for his work on deep learning.

Starting point is 00:00:26 He's probably best known as the founding father of convolutional neural networks. In particular, their application to optical character recognition and the famed M-NIST data set. He is also an outspoken personality, unafraid to speak his mind in a distinctive French accent and explore provocative ideas both in the rigorous medium of academic research and the somewhat less rigorous medium of Twitter and Facebook. This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, give it 5 stars and iTunes, support it on Patreon, or simply to click me on Twitter at Lex Friedman's about the F-R-I-D-M-A-N.

Starting point is 00:01:06 And now here's my conversation with Jan LeCoon. You said that 2001 Space Odyssey is one of your favorite movies. Hal 9000 decides to get rid of the astronauts for people who haven't seen the movie spoiler alert because he believes that the they will interfere with the mission. Do you see how is flawed in some fundamental way, or even evil, or did he do the right thing? Neither. There's no notion of evil in that context, other than the fact that people die. But it was an example of what people call value misalignment, right? You give an objective to a machine and

Starting point is 00:02:09 the machine tries to achieve this objective. If you don't put any constraints on this objective, don't kill people and don't do things like this. The machine given the power will do stupid things just to achieve this objective or damaging just to achieve this objective or damaging things to achieve this objective. It's a little bit like, we are used to this in a context of human society. We put in place laws to prevent people from doing bad things because spontaneously they

Starting point is 00:02:40 would do those bad things. So we have to shape their cost function, their objective function, if you want to laws to kind of correct and education obviously, to sort of correct for those. So maybe just pushing a little further on that point, how there's a mission, there's a fuzziness around the ambiguity around what the actual mission is, but do you think that there will be a time from a utilitarian perspective where an AI system, where it is not misalignment, where it is alignment for the greater good of society,

Starting point is 00:03:19 that an AI system will make decisions that are difficult? Well, that's the trick. I mean, eventually it will have to figure out how to do this. And again, we're not starting from scratch because we've been doing this with humans for millennia. So designing objective functions for people is something that we know how to do. And we don't do it by programming things,

Starting point is 00:03:41 although the legal code is called code. That tells you something. It's actually the design of an objective function. That's really what legal code is. It tells you, here is what you can do, here is what you can't do. If you do it, you pay that much. That's an objective function.

Starting point is 00:03:58 There is this idea somehow that it's a new thing for people to try to design objective functions that are aligned with the common good. But no, we've been writing laws from millennia and that's exactly what it is. So that's where the science of law-making and computer science will come together. We'll come together. So there's nothing special about how our AI systems is just the continuation of tools used to make some of these difficult ethical judgments

Starting point is 00:04:28 that laws make. Yeah, and we have systems like this already that make many decisions for ourselves in society that need to be designed in a way that they like rules about things that sometimes have bad side effects. And we have to be flexible enough about those rules so that they can be broken when it's obvious that they shouldn't be applied. So you don't see this on the camera here, but all the decoration in this room is all pictures from

Starting point is 00:04:55 2001 and space audices. Wow. Then by accident or is there a lot? It's not by accident. It's by design. or is there a lot? It's not by accident, it's by design. Oh wow. So if you were to build how 10,000, so an improvement of how 9,000, what would you improve? Well, first of all, I wouldn't ask you to hold secrets and tell lies because that's really what breaks it in the end.

Starting point is 00:05:21 That's the fact that it's asking, it's health questions about the purpose of the mission. And it's, you know,'s the fact that it's asking, it's health questions about the purpose of the mission. And it's, you know, pieces things together that it's heard, you know, all the secrecy of the preparation of the mission and the fact that it was discovery on the lunar surface that really was kept secret. And one part of how's memory knows this and the other part is, does not know it and is supposed to not tell anyone and that creates an internal conflict. Do you think there's never should be a set of things that an AI system should not be allowed? Like a set of facts that should not be shared with the human operators?

Starting point is 00:05:59 Well, I think no, I think that I think it should be a bit like in the design of No, I think it should be a bit like in the design of autonomous AI systems, there should be the equivalent of the oath that he for credit. That's a credit also. That doctor is a sign of too. So there's certain things, certain rules that you have to abide by. And we can sort of hardwire this into our machines to kind of make sure they don't go. So I'm not, you know, an advocate of the $3.00

Starting point is 00:06:31 over robotics, you know, the azimuth kind of thing, because I don't think it's practical, but, you know, some level of limits. But to be clear, this is not, these are not questions that are kind of re-worth asking today because we just don't have the technology to do this. We don't have autonomous internal machines. We have intelligent machines, so am I intelligent machines that are very specialized, but they don't really sort of satisfy an objective. They're just kind of trained to do one thing. So until we have some idea for design of a full-fledged autonomous intelligent system, asking the question of how we design this objective, I think it's a little too abstract.

Starting point is 00:07:15 It's a little too abstract. There's useful elements to it in that it helps us understand our own ethical codes, humans. So even just as a thought experiment, if you imagine that an AGI system is here today, how would we program it? Is it a kind of nice thought experiment of constructing? How should we have a law, have a system of laws for us humans?

Starting point is 00:07:41 It's just a nice practical tool. And I think there's echoes of that idea too in the AI systems we have today that don't have to be that intelligent. Yeah, like autonomous vehicles. These things start creeping in that they're worth thinking about, but certainly they shouldn't be framed as how. Yeah. Looking back, what is the most, I'm sorry if it's a silly question, but what is the most beautiful or surprising idea in deep learning or AI in general that you've ever come across? So personally, when you said back and just had this kind of, that's a pretty cool moment. That's nice.

Starting point is 00:08:22 Well, surprising. I don't know if it's an idea, rather than a sort of empirical fact, the fact that you can build gigantic neural nets, try them on relatively small amounts of data relatively with to castigrate and descent, and that actually works. Breaks everything you read in every textbook, every pre-depriving textbook that I told you, you need to have fewer parameters than you have data samples.

Starting point is 00:08:54 If you have a non-convex objective function, you have no guarantee of convergence, all those things that you read in textbook and they tell you stay away from this. And they're all wrong. As each number of parameters, non-convex, and somehow, which is very relative to the number of parameters data, it's able to learn anything. Right.

Starting point is 00:09:13 Does that still surprise you today? Well, it was kind of obvious to me before I knew anything that this is a good idea, and then it became surprising that it worked because I started reading those textbooks. Okay. So you talked to the intuition of why I was obviously if you remember. Well, okay, so the intuition was, it's sort of like those people in the late 19th century

Starting point is 00:09:37 who proved that heavier than air flight was impossible. And of course you have birds. They do fly. On the face of it, it's obviously wrong as an empirical question. We have the same thing that we know that the brain works. We don't know how, but we know it works. We know it's a large network of neurons in interaction, and that learning takes place by changing the connection.

Starting point is 00:10:03 Getting this level of inspiration without copying the details, but sort of trying to derive basic principles, you know, that kind of gives you a clue as to which direction to go. There's also the idea somehow that I've been convinced of since I was on the undergrad that, even before, that intelligence is inseparable from learning.

Starting point is 00:10:24 So the idea somehow that you can create an intelligent machine by basically programming, for me was an on-starter from the start. Every intelligent entity that we know about arrives at this intelligence through learning. So learning, machine learning was completely obvious path. Also because I'm lazy, so you know, kind of, is automate basically everything and learning

Starting point is 00:10:52 is the automation of intelligence. So do you think, so what is learning then what what falls under learning? Because do you think of reasoning as learning? Well, reasoning is certainly a consequence of learning as well, just like other functions of the brain. The big question about reasoning is how do you make reasoning compatible with gradient-based learning?

Starting point is 00:11:20 Do you think neural networks can be made to reason? Yes, there is no question about that. Again, we have a good example, right? The question is, how? So the question is how much prior structure you have to put into the neural net so that something like human reasoning will emerge from it, you know, from learning. Another question is, all of our kind of model of what reasoning is that are based on logic are discrete and are therefore incompatible with great and based learning. And I'm a very strong believer in this idea of great and based learning. I don't believe that

Starting point is 00:11:55 all the types of learning that don't use kind of gradient information if you want. So you don't like discrete mathematics you don't like anything discrete. Well, that's it's not that I don't like it, it's just that it's incompatible with learning and I'm a big fan of learning, right? So in fact that's perhaps one reason why deep learning has been kind of looked at with suspicion by a lot of computer scientists because the math is very different. The math that you use for deep learning, you know, what we're kind of, as more to do with you know, sabonetics, the kind of math you do in electric engineering, then the kind of math you do in computer science.

Starting point is 00:12:30 And, you know, nothing in machine learning is exact, right? Computer science is all about sort of, you know, obviously, compulsive attention to details of like, you know, every index has to be right. And you can prove that algorithm is correct, right? Machine learning is the science of solopiness really. That's beautiful. So, okay, maybe let's feel around in the dark of what is a neural network that reasons or a system that is works with continuous functions that is works with continuous functions

Starting point is 00:13:09 that's able to do build knowledge. However we think about reasoning, build on previous knowledge, build on extra knowledge, create new knowledge, generalize outside of any training set ever built, what does that look like? If, yeah, maybe they have inklings of thoughts of what that might look like. Yeah, I mean, yes or no. If I had precise ideas about this,

Starting point is 00:13:31 I think, you know, we'll be building it right now. But, and there are people working on this, who's main research interest is actually exactly that, right? So, what you need to have is a working memory. So, you need to have some device, memory. So you need to have some device if you want, some subsystem that can store a relatively large number of factual episodic information for a reasonable amount of time. So in the brain, for example,

Starting point is 00:14:01 there are three main types of memory. One is the sort of memory of the state of your cortex. And that sort of disappears within 20 seconds. You can't remember things for more than about 20 seconds or a minute, if you don't have any other form of memory. The second type of memory, which is longer term, the short term, is the hippocampus. So you can, you know, you came into this building, you remember where the, where the exit is,

Starting point is 00:14:28 where the elevators are. You have some map of that building that's stored in your hippocampus. You might remember something about what I said, you know, a few minutes ago. I forgot it all already, for a while. It's been erased, but, you know, but that, that would be in your be in your hippocampus. And then the longer term memory is in the synapses. So what you need if you want to system that's keep people reasoning is that you want the hippocampus like thing.

Starting point is 00:14:57 And that's what people have tried to do with memory networks and neural-turing machines and stuff like that. And now with transformers, which have sort of a memory in there, kind of self-attention system, you can think of it this way. So, so that's one element you need. Another thing you need is some sort of network that can access this memory, get an information back, and then kind of crunch on it it and then do this iteratively multiple times because a chain of reasoning is a process by which you update your knowledge about the

Starting point is 00:15:36 set of the world about what's going to happen, etc. And that has to be this recurrent operation basically. And you think that kind of, if you think about a transformer, so that seems to be this sort of recurrent operation, basically. And you think that kind of, if you think about a transformer, so that seems to be too small to contain the knowledge that's to represent the knowledge that's contained Wikipedia, for example. Well, a transformer doesn't have this idea of recurrence. It's got a fixed number of layers,

Starting point is 00:16:00 and that's a number of steps that will limit basically its representation. But recurrence would build on the knowledge somehow. I mean, it would evolve the knowledge and expand the amount of information, perhaps, or useful information within that knowledge. But is this something they just can emerge with size? Because it seems like everything we have now is just...

Starting point is 00:16:24 No, it's not clear. I mean, how are you access and write into an associated memory inefficient way? I mean, so the original memory network maybe had something like the right architecture, but if you try to scale up a memory network so that the memory contains all Wikipedia, it doesn't quite work. Right. So there's a need for new ideas there. But it's not the only form of reasoning.

Starting point is 00:16:47 So there's another form of reasoning, which is very classical also in some types of AI. And it's based on, let's call it energy minimization. So you have some sort of objective, some energy function that represents the quality or the negative quality. Energy goes up when things get bad and they get low when things get good. So let's say you want to figure out what gestures do I need to do to grab an object or work out the door. If you have a good model of your own body, a good model of the environment,

Starting point is 00:17:29 using this kind of energy minimization, you can make, you can do planning. And it's in optimal control, it's called model predictive control. You have a model of what's gonna happen in the world as consequence of your actions. And that allows you to buy energy minimization, figure out a sequence of action that

Starting point is 00:17:47 optimizes a particular objective function, which measures minimizes the number of times you're going to hit something, and the energy you're going to spend doing the gesture and et cetera. So that's a formal reasoning. Planning is a formal reasoning. And perhaps what led to the ability of humans to reason is the fact that or species that appear before us had to do some sort of planning to be able to hunt and survive

Starting point is 00:18:14 and survive the winter in particular. And so it's the same capacity that you need to have. So in your intuition is if we look at expert systems and encoding knowledge as logic systems, as graphs, in this kind of way, is not a useful way to think about knowledge. Graphs are little brittle or logic representation. So basically, you know, variables that have values and then constrained between them that are represented by rules, is little too rigid and too brittle, right? So one of the, you know, some of the early efforts in that respect, were to put probabilities on them. So a rule, you know, if you have this in that same term, you know, you have this disease with that probability and you should prescribe that antibiotic with that probability.

Starting point is 00:19:07 That's the mycine system from the 70s. And that's what that branch of AI led to, based on networks and graphical models and causal inference and variational method. So there is, I mean, certainly a lot of interesting work going on in this area. The main issue with this is knowledge acquisition. How do you reduce a bunch of data to a graph of this type? Yeah, we're lives in the expert on the human being to encode, to add knowledge. And that's

Starting point is 00:19:43 essentially in practical. Yeah. So that's a big question. The second question is, do you want to represent knowledge as symbols? And do you want to manipulate them with logic? And again, that's incompatible with learning. So one suggestion with Jeff Hinton has been advocating for many decades

Starting point is 00:20:02 is replace symbols by Vactors think of it as pattern of activities in a bunch of neurons or units or whatever you want to call them and replace logic by Continuous functions Okay, and that becomes now compatible. This is a very good set of ideas By written in a paper about 10 years ago by Leon Batou on who is here at Facebook. The title of the paper is for machine learning to machine reasoning and his idea is that a learning system should be able to manipulate objects that are in a space and then put the result back in the same space. So it's this idea of working memory, basically.

Starting point is 00:20:45 And it's very enlightening. And in a sense, that might learn something like the simple expert systems. I mean, you can learn basic logic operations there. Yeah, quite possibly. Yeah. This big debate on how much prior structure you have to put in for this kind of stuff to emerge. That's the debate I have with Gary Marcus and people like that. Yeah. This big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge.

Starting point is 00:21:06 That's the debate I have with Gary Marcus and people like that. Yeah. Yeah. So, and the other person, so I just talked to Judea Pearl, from the mention causal inference world. So his worry is that the current neural networks are not able to current neural networks are not able to learn what causes what causal inference between things. So I think is right and wrong about this. If he's talking about the sort of classic type of neural nets, people didn't worry too much about this. But there's a lot of people now working on causal inference, and there's a paper that

Starting point is 00:21:43 just came out last week by Leon Boutouou among others, the videopass, by Zon Pacheva, the people. Exactly on that problem of how do you kind of, you know, get an neural net to sort of pay attention to real causal relationships, which may also solve issues of bias in data, can things like this. So I'd like to read that paper because that ultimately the challenge there's also seems to fall back on the human expert to ultimately decide causality between things. People are not very good at establishing causality, first of all. So first of all, you talk to a physicist and physicists actually don't believe in causality, first of all. So first of all, you talk to physicists, and physicists actually don't believe in causality

Starting point is 00:22:25 because look at all the basic laws of macro physics are time reversible. So there's no causality. There are times not real. It's as soon as you start looking at macroscopic systems where there is unpredictable randomness where there is clearly an hour of time, but it's a big mystery in physics actually,

Starting point is 00:22:44 well, how that emerges. Is it emergent, or is it part of the, on the metal fabric of reality, yeah? Or is it bias of intelligent systems that, you know, because of the second law of thermodynamics, we perceive a particular hour of time, but in fact, it's kind of arbitrary, right? So yeah, physicists, mathematicians, they don't care about care about I mean the math doesn't care about the flow of time

Starting point is 00:23:08 Well, certainly certainly macrophysics doesn't People themselves are not very good at establishing causal causal relationships if you ask is I think it was in one of similar papers book on on Like children learning you know, he studied with Jean-Pierre Géin, he's the guy who co-authored the book Perception with Marvin Minsky that kind of killed the first wave of neural nets.

Starting point is 00:23:31 But he was actually a learning person. He, in the sense of studying learning in humans and machines, that's why he got interested in Perception. And he wrote that, if you ask a little kid about what is the cause of the wind, a lot of kids will say, they will think for a while, and they'll say, oh, the branches and the trees, they move, and that creates wind.

Starting point is 00:23:57 So they get the causal relationship backwards. And it's because they are understanding of the world and intuitive physics. It's not that great. I mean, these are like four or five year old kids. It gets better and then you understand that this can be, right? But there are many things which we can, because of our common sense, understanding of things, what people call common sense.

Starting point is 00:24:20 Yeah. And we're understanding of physics. We can, there's a lot of stuff that we can figure out causality, even with diseases. We can figure out what's not causing what, often. There's a lot of mystery, of course, but the idea is that you should be able to encode that into systems because it seems unlikely to be able to figure that out themselves. Well, whenever we can do intervention, but all of humanity has been completely diluted for millennia, probably since existence, about a very, very wrong causal relationship

Starting point is 00:24:50 where whatever you can explain, you're attributed to some deity, some divinity, right? And that's a cup out. That's a way of saying, like, I don't know the cause, so God did it, right? So you mentioned Marvin Minsky and the irony of, you know, maybe causing the first day I went to you were there in the 90s, you were there in 80s, of course. In the 90s, what do you think people lost faith in deep learning? In the 90s and founded again, a decade

Starting point is 00:25:21 later, over a decade later. Yeah, it wasn't called deep learning yet, it was just called neural nets. Yeah, they lost interest. I mean, I think I would put that around 1995, at least the machine learning community. There was always a neural net community, but it became kind of disconnected from sort of mainstream machine learning if you want. There were, it was basically actually a engineering that kept at it and computer science gave up on neural nets. I don't know. I was too close to it to really

Starting point is 00:26:00 sort of analyze it with sort of a unbiased eye if you want, But I would make a few guesses. So the first one is, at the time, neural nets were, it was very hard to make them work. In a sense that you would implement backprop in your favorite language. And that favorite language was not Python. It was not MATLAB, it was not

Starting point is 00:26:25 any of those things because they didn't exist, right? You had to write it in Fortranos C or something like this, right? So you would experiment with it, you would probably make some very basic mistakes, like you know, barely initialize your weights, make the network too small because you're ready in the textbook, you know, you don't want to any parameters, right? And of course, you know, and you would train on XOR because you're ready in the textbook. You don't want too many parameters. Of course, you would train on XOR because you didn't have any other dataset to trade on. Of course, it works after time. So you would say, give up.

Starting point is 00:26:52 Also, you would train it with batch gradient, which isn't sufficient. There was a lot of bad metrics that you had to know to make those things work, or you had to reinvent. A lot of people just didn't and they just couldn't make it work. So that's one thing. The investment in software platform to be able to kind of, you know,

Starting point is 00:27:14 display things, figure out why things don't work, kind of get a good intuition for how to get them to work, have inner flexibility so you can create, you know, network architecture is a lot of convolutional nets and stuff like that. It was hard. I mean, you had to write everything from scratch. And again, you didn't have any Python or MATLAB or anything, right? So I read that, sorry to interrupt,

Starting point is 00:27:32 but I read that you wrote in LISP, the first versions of Lynette with the convolutional networks, which by the way, one of my favorite languages. That's how I knew your legit, touring award, whatever, this is what the program done list, that's still my favorite language. But it's not that we programmed in Lisp, it's that we had to write our Lisp interpreter. Okay, because it's not like we used one that existed.

Starting point is 00:27:57 So we wrote our Lisp interpreter that we hooked up to backhand library that we wrote also for sort of neural-nade computation. And then after a few years around 1991, we invented this idea of basically having modules that know how to forward propagate and back propagate gradients and then interconnecting those modules in a graph. Lombard 2 had made proposals on this about this in the late 80s

Starting point is 00:28:21 and we were able to implement this using a list system. Eventually, we wanted to use that system to make on this about this in the late 80s and we're able to implement this using our list system. Eventually, we wanted to use that system to make build production code for character recognition at Bell Labs. So we actually wrote a compiler for that list interpreter so that Patricia Mard, who is now Microsoft, did the bulk of it with Leon and me. And so we could write our system in this and then compile to C and then we'll have a self-contained complete system that could kind of do the entire thing. Neither PyTorch nor TensorFlow can do this today.

Starting point is 00:28:53 Yeah, OK. It's coming. Yeah. I mean, there's something like that in PyTorch called Torch Script. And so we had to write to a list of the triplers, we had to write to a list compiler, we had to write all this meterplier, we had to write all this compiler, we had to invest a huge amount of effort to do this.

Starting point is 00:29:08 And not everybody, if you don't completely believe in the concept, you're not going to invest the time to do this. Now, at the time also, what today, this would turn into Torch or PyTorch or TensorFlow, or whatever, we'd put it in open source, everybody would use it and realize it's good. Back before 1995, working at AT&T,

Starting point is 00:29:28 there's no way the lawyers would let you release anything in open source of this nature. And so we could not distribute our code really. And at that point, and started going to million tangents, but on that point, I also read that there was some almost pat, like a patent on commercial and you know, that work. Yes.

Starting point is 00:29:48 But it was. So that, first of all, I mean, just who actually, that ran out, that, thankfully, 2007, 2007. What, can we, can we just talk about that for, I know you're a Facebook, but you're also in NYU, and what does it mean to patent ideas like these software ideas, essentially, or what are mathematical ideas, or what are they? Okay. So, they're not mathematical ideas, so there are algorithms. And there was a period

Starting point is 00:30:25 where the US patent office would allow the patent of software as long as it was embodied. The Europeans are very different. They don't quite accept that. They have a different concept, but I don't, I don't, I mean, I never actually strongly believed in this, but I don't believe in this kind of patent. Facebook basically doesn't believe in this kind of patent. Facebook basically doesn't believe in this kind of patent. Google five patents because they've been burned with Apple. And so now they do this for defensive purpose, but usually they say, we're not going to see you if you're in French.

Starting point is 00:31:02 Facebook has a similar policy, they say, we've had patterns on certain things. For defensive purpose, we're not going to see you if you're in French, when I see you through us. So the industry does not believe in patterns. They are there because of the legal landscape and various things, but I don't really believe in patterns for this kind of stuff.

Starting point is 00:31:25 So that's a great thing. So I'll tell you a more story actually. So what happens was the first pattern about convolutional net was about the early version of convolutional net that didn't have separate pulling layers. It had convolutional layers with tried more than one if you want.

Starting point is 00:31:42 And then there was a second one on convolutional nets with separate pulling layers, turning it back up. And there were files in 89 and 90 years, something like this. At the time, the life of a pattern was 17 years. So here's what happened over the next few years, is that we started developing character recognition technology around convolutional nets.

Starting point is 00:32:05 And in 1994, a check reading system was deployed in ATM machines. In 1995, it was for large check reading machines in back offices, et cetera. And those systems were developed by an engineering group that we were collaborating with at AT&T, and they were commercialized by NCR, which at the time was a subsidiary of AT&T. Now, AT&T is played up in 1996, early 1996.

Starting point is 00:32:35 The lawyers just looked at all the patents and distributed the patents among the various companies. They gave the commercial net patent to NCR because they were actually selling products that used it. But nobody at NCR had any idea what the commercial net was. Yeah. Okay. So between 1996 and 2007, there's a whole period until 2002 where I didn't actually work on machine learning or commercial net. I resumed working on this around 2002.

Starting point is 00:33:03 And between 2002 and 2007, I was working on them crossing my finger that nobody at NCR would notice and nobody noticed. Yeah, and I hope that this kind of somewhat, as you said, lawyers aside relative openness of the community now will continue. It accelerates the entire progress of the industry. And the problems that Facebook and Google and others

Starting point is 00:33:29 are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other. It's that we don't have the technology to build these things we want to build. We want to build intelligent virtual assistants that have common sense. We don't have monopoly on good ideas for this. We don't believe we do. Maybe others believe they do, but we don't. If a startup tells you they have the secret to human level intelligence and common sense, don't believe them. They don't. And it's going to take the entire work of the world research community for a while to get to the point where you can go

Starting point is 00:34:04 often in each of those companies can start to build things on this. We're not there yet. of the world research community for a while, to get to the point where you can go often in each of those companies can start to build things on this. We're not there yet. It's absolutely, and this calls to the gap between the space of ideas and the rigorous testing of those ideas of practical application that you often speak to.

Starting point is 00:34:20 You've written advice saying, don't get fooled by people who claim to have a solution to artificial general intelligence who claim to have an AI system that works just like the human brain or who claim to have figured out how the brain works. Ask them what the error rate they get on M-nist or ImageNet. So this is a little dated, by the way. I mean, five years. Yes. Who's counting. OK. But I think your opinion is to amnest an image that, yes, maybe dated, there may be new benchmarks, right?

Starting point is 00:34:53 But I think that philosophy is when you still and somewhat hold that benchmarks and the practical testing, the practical application is where you really get to test the ideas. Well, it may not be completely practical. Like, for example, you know, it could be a toy dataset, but it has to be some sort of task that the community as a whole has accepted as some sort of standard, you know, kind of benchmark if you want.

Starting point is 00:35:17 It doesn't need to be real. So, for example, many years ago here at Fair, people, you know, just in Western Antoine Borne, and a few others proposed the Babbitas tasks, which were kind of a toy problem to test the ability of machines to reason actually to access working memory and things like this. And it was very useful, even though it wasn't a real task. M-list is kind of halfway a real task. So you know, toy problems can be very useful. It's just that I was really struck by the fact that a lot of people,

Starting point is 00:35:47 particularly a lot of people, we need to invest. We'll be fooled by people telling them, oh, we have, you know, the algorithm of the cortex and you should give us 50 million. Yes, absolutely. So there's a lot of people who try to take advantage of the hype for business reasons and so on. But let me sort of talk to this idea that new ideas, the ideas that push the field forward may not yet have a benchmark. Or it may be very difficult to establish a benchmark.

Starting point is 00:36:18 I agree. That's part of the process. Establishing benchmarks is part of the process. So what are your thoughts about? So we have these benchmarks on around stuff we can do with images from classification to captioning to just every kind of information you can pull off from images in the surface level. There's audio, data sets, there's some video.

Starting point is 00:36:39 What can we start, natural language? What kind of stuff, what kind of benchmarks do you see? They start creeping on to more something like intelligence, like reasoning, like maybe you don't like the term, but AGI, echoes of that kind of organization. A lot of people are working on interactive environments in which you can train and test intelligence systems. So there, for example, you know, it's the classical paradigm of super-visioning is that you

Starting point is 00:37:14 you have a data set, you partition it into a training set, validation set, test set, and there's a clear protocol, right? But why if the that assumes that the samples are statistically independent, you can exchange them, the order in which you see them, doesn't matter, you know, things like that. But what if the answer you give determines the next sample you see,

Starting point is 00:37:34 which is the case, for example, in robotics, right? You robot does something and then it gets exposed to a new room, and depending on where it goes, the room will be different. So that's the, that creates the exploration problem. The, what if the samples, so that creates also a dependency between samples, right? To, you, if you move, if you can only move in, in space, the next sample you're going to see is going to be probably in the same building,

Starting point is 00:38:00 most likely. So, so the, so the, all the assumptions about the validity of this training set set are put as its break. Whenever a machine can take an action that has an influence in the world and it's what is going to see. So people are sending up artificial environments where that takes place, right? The robot runs around a 3D model of a house and can interact with objects and things like this. So you do robotics by simulation, you have those, you know, opening a gym type thing or Mujoko kind of simulated robots and you have games, you know, things like that.

Starting point is 00:38:38 So that's where the field is going really, this kind of environment. Now back to the question of a GI, like, I don't like the term a GI, because it implies that human intelligence is general, and human intelligence is nothing like general. It's very, very specialized. We think it's general, we like to think ourselves as having general intelligence, we don't, we're very specialized. We're only slightly more general than... Why does it feel general? So you kind of, the term general, I think what's impressive about humans

Starting point is 00:39:11 is ability to learn, as we were talking about learning, to learn in just so many different domains. It's perhaps not arbitrarily general, but just you can learn in many domains and integrate that knowledge somehow. Okay. The knowledge persists. So let me take a very specific example.

Starting point is 00:39:28 Yes. It's not an example. It's more like a quasi-methodical demonstration. So you have about one million fibers coming out of one of your eyes, okay, two million total. But let's let's talk about just one of them. It's one million nerve fibers, your optical nerve. Let's imagine that they are binary, so they can be active or inactive. So the input to your visual cortex is 1 million bits.

Starting point is 00:39:54 Now they're connected to your brain in a particular way, and your brain has connections that are kind of a little bit like a convolutional network, they're kind of local, you know, in space and things like this. Now imagine I play a trick on you. It's a pretty nasty trick I admit. I cut your optical nerve and I put a device that makes a random perturbation of a permutation of all the nerve fibers.

Starting point is 00:40:18 So now what comes to your brain is a fixed but random permutation of all the pixels. There's no way in hell that your visual cortex, even if I do this to you in infancy, will actually learn vision to the same level of quality that you can. Got it. And you're saying there's no way you've relearned that? No, because now two pixels that are nearby in the world well end up in very different places in your visual cortex. And your neurons there have no connections with each other because they only connect it locally.

Starting point is 00:40:50 So this whole, our entire, the hardware is built in many ways to support the locality of the real world. Yeah. Yes, that's specialization. Yeah, but it's still pretty damn impressive. So it's not perfect generalization. It's not even close. No, no, it's not even close. It's not at all. Yes, not. It's still pretty damn impressive. So it's not perfect generalization. It's not even close. No, it's not even close.

Starting point is 00:41:07 It's not at all. Yes, not. It's specialized. So how many Boolean functions? So let's imagine you want to train your visual system to recognize particular patterns of those 1 million bits. So that's a Boolean function. Either the pattern is here or not here.

Starting point is 00:41:24 There's a 2A classification with 1 million binary inputs. How many such Boolean functions are there? Okay. You have two to the 1 million combinations of inputs. For each of those, you have an output bit. And so you have two to the 1 million Boolean functions of this type, okay, which is an unimaginably large number. How many of those functions can actually be computed by your usual cortex? And the answer is a tiny tiny tiny tiny tiny tiny tiny sliver,

Starting point is 00:41:58 like an enormously tiny sliver. So we are ridiculously specialized. Okay, but that's an argument against the word general. I think there's a, I agree with your intuition, but I'm not sure it seems the brain is impressively It's impressively capable of adjusting to things. So it's because we can't imagine tasks that are outside of our comprehension. So we think we're a general, because we're a general of all the things that we cannot comprehend. But there is a huge world out there of things that we have no idea. We call that heat, by the way. Heat. So at this physicist call that heat, or they call it entropy, which is...

Starting point is 00:42:49 That's true. You know, you have a... ...thing full of gas, right? ...close system for gas, right? Close on or close. It has, you know, pressure, it has... ...temperature, it has temperature, and you can write equations, pv equal and on t, things like that. When you reduce the volume, the temperature goes up, the pressure goes up, things like that, for perfect gas at least. Those are the

Starting point is 00:43:19 things you can know about that system. And it's a tiny, tiny number of bits compared to the complete information of the state of about that system. And it's a tiny, tiny number of bits compared to the complete information of the state of the entire system, because the state of the entire system will give you the position and momentum of every, every molecule of the gas. And what you don't know about it is the entropy and you interpret it as heat. The energy contained in that thing is what we call heat. Now, it's very possible that in fact there is some very strong structure in how those molecules are moving, it's just that they are in a way that we are just not wired to perceive. Yeah, we're ignorant to it. And there's in your infinite amount of things we're not

Starting point is 00:44:01 wired to perceive. Yeah. And you're right, that's a nice way to put it. We're general to all the things we can imagine, which is a very tiny subset of all things that are possible. It's like CommaGolf complexity or the CommaGolf is chitine someone of complexity. You know, every bit string or every integer is random, except for all the ones that you can actually write down. Yeah, okay, so beautiful, but you know, so we can just call it artificial intelligence. We don't need to have a general stamina level. Human level intelligence is a good one.

Starting point is 00:44:37 You know, you'll start anytime you touch human, it gets interesting because, Any time you touch human, it gets interesting because, you know, it's because we attach ourselves to human and it's difficult to define what human intelligence is. Nevertheless, my definition is maybe a damn impressive intelligence. Okay, damn impressive demonstration of intelligence, whatever. And so on that topic, most successes in deep learning have been in supervised learning. What is your view on supervised learning? Is there hope to reduce involvement of human input and still have successful systems that have practically used. Yeah, I mean, there's definitely a hope.

Starting point is 00:45:26 It's more than a hope actually. It's mounting evidence for it. And that's basically all I do, like the only thing I'm interested in at the moment is, I call it self-supervised running, not unsupervised. Because unsupervised running is a loaded term. People who know something about machine running, tell you, so you're doing clustering or PCA, which is not loaded term. People who know something about machine learning, you know, tell you, so you're doing clustering or PCA, right? She's not the case. And the white public, you know,

Starting point is 00:45:49 when you say unsupervised learning, oh my god, you know, machines are going to learn about themselves and without supervision, you know, they see this as, where's the parents? Yeah. So, so I call it self-supervised learning because in fact, the underlying algorithms that are used are the same algorithms as the supervised learning algorithms, except that what we trend them to do is not predict a particular set of variables like the category of an image and not to predict a set of variables that have been provided by human laborers. But what you're trying to machine to do is basically reconstruct a piece of its input that is being masked out essentially. You can think of it this way.

Starting point is 00:46:32 So show a piece of video automachine and ask it to predict what's going to happen next. And of course, after a while, you can show what happens and the machine will kind of train itself to do better at that task. You can do all the latest, most successful models in natural language processing use self-supervised running. You know, sort of bird style systems, for example, right? You show it a window of a thousand words on a text corpus.

Starting point is 00:47:00 You take out 15% of the words, and then you train the machine to predict the words that are missing that's super resonating. It's not predicting the future, it's just predicting things in the middle, but you could have you predict the future, that's what language models do. So you construct, so in an unsupervised way, you construct a model of language. Do you think or or video, or the physical world, or whatever, right? How far do you think that can take us? Do you think very far, I think,

Starting point is 00:47:31 understands anything? To some level, it has a shadow understanding of text, but it needs to, I mean, to have kind of true human level intelligence that you need to ground language in reality. So some people are attempting to do this, right? Having systems that kind of have some visual representation of what is being talked about, which is one reason you need those interactive environments, actually. But this is like a huge technical problem that is not solved, and that explains why

Starting point is 00:48:04 self-supervisioning works in the context of natural language, but does not work in the context, or at least not well, in the context of image recognition and video, although it's making progress quickly. And the reason that reason is the fact that it's much easier to represent uncertainty in the prediction in the context of natural language than it is in the context of things like video and images. So, for example, if I ask you to predict what words I'm missing, you know, 15% of the words that are taken out.

Starting point is 00:48:34 The possibility is a small. That means small, right? There is a hundred thousand words in the in the lexicon. And what the machine splits out is a big probability vector, right? It's a bunch of numbers between the and 1 that's 1 to 1. And we know how to do this with computers. So there representing uncertainty in the prediction is relatively easy. And that's in my opinion why those techniques work for NLP.

Starting point is 00:48:59 For images, if you block a piece of an image and you have a system reconstructed a piece of the image, there are many possible answers. There are all perfectly legit. And how do you represent that the set of possible answers? You can't train a system to make one prediction. You can't train a neural net to say, here it is. That's the image. Because there's a whole set of things that are compatible with it. So how do you get the machine to represent not a single output, but a whole set of outputs? And, you know, similarly with video prediction, there's a lot of things that can happen in the future video. You're looking at me right now. I'm not moving my head very much,

Starting point is 00:49:39 but, you know, I might, you know, turn my head to the left or to the right. Right. If you don't have a system that can predict this, and you train it with least square to kind of minimize the error with a prediction and what I'm doing, what you get is a blurry image of myself in all possible future positions that I might be in, which is not a good prediction. But so there might be other ways

Starting point is 00:50:00 to do the self-supervision, right? For visual scenes. Like what? If I knew I wouldn't tell you, I'd publish it first. I don't know. I know there might be. So I mean, these are kind of, there might be artificial ways of like self-play

Starting point is 00:50:19 in games, the way you can simulate part of the environment. You can- Oh, that doesn't solve the problem. It's just a way of generating data. But because you have more of a control, like me, you can control, yeah, it's a way to generate data. And that's right.

Starting point is 00:50:34 And because you can do huge amounts of data generation, that doesn't, you're right. Well, it's a creeps up on the problem from the side of data. And you don't think that's the right way to creep up. It doesn't solve this problem of handling uncertainty in the world, right? So if you have a machine learn a predictive model

Starting point is 00:50:52 of the world in a game that is deterministic or quasi deterministic, it's easy, right? Just give a few frames of the game to a confnet, put a bunch of layers, and then have the game generate the next few frames. And if the game is deterministic, it works fine. And that includes feeding the system with the action that your little character is going to take. The problem comes from the fact that the real world and most games are not entirely predictable. So there you get those blurry predictions and you can't do planning with blurry predictions.

Starting point is 00:51:30 So if you have a perfect model of the world, you can in your head run this model with a hypothesis for sequence of actions and you're going to predict the outcome of that sequence of actions. But if your model is imperfect, how can you plan? for a sequence of actions, and you're going to predict the outcome of that sequence of actions. But if your model is imperfect, how can you plan? Yeah, I quickly explode. What are your thoughts on the extension of this, which topic I'm super excited about? It's connected to something you're talking about in terms of robotics, is active learning. So as opposed to sort of completely unsupervised self-supervised learning, you ask the system for human help for selecting parts you want to annotate next. So if you think about a robot

Starting point is 00:52:16 exploring a space or a baby exploring a space or a system exploring a data set, every once in a while asking for human input. Do you see value in that kind of work? I don't see transformative value. It's going to make things that we can already do more efficient or they will learn slightly more efficiently, but it's not going to make machines significantly more intelligent.

Starting point is 00:52:40 I think, and by the way, there is no opposition, there is no conflict between self-supervised learning, reinforcement learning, and supervised learning, or imitation learning, or active learning. I see self-supervised learning as a preliminary to all of the above. Yes. So the example I use very often is how is it that so if you use classical reinforcement learning, deep reinforcement learning if you want. The best methods today so called model free reinforcement learning to learn to play Atari games, take about 80 hours of training to reach the level that any human can reach in about 15 minutes. They get better than humans, but it takes them a long time. Alpha Star, OK?

Starting point is 00:53:36 The Oreo Vini House and his teams, the system to play Starcraft plays a single map, a single type of player, and can reach better than human level with about the equivalent of 200 years of training playing against itself. It's 200 years, right? It's not something that no human can. I mean, I'm not sure what else to take away from that. Okay. Now, take those algorithms, the best our algorithms we have today, to train a car to drive itself. It would probably have to drive millions of hours. It will have to kill thousands of pedestrians. It will have to run into thousands of trees.

Starting point is 00:54:24 It will have to run off cliffs. and it had to run off cliffs multiple times before it figures out that it's about our idea, first of all, and second of all, before it figures out how not to do it. And so, I mean, this type of learning obviously does not reflect the kind of learning that animals and humans do. There is something missing that's really, really important there. And my hypothesis, which I've been advocating for like five years now, is that we have predictive models of the world that include the ability to predict under uncertainty. And what allows us to not run off a cliff when we learn to drive, most of us can learn to drive in about 20 or 30 hours

Starting point is 00:55:05 of training without ever crashing, causing any accident. If we drive next to a cliff, we know that if we turn the wheel to the right, the car is going to run off the cliff and nothing good is going to come out of this. Because we have a pretty good model of intuitive physics that tells us the car is going to fall. We know about gravity. Babies run this around the age of eight or nine months.

Starting point is 00:55:25 That objects don't float the fault. And we have a pretty good idea of the effect of turning the wheel on the car, and we need to stay on the road. So there is a lot of things that we bring to the table, which is basically our predictive model of the world. And that model allows us to not do stupid things and to basically stay within the context of things we need to do.

Starting point is 00:55:48 We still face unpredictable situations, and that's how we learn. But that allows us to learn really, really, really quickly. So that's called model-based reinforcement learning. There's some imitation and super-vegening because we have a driving instructor that tells us occasionally what to do. But most of the learning is learning the model, learning physics that we've done since we were babies. That's where almost all the learning physics is somewhat transferable from it's transferable from sin to sin. Stupid things are the same everywhere. Yeah, I mean, if you, you know, you have an experience of the world, you don't need to be

Starting point is 00:56:27 particularly from a particular intelligence species to know that if you spill water from a container, you know, the rest is going to get wet. You might get wet. So you know, cats know this, right? So the main problem we need to solve is how do we learn models of the world? That's, and that's what I'm interested in. That's what self-supervised running is all about. If you were to try to construct a benchmark for, let's, let's look at MNIST. I'd love that data set. But if you do think it's useful, interesting slash possible,

Starting point is 00:57:08 And if you do think it's useful, interesting slash possible to perform well on MNUS with just one example of each digit and how would we solve that problem? The answer is probably yes. The question is what other type of learning are you allowed to do? So if what you're allowed to do is train on some gigantic dataset of labeled digit that's called transfer learning. And we know that works. Okay. We do this at Facebook like in production, right? We train large convolutional nets to predict hashtags that people type on Instagram and we train on billions of images, literally billions. And then we chop off the last layer and fine tune on

Starting point is 00:57:40 whatever task we want. That works really well. You can beat, you know, the image network record with this. We actually open-sourced the whole thing a few weeks ago. Yeah, that's still pretty cool. But yeah, so what in yet, what would be impressive, what's useful and impressive, what kind of transfer learning would be useful and impressive?

Starting point is 00:57:57 Is it Wikipedia? That kind of thing? No, no, that's so- I don't think transfer learning is really where we should focus. We should try to do, you know, have a kind of scenario for a benchmark where you have unlabeled data. And you can, and it's very large number of unable data. It could be video clips.

Starting point is 00:58:19 It could be where you do, you know, frame prediction. It could be images where you could choose to you could choose to mask a piece of it, could be whatever, but they're unlabeled and you're not allowed to label them. So you do some training on this, and then you train on a particular supervised task, ImageNet or Enlist, and you measure how your test error or validation error

Starting point is 00:58:48 decreases as you increase a number of label training samples. And what you'd like to see is that your error decreases much faster than if you train from scratch, from random weights, so that to reach the same level of performance and a completely supervised, purely supervised system, would reach you would need way fewer samples. So that's the crucial question,

Starting point is 00:59:14 because it will answer the question to people interested in medical image analysis. If I want to get a particular level of error rate for this task, I know I need a million samples. Can I do, you know, self-supervised pre-training to reduce this to about 100 or something? Anything they answer there is self-supervised pre-training. Yeah. Some form of it. Telling you active learning, but you disagree. No, it's useless.

Starting point is 00:59:47 It's just not going to lead to a quantum leap. It's just going to make things that we already do. So your way smarter than me, I just disagree with you. But I don't have anything to back that. It's just intuition. So I work with a lot of large scale data sets and there's something that might be magic and active learning. But okay. At least I said it publicly.

Starting point is 01:00:07 At least I'm being an idiot publicly. Okay. It's not being an idiot. It's working with the data you have. I mean, certainly people are doing things like, okay, I have 3,000 hours of imitation running for a start driving car, but most of those are incredibly boring. What I like is select 10% of them that are kind of the most informative. are incredibly boring. What I like is select, you know, 10% of them that are kind of the most informative. And with just that, I would probably reach the same. So it's a weak form of active learning if you want. Yes, but there might be a much stronger version. Yeah, that's right. That's what an awesome question exists. The question is how much strong you can get. Elon Musk is is confident, talked to him recently.

Starting point is 01:00:47 He's confident that large-scale data in deep learning can solve the time of a driving problem. What are your thoughts on the limits, possibilities of deep learning in this space? Of course, it's obviously part of the solution. I mean, I don't think we'll ever have a set driving system, or it is not in the foreseeable future, that does not use deep learning. Let me put it this way. Now, how much of it?

Starting point is 01:01:08 So in the history of engineering, particularly AI-like systems, there's generally a first phase where everything is built by hand, and there is a second phase, and that was the case for autonomous driving 20 you know, 20, 30 years ago. There's a phase where there's a little bit of learning is used, but there's a lot of engineering that's involved in kind of, you know, taking care of corner cases and putting limits,

Starting point is 01:01:35 et cetera, because the learning system is not perfect. And then as technology progresses, we end up relying more and more on learning. That's the history of character recognition, so history of speech recognition, not computer vision, natural language processing. And I think the same is going to happen with the time it's driving that currently

Starting point is 01:01:54 the methods that are closest to providing some level of autonomy, some decent level of autonomy, where you don't expect a driver to kind of do anything, is where you const't expect a driver to kind of do anything, is where you constrain the world, so you only run within, you know, 100 square kilometers or square miles in Phoenix, but the weather is nice and the roads are wide, which is what WEMO is doing. You completely over-engineer the car with tons of light hours and sophisticated sensors that are too expensive for consumer cars,

Starting point is 01:02:26 but they're fine if you just run a fleet. And you engineer the hell out of everything else, you map the entire world, so you have a complete 3D model of everything. So the only thing that the perception system has to take care of is moving objects and construction and sort of things that weren't in your map. And you can engineer a good slam system or a step, right?

Starting point is 01:02:50 So that's kind of the current approach that's closest to some level of autonomy. But I think eventually the long-term solution is going to rely more and more on learning and possibly using a combination of self-supervised learning and model-based reinforcement or something like that. But ultimately learning will be not just at the core, but really the fundamental part of the system. Yeah, it already is, but it will become more and more. What do you think it takes to build a system with human level intelligence? You talked about the AI system and the Muir,

Starting point is 01:03:24 being way out of reach, our current reach. This might be outdated as well, but this is your way out of reach. What would it take to build her? Do you think? So I can tell you the first two obstacles that we have to clear, but I don't know how many obstacles they are after this. So the image I usually use is that there is a bunch of mountains that we have to climb. And we can see the first one, but we don't know how many of us took us there after this. So the image I usually use is that there is a bunch of mountains that we have to climb. And we can see the first one. But we don't know if there are 50 mountains behind the internet. And this might be a good sort of metaphor

Starting point is 01:03:52 for why AI researchers in the past have been overly optimistic about the result of a AI. For example, New Orleans Simon wrote the general problem solver, and they called it the general problem solver. Yeah, problem solver. Of course, the first thing you realize is that all the problems you want to solve are exponential. You can't actually use it for anything useful. But all you see is the first peak.

Starting point is 01:04:19 What are the first couple of peaks for her? The first peak which is precisely what I'm working on, is self-supervisioning. How do we get machines to learn models of the world by observation, kind of like babies and like young animals? So we've been working with, you know, cognitive scientists. So this Amanda DuPou, who's at Fair in Paris,

Starting point is 01:04:43 is half time is also a researcher in French University. And he has his chart that shows that which how many months of life baby humans can learn different concepts. And you can measure this in various ways. So things like distinguishing animate objects from any animate object, you can tell the difference at age two, three months. Whether an object is gonna stay stable,

Starting point is 01:05:16 it's gonna fall, you know, about four months you can tell. You know, there are various things like this. And then things like gravity, the fact that objects are not supposed to float in the air, but are supposed to fall, you run this around the edge of eight or nine months. If you look at a lot of eight-month-old babies,

Starting point is 01:05:33 you give them a bunch of toys on their high chair. First thing they do is they throw them on the ground and they look at them. It's because, you know, they're learning about actively learning about gravity. Graffiti, yeah. So they're not trying to annoy you, but they need to do the experiment, right? So how do we get machines to learn, like babies, mostly by observation, with a little

Starting point is 01:05:54 bit of interaction, and learning those models of the world? Because I think that's really a crucial piece of an intelligent autonomous system. So if you think about the architecture of an intelligent autonomous system, it needs to have a predictive model of the world. So something that says, here is a world at time T, here is a state of the world at time T plus 1 if I take this action. And it's not a single answer, it can be a contribution. Yeah, well, we don't know how to represent

Starting point is 01:06:20 distributions in high-dimensional continuous spaces. So it's got to be something weaker than that, okay? But with some more presentation of uncertainty, if you have that, then you can do what optimal control theory is called model predictive control, which means that you can run your model with a hypothesis for sequence of action

Starting point is 01:06:37 and then see the result. Now, what you need, the other thing you need is some sort of objective that you want to optimize. Am I reaching the goal of grabbing this object? Am I minimizing energy? Am I whatever? Right? So there is some sort of objectives that you have to minimize. And so in your head, if you had this model, you can figure out the sequence of action that will optimize your objective. That objective is something that ultimately is rooted in your bezel ganglia, at least in the human brain. That's what it's a bezel ganglia,

Starting point is 01:07:06 computes your level of contentment or miscontentment. I don't know if that's a word. Unhappiness, okay? Yeah, this contentment. Discontentment. Discontentment, mate. And so your entire behavior is driven towards kind of minimizing that objective,

Starting point is 01:07:22 which is maximizing your contentment, computed by your bezogangliet. And what you have is an objective function, which is basically a predictor of what your bezogangliet is gonna tell you. So you're not gonna put your hand on fire because you know it's gonna burn, and you're gonna get hurt.

Starting point is 01:07:40 And you're predicting this because of your model of the world, and your predictor of the subjective. So if you have those three components, you have the four components. You have the hard-wired contentment objective computer if you want, calculator. And then you have the three components. One is the objective predictor, which basically predicts your level contentment. One is the model of the world. And there's a third module I didn't mention, which is a module that will figure out the best course of action to optimize an objective given your model. Okay. Yeah.

Starting point is 01:08:21 Cool. It's a policy network or something like that, right? Now, you need those three components to act autonomously, intelligently. And you can be stupid in three different ways. You can be stupid because your model of the world is wrong. You can be stupid because your objective is not aligned with what you actually want to achieve. Okay.

Starting point is 01:08:44 Inhumans that would be a psychopath. And then the third thing you can be stupid is that you have the right model. You have the right objective. But you're unable to figure out a course of action to optimize your objective given your model. OK. Some people who are in charge of big countries

Starting point is 01:09:03 actually have all three that are wrong. All right. Which country is that? I don't know. OK. Some people who are in charge of big countries actually have all three that are wrong. All right. Which countries? I don't know. Okay. So, if we think about this agent, if you think about the movie, her, you've criticized the art project that is Sophia the robot.

Starting point is 01:09:22 And what that project essentially does is uses our natural inclination to anthropomorphize things that look like human and give them more. Do you think that could be used by AI systems like in the movie Her? So do you think that body is needed to create a feeling of intelligence? Well, if Sophia was just on our piece, I would have no problem with it, but it's presented as something else. Let me add in that comment real quick,

Starting point is 01:09:52 if creators of Sophia could change something about their marketing or behavior in general, what would it be? What's, I'm just about everything. I mean, don't you think, here's a tough question. Let me, so I agree with you. So Sophia is not, the general public feels that Sophia can do way more than she actually can. That's right.

Starting point is 01:10:17 And the people who created Sophia are not honestly, publicly communicating trying to teach the public. But here's a tough question. Don't you think the same thing is scientists in industry and research are taking advantage of the same misunderstanding in the public when they create AI companies or publish stuff. Some companies, yes. I mean, there is no sense of there's no desire to delude. There is no

Starting point is 01:10:52 desire to kind of over claim with something. Right. You can be sure paper on AI that has this result on ImageNet. It's pretty clear. I mean, it's not even interesting anymore. But I mean, it's not even interesting anymore, but I don't think there is that. I mean, the reviewers are generally not very forgiving of unsupported claims of this type. But there are certainly quite a few startups that have had a huge amount of hype around this that I find extremely damaging and have been calling it out when I've seen it. So yeah, but to go back to your original question, like the necessity of embodiment, I think embodiment is necessary.

Starting point is 01:11:32 I think grounding is necessary. So I don't think we're gonna get machines that really understand language without some level of grounding in the world. And it's not clear to me that language is a high enough bandwidth medium to communicate how the real world works. I think for this, we start to ground what grounding means.

Starting point is 01:11:49 So grounding means that, so there is this classic problem of common sense reasoning, you know, the Wiener grad schema, right? And so I tell you, the trophy doesn't fit in the suitcase because it's too big, or the trophy doesn't fit in the suitcase because it's too big or the trophy doesn't fit in the suitcase because it's too small And the it in the first case refers to the trophy in the second case to the suitcase and the reason you can figure this out It's because you know the trophy in the suitcase are you know one is supposed to fit in the other one and you know the notion of size And the big object doesn't fit in a small object unless it's a tardis you know things like that, right? So you have this this knowledge of father world, you know, things like that, right? So you have this knowledge of how the world works,

Starting point is 01:12:25 of geometry and things like that. I don't believe you can learn everything about the world, but just being told in language how the world works. I think you need some low-level perception of the world, you know, be it visual touch, you know, whatever, but some how you bend with perception of the world. So by reading all the world's text, you still may not have enough information. That's right.

Starting point is 01:12:49 There's a lot of things that just will never appear in text, and that you can't really infer. So I think common sense will emerge from, you know, certainly a lot of language interaction, but also with watching videos, or perhaps even interacting in virtual environments. And possibly robot interacting in the real world. But I don't actually believe necessarily that this last one is absolutely necessary. But I think there's a need for some grounding. But the final product doesn't necessarily need to be embodied.

Starting point is 01:13:20 No, you're saying. It just needs to have an awareness grounding. Right. But it needs to know how the world works to have, you know, to not be frustrated, frustrating to talk to. And you talked about emotions being important. That's the whole other topic. Well, so, you know, I talked about this, the, the Bezul Ganglia, Ganglia as the, I talked about this, the Bezul Ganglia, Ganglia as the thing that calculates your level of mixed-constantment contentment. And then there is this other module that sort of tries to do a prediction of

Starting point is 01:13:54 whether you're gonna be content or not. That's the source of some emotion. So fear, for example, is an anticipation of bad things that can happen to you, right? You have this inkling that there is some chance that something really bad is going to happen to you and that creates fear. When you know for sure that something bad is going to happen

Starting point is 01:14:11 to you, you kind of give up, right? It's not there anymore. It's uncertainty that creates fear. So the punchline is, we're not going to have autonomous intelligence without emotions. Whatever the heck emotions are, do you mentioned very practical things of fear, but there's a lot of other mess around it. But there are the results of drives. Yeah,

Starting point is 01:14:34 there's deeper biological stuff going on. And I've talked to a few folks on this. There's fascinating stuff that ultimately connects to our brain. If we create an AGI system, human level intelligence system, and you get to ask her one question, what would that question be? You know, I think the first one will create, will probably not be that smart. They'll be like a four year old. Okay. will probably not be that smart. They'll be like a four-year-old. So you would have to ask her a question to know she's not that smart.

Starting point is 01:15:09 Yeah. Well, what's a good question to ask, to be as good as a person? To be as good as a person. To be as good as a person. And if she answers, oh, it's because the leaves of the tree are moving and they create a wind, She's onto something. And she says, that's a stupid question. She's really onto something. No. And then you tell her, actually,

Starting point is 01:15:32 you know, here is the real thing. And she says, Oh, yeah, that makes sense. So questions that reveal the ability to do common sense reasoning about the physical world. Yeah, and you'll summon up a little call to any of your friends. Call to any of your friends. Well, it was a huge honor. Congratulations on your turning award. Thank you so much for talking today. Thank you.

Lex Fridman Podcast - Yann LeCun: Deep Learning, Convolutional Neural Networks, and Self-Supervised Learning

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.