Lex Fridman Podcast - Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators

Starting point is 00:00:00 The following is a conversation with Chris Latner. Currently, he's a senior director at Google, working on several projects, including CPU, GPU, TPU Accelerators for TensorFlow, Swift for TensorFlow, and all kinds of machine learning compiler magic going on behind the scenes. He's one of the top experts in the world on compiler technologies, which means he deeply understands

Starting point is 00:00:22 the intricacies of how hardware and software come together to create efficient code. He created the LLVM compiler infrastructure project and the Clang compiler. He led major engineering efforts at Apple, including the creation of the Swift programming language. He also briefly spent time at Tesla as vice president of Autopilot's software during the transition from autopilot hardware 1 to hardware 2 when Tesla essentially started from scratch to build an in-house software infrastructure for autopilot. I could have easily talked to Chris for many more hours.

Starting point is 00:00:58 Compiling code down across the levels of abstraction is one of the most fundamental and fascinating aspects of what computers do and he is one of the world fundamental and fascinating aspects of what computers do, and he is one of the world experts in this process. It's rigorous science, and it's messy, beautiful art. This conversation is part of the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter. At Lex Friedman spelled F-R-I-D. And now, here's my conversation with Chris What was the first program you've ever written?

Starting point is 00:01:49 My first program back and what was it? I think I started as a kid and my parents got a basic programming book. So when I started it was typing out programs from a book and seeing how they worked and then typing them in wrong and trying to figure out why they were not working right. That kind of stuff. So basic, what was the first language that you remember yourself maybe falling in love with, like really connecting with?

Starting point is 00:02:16 I don't know, I mean, I feel like I've learned a lot along the way and each of them have a different special thing about them. So I started on basic and then went like GW basic, which was the thing back in the DOS days, and then upgraded to Q basic and eventually quick basic, which are all slightly more fancy versions of Microsoft basic, made the jump to Pascal and started doing machine language programming

Starting point is 00:02:40 in assembly in Pascal, which was really cool. Thrubo Pascal was amazing for its day. Eventually, gone to CAC++ and then did lots of other weird things. I feel like you took the dark path, which is the... You could have gone Lisp. Yeah. You could have gone higher level, sort of functional philosophical, hippie route.

Starting point is 00:03:01 Instead, you went into like the dark arts of the sea straight straight in the machine Straight to the ice so started with basic Pascal and then assembly and then wrote a lot of assembly and Why eventually eventually did small talk and other things like that? But that was not the starting point, but so what What is this journey to see is that in high school? Is that in college? That was in high school. is that in college? That was in high school. Yeah. And then that was really about trying to be able to do more powerful things than what

Starting point is 00:03:32 Pascal could do and also to learn a different world. So he was really confusing me with pointers and the syntax and everything and it took a while. But Pascal's much more principled in various ways. See, it has its historical roots, but it's not as easy to learn. With pointers, there's this memory management thing that you have to become conscious of. Is that the first time you start to understand that there's resources that you're

Starting point is 00:04:02 supposed to manage? Well, so you have that in Pascal as well, but in Pascal, these, like, the carrot instead of the star, and there's some small differences like that, but it's not about pointer arithmetic. And see, you end up thinking about how things get laid out in memory a lot more. And so in Pascal, you have allocating and deallocating and owning the memory, but just the programs are simpler and you don't have to Well, for example, Pascal has a string type and so you can think about a string instead of an array of characters

Starting point is 00:04:32 Which are consecutive in memory? Yeah, so it's a little bit of a higher level abstraction. So Let's get into it. Let's talk about LLVM C-LANG and compilers. Sure Let's talk about LLVM, C-Lang and Compilers. Sure. So can you tell me first what LLVM and C-Lang are? And how is it that you find yourself, the creator, and lead developer, one of the most powerful compiler optimization systems in use today? Sure.

Starting point is 00:04:57 So I guess they're different things. So let's start with what is a compiler? Is that a good place to start? What are the phases of a compiler? Where are the parts? Yeah, what is it? So what is even a compiler used for? So the way the way I look at this is you have a two-sided problem of you have humans that need to write code and then you have machines that need to run the program that the human wrote. And for lots of reasons, the humans don't want to be writing in binary and want to think about every piece of hardware. And so at the same time that you have lots of humans, you also have lots of kinds of hardware.

Starting point is 00:05:31 And so compilers are the art of allowing humans to think at a level of abstraction that they want to think about. And then get that program, get the thing that they wrote to run on a specific piece of hardware. And the interesting and exciting part of all this is that there's now lots of different kinds of hardware, chips like X86 and PowerPC and ARM and things like that, but also high performance accelerators from machine learning and other things like that, or also just different kinds of hardware, GPUs, these are new kinds of hardware. And at the same time, on the programming side of it, you have basic, you have C, you have JavaScript, you have Python, you have Swift,

Starting point is 00:06:07 you have lots of other languages that are all trying to talk to the human in a different way to make them more expressive, incapable, and powerful. And so, some pilots are the thing that goes from one to the other, no. And to end, from the very beginning to the end, to end, and so you go from what the human wrote and programming languages end up being about expressing intent not just for the compiler and the hardware,

Starting point is 00:06:33 but the programming languages job is related to to capture an expression of what the programmer wanted that then can be maintained and adapted and evolved by other humans as well as interpreted by the compiler. So when you look at this problem, you have on the one hand humans, which are complicated, you have hardware which is complicated. And so, compilers typically work in multiple phases. And so, the software engineering challenge that you have here is try to get maximum reuse out of the amount of code

Starting point is 00:07:03 that you write because these compilers are very complicated. And so the way it typically works out is that you have something called a front end or a parser that is language specific. And so you have a C parser, and that's what clang is, or C++ or JavaScript or Python or whatever, that's the front end. Then you'll have a middle part, which is often the optimizer, and then you'll have a late part, which is hardware specific.

Starting point is 00:07:30 And so compilers end up, there's many different layers often, but these three big groups are very common in compilers. And what LLVM is trying to do, is trying to standardize that middle and last part. And so one of the cool things about LLVM is that there are a lot of different languages that compile and last part. And so one of the cool things about LVM is that there are a lot of different languages that compile through to it.

Starting point is 00:07:48 And so things like Swift, but also Julia, Rust, Clang for CAC++, Subjective C, like these are all very different languages and they can all use the same optimization infrastructure, which gets better performance and the same cogeneration infrastructure for hardware support. And so, LVM is really that layer that is common, that all these different specific compilers can use.

Starting point is 00:08:12 And is it a standard, like a specification, or is it literally an implementation? It's an implementation. And so, it's, I think there's a couple of different ways of looking at it, right? Because it depends on which angle you're looking at it from. LVM ends up being a bunch of code. Okay, so it's a bunch of code that people reuse and they build a compiler with. We call it a compiler infrastructure because it's kind of the underlying platform that you build a concrete compiler on top of.

Starting point is 00:08:39 But it's also a community. And the LVM community is hundreds of people that all collaborate and one of the most fascinating things about LVM over the course of time is that we've managed somehow to successfully get harsh competitors in the commercial space to collaborate on shared infrastructure. And so you have Google and Apple, you have AMD and Intel, you have Nvidia and AMD on the graphics side, you have Kray and everybody else doing these things. And like all these companies are collaborating together to make that shared infrastructure

Starting point is 00:09:14 really, really great. And they do this, not other good, but they do it because it's in their commercial interest of having really great infrastructure that they can build on top of and facing the reality that it's so expensive that no one company, even the big company, is no one company really wants to implement it all themselves. Expensive or difficult? Both. That's a great point because it's also about the skill sets, right? And these, the skill sets are very hard to find.

Starting point is 00:09:43 It's a lot bigger with open source projects, the kind, and LVM is open source. Yes, it's open source, it's about 19 years old now, so it's fairly old. It seems like the magic often happens within a very small circle of people, at least the early birth and whatever. Yes. LVM came from a university project And so I was at the University of Illinois and there it was myself my advisor and then a team of two or three research Students in the research group and we built many the core pieces initially

Starting point is 00:10:18 I then graduated and went to Apple and in Apple brought it it to the products. First in the OpenGL graphics stack, but eventually to the Seacompiler ROM and eventually built Clang and eventually built Swift in these things, along the way building a team of people that are really amazing compiler engineers that help build a lot of that. And so as it was gaining momentum and as Apple was using it,

Starting point is 00:10:40 being open source in public and encouraging contribution, many others, for example, at Google, came in and started contributing. And in some cases, Google effectively owns Clang now, because it cares so much about C++ and the evolution of that ecosystem. And so it's investing a lot in the C++ world and the tooling and things like that.

Starting point is 00:11:00 And so likewise, Nvidia cares a lot about CUDA. And so CUDA uses Clang and uses LVM for graphics and GPU. And so when you first started as a master's project, I guess, did you think it's gonna go as far as it went, were you crazy ambitious about it? No, it seems like a really difficult undertaking, a brave one. Yeah, no, it was nothing like that.

Starting point is 00:11:28 So, I mean, my goal when I went to University of Illinois was to get in and out with the non-thesis masters in a year and get back to work. So, I was not planning to stay for five years and build this massive infrastructure. I got nerds sniped into staying. And a lot of it was because LVM was fun. I was building cool stuff and learning really interesting things and facing both software engineering challenges but also learning how to work in a team and things like that.

Starting point is 00:11:57 I had worked at many companies as interns before that, but it was really a different thing to have a team of people that were working together and trying to collaborate in version control. And it was just a different, a different thing to have a team of people that are working together and Trying to collaborate in version control and it was it was just a little bit different Like I said, I just talked to Don Canuth and he believes that 2% of the world population Have something weird with their brain that their geeks they understand computers are connected with computer He put it exactly 2% okay, so it's a specific guy. It's very specific. He says I can't prove it, but it's very empirically there. Is there something that attracts you to the idea of optimizing code?

Starting point is 00:12:33 And he seems like that's one of the biggest, coolest things about LVS. Yeah, and that's one of the major things it does. So I got into that because of a person, actually. So when I was in my undergraduate, I had an advisor or a professor named Steve Vecto. And he, I went to this little tiny private school. There were like seven or nine people in my computer science department, students in my class. So it was a very tiny, very small school. It was kind of a work on the side of the math department,

Starting point is 00:13:07 kind of a thing at the time. I think it's evolved a lot in the many years since then, but Steve Vegdahl was a compiler guy, and he was super passionate. And he, his passion rubbed off on me, and one of the things I like about compilers is that they're large, complicated software pieces. And so one of the culminating classes

Starting point is 00:13:30 that many computer science departments, at least at the time, did was to say that you would take algorithms and data structures in all these core classes, but then the compilers class was one of the last classes you take because it pulls everything together. And then you work on one piece of code over the entire semester.

Starting point is 00:13:45 And so you keep building on your own work, which is really interesting. It's also very challenging because in many classes, if you don't get a project done, you just forget about it and move on to the next one and get your B or whatever it is. But here you have to live with the decisions you make and continue to reinvest in it. And I really like that. And so I did an extra study project with him the following semester and he was just really great. And he was also a great mentor in a lot of ways.

Starting point is 00:14:14 And so from him and from his advice, he encouraged me to go to graduate school. I was on super excited about going to grad school. I wanted the master's degree, but I didn't want to be an academic. But like I said, I kind of got tricked into saying and was having a lot of fun. And I definitely do not regret it. Well, the aspects of compilers were the things you connect it with. So, LVM, there's also

Starting point is 00:14:37 the other part that's just really interesting, if you're interested in languages, is parsing and just analyzing, like, yeah, analyzing language, breaking it out, parsing and just analyzing like yeah, analyzing language breaking it out parsing and so on. Was that interesting to you or you more interested in optimization? For me it was more so I'm not really a math person. I could do math. I understand some bits of it when I get into it, but math is never the thing that attracted me. And so a lot of the parser part of the compiler has a lot of good formal theories that Don, for example, knows quite well still waiting for his book on that. But the, but I just like building a thing and, and seeing what it could do and exploring and getting it to do more things and then saying new goals and reaching for them. And, and

Starting point is 00:15:21 with, in the case of, in the case of LVM when I started working on that, my research advisor that I was working for was a compiler guy. And so I specifically found each other because we were both interesting compilers. And so I started working with them and taking his class. And a lot of LVM initially was, it's fun implementing all the standard algorithms and all the things that people had been talking about and were well known. And they were in the curricula for advanced studies and compilers. And so just being able to build that was really fun.

Starting point is 00:15:51 And I was learning a lot by instead of reading about it, just building. And so I enjoyed that. So you said compilers are these complicated systems. Can you even just with language try to describe, you know, how you turn a C++ program into code? Like, what are the hard parts? Why is it so hard? So I'll give you examples of the hard parts a little way. So C++ is a very complicated programming language, it's something like 1400 pages in the spec. So C++ by itself is crazy complicated. Can you just start to pause what makes the language complicated in terms of what's syntactically? Like us. So it's what they call syntax. So the actual how the characters are arranged.

Starting point is 00:16:33 Yes, it's also semantics, how it behaves. It's also in the case of C++, there's a huge amount of history. C++ build on top of C. You play that forward and then a bunch of suboptimal in some cases decisions were made and they compound and then more and more and more things keep getting added to C++ and it will probably never stop. But the language is very complicated from that perspective and, the CEC plus plus compiler that I built, I and many people built, one of the challenges we took on was we looked at GCC. Okay, GCC at the time was like a really good industry standardized compiler that had really consolidated a lot of the other compilers in the world and was a standard. But it wasn't really great for research. The design was very difficult to work with.

Starting point is 00:17:29 And it was full of global variables and other things that made it very difficult to reuse in ways that it wasn't originally designed for. And so with Clang, one of the things that we wanted to do is push forward on better user interface so make error messages that are just better than GCCs. And that's actually hard because you have to do a lot of bookkeeping in an efficient way to be able to do that.

Starting point is 00:17:50 We want to make compile time better. And so compile time is about making it efficient, which is also really hard when you're keeping track of extra information. We wanted to make new tools available. So refactoring tools and other analysis tools that the GCC never supported, also leveraging the extra information we kept, but enabling those new classes of tools that then get built

Starting point is 00:18:11 into IDEs. And so that's been one of the areas that Clang has really helped push the world forward in, is in the tooling for C and C++ and things like that. But C++ and the front-end piece is complicated and you have to build syntax trees and you have to check every rule in the spec and you have to turn that back into an error message to the human that the human can understand when they do something wrong.

Starting point is 00:18:35 But then you start doing what's called lowering. So going from C++ in the way that it represents code down to the machine. And when you do that, there's many different phases you go through. Often there are, I think, elevium is something like 150 different, what are called passes in the compiler, but the code passes through.

Starting point is 00:18:55 And these get organized in very complicated ways, which affect the generated code in performance and compile time and many other things. What are they passing through? So after you do the the client parsing, what's the, what's the, is it graph? What is it look like? What's the data structure here?

Starting point is 00:19:13 Yeah. So in the parser, it's usually a tree. And it's called an abstract syntax tree. And so the idea is you, you have a node for the plus that the human wrote in their code or the function call, you'll have a node for call with the function that they call and the arguments they pass, things like that. This then gets lowered into what's called an intermediate representation. And intermediate representations are like LVM has one.

Starting point is 00:19:39 And there it's a, it's what's called a control flow graph. And so you represent each operation in the program as a very simple, like this is going to add to numbers. This is going to multiply two things. This may be we'll do a call, but then they get put in what are called blocks. And so you get blocks of these straight line operations, or instead of being nested like in a tree, it's straight line operations.

Starting point is 00:20:04 And so there's a sequence in ordering to these operations. And then the block, or outside the block. That's within the block. And so it's a straight line sequence of operations within the block, and then you have branches, like conditional branches, between blocks. And so when you write a loop, for example, in a syntax tree, you would have a four node,

Starting point is 00:20:24 like for a four statement in a C-like language, you'd have a four node, like for a four statement, and a C-like language, you'd have a four node, and you have a pointer to the expression for the initializer, a pointer to the expression for the increment, a pointer to the expression for the comparison, a pointer to the body. Okay, and these are all nested underneath it. In a control flow graph, you get a block for the code that runs before the loop, so the initializer code, then you have a block for the body of the loop, and so the body of the loop code goes in there, but also the

Starting point is 00:20:51 increment and other things like that, and then you have a branch that goes back to the top, and a comparison and a branch that goes out. And so it's more of an assembly level kind of representation, but the nice thing about this level of representation is it's much more language independent. And so there's lots of different kinds of languages with different kinds of, you know, JavaScript has a lot of different ideas of what is false, for example, and all that can stand the front end, but then that middle part can be shared across all those. How close is that intermediate representation to a Neon Networks, for example?

Starting point is 00:21:27 Is everything described as a kind of echoes of Neon Network Graph? Are they neighbors or what? They're quite different in details with a very similar idea. So one of the things that Neon Networks do is they learn representations for data at different levels of abstraction, and then they transform those through layers. So the compiler does very similar things, but one of the things the compiler does is it has relatively few different representations where a neural network often, as you get deeper, for example,

Starting point is 00:22:00 you get many different representations in each layer or set of ops is transforming between these different representations. In compiler often you get one representation and they do many transformations to it. These transformations are often applied iteratively. For programmers there are familiar types of things. For example, trying to find expressions inside of a loop and pulling them out of a loop. So they execute for your time. So we find redundant computation or find constant folding or other simplifications

Starting point is 00:22:32 turning two times x into x shift left by one and things like this are all the examples of the things that happen. But compilers end up getting a lot of theorem proving and other kinds of algorithms that try to find higher level properties of the program that then can be used by the optimizer. Cool. So what's like the biggest bang for the buck with optimization? What's today? Yeah. Well, no, not even today at the very beginning the 80s, a lot of it was things like register allocation. So the idea of in a modern like a microprocessor, what you'll end up having is you'll end up having memory, which is relatively slow. And then you have registers, relatively fast, but registers, you don't have room, any of them. Okay.

Starting point is 00:23:17 And so when you're writing a bunch of code, you're just saying like compute this, put it in temporary variable, compute this, compute this, compute this, put in temporary variable, I have a loop, I have some other stuff going on. Well, now you're running on an x86, like a desktop PC or something. Well, it only has, in some cases, some modes, eight registers. Right. And so now the compiler has to choose what values get put in what registers at what points in the program. And this is actually a really big deal. So if you think about you have a loop, and in a loop that executes millions of times maybe,

Starting point is 00:23:48 if you're doing loads and stores inside that loop, then it's gonna be really slow. But if you can somehow fit all the values inside that loop and registers, now it's really fast. And so getting that right requires a lot of work because there's many different ways to do that. And often what the compiler ends up doing

Starting point is 00:24:04 is it ends up thinking about things in a different representation than what the human wrote. Right, you wrote into X. Well, the compiler thinks about that as four different values, each which have different lifetimes across the function that it's in and each of those could be put in a register or memory or different memory or maybe in some parts of the code re-computed instead of stored and reloaded. And there are many of these different kinds of techniques that can be used. So it's adding almost like a time dimension.

Starting point is 00:24:32 It's trying to optimize across time. So it's considering when you're programming, you're not thinking in that. Yeah, absolutely. And so the risk error made things. So risk chips, RISC, the risk, risk chips as opposed to SISC chips, the risk chips made things more complicated for the compiler because what they ended up doing is ending up adding pipelines

Starting point is 00:24:58 to the processor where the processor can do more than one thing at a time. But this means that the order of operations matters a lot. And so one of the classical compiler techniques that you use is called scheduling. And so moving the instructions around so that the processes can like keep its pipelines full instead of stalling and getting blocked. And so there's a lot of things like that

Starting point is 00:25:18 that are kind of bread and butter compiler techniques who've been studied a lot over the course of decades now. But the engineering side of making them real is also still quite hard. And you talk about machine learning, this is a huge opportunity for machine learning because many of these algorithms are full of these like hockey hand rolled heuristics, which work well on specific benchmarks, but don't generalize and full of magic numbers. And, you know, I hear there's some techniques that are good at handling that. So what would be the, if you were to apply machine learning

Starting point is 00:25:49 to this, what's the thing you try to optimize? Is it ultimately the running time? Yeah, yeah. You can pick your metric and there's running time, there's memory use, there's lots of different things that you can optimize for. Code size is another one that some people care about in the embedded space.

Starting point is 00:26:05 Is this like the thinking into the future or somebody actually been crazy enough to try to have machine learning based parameter tuning for optimization of compilers? So this is something that is, I would say research right now. There are a lot of research systems that have been applying search in various forums and using reinforcement learning is one forum, but also brute-force search has been tried for a quite a while. And usually these are in small, small problem spaces.

Starting point is 00:26:35 So find the optimal way to co-generate a matrix multiply for a GPU, something like that, where you say, there's a lot of design space of, do you unroll loops a lot? Do you execute multiple things in parallel? And there's many different confounding factors here because graphics cards have different numbers of threads and registers and execution ports and memory bandwidth and many different constraints interact in non-linear ways. And so search is very powerful for that. And it gets used in certain ways, but it's not very structured. This is something that we need. We as an industry need to fix.

Starting point is 00:27:11 So you said 80s, but like so have there been like big jumps in improvement and optimization? Yeah. Yeah, yeah, since then what's... Yeah, so it's largely been driven by hardware so heartwell hardware and software so in the mid 90s Java totally changed the world right and and I'm still amazed by how much change was introduced by good way or in a good way so like reflecting back Java introduced things like it all at once introduced things like jit compilation I none of these were novel but it pulled it together and made it mainstream and and made people invest in it jit compilation. None of these were novel, but it pulled it together and made it mainstream and made people invest in it.

Starting point is 00:27:47 Jit compilation, garbage collection, portable code, safe code, like memory safe code, like a very dynamic dispatched execution model. Like many of these things, which had been done in research systems and had been done in small ways in various places, really came to the forefront and really changed how things worked and therefore changed the way people thought about the problem. JavaScript was another major world change based on the way it works.

Starting point is 00:28:16 But also on the hardware side of things, multi core and vector instructions really change the problem space and are very, they don't remove any of the problems that the competitors face in the past, but they add new kinds of problems of how do you find enough work to keep a four-wide vector busy? Or if you're doing a matrix multiplication, how do you do different columns add that matrix at the same time and how do you maximally utilize the arithmetic compute that one core has, and then how do you take it to multiple

Starting point is 00:28:49 cores? How did the whole virtual machine thing change the compilation pipeline? Yeah, so what the Java virtual machine does is it splits just like I was talking about before, where you have a front end that parses the code, and then you have an intermediate representation that gets transformed. What Java did was they said, we will parse the code and then compile to what's known as Java bytecode. That bytecode is now a portable code representation that is industry standard and locked down and can't change.

Starting point is 00:29:19 Then the back part of the compiler, the desoptionization and code generation, can now be built by different vendors. And Java bytecode can be shipped around across the wire. It's memory safe and relatively trusted. And because of that, it can run in the browser. And that's why it runs in the browser.

Starting point is 00:29:37 And so that way, you can be in, again, back in the day, you would write a Java applet, and you use it as a web developer. You'd build this mini app that use it as a web developer. You build this mini app that would run on a web page. Well, a user of that is running a web browser on their computer. You download that Java bytecode, which can be trusted, and then you do all the compiler stuff on your machine. So you know that you trust that. That was a good idea, a bad idea.

Starting point is 00:30:02 It's a great idea. It's great idea for certain problems. I'm very much believe that the technology is itself neither good nor bad. It's how you apply it. You know, this would be a very, very bad thing for very low levels of the software stack, but in terms of solving some of these software portability and transparency or portability problems,

Starting point is 00:30:20 I think it's been really good. Now Java ultimately didn't win out on the desktop and like there are good reasons for that, but it's been really good. Now Java ultimately didn't win out on the desktop, and they're good reasons for that, but it's been very successful on servers and in many places, it's been a very successful thing over decades. So what has been LLVMs and C-Lings improvements in optimization that throughout its history.

Starting point is 00:30:46 What are some moments we had set back and really proud of? Was it been accomplished? Yeah, I think that the interesting thing about LVM is not the innovations in compiler research. It has very good implementations of very important algorithms, no doubt. And a lot of really smart people have worked on it. But I think that the thing that's most profound about LVM is that through standardization, it made things possible that otherwise one would have happened.

Starting point is 00:31:13 And so interesting things that have happened with LVM, for example, Sony has picked up LVM and used it to do all the graphics compilation in their movie production pipeline. And so now they're able to have better special effects because Vell of Yen. That's kind of cool. That's not what it was designed for. Right. But that's the sign of good infrastructure when it can be used in ways that was never designed for because it has good layering and software engineering and it's composable and things like that. Just where, as you said, it differs from GCC. Yes, GCC is also great in various ways,

Starting point is 00:31:45 but it's not as good as an infrastructure technology. It's really a C compiler, or it's a four-trank compiler. It's not infrastructure in the same way. Now, you can tell, I don't know what I'm talking about because I keep saying C-lang. You could always tell. When a person is closed, by the way, but not something.

Starting point is 00:32:06 I don't think have I ever used Klang? I'm entirely possible. Have you, well, so you've used code, it's generated probably. So Klang is an L of Amr used to compile all the apps on the iPhone effectively and the OS is it compiles Google's production server applications. It's used to build like game cube games and PlayStation 4 and things like that. So the user I have, but just everything I've done that I experienced with Linux has been, I believe, always GCC. Yeah, I think Linux still defaults to GCC. And is there a reason for that? Or is it big? I mean,

Starting point is 00:32:45 is there a combination of technical and social reasons? Many, many likes developers do use do use Clang. But the distributions for lots of reasons of use GCC historically and they've not switched. Yeah. Because it's just anecdotally online, it seems that LLVM has either reached the level GCC or supersede on different features or whatever. The way I would say it is that they're so close, it doesn't matter. Yeah, exactly. Like they're slightly better in some way, slightly worse than otherwise, but it doesn't actually really matter anymore at that level. So in terms of optimization, breakthroughs has just been solid in commercial work.

Starting point is 00:33:26 Yeah, yeah, which is, which describes a lot of compilers. The hard thing about compilers, in my experience, is the engineering, the software engineering, making it so that you can have hundreds of people collaborating on really detailed low level work and scaling that. And that's, that's really hard. and that's one of the things I think LVM has done well and that kind of goes back to the original design goals with it to be modular and things like that.

Starting point is 00:33:54 And incidentally I don't want to take all the credit for this, right? I mean some of the best parts about LVM is that it was designed to be modular and when I started I would write for example a register allocator and then some may much smarter than me would come in and pull it out and replace it with something else that they would come up with. And because it's modular, they were able to do that. And that's one of the challenges with what GCC, for example, is replacing subsystems is incredibly difficult. It can be done, but it wasn't designed for that. And that's one of the reasons the LVM has been very successful in the research world as well. But in the community sense, Guidovan Rasm right from Python, just retired from what is it? Benevolent dictated for life, right? So in managing this community of brilliant

Starting point is 00:34:40 compiler folks, is there, did it for a time at least fall on you to approve things? Oh yeah, so I mean I still have something I can order a magnitude more patches in LVM than anybody else. And many of those I wrote myself. But you still write, I mean you still you still close to the, I don't know what the expression is, to the metal, you still write, I mean, you still close to the, to the, I don't know what the expression is, to the metal, you still write co, you still write. Not as much as I was able to in grad school, but that's an important part of my identity.

Starting point is 00:35:14 But the way that Elvian has worked over time is that when I was a grad student, I could do all the work and steer everything and review every patch and make sure everything was done exactly the way my opinionated sense felt like it should be done and that was fine but as things scale you can't do that right and so what ends up happening is LVM has a hierarchical system of what's called code owners these code owners are given their responsibility not to do all the work not necessarily to review all the

Starting point is 00:35:43 patches but to make sure that the patches do get reviewed and make sure that the right things happening architecturally in their area. And so what you'll see is you'll see that, for example, hardware manufacturers end up owning the hardware-specific parts of their hardware. That's very common. Leaders in the community that have done really good work naturally become the de facto owner of something and then usually somebody else is like, how about we make them the official code owner and then and then we'll have somebody to make sure that all patches get reviewed in a timely manner and then everybody's like,

Starting point is 00:36:18 yes, that's obvious. And then it happens, right? And usually this is a very organic thing, which is great. And so I'm nominally the top of that stack still but I don't spend a lot of time reviewing patches. What I do is I help negotiate a lot of the technical disagreements that end up happening and making sure that the community is a whole mix progress and is moving in the right direction and doing that. So we also started a non-profit six years ago, seven years ago, times gone away. And the nonprofit, the the LVM Foundation nonprofit helps oversee all the business sides of things and make sure that the events that the LVM community has are funded and set up and run

Starting point is 00:36:58 correctly and stuff like that. But the foundation is very much stays out of the technical side of where where the project is going. Right. It sounds like a lot of it is just organic. Just, yeah. Well, and this is LVM is almost 20 years old, which is hard to believe. Somebody pointed out to me recently that LVM is now older than GCC was when LVM started. Right.

Starting point is 00:37:21 So time has a way of getting away from you. But the good thing about that is it has a really robust, really amazing community of people that are in their professional lives spread across lots of different companies, but it's a community of people that are interested in similar kinds of problems and have been working together effectively for years and have a lot of trust and respect for each other, and even if they don't always agree that we're able to find a path forward. So then in a slightly different flavor of effort you started Apple in 2005 with the task of making, I guess, LLVM production ready.

Starting point is 00:37:59 And then eventually 2013, 30,000, 17 leading the entire developer tools department. We're talking about LLVM Xcode objective C to Swift. So in a quick overview of your time there, what were the challenges? First of all, leading such a huge group of developers. What was the big motivator dream mission behind creating Swift, the early birth of it from Objective C and so on and Xcode. So these are different questions. I know. But I'll stay on the technical side, then we can talk about the big team pieces.

Starting point is 00:38:40 That's okay. So it's to really oversimplify many years of hardware. LVM started joined Apple, became a thing, became successful and became deployed. But then there's a question about how do we actually parse the source code? So LVM is that back part, the optimizer and the code generator. And LVM is really good for Apple as it went through a couple of hardware transitions. I joined right at the time of the Intel transition, for example, and 64 bit transitions and then the transition to ARM with the iPhone. So LVM is very useful for some of these kinds of things. But at the same time, there's a lot of questions around developer experience.

Starting point is 00:39:17 So if you're a programmer pounding out at the time of Objective-Sea Code, the error message you get, the compile time, the turnaround cycle, the tooling and the IDE were not great, were not as good as they could be. And so, as I occasionally do, I'm like, well, okay, how hard is it to try to see compiler? And so, I'm not going to commit to anybody, I'm not going to tell anybody, I'm just going to just do it on nice and weekends and start working on it. And then, you know, I built up and see there's a thing called the pre-processor which people don't like, but it's actually really hard and complicated and includes a bunch of really weird things like try graphs and other stuff like that that are really nasty and it's the

Starting point is 00:39:58 crux of a bunch of the performance issues in the compiler. Start working on the parser and kind of got to the point where I'm like, oh, you know what, we could actually do this. It's everybody saying that this is impossible to do, but it's actually just hard, it's not impossible. And eventually told my manager about it, and he's like, oh, wow, this is great, we do need to solve this problem.

Starting point is 00:40:17 Oh, this is great, we can like get you one other person to work with you on this, you know? And slowly a team is formed and it starts taking off. And C++, for example, a huge complicated language. People always assume that it's impossible to implement. And it's very nearly impossible, but it's just really, really hard. And the way to get there is to build it

Starting point is 00:40:38 one piece of the time incrementally. And that was only possible because we were lucky to hire some really exceptional engineers that knew various parts of it very well and could do great things. Swift was kind of a similar thing. So Swift came from, we were just finishing off the first version of C++ support in Clang. And C++ is a very formidable and very important language, but it's also ugly in lots of ways. And you can't influence C++ without thinking there has to be a better thing, right? And so I started working on Swift again

Starting point is 00:41:13 with no hope or ambition that would go anywhere. Just let's see what could be done. Let's play around with this thing. It was me and my spare time, not telling anybody about it, kind of a thing, and it made some good progress. I'm like, actually it would make sense to do this. At the same time, I started talking with the senior VP of software at the time, a guy

Starting point is 00:41:33 named Bertrand Sirelé, and Bertrand was very encouraging. He was like, well, you know, let's have fun. Let's talk about this, and he was a little bit of a language guy, and so he helped guide some of the early work, and encouraged me and got things off the ground and eventually told my manager and told other people and it started making progress. The complicating thing was Swift was that the idea of doing a new language is not obvious to anybody including including myself.

Starting point is 00:42:08 And the tone at the time was that the iPhone was successful because of objective C. Right. Oh, it was interesting. In objective C. In objective C. Right. And you have to understand that at the time, Apple was hiring software people that loved objective C.

Starting point is 00:42:22 Right. And it wasn't that they came despite objective C. They loved objective C and that's whyive-C. Right? And it wasn't that they came despite Objective-C. They loved Objective-C and that's why they got hired. And so, you had a software team that the leadership in, in many cases, went all the way back to Next, where Objective-C really became real. And so, quote-unquote grew up writing Objective-C. And many of the individual engineers all were hired because they loved Objective C. And so this notion of, okay, let's do a new language, it was kind of heretical in many ways, right?

Starting point is 00:42:51 Meanwhile, my sense was that the outside community wasn't really in love with Objective C. Some people were, and some of the most outspoken people were, but other people were hitting challenges because it has very sharp corners and it's difficult to learn. And so one of the challenges of making Swift happen that was totally non-technical is the social part of what do we do?

Starting point is 00:43:14 Like if we do a new language, which it Apple, many things happen that don't ship, right? So if we ship it, what is the metrics of success? Why would we do this? Why wouldn't we make Objective C better? If Object C has problems, let's file off those rough corners and edges. And one of the major things that became the reason

Starting point is 00:43:32 to do this was this notion of safety, memory safety. And the way Objective C works is that a lot of the object system and everything else is built on top of pointers and C. Objective C is an extension on top of C. And so pointers are unsafe. And if you get rid of the pointers, it's not objective C anymore. And so fundamentally, that was an issue that you could not fix safety or memory safety without fundamentally changing the language. And so once we got through that part of the mental process and the thought process, it became a design process of saying, okay, well, if we're going to do something new,

Starting point is 00:44:12 what is good? Like, how do we think about this? And what do we like? And what are we looking for? And that was a very different phase of it. So what are some design choices early on as Swift? Like talking about braces? Are you making a type language or not all those kinds of things? Yeah, so some of those were obvious given the context so a type language for example objective

Starting point is 00:44:35 sees a type language and going with an untype language Wasn't really seriously considered we wanted we wanted the performance and we wanted refactoring tools and other things like that to go with type of languages. Quick dumb question. Was it obvious? I think this would be a dumb question. But was it obvious that the language has to be a compiled language? Yes, that's not a dumb question. Earlier, I think late 90s, Apple is seriously considered moving its development experience to Java. But Swift started in 2010, which was several years after the iPhone, and the iPhone was definitely on an upper trajectory, and the iPhone was still extremely and is still a bit memory constrained. Right. And so being able to compile the code and then ship it, and then having

Starting point is 00:45:25 able to compile the code and then ship it and then having standalone code that is not jick compiled is a very big deal and is very much part of the Apple value system. Now, JavaScript is also a thing, right? I mean, it's not that this is exclusive and technologies are good depending on how they're applied, right? But in the design of Swift, saying like, how can we make Objective-C better, right? Objective-C is statically compiled, and that was the contiguous natural thing to do. Just skip ahead a little bit.

Starting point is 00:45:51 Now, right back, just as a question, as you think about today in 2019, in your work at Google, TensorFlow, and so on, is again, compilation, static compilation, the right thing. Yeah, so the funny thing after working on Compilers for a really long time is that, and this is one of the things that LVM has helped with, is that I don't look at compilations being static or dynamic or interpreted or not. This is a spectrum. And one of the cool things about Swift is that Swift is not just statically compiled. dynamic or interpreted or not. This is a spectrum. Okay.

Starting point is 00:46:25 And one of the cool things about Swift is that Swift is not just statically compiled. It's actually dynamically compiled as well. And it can also be interpreted than nobody's actually done that. And so what ends up happening when you use Swift in a workbook, for example, in Colab or in Jupyter, is it's actually dynamically compiling

Starting point is 00:46:43 the statements as you execute them. And so let's get back to the software engineering problems where if you layer the stack properly, you can actually completely change how and when things get compiled because you have the right abstractions there. And so the way that a collab workbook works with Swift is that when you start typing into it,

Starting point is 00:47:04 it creates a process, a unix process. And then each line of code you type in, it compiles it through the Swift compiler, the front end part, and then sends it through the optimizer, jit compiles machine code, and then injects it into that process. And so as you're typing new stuff, it's like squirting in new code, and overriding and replacing and updating code in place. And the fact that it can do this is not an accident. Like Swift was designed for this.

Starting point is 00:47:32 But it's an important part of how the language was set up and how it's layered. And this is a non-obvious piece. And one of the things with Swift that was, for me, a very strong design point is to make it so that you can learn it very quickly. And so, from a language design perspective, the thing that I always come back to is this UI principle of progressive disclosure of complexity. And so, in Swift, you can start by saying, print, quote, hello world, quote, right? And there's no slush in, just like Python, one line of code, no main, no header files, no header files, no public static class void, blah blah blah string, like Java has right. One line of code, right. And you can teach that and it works great. Then you can say, well, let's introduce variables. And so you can declare a variable with var. So var x equals four. What is a variable? You can use x, x plus 1, this is what it means, then you

Starting point is 00:48:25 can say, well, how about control flow? Well, this is one if statement is, this is what a four statement is, this is what a while statement is. And then you can say, let's introduce functions, right? And many languages like Python have had this, this kind of notion of, let's introduce small things, and then you can add complexity, then you can introduce classes, and then you can add generics, and the case of Swift, and then you can add complexity, then you can introduce classes, and then you can add generics, in the case of Swift, and then you can, in modules and build out in terms of the things that you're expressing. But this is not very typical for compiled languages.

Starting point is 00:48:52 And so this was a very strong design point and one of the reasons that Swift in general is designed with this factoring of complexity in mind so that the language can express powerful things. You can write firmware and switch if you want to. But it has a very high level feel, which is really this perfect blend because often you have very advanced library writers that want to be able to use the the nitty gritty details. But then other people just want to use the libraries and work at a higher abstraction level. It's kind of cool that

Starting point is 00:49:22 I saw that you can just interrupt our ability. I don't think I pronounced that word enough, but you can just drag in Python. It's just a strength. You can import, like, I saw this in the demo. Yeah. I'm importing out, but like, how do you make that happen? Yeah, well, what's, what's up with, yeah, so is that as easy as it looks, or is it? Yeah, as easy as it looks, that's not,

Starting point is 00:49:44 that's not a stage magic hack or anything like that. I don't mean from the user perspective, I mean from the implementation perspective to make it happen. So it's easy once all the pieces are in place. The way it works. So if you think about a dynamically typed language like Python, right, you can think about it in two different ways. You can say it has no types, right, which is what most people would say. Or you can say it has one types, right, which is what most people would say or you can say it has one type

Starting point is 00:50:07 Right, and you can say it has one type and it's like the Python object Right, and the Python object is passed around and because there's only one type it's implicit Okay, and so what happens was Swift and Python talking to each other Swift has lots of types Right has a raise and it has strings and all all like classes and that kind of stuff But it now has a Python object type. So there is one Python object type. And so when you say import numpy, what you get is a Python object, which is the numpy module.

Starting point is 00:50:36 And then you say NP dot array. And it says, okay, hey, Python object. I have no idea what you are. Give me your array member. Right. Okay. Cool. It just uses dynamic stuff. Talks to the Python interpreter and says, hey, Python object. I have no idea what you are. Give me your array member. Okay, cool. It just uses dynamic stuff, talks to the Python interpreter and says, hey, Python, what's the data array member

Starting point is 00:50:50 in that Python object? It gives you back another Python object. And now you say, parentheses for the call and the arguments you're gonna pass. And so then it says, hey, a Python object that is the result of NP.array, call with these arguments. Again, calling into the Python interpreter to do that work. Right now, this is all really simple. If you dive into the code, what you'll see is that the Python module in Swift is something like 1200 lines of code or something written in

Starting point is 00:51:18 pure Swift is super simple and it's built on top of the C interoperability because it just talks to the Python interpreter. But making that possible required us to add two major language features to Swift to be able to express these dynamic calls and the dynamic member lookups. And so what we've done over the last year is we've proposed, implement, standardized, and contributed new language features to the Swift language in order to make it so it is really trivial. And this is one of the things about Swift that is critical to the Swift for TensorFlow work,

Starting point is 00:51:52 which is that we can actually add new language features and the bar for adding those is high, but it's what makes it possible. So you're now at Google doing incredible work on several things, including TensorFlow. So TensorFlow 2.0 or whatever leading up to 2.0 has by default in 2.0 has eager execution in yet in order to make code optimized for GPU or TPU or some of these systems computation needs to be converted to a graph.

Starting point is 00:52:23 So what's that process like? What are the challenges there? Yeah, so I'm tangentially involved in this, but the way that it works with autograph is that you mark your function with the decorator, and when Python calls it, that decorator is invoked, and then it says, before I call this function, you can transform it. And so the way autograph works is as far as I understand, is it actually uses the Python parser to go parse that turn into a syntax tree and now apply compiler techniques to again transform this down into TensorFlow graphs.

Starting point is 00:52:59 And so it, you can think of it as saying, hey, I have an if statement. I'm going to create an if node in the graph, like you say, TF, conned, you have a multiply, well, I'll turn that into a multiply node in the graph and it becomes this tree transformation. So where does the Swift for TensorFlow come in, which is, you know, parallels, you know, for one Swift is a interface, like Python is an interface, the TensorFlow, but it seems like there's a lot more going on in just a different language interface. There's optimization methodology.

Starting point is 00:53:32 Yeah, so the TensorFlow world has a couple of different, what I'd call front-end technologies. And so Swift and Python and Go and Rust and Julian, all these things share the TensorFlow graphs and all the runtime and everything that's later. And so the Swift for TensorFlow is merely another front end for TensorFlow, just like any of these other systems are. There's a major difference between, I would say, three camps of technologies here. There's Python, which is a special case, because the vast majority of the community efforts go into the Python interface.

Starting point is 00:54:08 And Python has its own approaches for automatic differentiation, has its own APIs, and all this kind of stuff. There's Swift, which I'll talk about in a second. And then there's kind of everything else. And so the everything else are effectively language bindings. So they call into the TensorFlow runtime, but they usually don't have automatic differentiation, or they usually don't

Starting point is 00:54:29 provide anything other than APIs that call the C APIs in TensorFlow, and so they're kind of wrappers for that. Swift is really kind of special. And it's a very different approach. Swift for TensorFlow, that is, is a very different approach, because they were saying, let's look at all the problems that need to be solved in the full stack of the TensorFlow compilation process, if you think about it that way,

Starting point is 00:54:52 because TensorFlow is fundamentally a compiler. It takes models, and then it makes them go fast on hardware. That's what a compiler does. And it has a front end, it has an optimizer, and it has many back ends. And so if you think about it the right way, or if you look at it in a particular way, it is a compiler.

Starting point is 00:55:13 And so Swift is merely another front end. But it's saying in the design principle, it's saying, let's look at all the problems that we face as machine learning practitioners. And what is the best possible way we can do that, given the fact that we can change literally anything listed in our stack. And Python, for example, where the vast majority of the engineering and effort has gone into,

Starting point is 00:55:40 is constrained by being the best possible thing you can do with the Python library. There are no Python language features that are added because of machine learning that I'm aware of. They added a matrix multiplication operator with that, but that's as close as you get. So with Swift, it's hard, but you can add language features to the language, and there's a community process for that. So so we look at these things and say, well, what is the right division of labor between the human programmer and the compiler? And Swift has a number of things that shift that balance.

Starting point is 00:56:12 So because it has a type system, for example, it makes certain things possible for analysis of the code and the compiler can automatically build graphs for you without you thinking about them. That's a big deal for a programmer. You just get free performance, you get clustering infusion, and optimization, things like that. Without you as a programmer having to manually do it, because the compiler can do it for you. Automatic differentiation is another big deal. I think one of the key contributions of the Swift TensorFlow TensorFlow Flow Project is that there's

Starting point is 00:56:46 this entire body of work on automatic differentiation that dates back to the Fortran days. People doing a tremendous amount of numerical computing in Fortran used to write these, what they call source-to-source translators, where you take a bunch of code, shove it into a mini compiler, and it would push out more Fortran code, but it would generate the backwards passes for your functions for you, the derivatives. And so, in that work in the 70s,

Starting point is 00:57:14 a tremendous number of optimizations, a tremendous number of techniques for fixing numerical instability and other kinds of problems were developed. But they're very difficult to port into a world where in eager execution you get an op-by-op at a time. You need to be able to look at an entire function and be able to reason about what's going on. And so when you have a language integrated automatic differentiation, which is one of the things that Swift Project is focusing on, you can open all these techniques and reuse them and familiar ways. But the language integration piece has a bunch of design in it, and it's also complicated.

Starting point is 00:57:50 The other piece of the puzzle here, that's kind of interesting is TPUs at Google. So we're in a new world with deep learning. It's constantly changing, and I imagine without disclosing anything, I imagine you're still innovating on the TPU front, too. Indeed. So, how much interplays there between software and hardware

Starting point is 00:58:10 and trying to figure out how to gather, move towards an optimized solution? There's an incredible amount. So, we're on a third generation of TPUs, which are now 100 petaflops in a very large liquid-cooled box, virtual box with no cover. And as you might imagine, we're not out of ideas yet. liquid-cooled box, virtual box with no color. And as you might imagine, or not, out of ideas yet, the great thing about TPUs is that they're a perfect example of hardware software code design.

Starting point is 00:58:34 And so it's about saying, what hardware do we build to solve certain classes of machine learning problems? Well, the algorithms are changing. The hardware takes some're used to produce, right? And so you have to make bets and decide what is going to happen. And so, and what is the best way to spend the transistors to get the maximum, you know, performance per watt or area per cost or like whatever it is that you're optimizing for. And so one of

Starting point is 00:59:02 the amazing things about TPUs is this numeric format called BFLUT16. BFLUT16 is a compressed 16 bit floating point format, but it puts the bits in different places. In numeric terms, it has a smaller mantissa and a larger exponent. That means that it's less precise, but it can represent larger ranges of values, which in the machine learning context is really important and useful because sometimes you have very small gradients you want to accumulate and very, very small numbers that are important to move things as you're learning. But sometimes you have very large magnitude numbers as well.

Starting point is 00:59:40 And B-flot 16 is not as precise, the mantis is small, but it turns out the machine learning algorithms actually want to generalize. And so there's theories that this actually increases the ability for the network to generalize across datasets. And regardless of whether it's good or bad, it's much cheaper at the hardware level to implement, because the area and time of a multiplier is n squared

Starting point is 01:00:06 in the number of bits in the mantissa, but it's linear with size of the exponent. And you connected to a solid deal. Efforts here both on the hardware and the software side. Yeah, and so that was a breakthrough coming from the research side and people working on optimizing network transport of weights across the network originally and trying to find ways to compress that. But then it got burned into silicon and it's a key part of what makes TPU performance so amazing and

Starting point is 01:00:33 and great. Now TPUs have many different aspects that are important, but the the co-design between the low-level compiler bits and the software bits and the algorithms is all super important. And it's an amazing trifecta that only Google can do. Yeah, that's super exciting. So can you tell me about MLIR project, previously the secretive one? Yeah, so MLIR is a project that we announced at a compiler conference three weeks ago or something at the compiler's three weeks ago or something, the Compilers for Machine Learning Conference.

Starting point is 01:01:07 Basically, again, if you look at TensorFlow as a compiler stack, it has a number of compiler algorithms within it. It also has a number of compilers that get embedded into it, and they're made by different vendors. For example, Google has XLA, which is a great compiler system. Nvidia has TensorFlow RT, Intel has NGraph. There's a number of these different compiler systems and they're very hardware specific

Starting point is 01:01:31 and they're trying to solve different parts of the problems. But they're all kind of similar in a sense of they want to integrate with TensorFlow. Now, TensorFlow has an optimizer and it has these different code generation technologies built in. The idea of MLR is to build a common infrastructure to support all these different subsystems. Initially, it's to be able to make it so that they all plug in together and

Starting point is 01:01:52 they can share a lot more code and can be reusable. But over time, we hope that the industry will start collaborating and sharing code and then instead of reinventing the same things over and over again, that we can actually foster some of that, that working together to solve common problem, energy that has been useful in the compiler field before. Beyond that, MLR is, some people have joked that it's kind of LVM2. It learns a lot about what LVM has been good

Starting point is 01:02:19 and what LVM has done wrong. And it's a chance to fix that. And also, there are challenges in the LVM ecosystem as well where LVM is very good at the thing it was designed to do, but you know, 20 years later, the world has changed and people try to solve higher level problems and we need some new technology. And what's the future of open source in this context? Very soon. So it is not yet open source, but it will be hopefully still believe in the value of open source and its consequences.

Starting point is 01:02:49 Oh, yeah, absolutely. And I think that the TensorFlow community at large fully believes in open source. So I mean, that's, there is a difference between Apple, where you were previously in Google, now, in spirit and culture. And I would say the open source in your TensorFlow was a seminal moment in the history of software because here's this large company releasing a very large code basis, the open sourcing. What are your thoughts on that? How happy or not were you to see that kind of degree of open sourcing? So between the two, I prefer the Google approach,

Starting point is 01:03:22 if that's what you're saying. The Apple approach makes sense given the historical context that Apple came from, but that's been 35 years ago. And I think the Apple is definitely adapting. And the way I look at is that there's different kinds of concerns in the space, right? It is very rational for a business to care about making money. That fundamentally is what a business is about. But I think it's also incredibly realistic to say, it's not your string library that's the thing that's going to make you money. It's going to be the amazing UI product differentiating features

Starting point is 01:03:59 and other things like that that you build on top of your string library. And so keeping your string library proprietary and secret and things like that is maybe not the important thing anymore. Or before platforms were different, right? And even 15 years ago, things were a little bit different. But the world is changing. So Google strikes a very good balance, I think.

Starting point is 01:04:22 And I think the TensorFlow being open source really changed the entire machine learning field and it caused revolution in its own right. And so I think it's amazing for amazingly forward looking because I could have imagined, and I was in that Google at the time, but I could imagine a different context in a different world where a company says, machine learning is critical to what we're doing. We're not going to give it to other people. Right. And so that decision is a profound, a profoundly brilliant insight that I think

Starting point is 01:04:53 has really led to the world being better and better for Google as well. And has all kinds of ripple effects. I think it is really, I mean, you can't understate Google deciding that how profound that is for software is awesome. Well, and it's been, and again, I can understand the concern about if we release our machine learning software,

Starting point is 01:05:14 our competitors could go faster. From the other hand, I think that open source in touch flow has been fantastic for Google. And I'm sure that that decision was very non-obvious at the time, but I think it's worked out very well. So let's try this real quick. You were at Tesla for five months as the VP of Audubo. Software, you led the team during the transition from H,

Starting point is 01:05:37 hardware one, to hardware two, have a couple questions. So one, first of all, to me, that's one of the bravest engineering decisions undertaking sort of like undertaking really ever in the automotive industry to me, software-wise, starting from scratch. It's a really braver decision. So my one question is there is, what was that like? What was the challenge of that? Do you mean the career decision of jumping from a comfortable good job into the unknown? Or that combined, so at the individual level,

Starting point is 01:06:08 you making that decision. And then when you show up, it's a really hard engineering problem. So you could just stay, maybe slow down, say hardware one, or those kinds of decisions. So just taking it full on, let's do this from scratch. What was that like?

Starting point is 01:06:28 Well, so I mean, I don't think Tesla has a culture of taking things slow and seeing how it goes. So, and one of the things that attracted me about Tesla is it's very much a gung ho. Let's change the world. Let's figure it out kind of a place. And so I have a huge amount of respect for that. Tesla has done very smart things with Hardware 1 in particular, and the Hardware 1 design

Starting point is 01:06:48 was originally designed to be very simple automation features in the car for traffic work, cruise control, and things like that. And the fact that they were able to effectively feature creep it into lane holding and a very useful driver assistance feature is pretty stouning, particularly given the details of the hardware. Hardware 2 built on that a lot of ways. And the challenge there was that they were transitioning from a third party provided vision stack to an in-house built vision stack.

Starting point is 01:07:19 And so for the first step, which I mostly helped with, was getting onto that new vision stack, and that was very challenging. And it was time-critical for various reasons, and it was a big leap, but it was fortunate that it built on a lot of the knowledge and expertise and the team that had built Hardware One's driver assistance features. So you spoke in a collected and kind way about your time in Tesla, but it was ultimately not a good fit Elon Musk. We've talked on this podcast, I've guessed the course. Elon Musk continues to do some of the most bold and innovative engineering work in the world.

Starting point is 01:07:55 At times at the cost, some of the members of the Tesla team, what did you learn about this working in this chaotic world with Elon? Yeah, so I guess I would say that when I was at Tesla, I experienced and saw the highest degree of turnover I'd ever seen in a company, which was a bit of a shock. But one of the things I learned and I came to respect is that Elon's able to attract amazing talent because he has a very clear vision of the future and he can get people to buy into it because they want that future to happen. Right. And the power of vision is something that I have a tremendous amount of respect for. And I think that Elon is fairly singular in the world in terms of

Starting point is 01:08:35 the things he's able to get people to believe in. And it's a very, it's very, there may people to stand in the street corner and say, ah, we're going to go to Mars, right? But then, but then there are a few people that can get others to buy into it and believe and build the path and make it happen. And so I respect that. I don't respect all of his methods, but I have a huge amount of respect for that. You've mentioned in a few places, including in this context, working hard. What does it mean to work hard? And when you look back at your life,

Starting point is 01:09:10 what were some of the most brutal periods of having to really sort of put everything you have into something? Yeah, good question. So working hard can be to find a lot of different ways. So a lot of hours, and so that is true. The thing to me that's the hardest is both being short term focused on delivering and executing and making a thing happen while also thinking about the longer term and trying to balance that, right? Because if you are myopically focused on solving a task and getting that done, and only think about that

Starting point is 01:09:48 incremental next step, you will miss the next big hill you should jump over to, right? And so I've been really fortunate that I've been able to kind of oscillate between the two. And historically at Apple, for example, that was made possible because I was able to work with some really amazing people and build up teams and leadership structures and allow them to grow in their careers and take on responsibility thereby freeing up me to be a little

Starting point is 01:10:16 bit crazy and thinking about the next thing. And so it's a lot of that, but it's also about, you know, with the experience you make connections that other people don't necessarily make. And so I think that is, that's a big part as well. But the bedrock is just a lot of hours. And, you know, that's okay with me. There's different theories on work life balance. And my theory for myself, which I do not project onto the team, but my theory for myself is that, you

Starting point is 01:10:45 know, I want to love what I'm doing and work really hard. And my purpose, I feel like in my goal is to change the world and make it a better place. And that's what I'm really motivated to do. So, last question. LLVM logo is a dragon. Yeah. You explain that this is because dragons have connotations of power, speed, intelligence. It can also be sleek, elegant, and modular, though you remove the modular part. What is your favorite dragon-related character from fiction, video, or movies? So those are all very kind of ways of explaining it. Do you want to know the real reason it's a dragon? Yeah.

Starting point is 01:11:22 explaining it that you want to know the real reason it's a dragon. What? Yeah. Yeah. Is that better? So there's a seminal book on compiler design called the dragon book. And so this is a really old now book on compilers. And so the dragon logo for LVM came about because at Apple, we kept talking about LVM related technologies. And there's no logo to put on a slide. Right.

Starting point is 01:11:44 And sort of like, what do we do? And somebody's like, well, what kind of logo should a compiler technology have? And I'm like, I don't know. I mean, the dragons, the dragon is the best thing that we've got. And Apple somehow magically came up with the logo. And it was a great thing and the whole community rallied around it. And then it got better as other graphic designers got involved. But that's originally where it came from story. Is there dragons from fiction that you connect with that game of thrones, Lord of the Rings, that kind of thing? Lord of the Rings is great. I also like role playing games and things like input computer role playing

Starting point is 01:12:19 games instead of dragons off and show up in there. But but really comes back to to to the book. Oh no, we need we need a thing. We will do it. And hilariously, one of the funny things about LVM is that my wife, who's amazing, runs the LVM Foundation. And she goes to Grace Hopper and is trying to get more women involved. And she's also a compiler engineer,

Starting point is 01:12:41 so she's trying to get other women to get interested in compilers and things like this And so she hands out the stickers and people like the album sticker because the game is thrown And so sometimes culture has this helpful effect too like get the next generation of pilot engineers engaged with the cause Okay, awesome Chris. Thanks so much for talking to us. That's a great talk out here. you

Lex Fridman Podcast - Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.