Lex Fridman Podcast - #104 – David Patterson: Computer Architecture and Data Storage
Episode Date: June 27, 2020David Patterson is a Turing award winner and professor of computer science at Berkeley. He is known for pioneering contributions to RISC processor architecture used by 99% of new chips today and for c...o-creating RAID storage. The impact that these two lines of research and development have had on our world is immeasurable. He is also one of the great educators of computer science in the world. His book with John Hennessy "Computer Architecture: A Quantitative Approach" is how I first learned about and was humbled by the inner workings of machines at the lowest level. Support this podcast by supporting these sponsors: - Jordan Harbinger Show: https://jordanharbinger.com/lex/ - Cash App – use code "LexPodcast" and download: - Cash App (App Store): https://apple.co/2sPrUHe - Cash App (Google Play): https://bit.ly/2MlvP5w This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon. Here's the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time. OUTLINE: 00:00 - Introduction 03:28 - How have computers changed? 04:22 - What's inside a computer? 10:02 - Layers of abstraction 13:05 - RISC vs CISC computer architectures 28:18 - Designing a good instruction set is an art 31:46 - Measures of performance 36:02 - RISC instruction set 39:39 - RISC-V open standard instruction set architecture 51:12 - Why do ARM implementations vary? 52:57 - Simple is beautiful in instruction set design 58:09 - How machine learning changed computers 1:08:18 - Machine learning benchmarks 1:16:30 - Quantum computing 1:19:41 - Moore's law 1:28:22 - RAID data storage 1:36:53 - Teaching 1:40:59 - Wrestling 1:45:26 - Meaning of life
Transcript
Discussion (0)
The following is a conversation with David Patterson, touring award winner and professor of computer science at Berkeley.
He is known for pioneering contributions to risk processor architecture used by 99% of new chips today,
and for co-creating RAID storage.
The impact that these two lines of research and development have had in our world is immeasurable.
He is also one of the great educators of computer
science in the world. His book with John Hennessy is how I first learned about and was humbled by
the inner workings and machines at the lowest level. Quick summary of the ads, two sponsors,
the Jordan Harbinger Show and Cash App. Please consider supporting the podcast by going to Jordan Harbinger.com slash Lex
and downloading cash app and using code Lex Podcast. Click on the links by the stuff.
It's the best way to support this podcast and in general the journey I'm on in my research and startup.
This is the Artificial Intelligence Podcast. If you enjoy it, subscribe on YouTube,
review it, or five stars on Apple Podcast, support
it on Patreon, or connect with me on Twitter, and Lex Friedman, spelled without the e, just
F-R-I-D-M-A-N.
As usual, I'll do a few minutes of ads now, and never any ads in the middle that can break
the flow of the conversation.
This episode is supported by the Jordan Harbinger Show.
Go to JordanHarbinger.com slash Lex.
It's how he knows I sent you.
On that page, there's links to subscribe to it on Apple Podcasts, Spotify, and everywhere
else.
I've been binging on this podcast.
It's amazing.
Jordan is a great human being.
He gets the best out of his guest, Dive's Deep, calls them out when it's needed, and makes
the whole thing fun to listen to.
He's interviewed Kobe Bryant, Mark Cuban, Neil de Grass Tyson,
Garicosparov, and many more. I recently listened to his conversation with Frank Abagnale author
of Catch Me If You Can, and one of the world's most famous con men.
Perfect podcast length and topic for a recent long distance run that I did.
Again, go to JordanHarbinger.com slash Lex.
To give him my love and to support this podcast, subscribe also on Apple Podcasts, Spotify
and everywhere else.
This show is presented by CashApp, the greatest sponsor of this podcast ever and the number
one finance app in the App Store when you get it use code Lex podcast
Cash app, let's you send money to friends by Bitcoin and invest in the stock market with as little as $1
Since cash app allows you to buy Bitcoin let me mention that cryptocurrency in the context of the history of money is fascinating
I recommend the scent of money as a great book on this history.
Also, the audiobook is amazing. Debates and credits on ledgers started around 30,000 years ago.
The US dollar created over 200 years ago, and the first decentralized cryptocurrency released
just over 10 years ago. So given that history, cryptocurrency is still very much in its early days
of development, but it's still aiming to and just might redefine the nature of money.
So again, if you get cash out from the App Store Google Play and use the code,
Lex podcast, you get $10 and cash up will also donate $10 to first,
an organization that is helping to advance robotics and STEM education for young people around the world.
And now here's my conversation with David Patterson. Let's start with the big historical question.
How have computers changed in the past 50 years at both the fundamental architectural level
and in general in your eyes?
Well, the biggest thing that happened was the invention of the microprocessor.
So computers that used to fill up several rooms could fit inside your cell phone and
Not only and not only to get smaller they got a lot faster. So they're
Million times faster than they were 50 years ago and they're much cheaper and they're ubiquitous
You know
I don't there's seven point eight billion people on this planet, probably half of them have
cell phones, but now it's just remarkable. It's probably more microprocessors than there are people.
Sure, I don't know what the ratio is, but I'm sure it's above one. Maybe it's 10 to 1 or some
number like that. What is a microprocessor? So, I'll wait to say what a microprocessor. So a way to save it on microprocessor is to tell you what's inside a computer. So
computer forever has classically had five pieces. There's input and output, which kind of naturally,
as you'd expect, is input is like speech or typing and output is displays.
There's a memory, and like the name sounds, it remembers things.
So it's integrated circuits whose job is you put information in, and then when you ask
for it, it comes back out.
That's memory.
And the third part is the processor, where the team microprocessor comes from, and that
has two pieces as well, and that is the control, which is kind of the brain of the processor.
And what's called the arithmetic unit, it's kind of the brawn of the computer.
So if you think of the, as a human body, the arithmetic unit, the thing that does the
number crunching is the body and the control is the brain.
So those five pieces input output memory, arithmetic unit, and control have been
computer since the very dawn, and the last two are considered the processor. So a microprocessor
simply means a processor that fits on a microchip, and that was invented about 40 years ago,
was the first microprocessor. It's interesting that you refer to the arithmetic unit as the, like, you connected to the body
and the controllers, the brain.
So I guess I never thought of it the way, the next way to think of it, because most of
the actions the microprocessor does in terms of literally sort of computation, But the microprocessor does computation, it process information.
And most of the thing it does is basic arithmetic operations.
What are the operations, by the way?
It's a lot like a calculator.
So there are add instructions, subtractive instructions,
multiply and divide.
And kind of the brilliance of the,
the invention of the, of the,
Michael, of the computer or the processor,
is that it performs very trivial operations,
but it just performs billions of them per second.
And what we're capable of doing is writing software
that can take these very trivial instructions
and have them create tasks that can do things
better than human beings can do today.
Just looking back through your career,
does you anticipate the kind of how good we would be able
to get at doing these small basic operations?
How many surprises along the way?
Were you just kind of set back and you said,
wow, I didn't expect expected to go this fast, this
good.
Well, the fundamental driving force is what's called Mars law, which was named after Gordon
Moore, who's a Berkeley Ole Miss, and he made this observation very early in what are
called semiconductors.
And semiconductors are these ideas.
You can build these very simple switches
and you can put them on these microchips.
And he made this observation over 50 years ago.
He looked at a few years and said,
I think what's going to happen is the number
of these little switches called transistors
is going to double every year for the next decade.
And he said this in 1965.
And in 1975, he said, well, maybe it's
going to double every two years.
And that what other people since named that Moore's Law
guided the industry.
And when Gordon Moore made that prediction,
he wrote a paper back in, I think, in the 70s
and said, not only did this going to happen,
he wrote, what would be the implications of that?
And in this article from 1965,
he shows ideas like computers being in cars
and computers being in something that you would buy
in the grocery store and stuff like that.
So he kind of not only called his shot,
he called the implications of it. So if
you were in the computing field, and if you believed Morris prediction, he kind of said what
the what would be happening in the future. So it's not kind of, it's at one sense, this is what
was predicted, and you could imagine it was easy to believe that Morris Law was going to continue and so this would be the implications.
On the other side there are these shocking events in your life.
Like I remember driving in the Marine across the Bay in San Francisco and seeing a built-in board at a local civic center and had a URL on it. And it was like, for the people at the time,
these first URLs, and that's the, you know,
WWW select stuff with the HTTP,
people thought it was, look, look like alien writing, right?
They'd see these advertisements and commercials
or book boards that had this alien writing on it.
So for the lay people, it's like, what the hell is going on here?
And for those people in industry, it's, oh my God,
this stuff is getting so popular.
It's actually leaking out of our nerdy world
and into the real world.
So that, I mean, there is events like that.
I think another one was, I remember with the early days
of the personal computer, when we started seeing advertisements
in magazines for personal computers.
Like, it's so popular that it's made them newspapers. So at one hand, you know, Gordon Moore
predicted it and you kind of expected it to happen, but when it really hit and you saw it affecting
society, it was shocking. So maybe taking us that back and looking both engineering and philosophical perspective. What do you see
as the layers of abstraction in the computer? Do you see a computer as a set of layers of abstractions?
Yeah, I think that's one of the things that computer science, a fundamentalist, these things
are really complicated in the way we cope with complicated software and complicated
hardware as these layers of abstraction.
That simply means that we suspend disbelief and pretend that the only thing you know is
that layer.
You don't know anything about the layer below it.
That's the way we can make very complicated things.
Probably it started with hardware,
but that's the way it was done.
But it's been proven extremely useful.
And I would think in a modern computer today,
there might be 10 or 20 layers of abstraction.
And they're all trying to enforce this contract
is all you know is this interface.
There's a set of commands that you can are allowed to use,
and you stick to those commands,
and we will faithfully execute that.
And it's like peeling the layers of a London,
of an onion, and you get down,
there's a new set of layers and so forth.
So for people who want to study computer science,
the exciting part about it is you can keep peeling those layers.
You take your first course and you might learn to program and Python and then you can take
a follow-on course and you can get it down to a lower level language like C and you can
go and then you can, if you want to, you can start getting into the hardware layers and
you keep getting down all the way to that transistor that I talked about that Gordon
Moore predicted, and you can understand all those layers all the way up to the highest
level application software.
So it's a very kind of magnetic field.
If you're interested, you can go into any depth and keep going.
In particular, what's happening right now or what's happened in software
last 20 years in recently in hardware,
there's getting to be open source versions
of all of these things.
So what open source means is what the engineer,
the programmer designs, it's not secret
that belonging to a company, it's out there
on the worldwide web so you can see it.
So you can look at for lots of pieces of software that you use, you can see exactly what the
programmer does if you want to get involved.
That used to stop at the hardware.
Recently, it's been an effort to make open source hardware and those interfaces open, so
you can see that.
So instead of before you had to stop at the hardware,
you can now start going layer by layer below that and see what's inside there.
So it's a remarkable time that for the interested individual can really see
in great depth what's really going on in the computers that power everything
that we see around us.
Are you thinking also when you say open source at the hardware level,
is this going to the design architecture instruction set level,
or is it going to literally the manufacturer of the actual hardware,
of the actual chips, whether that's ASICs specialized,
a particular domain or the general?
Yeah, so let's talk about that a little bit.
So when you get down to the bottom layer of software,
the way software talks to hardware is in a vocabulary.
And what we call that vocabulary,
we call that the words of that vocabulary
called instructions and the technical term for the vocabulary is instruction set.
So those instructions are likely we talked about earlier
that can be instructions like add, subtract,
and multiply, divide.
There's instructions to put data into memory,
which is called a store instruction
and to get data back, which is called the load instructions.
And those simple instructions
go back to the very dawn of computing in 1950, the commercial computer had these instructions. So that's the instruction set that we're talking about. So up until, I'd say 10 years ago,
these instruction sets are all proprietary. So a very popular one is owned by Intel,
the one that's in the cloud and in all the PCs in the world,
Intel owns that instruction set. It's referred to as the X86. There have been a sequence of
ones that the first number was called 8086. And since then, there's been a lot of numbers,
but they all end in 86. So there's been that kind of family of instruction sets.
And that's proprietary.
And that's proprietary.
The other one that's very popular is from ARM.
That kind of powers all the cell phones in the world,
all the iPads in the world,
and a lot of things that are so-called Internet of Things devices.
ARM and that one is also proprietary.
Arm will license it to people for a fee, but they own that.
So the new idea that got started at Berkeley, kind of unintentionally 10 years ago, is
in early in my career, we pioneered a way to do of these vocabularious instruction sets
that was very controversial at the time.
At the time, in the 1980s, conventional wisdom was these vocabularies instruction sets
should have powerful instructions.
So polycyllabic kind of words you can think of that.
And so that instead of just add subtract and multiply, they would have polynomial of I'd or sort a list.
And the hope was of those powerful vocabularies
that make it easier for software.
So we thought that didn't make sense
for microprocessors or as people at Berkeley
and Stanford and IBM who argued the opposite.
And what we called that was a reduced instruction set computer
and the abbreviation was RISC and typical for computer people.
We used the abbreviations, start pronouncing it, so risk was the thing.
So we said for microprocessors, which with Gordon's Mars changing really fast,
we think it's better to have a pretty simple set of instructions,
reduce set of instructions.
That that would be a better way to build microprocessors
since they're going to be changing so fast due to more of the law.
And then we'll just use standard software to cover,
to use, generate more of those simple instructions.
And one of the pieces of software that it's in that software stack,
going between these layers of abstractions, is called compiler. And it's basically translates. It's a
translator between levels. We said the translator will handle that. So the
technical question was, well, since they're these reduced instructions, you
have to execute more of them. Yeah, that's right. But maybe you get execute them
faster. Yeah, that's right. They're simpler so they could go faster, but you have
to do more of them. So what's what's that tradeoff look like? And it ended up that we ended up executing maybe 50% more instructions,
maybe a third more instructions, but they ran four times faster. So so this risk controversial risk
ideas prove to be maybe factors of three or four better. I love that this idea was controversial
and almost kind of like rebellious. So that's in the context of what was more conventional
is the complex instructional set computing. So how'd you pronounce that? Sisk. Sisk.
Right. It's a risk versus sisk. And believe it or not, this sounds very, very, you know, who cares about this,
right? It was, it was violently debated at several conferences. It's like, what's the bright way to
go is, and people thought risk was, you know, was a de-evolution, where we're going to make software
worse by making those instructions simpler, and their fierce debates at several conferences in the 1980s,
and then later in the 80s that kind of settled to these benefits.
It's not completely intuitive to me why risk has for the most part one.
Why did that happen?
Yeah, and maybe I can sort of say a bunch of dumb things that could lay the land for
further commentary. So to me, this is a kind of interesting thing.
If you look at C++ versus C,
with modern compilers,
you really could write faster code with C++.
So relying on the compiler to reduce your complicated code
into something simple and fast.
So to me, comparing risk, maybe this is a dumb question, but why is it
that focusing the definition of the design of the instruction set on very few simple instructions
in the long run provide faster execution versus coming up with like you said, a ton of
versus coming up with like you said a ton of
complicated instructions that over time
You know years maybe decades you come up with compilers that can reduce those into simple
Instructions for you. Yeah, so let's try and split that into two pieces
so If the compiler can do that for you, if the compiler can take a complicated program
and produce simpler instructions, then the programmer doesn't care, right?
Programmer, I don't care, just how fast is the computer I'm using, how much does it cost?
And so what happened kind of in the software industry is right around before the 1980s,
critical pieces of software were still written not in languages like C or C++.
They were written in what's called assembly language, where there's this kind of humans writing
exactly at the instructions at the level that a computer understands.
So they were writing, add, subtract, multiply, instructions.
It's very tedious.
But the belief was to write this lowest level of software
that people use, which are called operating systems,
they had to be written in assembly language
because these high level languages were just too inefficient.
They were too slow, or the programs would be too inefficient. They were too slow or the programs would be too big. So that changed with a famous
operating system called UNIX, which is kind of the grandfather of all the operating systems today.
So UNIX demonstrated that you could write something as complicated as an operating system in a
language like C. So once that was true, then that meant
we could hide the instructions set from the programmer. And so that meant then it didn't
really matter. The programmer didn't have to write lots of these simple instructions.
That was up to the compiler. So that was part of our arguments for risk is if you were
still writing assembly language, there's maybe a better case for cis constructions. But if the compiler can do that, it's going to be,
that's done once, the computer translates it once, and then every time you run the program,
it runs at this potentially simpler instructions. And so that was the debate, right?
Is because people would acknowledge that the simpler instructions
could lead to a faster computer.
You can think of monosyllal-alobic instructions.
You could say them, you know, if you think of reading, you could probably read them faster
or say them faster than long instructions.
The same thing, that analogy works pretty well for hardware.
And as long as you didn't have to read a lot more of those instructions, you could win.
So that's the basic idea for risks.
It's interesting that in that discussion of Unix and C, that there's only one step of
levels of abstraction from the code that's really the closest to the machine to the code
that's written by human.
It's at least to me, again, perhaps a dumb intuition,
but it feels like there might have been more layers,
sort of different kinds of humans stacked up
out of each other.
So it's true and not true about what you said,
is several of the layers of software.
Like, so if you, here, two layers would be,
suppose we just talk about two layers, that would be the operating system like you get from,
from Microsoft or from Apple, like iOS or the Windows operating system.
And let's say applications that run on top of it, like Word or Excel.
So both the operating system could be written in C,
and the application could be written in C.
But you could construct those two layers,
and the applications absolutely do call up on the operating
system, and the change was that both of them
could be written in higher level languages.
So it's one step of a translation,
but you can still build many layers of abstraction
of software on top of that.
And that's how things are done today.
So still today, many of the layers that you'll deal with,
you may deal with debuggers, you may deal with linkers.
There's libraries, many of those today will be written in C++, say, even though
that language is pretty ancient, and even the Python interpreter is probably written in C or C++.
So lots of layers there are probably written in these some fashioned, efficient languages that still take one step to produce these instructions,
produce risk instructions, but they're composed each layer of software invokes one another through
these interfaces and you can get 10 layers of software that way. So in general, the risk was developed here, Berkeley. It was kind
of the three places that were these radicals that advocated for this against the rest of
the community, where IBM, Berkeley, and Stanford, you're one of these radicals. And how radical
did you feel? How confident did you feel? How douseful were you that risk might be the right approach?
Because you can also into it
that is kind of taking a step back
into simplicity, not forward into simplicity.
Yeah, no, it was easy to make,
yeah, it was easy to make the argument against it.
Well, this was my colleague John Hennessey at Stanford
and I, we were both assistant professors. And for me, I just believed in the power of
our ideas. I thought what we were saying made sense, more laws going to move fast. The
other thing that I didn't mention is one of the surprises of these complex instruction
sets. You could certainly write these complex instructions
if the programmers writing them themselves.
It turned out to be kind of difficult
for the compiler to generate those complex instructions,
kind of ironically, you'd have to find the right
circumstances that just exactly fit this complex instruction.
It was actually easier for the compiler
to generate these simple instructions.
So not only did these complex instructions
make the hardware more difficult to build,
often the compiler wouldn't even use them.
And so it's harder to build.
The compiler doesn't use them that much.
The simple instructions go better with Morris Law.
The number of transistors is doubling every two years.
So we're going to have, you know,
you want to reduce the time
to design the microprocessor.
That may be more important than the number of instructions.
So I think we believed in the, that we were right,
that this was the best idea.
Then the question became in these debates,
well, yeah, that's a good technical idea.
But in the business world, this doesn't matter.
There's other things that matter.
It's like things that matter.
It's like arguing that if there's a standard
with the railroad tracks and you've come up with a better width
but the whole world has covered railroad tracks
so your ideas have no chance of success.
Commercial success.
It was technically right, but commercially,
it'll be insignificant.
Yeah, it's kind of sad that this world, the history of human civilization is full of good
ideas that lost because somebody else came along first with a worse idea.
And it's good that in the computing world at least some of these have well, well you could
are, I mean, it's probably still cis people that say, yeah, what happened was what was interesting
Intel, a bunch of the SISC companies with SISC instructions that said vocabulary, they gave
up, but not Intel.
What Intel did to its credit because Intel's vocabulary was in the in the personal computer and so that was a very valuable vocabulary because the way we distribute software is in those actual instructions it's in the instructions of that instruction set so
they they you don't get that source code with the programmers wrote you get after it's been translated into the last level that's if you were to get a floppy disk or download software, it's in the instructions of that instruction set. So the X86 instruction set was very valuable.
So what Intel did cleverly and amazingly is they had their chips in hardware do a translation
step. They would take these complex instructions and translate them into essentially in risk
instructions in hardware on the fly at Gigerhurt's clock speeds.
And then any good idea that risk people had, they could use and they could still be compatible
with this really valuable PC software base, which also had very high volumes, 100 million
personal computers per year. So the SISC architecture in the business world was actually one in this PC era.
So just going back to the time of designing risk, when you design an instruction set architecture, do you think like a programmer, do you think like a microprocessor engineer, do you think like an artist, a philosopher, do you think in software and hardware?
I mean, is it art? Is it science?
Yeah, I'd say I think designing a good instruction set is an art. And I think you're trying to balance
the simplicity and speed of execution
with how well easy it will be for compilers to use it.
You're trying to create an instruction set
that everything in there can be used by compilers.
There's not things that are missing
that'll make it difficult for the program to run,
they run efficiently, but you want it to be easy to build as well. So you're thinking, I'd say
you're thinking hard, we're trying to find a hard software compromise that'll work well.
It's a matter of taste. It's fun to build sets. It's not that hard to build an instruction set, but to build one that
catches on and people use, you know, you have to be, you know, fortunate to be the right
place in the right time or have a design that people really like. Are you using metrics?
So is it quantifiable? Because you kind of have to anticipate the kind of program
that people write ahead of time.
So is that, can you use numbers, can you use metrics,
can you quantify something ahead of time,
or is this again, that's the art part
where you're kind of anticipating?
A big change, kind of what happened,
I think from Hennessy's and my perspective in the 1980s,
what happened was going from
kind of really, you know, taste and hunches to quantifiable.
And in fact, he and I wrote a textbook at the end of the 1980s called Computer Architecture
a Quantitative Approach.
I heard of that.
And it's the thing, it had a pretty big impact in the field because we
went from textbooks that kind of listed. So here's what this computer does, and here's the pros
and cons, and here's what this computer does in pros and cons, to something where there were formulas
and equations where you could measure things. So specifically for instruction sets,
what we do in some other fields do is we agree upon
a set of programs, which we call benchmarks, and a suite of programs.
And then you develop both the hardware and the compiler, and you get numbers on how well
your computer does, given its instruction set, and how well you implemented it in your microprocessor,
and how good your compilers are.
Computer architecture, we, you know,
using professors' terms, we grade on a curve,
rather than grade on absolute scale.
So when you say, you know,
these programs run this fast,
well, that's kind of interesting,
but how do you know it's better
while you compare it to other computers at the same time?
So the best way we know how to make, turn it into a kind of more science and experimental and quantitative is to compare yourself to other computers of the same era that have the same access to the same kind of technology on commonly agreed benchmark programs.
commonly agreed benchmark programs.
So maybe toss up two possible directions, we can go one is what are the different trade offs in designing architectures, we've been already talking about SISC and RISC, but maybe
a little bit more detail in terms of specific features that you were thinking about.
And the other side is what are the metrics that you're thinking about when looking at these
trade offs? Yeah, well, let's talk about the metrics that you're thinking about when looking at these trade-offs?
Yeah, let's talk about the metrics. During these debates, we actually had kind of a hard time
explaining, convincing people the ideas, and partly we didn't have a formula to explain it.
And a few years into it, we hit upon a formula that helped explain what was going on. And I think if we can do this,
see how it works orally, just so.
So, if I can do a formula orally.
So fundamentally, the way you measure performance
is how long does it take a program to run.
Program, if you have 10 programs,
and typically these benchmarks were sweet
because you want to have 10 programs so they typically these benchmarks were sweet, because you'd want to have 10 programs
so they could represent lots of different applications.
So for these 10 programs, how long did it take to run?
Well, now, when you're trying to explain why it took so long,
you could factor how long it takes to program to run
into three factors.
One of the first one is how many instructions
did it take to execute?
So that's the, what we've been talking about, you know,
the instructions of a chemi. How many did it take? All right.
The next question is how long did each instruction take to run on average?
So you multiply the number of instructions,
so I'm how long it took to run and that's you how time. Okay, so that's,
but now let's look at this metric
of how long they take the instructions to run.
Well, it turns out the way we could build computers today
is they all have a clock and you've seen this
when you, if you buy a microprocessor,
you'll say 3.1 gigahertz or 2.5 gigahertz
and more gigahertz is good.
Well, what that is is the speed of the clock.
So 2.5 gig GHz turns out to be
4 billions of instruction or 4 nanoseconds. So that's the clock cycle time. But there's another
factor which is what's the average number of clock cycles that takes per instruction. So it's
number of instructions, average number of clock cycles in the clock cycle time. So in these risk-sisterbates, they would concentrate on,
but risk needs to take more instructions.
And we'd argue what maybe the clock cycle is faster,
but what the real big difference was was the number of clock cycles per instruction.
The instruction has to say, what about the massive,
the beautiful mess of parallelism in the whole picture?
The parallelism, which has to do with say,
how many instructions could execute in parallel,
and things like that.
You could think of that as affecting
the clock cycles per instruction,
because it's the average clock cycles per instruction.
So when you're running a program,
if it took 100 billion instructions
and on average it took two clock cycles per instruction
and they were four nanoseconds,
you could multiply that out and see how long it took to run. And there's all kinds of tricks to
try and reduce the number of clocks cycles per instruction. But it turned out that the way they
would do these complex instructions is they would actually build what we would call an interpreter
in a simpler, a very simple hardware interpreter. But it turned out that for the SIS constructions,
if you had to use one of those interpreters,
it would be like 10 clock cycles per instruction
where the risk instructions could be two.
So there'd be this factor of five advantage
in clock cycles per instruction.
We have to execute say 25 or 50% more instructions.
So that's where the win would come.
And then you could make an argument
whether the clock cycle times are the same or not. But pointing out that we could divide the benchmark results time
per program into three factors. And the biggest difference between risk and cis was the clock cycles per
execute a few more instructions, but the clock cycles per instruction is much less. And that was what
this debate. Once we made that argument, then people said, okay, I get it
And so we went from it was outrageously controversial in you know 1982 that maybe probably by 1984 so people said oh yeah
Technically, they've got a good argument. What are the instructions in the risk instruction set just to get intuition
Okay these instructions in the risk instruction set just to get intuition. Okay, 1995 I was asked to sign to predict the future of what microprosyl of the future. So I, and I've seen these predictions
and usually people predict something outrageous just to be entertaining, right? And so my prediction
for 2020 was, you know, things are going to be pretty much, they're going
to look very familiar to what they are. And they are. And if you were to read the article,
you know, the things I said are pretty much true, the instructions that have been around forever
are kind of the same. And that's the outrageous prediction, actually. Yeah.
Given how fast computers have been going. Well, and you know, Morse Law was going to go on.
We thought for 25 more years, who knows.
But kind of the surprising thing, in fact, Hennessy and I won the ACM-AM Touring Award for
both the risk and construction set contributions and for that textbook I mentioned.
But we are surprised that here we are 35, 40 years later after we did our work.
And the conventionalism of the best way
to instruction sets is still those risk instruction sets
that looked very similar to what we looked like
we did in the 1980s.
So those surprisingly, there hasn't been some radical
new idea even though we have a million times
as many transistors as we had back then.
But what are the basic constructions and how did they change over the years?
So I was talking about addition subtractions.
So the things that are in a calculator are in a computer.
So any of the buttons that are in the calculator in the computer.
So the button, so there's a memory function, and like I said, those are turns into
putting something in memories called a store, or bring something back, so I'm going to load.
Just a quick tangent when you say memory, what does memory mean?
Well, I told you there were five pieces of a computer, and if you remember in a calculator,
there's a memory key, so you want to have intermediate calculation and bring it back
later.
So you, you'd hit the memory plus key, M plus maybe, and it would put that into memory, and then
you'd hit an RM like recur instruction, and it'd bring it back into display, so you don't
have to type it.
You don't have to write it down, bring it back again.
So that's exactly what memory is.
You can put things into it as temporary storage and bring it back when you need it later.
So that's memory and loads and stores, But the big thing, the difference between a computer
and a calculator is that the computer can make decisions.
And amazingly, decisions are as simple as,
is this value less than zero, or is this value bigger
than that value?
So there's embouse instructions, which
are called conditional branch instructions,
is what give computers all its power.
If you were in the early days of computing before what's called the general purpose microprocessor, people would write these instructions kind of in hardware,
but it couldn't make decisions. It would do the same thing over and over again. With the power of having branch instructions,
it can look at things that make decisions automatically.
And it can make these decisions,
billions of times per second.
And amazingly enough, we can get, you know,
thanks to advanced machine learning,
we can create programs that can do something smarter
than human beings can do.
But if you go down to that very basic level,
what's the instructions are,
the keys on the calculator,
plus the ability to make decisions
of these conditional branch instructions.
And all decisions fundamentally can be reduced
down to these branch instructions.
Yep.
So in fact, and so going way back in the stack,
back to, we did four risk projects that Berkeley
in the 1980s, they did a couple at Stanford in the 1980s.
In 2010, we decided we wanted to do a new instruction set
learning from the mistakes of those risk architectures
in the 1980s, and that was done here at Berkeley
almost exactly 10 years ago.
And the people who did it, I participated,
but other, Christos Sanovic and others drove it. They called it risk I participated, but other, Christopher Sonovitch and others drove it.
I think called it RIS-5 to honor the four RIS projects
of the 1980s.
So what is RIS-5 involved?
So RIS-5 is another instruction set of vocabulary.
It's learned from the mistakes of the past,
but it still has, if you look at the,
there's a core set of instructions.
It's very similar to the simplest architectures
from the 1980s.
And the big difference about RIS-5 is it's open.
So I talked early about the prior cherry versus open
and software.
So this is an instruction set.
So it's a vocabulary.
It's not hardware.
But by having an open instruction set,
we can have open source implementations,
open source processors that people can use.
Where do you see that going?
It's really exciting possibilities, but you're just like in the scientific American, if
you were to predict 10, 20, 30 years from now, that kind of ability to utilize open source
instructions that are architectures like risk five.
What kind of possibilities might that unlock?
Yeah, and so just to make it clear
because this is confusing,
the specification of risk five is something
that's like in a textbook.
There's books about it.
So that's what that's defining an interface.
There's also the way you build hardware
is you write it in languages that are kind of like C, but they're specialized for hardware that
gets translated into hardware. And so these implementations of this specification are what are
the open source. So they're written in something that's called Veralog or VHDL, but it's put up on the web
just like you can see the C++ code for Linux on the web. So that's the open instruction set enables
open source implementations of RIS 5. They can literally build a process or using this instruction set.
People are. So what happened to us, the story was this was developed
here for our use to do our research. And we made it, we licensed under the Berkeley software
distribution license, like a lot of things get licensed here. So other academics use it. They
wouldn't be afraid to use it. And then about 2014, we started getting complaints that we were
using it in our research and in our courses.
And we got complaints from people in industry is why did you change your instruction set
between the fall and the spring semester? And well, we get complaints from industrial time.
Why the hell do you care what we do with our instruction set? And then when we talked to them,
we found out there was this thirst for this idea of an open instruction set architecture. And they had been looking for one, they stumbled upon ours at Berkeley, thought it was,
boy, this looks great. We should use this one. And so once we realized there is this need
for an open instruction set architecture, and we thought, that's a great idea. And then we
started supporting it and tried to make it happen. So this was, we accidentally stumbled into this,
and to this need, and our timing was good.
And so it's really taking off.
There's, you know, universities are good at starting things,
but they're not good at sustaining things.
So like Linux has a Linux foundation,
there's a RISFI foundation that we started.
There's an annual conferences,
and the first one was done,
I think January of 2015 and the one that was just
last December, and it had 50 people at it.
And this one last December had 1,700 people were at it.
And the companies excited all over the world.
So if predicting into the future, if we were doing 25 years,
I would predict that RIS-5 will be, you know, possibly the most popular instruction set architecture
out there because it's a pretty good instruction set architecture and it's open and free and there's
no reason lots of people shouldn't use it. And there's benefits just like Linux is so popular today compared to 20
years ago. The fact that you can get access to it for free, you can modify it, you can improve it
for all those same arguments, and so people collaborate to make it a better system for everybody to
use, and that works in software. And I expect this same thing will happen in hardware.
everybody to use and that works in software and I expect this same thing will happen in hardware.
So if you look at the arm and tell MIPS, if you look at just the lay of the land and what do you think
just for me because I'm not familiar how difficult this kind of transition would
how much challenges this kind of transition would entail?
Do you see, let me ask my dumb question in another way? No, that's, I know we were headed.
Well, there's a bunch, I think the thing you point out,
there's these very popular proprietary instructions
that's the X86.
And so how do we move to risk five potentially
in sort of in the span of five, 10, 20 years,
a kind of unification?
And given that the device is the kind of way we use devices, IOT, mobile devices, and
the cloud is keeps changing.
Well, part of it, a big piece of it is the software stack.
And what right now looking forward,
there's seen to be three important markets.
There's the cloud.
And the cloud is simply companies like Alibaba
and Amazon and Google, Microsoft,
having these giant data centers
with tens of thousands of servers
and maybe a hundred of these data centers
all over the world.
And that's what the cloud is.
So the computer that dominates the cloud
is the x86 structure set.
So the instructions are the vote.
Instructions set's used in the cloud of the x86,
almost 100% of that today is x86.
The other big thing are cell phones and laptops. Those are the big things
today. I mean, the PC is also dominated by the x86 instruct set, but those sales are
dwindling. You know, there's maybe 200 million PCs a year and there's one and a half billion
phones a year or there's numbers like that. So for the phones, that's dominated by ARM.
And now, and a reason that,
I talked about the software stacks
and the third category is Internet of Things,
which is basically embedded devices,
things in your cars and your microwaves everywhere.
So what's different about those three categories
is for the cloud, the software that runs in the cloud is determined by these companies,
Alibaba, Amazon, Google, Microsoft. So they control that software stack.
For the cell phones, there's both, for Android and Apple, the software they supply, but both of them have marketplaces where anybody in the world can build software. And that software is translated or compiled down and shipped in the vocabulary of ARM.
So that's what's referred to as binary compatible because the actual, it's the instructions are
turned into numbers, binary numbers, and shipped around the world.
And so I just a quick introduction. So arm, what is arm?
Is arm is an instructions that like a risk-based?
Yeah, it's a risk-based instruction set as a proprietary one.
Arm stands for Advanced Risk Machine,
ARM is the name where the company is.
So it's a proprietary risk architecture.
So, and it's been around for a while and surely the most popular
instruction set in the world right now. Every year, billions of chips are using the ARM design
in this post PC era. It was one of the early risk adopters of the risk idea. The first ARM
goes back, I don't know, 86 or so. So Berkeley instead, did they work in the early 80s?
The arm guys needed an instruction set
and they read our papers and it heavily influenced them.
So getting back my story, what about the Internet of Things?
Well, software's not shipped in Internet of Things.
It's the embedded device, people control that software stack.
So the opportunities for RIS 5, everybody thinks, is in the Internet of Things embedded
things because there's no dominant player like there is in the cloud or the smartphones.
And you know, it doesn't have a lot of licenses associated with and you can enhance the instruction
set if you want.
And people have looked at instruction sets and think it's a very good instruction set.
So it appears to be very popular there.
It's possible that in the cloud people, those companies control their software stacks.
So it's possible that they would decide to use risk five ever talking about 10 and 20 years
in the future. One of the be harder would be the cell phones since people ship software
in the arm instruction set that you'd think be the more difficult one but if risk five really
catches on and you know you could in a period of a decade you can imagine that's changing over too.
Give a sense why risk five or arm has dominated. You mentioned these three categories.
Why did arm dominate?
Why does it dominate the mobile device space?
And maybe the gammai, naive intuition
is that there's some aspects of power efficiency
that are important that somehow come along with risk.
Well, part of it is for these old SIS construction sets
like in the X86, it was more expensive
to these, you know, they're older, so they have disadvantages in them because they were
designed 40 years ago, but also they have to translate in hardware from cis constructions to risk constructions on the fly.
And that costs both silicon area, the chips are bigger to be able to do that, and it uses more power.
So ARM, which has followed this risk philosophy, is seem to be much more energy efficient.
And today's computer world, both in the cloud and cell phone and you know things, it isn't the limiting
resource isn't the number of transitions you can fit in the chip, it's how much power
can you dissipate for your application.
So by having a reduced instruction set, that's possible to have a simpler hardware which
is more energy efficient.
An energy efficiency is incredibly important in the cloud.
When you have tens of thousands of computers in a data center,
you want to have the most energy efficient ones there as well.
And of course, for embedded things running off of batteries,
you want those to be our energy efficient and the cell phones too.
So I think it's believed that there's a energy disadvantage of using
these more complex instructions that architectures
So the other aspect of this is if we look at Apple Qualcomm Samsung Huawei all use the
Arm architecture and yet the performance of the systems varies
I mean, I don't know who's opinion you take on but you know Apple for some reason seems to
perform better in terms of these implementation these architecture I don't know who's opinion you take on, but you know, Apple for some reason seems to perform
better in terms of these implementation, these architecture.
So where's the magic and so the picture?
How does that happen?
Yeah, so what Arm pioneered was a new business model as they said, well, here's our proprietary
instruction set and we'll give you two ways to do it.
Either we'll give you one of these implementations written in things like C called Verilog, and
you can just use ours.
Well, you have to pay money for that.
Not only will you give you the will license you to do that, or you could design your own.
And so we're talking about numbers like tens of millions of dollars to have the right to
design your own since they, it's the instruction set belongs to them.
So Apple got one of those, the right to build their own.
Most of the other people who build like Android phones
just get one of the designs from ARM to do it themselves.
So Apple developed a really good microprocessor design team.
They acquired a very good team that
had, was a building other microprocessors
and brought them into the company to build their designs.
So the instructions says are the same.
The specifications are the same.
But their hardware design is much more efficient
than I think everybody else is.
And that's given Apple an advantage in the marketplace
and that the iPhones tend to be the faster
than most everybody else's phones that are there.
It'd be nice to be able to jump around
and kind of explore different little sides of this.
But let me ask one sort of romanticized question.
What do you use the most beautiful aspect or idea of risk instruction sets or instruction
sets for this?
So what I think of that?
I was always attracted to the idea of smallest beautiful.
Is that the temptation and engineering, it's kind of easy to make things more complicated.
It's harder to come up with a, it's more difficult surprise that they come up with a simple
elegant solution.
And I think there's a bunch of small features of risk in general that, you know, where
you can see this example of keeping it simpler, it makes it more elegant.
Specifically in RIS 5, which, you know,
I was kind of the mentor in the program,
but it was really driven by Christos Sanovic
and two graduate students, Andrew Waterman,
and Yensip Leigh.
Is they hit upon this idea of having
a subset of instructions,
a nice simple subset instructions,
like 40-ish instructions,
that all software, the software
staff RIS-5 can run just on those 40 instructions. And then they provide optional features that
could accelerate the performance instructions that if you needed them could be very helpful,
but you don't need to have them. And that's a new, really a new idea. So RIS-5 has, right now, maybe five optional subsets
that you can pull in, but the software runs without them.
If you just wanna build the core 40 instructions,
that's fine, you can do that.
So this is fantastic, educationally,
is that you can explain computers,
you only have to explain 40 instructions,
and not thousands of them.
Also, if you invent some wild and crazy new technology,
like, you know, biological computing,
you'd like a nice, simple instruction set,
and you can risk five if you implement
those core instructions, you can run,
you know, really interesting programs on top of that.
So this idea of a core set of instructions
that the software stack runs on, and then optional features that if you turn them on, the
compilers were used, but you don't have to, I think, is a powerful idea. What's happened in the past,
if for the proprietary instruction sets, is when they add new instructions, it becomes required piece
and so that all microprocessors in the future
have to use those instructions.
So it's kind of like, for a lot of people's,
they get older, they gain weight, right?
Is it that weight in age or correlated?
And so you can see these instructions sets
getting bigger and bigger as they get older.
So RIS-5 lets you be as slim as you are as a teenager and you only have to add these extra
features if you're really going to use them rather than you have no choice.
You have to keep growing with the instructions set.
I don't know if the analogy holds up, but that's a beautiful notion that it's almost
like a nudge towards.
Here's the simple core, that's the essential. Yeah, I think the surprising thing is still,
if we brought back the pioneers from the 1950s
and showed them the instructions that architectures,
they don't understand it.
They say, wow, that doesn't look that different.
Well, yeah, I'm surprised.
And it's, it may be something,
talk about philosophical things.
I mean, there may be something powerful about those, you know, 40 or 15 instructions that
all you need is these commands like these instructions that we talked about.
And that is sufficient to build, to bring about, you know, artificial intelligence. And so it's a remarkable,
surprising to me that is complicated as it is to build these things,
a microprocessors where the line lists are narrower
than the wavelength of light, you know,
is this amazing technology,
is at some fundamental level,
the commands that software executes are really pretty straightforward and haven't changed
that much in decades, of what a surprising outcome.
So underlying all computation, all touring machines, all artificial intelligence systems
perhaps might be a very simple instruction set like a risk five or it's...
Yeah, I mean I that's kind
of what I said I was interested to see I had another more senior faculty
colleague and he he had written something in scientific American and you know
his 25 years in the future and his turned out about when I was a young
professor and he said yep I checked it was I was I was interested to see how that
was going to turn out for me.
And it's pretty held up pretty well.
But yeah, so there's probably, there's something,
you know, there must be something fundamental
about those instructions that were capable of creating
intelligence from pretty primitive operations
and just doing them really fast.
You kind of mentioned a different maybe radical computational medium like biological
and there's other ideas. So there's a lot of spaces in ASIC, so it's domain-specific.
And then there could be quantum computers and so we can think of all those different mediums and types of computation. What's the connection between swapping out different hardware systems in the instruction set?
Do you see those as disjoints or are they fundamentally coupled?
Yeah. So what's so kind of if we go back to the history,
you know, when Mars laws and full effect in you're getting twice as many transistors every couple of years,
you know, kind of the challenge for computer designers is how can we take advantage of that?
How can we turn those transistors into better computers faster, typically? And so there was an era,
I guess in the 80s and 90s where computers were doubling performance every 18 months.
And if you weren't around then, what would happen is you had your computer
and your friends computer, which was like a year or a year and a half newer,
and it was much faster than your computer.
And he or she could get their work done much faster than your typical
viewers. So people took their computers perfectly good computers and threw them away to buy a newer computer because the computer one or two
years later was so much faster. So that's what the world was like in the 80s and 90s. Well
with the slowing down of Moore's law, that's no longer true, right? Now with that desktop computers with the laptops,
I only get a new laptop when it breaks, right?
I'll damn the display broke, I got to buy a new computer,
but before you would throw them away
because they were just so sluggish
compared to the latest computers.
So that's a huge change of what's gone on.
But since this lasted for decades,
programmers and maybe all of society is used to computers getting faster regularly.
We now believe those of us who are in computer design,
it's called computer architecture, the path forward is instead
is to add accelerators that only work well for certain applications.
So since the Moore's Law is slowing down, we don't think general purpose computers are
going to get a lot faster.
So the Intel processors of the world are not going to have them getting a lot faster.
They've been barely improving, like a few percent a year. It used to be doubling every
team months and now it's doubling every 20 years. So it was just shocking. So to be able to
deliver on what Morse law used to do, we think what's going to happen, what is happening right now
is people adding accelerators to their microprocessors that only work well for some domains.
And by sheer coincidence, at the same time
that this is happening has been this revolution
in artificial intelligence called machine learning.
So with, as I'm sure your other guests have said, AI had these two competing schools of thought,
is that we could figure out artificial intelligence
by just writing the rules top down,
or that was wrong, you had to look at data
and infer what the rules are, the machine learning,
and what's happened in the last decade or eight years,
this machine learning has won.
And it turns out that machine learning,
the hardware you build for machine learning,
is pretty much multiply.
The matrix multiply is a key feature
for the way people machine learning is done.
So that's a godsend for computer design.
It's we know how to make matrix multiply run really fast.
So general purpose market processes are slowing down.
We're adding accelerators for machine learning that fundamentally are doing matrix multiplies
much more efficiently than general-purpose computers have done.
So we have to come up with a new way to accelerate things.
The danger of only accelerating one application is how important is that application.
Turns out machine learning gets used for all kinds of things.
So serendipitously, we found something to accelerate
that's widely applicable.
And we're in the middle of this revolution of machine learning.
We're not sure what the limits of machine learning are.
So this has been kind of a godsend.
If you're going to be able to excel, deliver on improved performance, as long as people
are moving their programs to be embracing more machine learning, we know how to give
them more performance even as more slow slowing down.
And counterintuitively, the machine learning mechanism, you can say is domain specific, but because it's leveraging data,
it actually could be very broad in terms of,
in terms of the domains it could be applied in.
Yeah, that's exactly right.
So it's almost,
sort of people sometimes talk about the idea of software 2.0,
we're almost taking another step up in the abstraction layer
in designing machine learning systems because now your programming in the space of data
in the space of hyperparameters is changing fundamentally the nature of programming.
And so the specialized devices that accelerate the performance, especially neural network-based
machine learning systems might become the new general. Yeah. So the thing that's interesting point out, these are not
coral, these are not tied together. The enthusiasm about machine learning about
creating programs driven from data that we should figure out the answers from data,
rather than kind of top down, which is classically the way most programming is done and the way artificial intelligence
used to be done, that's a movement that's going on at the same time. Coincidentally, and
the first word machine learning is machines, right? So that's going to increase the demand
for computing. Because instead of programmers being smart writing those things down, we're
going to instead use computers to examine a lot of data to kind of create the programs.
That's the idea. And remarkably, this gets used for all kinds of things very successfully.
The image recognition, the language translation, the game playing, and you know, it gets into
pieces of the software stack like databases and stuff
like that.
We're not quite sure how journal purposes is, but that's going on independent of this hardware
stuff.
What's happening on the hardware side is Moore's law is slowing down right when we need
a lot more cycles.
It's failing us, it's failing us right when we need it because there's going to be a greater
increase in computing.
And then this idea that we're going to do so-called domain-specific, here's a domain
that your greatest fear is you'll make this one thing work, and that'll help 5% of the
people in the world, well, this looks like it's a very general purpose thing.
So the timing is fortuitous that if we can perhaps, if we can keep building hardware
that will accelerate machine learning the neural networks, that will beat the timing
will be right. That neural network revolution will transform software, the so-called software
2.0. And the software in the future will be very different from the software in the
past, and just as our microprocessors,
even though we're still going to have that same basic risk instructions to run a big pieces of the software stack like user interfaces and stuff like that,
we can accelerate the kind of the small piece that's computationally intensive.
It's not lots of lines of code, but it takes a lot of cycles to run that code that that's going
to be the accelerator piece.
So that's what makes this from a computer designer's perspective a really interesting
decade.
But Hennessy and I talked about in the title of our Turing Warrant's speech is a new goal
in age.
We see this as a very exciting decade, much like when we were assistant professors and the
risk stuff was going on.
That was a very exciting time, where we were changing what was going on.
We see this happening again, tremendous opportunities of people because we're fundamentally
changing how software is built and how we're running it.
Which layer of the abstraction do you think most of the acceleration might be happening?
If you look in the next 10 years, Google is working on a lot of exciting stuff with the
TPU.
There's a closer to the hardware that could be optimizations around the closer to the
instruction set, that could be optimizations at the compile level, it could be even at
the higher level software stack.
Yeah, it's going to be, I mean, if you think about the old
resist debate, it was both, it was software hardware.
It was the compilers improving as well as the architecture
improving, and that's likely to be the way things are now.
With machine learning, they're using domain-specific
languages, the languages like TensorFlow and PyTorch
are very popular with machine learning people.
Those are the raising the level of abstraction.
It's easier for people to write machine learning
in these domain-specific languages,
like PyTorch and TensorFlow.
So where the most optimization might happen?
Yeah, and so they'll be both the compiler piece and the hardware piece underneath it.
So as you kind of the fatal flaw for hardware people is to create really great hardware,
but not have brought along the compilers. What we're seeing right now in the marketplace,
because of this enthusiasm around hardware for machine learning is getting, you know,
probably a billions of dollars invested in startup companies.
We're seeing startup companies go belly up because they focused on the hardware, but didn't bring the software stack along.
We talked about benchmarks earlier, so I participated in machine learning.
Didn't really have a set of benchmarks.
I think just two years ago they didn't have a set of benchmarks. I think just two years ago, they didn't have a set of benchmarks.
And we've created something called ML Perf,
which is machine learning, benchmarks suite,
and pretty much the companies who didn't invest in software stack
couldn't run ML Perf very well.
And the ones who did invest in software stack did,
and we're seeing, you know, it's like kind of in computer architecture.
This is what happens.
You have these arguments about risk versus this.
People spend billions of dollars in the marketplace to see who wins.
It's not a perfect comparison, but it kind of sorts things out and we're seeing companies
go out of business and then companies like, like, there's a company in an Israel called
Habana.
They came up with machine learning accelerators. They had good ML Perf
scores. Intel had acquired a company earlier called Nirvana a couple years ago. They didn't
reveal them with Perf scores, which was suspicious. But a month ago, Intel announced that they're
canceling the Nirvana product line and they've bought Habana for $2 billion and Intel's going to be shipping
Habana chips which have hardware and software and run the ML Perf programs pretty well and
that's going to be their product line in the future.
Brilliant.
So maybe it's just a linker briefly.
ML Perf, I love metrics, I love standards that everyone can gather around.
What are some interesting aspects of that portfolio
of metrics?
Well, one of the interesting metrics
is what we thought, I was involved in the start.
Peter Mattson is leading the effort from Google.
Google got it off the ground, but we
had to reach out to competitors and say,
there's no benchmarks here.
We think this is bad for the field.
It'll be much better if we look at examples
like in the risk days, there was an effort to create a,
for the people in the risk community got together,
competitors got together,
building risk-bindable processors to agree on a set of benchmarks
that were called spec, and that was good for the industry.
As rather before, the different risk architectures were arguing,
well, you can believe my performance others, but those other guys are liars.
And that didn't do any good.
So we agreed on a set of benchmarks, and then we could figure out who was faster between
the various risk architectures.
But it was a little bit faster, but that grew the market rather than, you know, people were
afraid to buy anything.
So we argued the same thing would happen with ML Perf, you know, companies like Nvidia were, you know, maybe worried that it was some kind of trap, but eventually we all
got together to create a set of benchmarks and do the right thing, right? And we agree on
the results. And so we can see whether TPUs or GPUs or CPUs are really faster and how much
the faster. And I think from an engineer's perspective, as long as the results are fair, you can live with it.
You can tip your hat to your colleagues at another institution,
boy, they did a better job than this.
What you hate is if it's false, right?
They're making claims and it's just marketing bullshit.
That's affecting sales.
From an engineer's perspective,
as long as it's a fair comparison,
and we don't come in first place, that's too bad, but that's affecting sales. So from an engineer's perspective, as long as it's a fair comparison,
and we don't come in first place,
that's too bad, but it's fair.
So we wanted to create that environment for ML Perf.
And so now there's 10 companies,
I mean, 10 universities and 50 companies involved.
So pretty much ML Perf has,
is the way you measure machine learning performance
and it didn't exist even two years ago.
One of the cool things that I enjoy about the internet
has a few downsides, but one of the nice things
is people can see through BS a little better
with the presence of this kind of metrics.
So it's really nice.
A company is like Google and Facebook and Twitter.
Now it's the cool thing to do.
It's to put your engineers forward
and to actually show off how well you do on these metrics.
There's not sort of,
it, well, there's less of a desire to do marketing,
less so.
Am I sort of naive?
No, I think, well, I was trying to understand that,
you know, what's changed from the 80s in this era,
I think because of things like social networking, Twitter trying to understand that, you know, what's changed from the 80s in this era, I think
because of things like social networking, Twitter and stuff like that, if you
if you put up, you know, a bullshit stuff, right, that's just, you know, mispurposely misleading,
you know, that you can get a violent reaction in social media pointing out the flaws in your arguments, right? And, from a marketing perspective, you have to be careful today,
that you didn't have to be careful, that there will be people
who put out the flaw.
You can get the word out about the flaws and what you're saying,
much more easily today than in the past.
It used to be easier to get away with it.
And the other thing that's been happening in terms of
stirring off engineers is just,
and the software side, people have largely embraced
open source software.
It was 20 years ago, it was a dirty word at Microsoft,
and today Microsoft is one of the big proponents
of open source software.
The kind of that's the standard way most software gets built,
which really shows off your engineers
because you can see, if you look at the source code, you can see who are making the commitments,
who are the engineers at all these companies, who are really great programmers and engineers
and making really solid contributions which enhances their
Reputations and the reputation of the companies. Oh, but that's of course not everywhere like in this space that I work
more in is autonomous vehicles and they're still
The machinery of hype and marketing still very strong there and there's less willingness to be open in this kind of open source way and sort of benchmark
So ML Perth is represents the machine learning world is much better
being open source about holding itself to standards of different.
The amount of incredible benchmarks in terms of the different computer vision,
natural angle processing tasks is incredible.
Historically, it wasn't always that way.
I had a graduate student working with me,
David Martin. So, in computer, in some fields, benchmarking has been around forever. So,
computer architecture, databases, maybe operating systems, benchmarks are the way you measure progress.
But he was working with me and then started working
with Jitendra Malik and he's Jitendra Malik
and computer vision space, I guess you've interviewed Jitendra.
And Dave Martin told me, they don't have benchmarks.
Everybody has their own vision algorithm
and the way that my, here's my image,
look at how well I do and everybody had their own image.
So David Martin, back when he did his
dissertation, figured out a way to do benchmarks. He had a bunch of graduate students identify images
and then ran benchmarks to see which algorithms run well. And that was as far as I know, kind of
the first time people did benchmarks in computer vision, which was predated all the things that
eventually led to image net and stuff like that.
But then the vision community got religion and then once we got as far as image net, then that let the guys in Toronto
be able to win the image net competition and then you know that changed the whole world.
It's a scary step actually because when you enter the world of benchmarks,
you actually have to be good to participate as opposed to, yeah, you can just, you just
believe you're the best in the world. I think the people, I think they weren't purposely misleading.
I think if you don't have benchmarks, I mean, how do you know, you know, you could have,
your intuition is kind of like the way we did you to do computer architecture your intuition is that this is the right
Instruction set to do this job. I believe in my experience my hunch is that's true
We had to get to make things more quantitative to make progress and so I just don't know how you know in fields that don't have benchmarks
I don't understand how they figure out how they're making progress. We're kind of in the vacuum tube days of quantum
computing. What are your thoughts in this wholly different kind of space of architectures?
You know, I actually, you know, quantum computing is, ideas been around for a while and I actually
thought, well, sure hope, I retire before I have to start teaching
this. I'd say because I talk about, give these talks about
the slowing of Morse law, and you know, when we need to change
by doing domain-specific accelerators, common questions
say what about quantum computing. The reason that comes up,
it's in the news all the time. So I think the the thing to keep in mind is quantum computing is not right around the corner. There
have been two national reports, one by the National Chemistry Engineering and
other by the computing consortium, whether they did a frank assessment of
quantum computing. And both of those reports said, you know, as far as we can
tell, before you get error corrected quantum computing,
it's a decade away.
So I think of it like nuclear fusion, right?
There have been people who've been excited about nuclear fusion a long time.
If we ever get nuclear fusion, it's going to be fantastic.
For the world, I'm glad people are working on it,
but you know, it's not right around the corner.
Those two reports, to me, say probably it'll be 2030 before
quantum computing is something that could happen. And when it does happen, you know, this
is going to be big science stuff. This is, you know, micro-calvin, almost absolute zero
things that if they vibrate, if truck goes by, it won't work, right? So this will be in
data center stuff.
We're not going to have a quantum cell phone.
And it's probably a 2030 kind of thing.
So I'm happy that other people are working on it, but just, you know, it's hard with all
the news about it, not to think that it's right around the corner.
And that's why we need to do something as Moore's law is slowing down to provide the computing,
keep computing getting better for this next decade. And we shouldn't be betting on quantum computing.
Expecting quantum computing to deliver in the next few years, it's probably further off.
I'd be happy to be wrong, it'd be great if quantum computing is going to commercially viable.
But it will be a set of applications, it's not a general purpose computation.
So it's going to do some amazing things, but there'll be a lot of things that probably,
you know, the old fashioned computers are going to keep doing better for quite a while.
And there will be a teenager 50 years from now watching this video saying, look how silly
David Anderson was saying.
No, I just said, I said 2030.
I didn't say, I didn't say never.
We're not going to have quantum cell phones.
So he's going to be watching on a quantum cell phone.
Well, I, I mean, I, I think this is such a, you know, given that we've had more of a
law, I just, I feel comfortable trying to do projects that are thinking about the next
decade.
I admire people who are trying to do things that are 30 years out, but it's such a fast
moving field.
I just don't know how to.
I'm not good enough to figure out what's the problem.
It's going to be in 30 years.
10 years is hard enough for me.
So maybe if it's possible to untangle your intuition a little bit, I spoke with Jim Keller.
I don't know if you're familiar with Jim. And he he is trying to sort of be a little bit rebellious and to try to think that he
quotes me as being wrong. Yeah. So this is what you're doing for the record. Jim
try talks about that he has an intuition that Moore's law is not in fact dead yet and then it may continue for some time to come.
What are your thoughts about Jim's ideas in this space?
Yeah, this is just this is just marketing.
So but Gordon Moore said is a quantitative prediction.
We can check the facts, right, which is doubling the number of transistors every two years.
facts, right, which is doubling the number of transistors every two years.
So we can look back at Intel for the last five years and ask him, let's look at DRAM chips six years ago.
So that would be three to your periods.
So then our DRAM chips have eight times as many transistors as they did six
years ago.
We can look up Intel microprocessors six years ago.
If Moore's law is continuing,
it should have eight times as many transistors
as six years ago.
The answer in both those cases is no.
The problem has been because Moore's law was kind of
genuinely embraced by the semiconductor industry as they would make
investments in severe equipment to make Morse law come true.
Symmething doctor improving and Morse law in many people's minds are the same thing.
So when I say, and I'm factually correct, that Morse law is no longer holds, we are not
doubling the transistors every years.
The downside for a company like Intel is people think
that means it's stopped,
that technology has no longer improved.
And so Jim is trying to
cataract the impression that symbi-conductors
are frozen in 2019,
are never gonna get better.
So I never said that.
All I said was, Moore's law is no more.
And I'm strictly looking at the number of transistors.
That's what Moore's law is.
There's the, I don't know, there's been this aura
associated with Moore's law that they've enjoyed for 50 years
about, look, the field we're in, we're doubling transistors every two years, what an amazing
field, which is an amazing thing that they were able to pull off.
But even as Gordon Moore said, no exponential can last forever.
It lasted for 50 years, which is amazing.
And this is a huge impact on the industry because of these changes that
we've been talking about. So he claims because he's trying to act on it, he claims, you know,
Patterson says, Morse law is no more and look at all, look at it, it's still going. And TSMC
they say it's longer, but there's quantitative evidence that Morse law is not continuing. So,
what I say now to try and, okay, I understand the perception problem when I say Morse law has stopped.
Now, I say Morse law is slowing down.
I think Jim, which is another way of, if it's predicting every two years and I say it's
slowing down, that's another way of saying it doesn't hold anymore.
I think Jim wouldn't disagree that
it's slowing down because that sounds like it's things are still getting better just not as fast
which is another way of saying Moore's Law isn't working anymore. It's still good for marketing.
But what's your you're not you don't like expanding the definition of Moore's Law. Sort of naturally. It's an educator, you know,
you know, it's just like Bonn and politics.
Is everybody get their own facts?
Or do we have, you know,
Moore's Law was a crisp, you know,
a Moore's, it was carver-mead,
looked at his obstu- Moore's commensations,
drawing on a log, log scale, a straight line.
And that's what the definition of Moore's Law is. There's this other, what Intel did for a log, log scale, a straight line. And that's what the definition of Moore's Law is.
There's this other, what Intel did for a while, interestingly,
before Jim joined them, they said,
oh, no, Moore's Law isn't the number of doubling,
isn't really doubling transistors every two years.
Moore's Law is the cost of the individual transistors going down,
cutting in half every two years.
Now, that's not what he said, but they reinterpreted it
because they believed that the cost of transistors
was continuing to drop even if they couldn't get
twice a minute shifts.
Many people in industry have told me
that's not true anymore.
That basically, in more recent technologies
that got more complicated,
the actual cost of transistor went up.
So even the a corollary might not be true,
but certainly, that was the beauty of Morse law.
It was a very simple, it's like E equals MC squared, right?
It was like, wow, what an amazing prediction.
It's so easy to understand, the implications are amazing.
And that's why it was so famous as a as a prediction. And this this reinterpretation of what it meant and changing is,
you know, is revisionist history. And I'd be happy. And they're not claiming there's a new
more law. They're not saying, by the way, it's instead of every two years, it's every three years.
I don't think they wanna say that.
I think what's gonna happen is new technology
and emissions each one is gonna get a little bit slower.
So it is slowing down, the improvements won't be as great
and that's why we need to do new things.
Yeah, I don't like that the idea of Moore's law
has tied up with marketing.
It would be nice if- Yeah, I don't like that the idea of Moore's Law is tied up with marketing.
It would be nice if...
Whether it's marketing or it's...
Well, it could be affecting business, but it could also be affecting the imagination
of engineers.
If Intel employees actually believe that we're frozen in 2019, well, that would be bad
for Intel.
Not just Intel, but everybody. Yeah, it's inspired more
as laws inspiring. Yeah, well, everybody. But what's happening right now talking to people in who
have working in national offices and stuff like that, a lot of the computer science community is
unaware that this is going on. Right. That we are in an era that's going to need radical change
at lower levels. That could affect the whole software stack. If you're using Cloud stuff
and the servers that you get next year are basically a little bit faster than the servers
you got this year, you need to know that. And we need to start innovating to start delivery
on it.
If you're counting on your software,
your software going to add a lot more features, assuming
the computers are going to get faster, that's not true.
So are you going to have to start making your software stack
more efficient?
Are you going to have to start learning about machine learning?
So it's kind of a warning or call for arms
that the world is changing right now.
And a lot of people, a lot of computer science PhDs are unaware of that.
So, a way to try and get their attention is to say that Moore's Law is slowing down,
and that's going to affect your assumptions.
And, you know, we're trying to get the word out.
And when companies like TSMC and Intel say, oh, no, no, no, Moore's Law is fine,
then people think, okay, I don't have to change my behavior.
I'll just get the next servers.
And you know, if they start doing measurements, though, realize what's going on.
Maybe nice to have some transparency and metrics for the lay person.
To be able to know if computers are getting faster and the forget the
there are. Yeah, there are a bunch of, most people kind of use clock rate is a measure of performance.
It's not a perfect one.
But if you've noticed clock rates are more or less the same as they were five years ago,
computers are a little better than they aren't.
They haven't made zero progress, but they've made small progress.
So there's some indications out there in their behavior, right? Nobody buys the next laptop
because it's so much faster than the laptop from the past.
For cell phones, I think, I don't know why people
buy new cell phones, you know,
because the new ones announced, the cameras are better,
but that's kind of domain-specific, right?
They're putting special purpose hardware
to make the processing of images go much better. So that's kind of domain-specific, right? They're putting special purpose hardware to make the processing of the images go much better.
So that's the way they're doing it.
They're not particularly, it's not that the arm process
in there is twice as fast as much as they've added
accelerators to help the experience of the phone.
Can we talk a little bit about one other exciting space?
Arguably the same level of impact as your work with risk is raid.
And in your, in 1988, you call author to paper a case for redundant to raise of inexpensive discs, hence R.A.I.D raid.
R-A-I-D rate. So that's where you introduce the idea of rate. Incredible. That little, I mean, little, that paper kind of had this ripple effect
and had a really revolutionary effect. So first, what is rate?
What is rate? So this is what I did with my colleague, Randy Katz, and a Star
Graduate Student, Garth Gibson. So we had just done the fourth generation risk project
and a Randy Katz, which had at early Apple Macintosh computer. At this time, everything was done with
floppy disks, which are old technologies that could store things that didn't have much capacity
and you had to get any
work done, you're always sticking in your little floppy disk in and out because they didn't
have much capacity.
They started building what are called hard disk drives, which is magnetic material that
can remember information storage for the Mac.
Randy asked the question when he saw this disc next to his Mac.
Gee, these are brand new small things before that for the big computers that the disc
would be the size of washing machines.
And here's something, the size of kind of the size of a book or so.
So I wonder what we could do with that.
Well, we, the, Randy was involved in the, in the fourth generation risk project here at
Brooklyn in the 80s.
So we figured out a way how to make the computation part, the processor part go a lot faster.
But what about the storage part?
Can we do something to make it faster?
So we hit upon the idea of taking a lot of these disks developed for personal computers
at Macintoshes and putting many of them together instead of one of these washing machine size
things. And so we worked the first draft of the paper and we'd have 40 of these
little PC disks instead of one of these washing machine size things and they
would be much cheaper because they're made for PCs and they could actually kind of
be faster because there was 40 of them rather than one of them. And so we wrote a paper like that and sent it to one of our former Berkeley students at IBM,
and he said, well, this is all great and good, but what about the reliability of these things?
Now, you have 40 of these devices, each of which are kind of PC quality, so they're not as good as these
IBM washing machines, IBM dominated the storage genders.
So the reliability is gonna be awful.
And so when we calculated it out, instead of,
it breaking on average once a year,
it would break every two weeks.
So we thought about the idea and said,
well, we gotta address the reliability.
So we did it originally performance,
but we had to do reliability.
So the name redundant array of inexpensive disks
is array of these disks inexpensive life for PCs,
but we have extra copies.
So if one breaks, we won't lose all the information.
We'll have enough redundancy that we could let some break
and we can still preserve the information.
So the name is an array of inexpensive disks.
This is a collection of these PCs and the R part of the name
was the redundancy so they'd be reliable.
And it turns out if you put a modest number of extra disks
in one of these arrays, it could actually not only be
as faster and cheaper that one of these washing machine disks,
it could be actually more reliable
because you could have a couple of breaks even with these
cheap disks, whereas one failure
with the washing machine thing would knock it out.
Did you have a sense just like with risk that in 30 years that followed, Raid would take
over as a mechanism for storage?
I'd say, I think I'm naturally an optimist, but I thought our ideas were right.
I thought kind of like Morris Law, it seemed to me, if you looked at the history of the
disk drives, they went from washing machine-sized things and they were getting smaller and smaller,
and the volumes were with the smaller disk drives because that's where the PCs were.
So we thought that was a technological trend
that the volume of disk drives was going to be
getting smaller and smaller devices, which were true.
They were the size of a, I don't know,
eight inches diameter, then five inches, then three inches
and diameters.
And so that it made sense to figure out
how to deal things with an array of disks.
So I think it was one of those things where logically we think the technological forces were on our side that it made sense. So we expected
it to catch on, but there was that same kind of business question. You know, IBM was the big
pusher of these destroys in the real world where the technical advantage get turned into a business advantage or not.
It proved to be true.
And so we thought we were sound technically
and it was unclear worth of the business side,
but we kind of as academics, we believe
that technology should win and it did.
And if you look at those 30 years,
just from your perspective,
are there interesting developments in the space of storage that have happened in that time?
Yeah, the big thing that happened, well, a couple of things that happened, what we did
had a modest amount of storage, so as redundancy, as people built bigger and bigger storage
systems, they've added more redundancy so they could add more failures.
And the biggest thing that happened in storage is for decades,
it was based on things physically spinning
called hard disk drives.
We used to turn on your computer and it would make a noise.
What that noise was was the disk drives spinning
and they were rotating at like 60 revolutions per second.
And it's like, if you remember the vinyl records,
if you've ever seen those, that's what it looked like.
And there was like a needle like on a vinyl record
that was reading it.
So the big drive change is switching that over
to a semi-givered technology called Flash.
So within the last, I'd say, about decade
is increasing fraction of all the computers in the world
are using
semiconductor for storage, the flash drive, instead of being magnetic, they're optical,
they're, well, they're a semiconductor writing of information very densely.
That's many huge difference.
All the cell phones in the world use Flash.
Most of the laptops use Flash.
All the embedded devices use Flash instead of storage.
Still in the cloud,
magnetic disks are more economical than Flash,
but they use both in the cloud.
So it's been a huge change in the storage industry.
This, the switching from primarily disk
to being primarily semiconductor.
For the individual disk, but still the rate mechanism applies to those different kinds
of disks.
Yes.
The people will still use rate ideas because it's kind of what's different, kind of interesting,
kind of psychologically, if you think about it.
People have always worried about the reliability of computing since the earliest days.
But if we're talking about computation, if your computer makes a mistake and the computer
says, the computer has worries to check and say, we screwed up, we made a mistake, what
happens is that program that was running, you have to redo it, which is a hassle. For storage, if you've sent important information away
and it loses that information, you go nuts.
This is the worst, oh my God.
So if you have a laptop and you're not backing it up
on the cloud or something like this
and your disk drive breaks, which it can do,
you'll lose all that information and you just go crazy, right?
So the importance of
reliability for storage is tremendously higher than the importance of reliability for computation
because of the consequences of it so yes so rate ideas are still very popular even with the
switch of the technology although you know flash drives are more reliable you know if you're not
doing anything like backing it up to get some redundancy,
so they handle it.
You're taking great risks.
You said that for you, and possibly for many others, teaching and research don't conflict
with each other as one might suspect, and in fact, they kind of complement each other.
So maybe a question I have is, how is teaching helped you in your research
or just in your entirety as a person who both teaches and does research and just thinks
and creates new ideas in this world?
Yes, I think what happens is when you're a college student, you know there's this kind
of tenure system and doing research. So kind of this model that is popular in America, I think America really made it happen, is
we can attract these really great faculty to research universities because they get to
do research as well as teach.
And that, especially in fast moving fields, this means people are up to date and they're teaching
those things.
But when you run into a really bad professor, a really bad teacher,
I think the students think, well, this guy must be a great researcher.
Why else could he be here? So, you know, after 40 years at Berkeley, we had a retirement party and I got a chance to reflect and I looked it back at some things. That is not my experience.
There's a, I saw a photograph of five of us in the department who won the Distinguished Teaching Award from campus a very high honor
You know what I've got one of those one the highest honors. So there are five of us on that picture. There's manual Blum
Richard Carp me Randy Kassin John Osterhawk
Contemporaries of mine. I mentioned Randy already all of us are in the National Academy of Engineering.
We've all run the Distinguished Teaching Award.
Blum, carp and iron all have touring awards.
Just touring awards, right?
The highest award in computing.
So that's the opposite, right?
What's happened is they're highly correlated.
So probably the other way to think of it if
If you're very successful people are maybe successful at everything they do it's not an either or and
But it's an interesting question whether specifically that's probably true
But specifically for teaching if there's something in teaching that it's the Richard Feynman right?
Yeah, is there something about teaching that actually makes your research
Richard Feynman, right? Right.
Is there something about teaching that actually makes your research, makes you think deeper
and more outside the box and more and more so?
Absolutely.
I was going to bring up Feynman.
I mean, he criticized the Institute of Advanced Studies.
So, the Institute of Advanced Studies was this thing that was created in your Princeton
where Einstein and all these smart people went.
And when he was invited, he thought it was a terrible idea.
This is a university. It was supposed to be heaven, right? A university without any teaching.
But he thought it was a mistake. It's getting up in the classroom and having to explain
things to students and having to mask questions. Like, well, why is that true? Makes you stop
and think. So he thinks he thought, and I agree, I think that interaction between a great
university and having students with bright young men asking hard questions at all to think he thought and I agree, I think that interaction between a great university
and having students with bright young men asking hard questions at the whole time is synergistic.
And a university without teaching wouldn't be as vital and exciting a place.
And I think it helps stimulate the research.
Another romanticized question, but what's your favorite
concept or idea to teach? What inspires you or you see
inspired the students? Is there something to post in my or
puts the fear of God in them? I don't know, whichever is
most effective. I mean, in general, I think people are
surprised. I've seen a lot of people who don't think they
like teaching,
come give guest lectures or teach a course
and get hooked on seeing the lights turn on, right?
Is people, you can explain something to people
that they don't understand, and suddenly they get something,
you know, that's not, that's important and difficult,
and just seeing the lights turn on is a,
you know, it's a real satisfaction there. I don't think there's any specific example of that
it's just the general joy of seeing them seeing them understand. I have to
talk about this because I've wrestled. I do. Oh you're arts. Yeah. Of course I love
wrestling. I'm a huge I'm Russian. So I sure I had talked to Dan Gabel on the
guest. So I'm young, Gabel is my hero kind of guy. So you wrestled at UCLA among many other things you've
done in your life, competitively in sports and science, so on you've wrestled. Maybe again, continue
with their men's size questions, but what have you learned about
life?
And maybe even science from wrestling or from.
Yeah, that's, in fact, I wrestled at UCLA, but also at El Camino Community College.
And just right now, we were in the state of California, we were state champions at El Camino.
And in fact, I was talking to my mom, and I got into UCLA, but I decided to go to the
community college, which is, it's much harder to go to UCLA than in the community college.
And I asked, why did I make the decision?
Because I thought, it was because of my girlfriend.
She said, well, it was the girlfriend, and you thought the wrestling team was really good.
And we were right.
We had a great wrestling team.
We actually wrestled against UCLA at a tournament and we beat UCLA as a community college, which
is just freshmen and sophomores.
And part of the reason I brought this up is they've invited me back at El Camino to
give a lecture next month.
And so I'm, my friend who is on the wrestling team that we're still together, we're right
now reaching out to other members of the wrestling team that we can get together for
union.
But in terms of me, it was a huge difference.
I was both, I was kind of the age cut off I was December 1st and so I was almost always
the youngest person in my class and I matured later on, you know, our family
matured later. So I was almost always the smallest guy. So, you know, I took, you know, kind
of nerdy courses, but I was wrestling. So wrestling was huge for my, you know, self-confidence
in high school. And then, you know, I kind of got bigger at El Camino and in college. And so I had this kind of physical self-confidence.
And it's translated into research self-confidence.
And also kind of, I've had this feeling even today
in my 70s, if something going on in the streets
that is bad physically, I'm not going to ignore it
Right, I'm going to stand up and try and straighten that out and that kind of confidence just carries through the entirety of your life
Yeah, and the same things happens intellectually if there's something going on where people are saying something that's not true
I feel it's my job to stand up and just like I would in the street if there's something going on
Somebody attacking some woman or something. I'm not I'm not standing by and letting that get away.
So I feel it's my job to stand up. So it's kind of ironically translates.
The other things that turned out for both I had really great college and high school coaches.
And they believed, even though wrestling's an individual sport,
that would be even more successful as a team if we bonded together and you do things
that we would support each other
rather than everybody, you know,
in wrestling it's one-on-one,
and you could be everybody's on their own,
but he felt if we bonded as a team, we'd succeed.
So I kind of picked up those skills
of how to form successful teams and how to,
from wrestling.
And so I think one of, most people would say
one of my strengths is I can create teams of faculty,
large teams of faculty, grad students,
pool altogether for a common goal
and, you know, and, you, often be successful at it.
But I got, I got both of those things from wrestling.
Also, I think, I heard this line about if people are in kind of,
you know, collision, you collision, sports with physical contact
like wrestling or football and stuff like that.
People are a little bit more assertive or something.
So I think that also comes through.
I didn't shy away from the risk debates.
I enjoyed taking on the arguments and stuff like that. So it was, it was a, I'm really
glad I did wrestling. I think it was really good for my self-image and I learned a lot from it. So
I think that's, you know, sports done well. You know, there's really lots of positives you can take
about it, of leadership, you know, how to form teams and how to be successful. So we've talked about metrics a lot.
There's a really cool in terms of bench press and weight lifting,
pound years metric, the developed that we don't have time to talk about,
but it's a really cool and that people should look into.
It's rethinking the way we think about metrics and weight lifting.
But let me talk about metrics more broadly since that appeals to you in all forms.
Let's look at the most ridiculous, the biggest question of the
meeting of life. If you were to try to put metrics on a life well lived, what would those metrics be?
Yeah, a friend, Randy Kat said this. He said, when it's time to sign off, the measure isn't
the number of zeros in your bank account.
It's the number of inches in the obituary in the New York Times.
Let's see.
I think having, you know, this is a cliché, is that people don't die wishing they'd
spent more time in the office, right?
As I reflected upon my career, there have been, you know been a half a dozen things say I've been proud of.
A lot of them aren't papers or scientific results.
Certainly, my family, my wife, we've been married more than 50 years, kids and grandkids.
That's really precious.
Education things I've done, I'm very proud of books and courses. I did some help with
underrepresented groups that was effective. So it was interesting to see what were the things
I reflected. I had hundreds of papers, but some of them were the papers, like the risk and
rate stuff that I'm proud of, but a lot of them were not those things. So people who are just spend
their lives
going after the dollars or going after all the papers
in the world, you know, that's probably not the things
that are afterwards you're gonna care about.
When I was,
just when I got the offer from Berkeley before I showed up,
I read a book where they interviewed a lot of people
in all works of life.
And what I got out of that book was the people
who felt good about what they book was the people who felt good
about what they did was the people who affected people
as opposed to things that were more transitory.
So I came into this job assuming that it wasn't going
to be the papers, it was going to be relationships
with the people over time that I would value.
And that was a correct assessment.
It's the people you work with, the people you can influence,
the people you can help, is the things can influence, the people you can help,
is the things that you feel good about towards into your career. It's not the stuff that's more
transitory. I don't think there's a better way to end it than talking about your family,
the over 50 years of being married to your childhood sweetheart.
Oh, I think I can add is how to, when you tell people you've been married 50 years,
they want to know why.
How? Wow. I can tell you the nine magic words that you need to say to your partner
to keep a good relationship. And the nine magic words are, I was wrong. You are right. I love you.
And you got to say all nine. You can't say, I was wrong. You were right. You're a jerk. You know, you can't say that. So yeah, freely acknowledging that you made a mistake,
the other person was right, and that you love them,
really gets over a lot of bumps in the road.
So that's what I pass along.
Beautifully put, David is a huge honor.
Thank you so much for the book you've written
for the research you've done for changing the world.
Thank you for talking today.
Oh, thanks for the interview.
Thanks for listening to this conversation with David Patterson. And thank you to our sponsors.
The Jordan Harbinger Show and Cash App. Please consider supporting this podcast by going to JordanHarbinger.com slash Lex and downloading Cash App and using code Lex Podcast. Click the links
by the stuff. It's the best way
to support this podcast and the journey I'm on.
If you enjoy this thing, subscribe on YouTube, review it with 5000 on podcast, support it
on Patreon or connect with me on Twitter and Lex Friedman's belt without the e. Try to
figure out how to do that is just F-R-I-D-M-A-N. And now let me leave you
with some words from Henry David Thoreau. Our life is fredded away by detail.
Simplify, simplify. Thank you.