Lex Fridman Podcast - #73 – Andrew Ng: Deep Learning, Education, and Real-World AI
Episode Date: February 20, 2020Andrew Ng is one of the most impactful educators, researchers, innovators, and leaders in artificial intelligence and technology space in general. He co-founded Coursera and Google Brain, launched dee...plearning.ai, Landing.ai, and the AI fund, and was the Chief Scientist at Baidu. As a Stanford professor, and with Coursera and deeplearning.ai, he has helped educate and inspire millions of students including me. EPISODE LINKS: Andrew Twitter: https://twitter.com/AndrewYNg Andrew Facebook: https://www.facebook.com/andrew.ng.96 Andrew LinkedIn: https://www.linkedin.com/in/andrewyng/ deeplearning.ai: https://www.deeplearning.ai landing.ai: https://landing.ai AI Fund: https://aifund.ai/ AI for Everyone: https://www.coursera.org/learn/ai-for-everyone The Batch newsletter: https://www.deeplearning.ai/thebatch/ This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, Medium, or YouTube where you can watch the video versions of these conversations. If you enjoy the podcast, please rate it 5 stars on Apple Podcasts, follow on Spotify, or support it on Patreon. This episode is presented by Cash App. Download it (App Store, Google Play), use code "LexPodcast". This episode is also supported by the Techmeme Ride Home podcast. Get it on Apple Podcasts, on its website, or find it by searching "Ride Home" in your podcast app. Here's the outline of the episode. On some podcast players you should be able to click the timestamp to jump to that time. OUTLINE: 00:00 - Introduction 02:23 - First few steps in AI 05:05 - Early days of online education 16:07 - Teaching on a whiteboard 17:46 - Pieter Abbeel and early research at Stanford 23:17 - Early days of deep learning 32:55 - Quick preview: deeplearning.ai, landing.ai, and AI fund 33:23 - deeplearning.ai: how to get started in deep learning 45:55 - Unsupervised learning 49:40 - deeplearning.ai (continued) 56:12 - Career in deep learning 58:56 - Should you get a PhD? 1:03:28 - AI fund - building startups 1:11:14 - Landing.ai - growing AI efforts in established companies 1:20:44 - Artificial general intelligence
Transcript
Discussion (0)
The following is a conversation with Andrew Eng, one of the most impactful educators,
researchers, innovators, and leaders in artificial intelligence and technology space in general.
He co-founded Coursera and Google Brain, launched Deep Learning AI,
Lending AI, and the AI Fund, and was the Chief Scientist of Baidu.
As a Stanford professor and with Coursera in Deep Learning AI,
he has helped educate and inspire millions of students, including me. This is the Artificial Intelligence
Podcast. If you enjoy it, subscribe on YouTube, give it 5 stars and Apple podcasts, support
it on Patreon or simply connect with me on Twitter at Lex Friedman spelled
F-R-I-D-M-A-N. As usual, I'll do one or two minutes of ads now and never any ads
in the middle that can break the flow of the conversation. I hope that works for
you and doesn't hurt the listening experience. This show is presented by CashApp,
the number one finance app in the App Store. When you get it, use CoLex Podcast.
CashApp lets you send money to friends by Bitcoin and invest in the stock market with
as little as $1.
brokerage services that are provided by CashApp investing, a subsidiary of Square, a member
SIPC.
Since CashApp allows you to buy Bitcoin, let me mention that cryptocurrency in the context
of the history of money is fascinating.
I recommend a cent of money as a great book on this history.
Debates and credits on ledgers started over 30,000 years ago.
The US dollar was created over 200 years ago, and Bitcoin, the first decentralized cryptocurrency, released
just over 10 years ago.
So given that history, cryptocurrency is still very much in its early days of development,
but is still aiming to, and just might redefine the nature of money.
So again, if you get cash out from the App Store or Google Play and use the code Lex Podcast,
you'll get $10 and cash out will also do an $10 to first, one of my favorite organizations
that is helping to advance robotics and STEM education for young people around the world.
And now here's my conversation with Andrew Ang. The courses you taught on machine learning as Stanford and later on Coursera, the Ego
founded, have educated and inspired millions of people.
So let me ask you, what people are ideas inspired you to get into computer science and machine
learning when you were young?
When did you first fall in love with the field?
There's another way to put it.
Drinking up in Hong Kong Singapore, I started learning to code when I was five or six years
old.
At that time, I was learning the basic programming language, and I would take these books and
they'll tell you, type this program into your computer, type that program into my computer.
And as a result of all that typing, I would get to play these very simple shoot them up games
that I had implemented on my on my little computer. So I thought it's fascinating as a
young kid that I could write this code that's really just copying code from a
book into my computer to then play these cool video games. Another moment for
me was when I was a teenager and my father, because his doctor was reading about
expert systems and about neural networks.
So he got me some of these books and I thought it was really cool.
You could provide a computer that started to exhibit intelligence.
Then I remember doing an internship while I was in high school.
This was in Singapore where I remember doing a lot of photocopying
and office assistance.
And the highlight of my job was when I got to use this shredder.
So the teenager of me, remember thinking boy, this is a lot of photocopying.
If only we could write software, build a robot, something to automate this.
Maybe I could do something else.
So I think a lot of my work since then has centered on the theme of automation.
Even the way I think about machine learning today, we're very good at writing learning
algorithms that can automate things that people can do.
Or even launching the first MOOCs mess open online courses that later let's call Sarah,
I was trying to automate what could be automatable in how I was teaching on campus.
Process of education tried to automate parts of that to make it more, to have more impact
from a single teacher or a single educator.
Yeah, I felt, you know, teaching at Stanford, teaching machine learnings about 400
years at the time.
And I found myself filming the exact same video every year, like telling the same jokes,
the same room.
And I thought, why am I doing this?
Why don't we just take last year's video
and then I can spend my time building a deeper relationship
with students.
So that process of thinking through how to do that,
that led to the first moves that we launched.
And then you have more time to write new jokes.
Are there favorite memories from your early days
as Stanford teaching thousands of people in person
and then millions of people online?
You know, teaching online,
what not many people know was that a lot of those videos
were shot between the hours of 10 p.m. and 3 a.m.
A lot of times, launching the first was was our standard, we already announced the course
by the 100,000 people that signed up.
We just started to write the code
and we had not yet actually filmed the video.
So we had a lot of pressure, 100,000 people
waiting for us to produce the content.
So many Friday, Saturdays, I would go out,
have dinner with my friends, and then I would
think, okay, do you want to go home now or do you want to go to the office to film videos?
And the thought of being able to help 100,000 people potentially learn machine learning,
fortunately that made me think, okay, I want to go to my office, go to my tiny little
recording studio, I would adjust my logic head webcam, adjust
my Wacom tab, make sure my lapel mic was on, and then it was not recording, off and
until 2 a.m. or 3 a.m. I think unfortunately, it doesn't show that it was recorded that
later night, but it was really inspiring the thought that we could create content to
hope so many people learn about machine learning.
How do that feel, the fact that you're probably somewhat alone,
maybe a couple of friends recording with a Logitech webcam
and kind of going home alone at 1 a.m. at night
and knowing that that's going to reach
sort of thousands of people, eventually millions of people,
is what's that feeling like?
I mean, is there a feeling of just satisfaction of pushing through?
I think it's humbling and I wasn't thinking about what I was feeling.
I think one thing we found proud to say we got right from the early days was,
I told my whole team back then that the number one priority is to do what's best for learners, do what's best for students. And so when I went to
the recording studio, the only thing on my mind was, what can I say, how can I
design my slides? Well, I need to draw right to make these concepts as clear as
possible for learners. I think, you know, I've seen sometimes instructors
is tempting to, Hey, let's talk about my work. Maybe if I teach you about my
research, someone will cite my papers a couple more times. And I think one thing we got right, launched the
first few MOOCs in later building Coursera was putting a place that bedrock principle of,
let's just do what's best for learners and forget about everything else. And I think that
that as a guiding principle turned out to be really important to the rise of the movement. And the kind of learner you're imagining in your mind is as broad as possible, as global
as possible, so really try to reach as many people interested in machine learning and AI as
possible.
I really want to help anyone that had an interest in machine learning to break into the field.
And I think sometimes I've eventually, people ask me,
hey, why are you spending so much time explaining
great in descent?
And my answer was, if I look at what I think
the learning needs and what benefit from,
I felt that having that a good understanding
of the foundations, kind of the back to the basics,
would put them in a better state
to then build on a long-term career.
So try to consistently make
decisions on that principle.
So one of the things you actually revealed to the narrow AI community at the time and
to the world is that the amount of people who are actually interested in AI is much larger
than we imagined. By you teaching the class and how popular it became. It showed that, wow, this isn't just a small community of
sort of people who go to new reps and it's much bigger.
It's developers, it's people from all over the world from
I'm Russian, so everybody in Russia is really interested.
This is a huge number of programmers who are interested in machine learning,
India, China, South America, everywhere. There's just millions of people who are interested in machine
learning. So how big do you get a sense that this, the number of people is that are interested
from your perspective? I think the numbers grow over time. I think of one of those things that maybe
it feels like it came out of nowhere, but it's an inside that I've built in it.
It took years.
It's not those overnight successes that took years to get there.
My first four year into this type of online education was when we were filming my Stanford
class and sticking the videos on YouTube.
And then some of the things we had uploaded, the whole world and so on.
But you know, basically the one hour or 15 minute video that we put on YouTube. And then we had four or five other
versions of websites that had built most of which you would never have heard of because they
reached small audiences, but that allowed me to iterate, allowed my team and me to iterate,
to learn what the ideas that work and what doesn't. For example, one of the features I was really
excited about and really proud of was build this website where multiple people could be logged into the website at the same time.
So today, if you go to a website, you know, if you are logged in and then I want to log
in, you need to log out.
It was the same browser, the same computer.
But I thought, well, what if two people say you and me were watching a video together in
front of a computer?
What if a website could have you type your name and password, have me type my name and password. And then now the computer knows both of us are
watching together and it gives both of us credit for anything we do as a group. Infantist feature
rolled it out in a school in San Francisco. We had about 20 something users. Where's the teacher
there? Sacred Heart Cathedral prep. The teacher is great. And guess what, zero people use this feature.
Turns out people studying online,
they want to watch the videos by themselves.
You can play back, pause at your own speed,
rather than in groups.
So that was one example of a tiny lesson learned.
All the many that allowed us to hone in to the set of features.
And it sounds like a brilliant feature.
So I guess the lesson to take from that is there's something that looks amazing on paper
and then nobody uses it.
It doesn't actually have the impact that you think it might have.
So yeah, I saw that you really went through a lot of different features and a lot of ideas
to arrive at the final, I'd Corsair,, at the final kind of powerful thing that showed the
world that MOOCs can educate millions.
And I think with the whole machine learning movement as well, I think it didn't come out
of nowhere instead.
What happened was as more people learn about machine learning, they will tell their
friends and their friends will see how the pick will to their work.
And then the community kept on growing.
And I think we're still growing.
I don't know in the future what percentage of all
developers will be AI developers.
I could easily see it being north of 50% because so many AI
developers broadly can't do, not just people doing the
machine learning modeling, but the people building infrastructure, data pipelines,
you know, all the software surrounding the core machine learning model.
Maybe it's even bigger.
I feel like today almost every software engineer has some understanding of the cloud, not all, you know,
maybe this is my microcontroller developer, doesn't need to do the cloud.
But I feel like the vast majority of software engineers
today are sort of having the patience to cloud.
I think in the future, maybe we'll approach nearly 100%
of all developers being in some way an AI developer
or at least having an appreciation of machine learning.
And my hope is that there's this kind of effect
that there's people who are not really interested in being a programmer
or being into software engineering,
like biologists, chemists, and physicists,
even mechanical engineers, all these disciplines
that are now more and more sitting on large data sets.
And here, they didn't think they're interested in programming
until they have this data set,
and they realize there's this set of machine learning tools
that allow you to use the data set.
So they actually become, they learn to program
and they become new programmers.
So like the, not just because you've mentioned
a larger percentage of developers become machine learning
people, it seems like more and more the,
the kinds of people who are becoming developers
is also growing significantly.
Yeah, I think once upon a time,
only a small part of humanity was literate.
You know, it could read and write.
And maybe you thought maybe not everyone
needs to learn to read and write.
You know, you just go listen to a few monks, right?
Read to you, and maybe that was enough.
Or maybe just need a few handful of authors
to write the best sellers, and then no one else needs to write.
But what we found was that by giving as many people, you know, in some countries, almost
everyone, basic literacy, it dramatically enhanced human to human communications and we
can now write from audience of one such as if I sent you an email or you sent me an email.
I think in computing, we're still in that phase where so
few people know how the code that the code is mostly have to cope for relatively
large audiences. But the very one, well, most people became developers at some
level, similar to how most people in developed economies are somewhat
literate. I would love to see the owners of a mom and pop store, people
that write a little bit of code to customize the TV display for their special this week.
I think it will enhance human to computer communications, which is becoming more and
more important today as well.
So you think it's possible that machine learning becomes kind of similar to literacy where,
yeah, like you said, the owners of a mom and pop shop is basically everybody in all walks of life would have some
degree of programming capability. I
Could see society getting there
There's one interesting thing. Yeah, if I go talk to the mom and pop store
If I talk to a lot of people in their daily professions
I previously didn't have a good story for why they should learn to code.
You know, we could give them some reasons.
But what I found with the rise of machine learning and data science is that I think the number of people with a concrete use for data science
in their daily lives, in their jobs, maybe even larger than the number of people with concrete use for software engineering.
For example, if you run a small mom and pop store,
I think if you can analyze the data about your sales,
your customers, I think there's actually real value there,
maybe even more than traditional software engineering.
So I find that for a lot of my friends
in various professions, be it recruiters or accountants
or people that work in factories,
which I do with more and more these days, I feel if they were
data scientists at some level, they could immediately
use that in their work.
So I think that data science and machine learning
may be an even easier entree into the developer world
for a lot of people than software engineering.
That's interesting.
And I agree with that, but that's a beautifully put.
We live in a world where most courses and talks have slides, part point keynote. And yet, you famously often still use a marker and a whiteboard. The simplicity of that is compelling
and for me at least fun to watch. So let me ask, why do you like using a marker and whiteboard,
even on the biggest of stages?
I think it depends on the concepts you want to explain.
For math and ethical concepts, it's nice to build up the equation one piece of the time.
And the whiteboard marker or the penistyle list is a very easy way to build up the equation,
to build up a complex concept, one piece of the time while you're
talking about it and sometimes that enhances understandability.
The downside of writing is that it's slow and so if you want a long sentence it's very
hard to write that.
I think they're frozen cons and sometimes I use slides and sometimes I use a whiteboard
or a stylus.
The slowness of a whiteboard is also its upside because it forces you to reduce
everything to the basics. So some of some of your talks involve the whiteboard.
I mean it's really not what you go very slowly and you really focus on the
most simple principles and that's a beautiful that enforces a kind of a
minimalism of ideas that I think is surprising to me is great for education.
Like a great talk, I think, is not one that has a lot of content.
A great talk is one that just clearly says a few simple ideas.
And I think you look, the whiteboard's some Halloween forces that
Peter Rabiel, who's now one of the top
roboticists, an reinforcement learning experts in the world, was your first PhD
student. So I bring him up just because I kind of imagine this was must have
been an interesting time in your life. Do you have any favorite memories of
working with Peter? This is your your first student in those uncertain times, especially before deep learning really sort of blew up
any favorite memories from those times? Yeah, I was really fortunate to have at
Peter at the US, my first PhD student, and I think even my long-term professional
success builds on early foundations or early work that
that Peter was so critical to, so it was really grateful to him for working with me.
You know, what not a lot of people know is just how hard research was and and still is
Peter's PhD thesis was using reinforcement learning to fly helicopters.
PhD thesis was using reinforcement learning to fly helicopters. And so, you know, even today, the website, heli.stanfit.edu, H-E-O-I.stanfit.edu, you
still watch videos of us using reinforcement learning to make a helicopter fly outside
down, fight loops, rolls.
This is cool.
So, one of the most incredible robotics videos ever, so people still watch it.
Oh, yeah, thanks.
Inspiring.
That's from, like, 2008 or 2006, like that range. Oh, yeah. Thanks. It's inspiring. That's from like 2008 or 2006.
Yeah.
Like that range.
Something like that.
It's like, yeah, it's over 10 years out.
That was really inspiring to a lot of people.
Yeah.
And what not many people see is how hard it was.
So Peter and Adam Coase and Morgan Quigley and I were working on various versions of the
helicopter.
And a lot of things did not work.
For example, turns out one of the hardest problems we had was
when the helicopter is flying around upside down,
doing stunts, how do you figure out the position,
how do you localize a helicopter.
So we want to try also things.
Having one GPS unit doesn't work
because you're flying upside down,
GPS units facing down, so you can't see the satellites.
So we experimented trying to have two GPS units when facing up, one facing down, so if you flip over, that didn't work, because
the downward facing one couldn't synchronize if you're flipping quickly. More and quickly,
was exploring this crazy complicated configuration of specialized hardware to interpret GPS signals,
looking to FPG's completely insane. Spent about a year working on that didn't work.
So I remember Peter, great guy, him and me, you know, sitting down in my
office, looking at some of the latest things we had tried that didn't work
and saying, you know, done it like what now?
Because because we tried so many things and it just didn't work.
In the end, what we did, and Adam Coles was crucial to this,
was put cameras on their ground
and use cameras on the ground to localize a helicopter.
And that soft localization problem
so that we could then focus on the reinforcement learning
and investment reinforcement learning techniques
so it didn't actually make the helicopter fly.
And you know, I'm reminded when I was doing
this work at Stanford around that time,
there was a lot of reinforcement learning theoretical papers,
but not a lot of practical applications.
So the autonomous helicopter work for flying helicopters
was one of the few practical applications
of reinforcement learning at the time,
which caused it to become pretty well known.
I feel like we might have almost come full circle
with today, there's so much, but so much hype,
so much excitement about reinforcement learning,
but the game we're hunting for more applications
and all of these great ideas that the communities come up with.
What was the drive in the face of the fact that most people are doing theoretical work?
What motivates you, and the uncertainty and the challenges, to get the helicopter to
do the applied work, to get the actual system to work, and the face of fear, uncertainty,
the setbacks, the mention for localization?
I like stuff that works.
I love physical world.
So like it's back to the shredder and this.
You know, I like theory, but when I work on theory myself, and this is
personal taste, I'm not seeing anyone else should do what I do.
But when I work on theory, I proceed and draw it more.
If I feel that my the work I do will influence people,
have positive impact, or hope someone.
I remember when many years ago, I was speaking with a mathematics professor, and it kind of
just said, hey, why do you do what you do?
And then he said, he had stars in his eyes when he answered, and this mathematician, not from Stanford,
at different universities, he said, I do what I do because it helps me to discover truth
and beauty in the universe.
He had stars in his eyes when he said, and I thought, that's great.
I don't want to do that.
I think it's great that someone does that, fully supportive people that do it, a lot of
respect for people that, but I am more motivated when I can see a line to how the work that my team is doing helps people.
The world needs all sorts of people, I'm just one type, I don't think everyone should do things the same way as I do.
But when I delve into either theory or practice, if I press the have conviction, you know,
that here's a path for it to help people, I find that more satisfying to have that conviction.
That's your path.
You were a proponent of deep learning before gained widespread acceptance.
What did you see in this field that gave you confidence?
What was your thinking process like in that first decade of the, I don't know what that's
called, 2000s, the arts?
Yeah, I can tell you the thing we got wrong and the thing we got right.
The thing we really got wrong was the importance of the early importance of unsupervised learning.
So early days of Google Brain, we put a lot of effort into unsupervised learning, rather in supervised learning. So early days of Google Brain, we put a lot of effort into unsupervised learning
rather than supervised learning. And those are the arguments. I think it was around 2005
after, you know, Neurops at that time called Nips, but now Neurops had ended. And Jeff
and I were sitting in the cafeteria outside, you know, the conference, we had lunch
with this chatting. And Jeff pulled up this napkin. He started sketching this argument on a napkin. It was very compelling, I'll repeat it.
Human brain has about 100 trillion, so there's 10 to the 14 synaptic connections. You will
live for about 10 to the 9 seconds, that's 30 years. You actually live for two by 10 to
the 9, maybe three by 10 to the 9 seconds. So just let's say 10 to the nine.
So if each synaptic connection,
each weight in your brain's neural network
has just a one bit parameter,
that's 10 to the 14 bits you need to learn
in up to 10 to the nine seconds of your life.
So via this simple argument,
which is a lot of problems, it's very simplified,
that's 10 to the five bits per second you need to learn
in your life.
And I have a one-year-old daughter.
I am not pointing out 10 to five bits per second
of labels to her.
So, and I think I'm a very loving parent,
but I'm just not going to do that.
So from this very crude, definitely problematic
argument, there's just no way that most of what we know is through supervised
learning. But why if you get so many specific information, this from sucking
in images, audio, just experiences in the world. And so that
argument, and there are a lot of known forces, argument, you know, go
going to really convince me that there's a lot of power to unsuvised
learning. So that was the part that we actually maybe got wrong. I still think unsuvised learning
is really important, but we, but in the early days, you know, 10, 15 years ago, a lot of us
thought that was the path forward. Oh, so you're saying that that perhaps was the wrong
intuition for the time for the time., that was the part we got wrong.
The part we got right was the importance of scale.
So Adam Coates, another wonderful person,
fortunate to have worked with him.
He was in my group at Stanford at the time,
and Adam had run these experiments as that,
showing that the bigger we train a learning algorithm,
the better better performance.
And it was based on that, it was a graph that Adam generated, you know, where the x-axis, y-axis lies going up into the right.
So, biggie made the thing, the better performance, accuracy is the birth of the axis.
So, it's really based on that chart that Adam generated, that gave me the conviction that
you could scale these models way bigger than what we could on the few CPUs, which is where we had a standard that we could get even better results.
And it was really based on that one figure that Adam generated that gave me the conviction
to go with Sebastian through and to pitch, you know, starting a project at Google,
which became the Google Brain project.
Brain, you go find Google Brain. And there the intuition was,
which became the Google Brain Crunch. Brain, you go find a Google Brain,
and there the intuition was scale will bring performance
for the system so we should chase a larger and larger scale.
And I think people don't realize how ground breaking of it
is simple, but the ground breaking idea
that bigger data sets will result in better performance.
It was controversial at the time.
Some of my well-meaning friends,
you know, seeing people in the machine learning community,
I won't name, but who's people,
some of whom we know.
My well-meaning friends came and were trying to give me
a friend of us, like, hey, Andrew, why are you doing this?
This is crazy.
It's in the near- and-athlete architecture.
Look at these architectures of building.
You just wanna go for scale, like this is a bad career move.
So my well-meaning friends, you know, we're trying to,
some of them, we're trying to talk me out of it. But I find that if you want to make a breakthrough,
you sometimes have to have conviction and do something before it's popular since that,
let's see, have a bigger impact. Let me ask you just in a small tangents on that topic, I find myself arguing with people
saying that greater scale, especially in the context of active learning, so very carefully
selecting the dataset, but growing the scale of the dataset is going to lead to even further
breakthroughs in deep learning.
And there's currently pushback at that idea that larger datasets are no longer,
so you wanna increase the efficiency of learning,
you wanna make better learning mechanisms.
And I personally believe that bigger datasets
will still with the same learning methods
we have now will result in better performance.
What's your intuition at this time on the dual side,
is do we need to come up with better architectures for learning, or can we just get bigger, better datasets that will improve performance?
I think both are important. And it's also problem dependent. So for a few datasets, we may be
approaching, you know, a Bay Zero rate or approaching a surpassing human level performance. And then there is that
theoretical ceiling that we will never surpass a base error rate.
But then I think there are plenty of problems where where we're
still quite far from either human level performance or from
base error rate and bigger data sets with neural networks
without further average innovation will be sufficient to take us further.
But on the flip side, if we look at the recent breakthroughs
using your Transformer Networks for language models,
it was a combination of novel architecture,
but also scale had a lot to do with it.
We look at what happened with GP2 and Birds.
I think scale was a large part of the story.
Yeah, that's not often talked about.
The scale of the data set was trained on
and the quality of the data set because there's some,
so it was like redded threads that had,
they were operated highly.
So there's already some weak supervision
on a very large data set that people don't often talk about, right?
I find it today, we have, we're maturing processes to managing code,
things like Git, right, version control.
It took us a long time to evolve the good processes.
I remember when my friends and I were emailing each other C++ files and email,
you know, but then we had, was that CVS a version Git, maybe something else in the future.
We're very
mature in terms of tools of managing data and think about the clean data and
how the soft on very hot messy data problems. I think there's a lot of innovation
there to be had still. I love the idea that you were versioning through email.
I'll give you one example. When we work with manufacturing companies, it's not at all uncommon for there
to be multiple labeless that disagree with each other.
And so, doing the work in visual inspection, we will take a plastic pot and show it to
one inspector.
And the inspector, sometimes very opinionated, they'll go clearly that's a defect, the scratch on the central gallery, check this part,
take the same part to different inspector, different, very opinionated, clearly the
scratch is small, it's fine, don't throw it away, you're going to make us use.
And then sometimes you take the same plastic part, show it to the same inspector in
the afternoon, as well as in the morning, and very opinionated, go in the morning to say clearly it's okay and the afternoon equally confident clearly this is a defect.
And so what does the AI team suppose to do if if sometimes even one person doesn't agree
with himself of a self in the span of a day.
So I think these are the types of very practical, very messy data problems that that you know
that my team's wrestle with.
In the case of large consumer internet companies where you have a billion users, you have a
lot of data, you don't worry about it, just take the average, it kind of works. But in the case of
other industry settings, we don't have big data, if you're just a small data, very small data
says maybe on the 100 defective parts or 100 examples
of a defect. If you have only 100 examples, these little labeling errors, you know, if 10
of your 100 labels are wrong, that actually is 10% of your data is there as a big impact.
So how do you clean this up? What are you supposed to do? This is an example of the types of
things that my team, this is LAN AI example are wrestling with,
to deal with small data,
which comes up all the time,
once you're outside, consume the internet.
Yeah, that's fascinating.
So then you invest more effort and time
in thinking about the actual labeling process.
What are the labels?
What are the harder disagreements resolved
and all those kinds of pragmatic, real world problems?
That's a fascinating space.
Yeah, I find it actually when I'm teaching at Stanford, I increasingly encourage students
at Stanford to try to find their own project for the end of term project. Rather than just
downloading someone else's nicely clean data set, it's actually much harder if you need
to go and define your own problem and define your own dataset, rather than go to one of the several good websites, very good websites,
with clean scopes datasets that you could just work on.
You're not running three efforts, the AI Fund, landing AI, and deeplearning.ai.
As you've said, the AI Fund is involved in creating new companies from scratch,
landing AI is involved in helping already established companies do AI,
and deep learning AI is for education of everyone else
or of individuals interested of getting into the field and exiling in it.
So let's perhaps talk about each of these areas first,
deeplearning.ai, how the basic question,
how does a person interested in deep
learning get started in the field? Deep learning.ai is working to create
causes to help people break into AI. So my machine learning course that I
taught through Stanford, it means one of the most popular causes on the
course era. To this day, it's probably one of the courses.
So if I ask somebody, how did you get into machine learning
or how did you fall in love with machine learning
or get you interested, it always goes back to
and rang at some point.
So you've influenced the amount of people
you've influenced as a Dico.
So for that, I'm sure I speak for a lot of people
say big thank you.
No, yeah, thank you.
You know, I was once once reading a news article.
I think it was tech review and I'm going to mess up the statistic.
But I remember reading article that said something like one third of our programmers are self-taught.
I may have the number one third around me was two thirds.
But when I read that article, I thought this doesn't make sense. Everyone is self-taught. I may have the number one third around me was two thirds, but when I read that article, I thought, this doesn't make sense. Everyone is self-taught. So because you teach yourself, I don't teach people.
I just know that's well put. So yeah, so how does one get started in deep learning? And where does deep learning
that AI fit into that? So the deep learning specialization off of my deep learning AI is, I think,
offered by D1DI is, I think it was Coursera's talk specialization. It might still be.
So it's a very popular way for people to take that specialization, to learn about everything
from neural networks, to how to tune in your network, to what is the confnet, to what is
a RNN or a sequence model, or what is the attention model.
And so the D dealing specialization steps everyone
through those algorithms.
So you deeply understand it and can implement it
and use it for whatever.
From the very beginning, so what would you say
of the prerequisites for somebody
to take the deep learning specialization
in terms of maybe math or programming background?
Yeah, I need to understand basic programming
since there are programming sizes in Python.
And the map prerec is quite basic.
So no calculus is needed.
If you know calculus is great,
you get better intuitions.
But did we really try to teach that specialization
without requiring calculus?
So I think high school math would be sufficient.
If you know how the mouse play two matrices,
I think that that that that that that's great.
So little basically in your algebra.
It's great.
Basically in the algebra,
even very, very basically in the algebra
in some programming.
I think that people that done the machine learning course
will find a deep learning specialization a bit easier,
but it's also possible to jump into the deep learning
specialization directly, but it'll be a little bit harder since we tend to go over faster concepts
like how this great insight work and what is an objective function which is covered more slowly
in the machine learning course. Could you briefly mention some of the key concepts in deep learning
that students should learn that you envision them learning in the first few months in the first year or so.
So, if you take the deep learning specialization, you learn the foundations of what is in your
network.
How do you build up a neural network from a single literacy unit, stack of layers to different
activation functions.
You learn how to train the neural networks.
One thing I'm very proud of in that specialization
is we go through a lot of practical know-how
of how to actually make these things work.
So what are the differences between
different optimization algorithms?
What do you do with the algorithm overfitting?
So how do you tell the algorithms overfitting?
When do you collect more data?
When should you not bother to collect more data?
I find that even today unfortunately,
there are engineers that will spend six months trying
to pursue a political direction such as collect more data because we heard more data is valuable.
But sometimes you could run some tests and could it figure out six months earlier that for
this particular problem collecting more data isn't going to cut it, so just don't spend
six months collecting more data, spend your time modifying the architecture or trying something else.
So go through a lot of the practical know-how so that when someone, when you take the
deep-night specialization, you have those skills to be very efficient in how you build
these networks.
So dive right in to play with the network to train it, to do the inference on a particular dataset to build an intuition about it without, without building it up too big to where you
spend like you said six months learning, building up your big project without building an
intuition of a small, small aspect of the data that could already tell you everything
you need to know about that data.
Yes, and also the systematic frameworks of thinking for how to go about building practical
machine learning.
Maybe to make an analogy, when we learn to code, we have to learn the syntax of some
programming language, right, be it Python or C++ or Octave or whatever, but that equally important
or maybe even more important part of coding is to understand how to string together these
lines of code and to coherent things.
So, you know, when should you put something on the function column, when should you not? How do you think about abstraction?
So those frameworks are what makes the programmer efficient, even more than understanding the syntax.
I remember when I was an undergrad at Carnegie Mellon, one of my friends with Debug, their code, by first trying
to compile it, and then it was C++ code. And then every line that is syntax error, they
want to get over the syntax errors as quickly as possible. So how do you do that? Well,
they would delete every single line of code with a syntax error. So really efficient for
getting a syntax error as a horrible debugging service. So I think, so we learn how to debug.
And I think in machine learning, the way you debug a machine learning program is very
different than the way you do binary search or whatever.
Use a debugger, trace through the code in the traditional software engineering.
So as an evolving discipline, but I find that the people that are really good at debugging
machine learning algorithms are easily 10x, maybe 100x faster, getting something to work.
The basic process of debugging is, the bug in this case,
why is in this thing learning improving,
sort of going into the questions of overfitting and all those kinds of things,
that's the logical space that the debugging is happening in with neural network.
Yeah, often the question is, why doesn't it work yet?
Or can I expect to eventually work?
And what are the things I could try?
Change the architecture, more data, more regularization,
different optimization algorithm, you know,
different types of data are so to answer those questions systematically,
so that you don't heading down the,
so you don't spend six months hitting down the blind alley before someone comes and says, why is you spend six months doing this?
What concepts in deep learning do you think students struggle the most with?
Or sort of the biggest challenge for them was to get over that hill.
It hooks them and it inspires them and they really get it. Similar to learning mathematics, I think one of the challenges of deep learning is that
there are a lot of concepts that build on top of each other.
If you ask me, what's hard about mathematics?
I have a hard time pinpointing one thing.
Is it addition, subtraction, is it a carry, is it multiplication, this is a lot of stuff.
I think one of the challenges of learning math
and of learning certain technical fields
is that there's a lot of concepts.
And if you miss a concept, then you're kind of missing
the prerequisite for something that comes later.
So in the deep learning specialization,
try to break down the concepts to maximize the odds of each
component being understandable.
So when you move on
to the more advanced thing, we learn we have confidence, hopefully you have enough intuitions from
the earlier sections to then understand why we structure confidence in a certain way. And eventually
why we built RNNs on LSTMs or attention model in a certain way, building on top of the earlier concepts. Actually, of course, you do a lot of teaching as well. Do you have a favorite,
this is the hard concept, moments in your teaching?
Well, I don't think anyone's ever turned the interview on me.
I'm glad to get first.
I think that's a really good question.
Yeah, it's really hard to capture the moment when they struggle.
I think you put it really eloquently.
I do think there's moments that are like aha moments that really inspire people.
I think for some reason reinforcement learning, especially deep brain enforcement learning is a really great way to really inspire people
and get what the use of neural networks can do. Even though neural networks really are just a part
of the deep RL framework, but it's a really nice way to paint the entirety of the picture of
a neural network being able to learn from scratch, knowing nothing and explore the world
and pick up lessons.
I find that a lot of the aha moments happen
when you use DBRL to teach people about neural networks
which is counterintuitive.
I find like a lot of the inspired sort of fire
and people's passion, people's eyes,
it comes from the RL world.
Do you find reinforcement learning
to be a useful part of the teaching
process or not?
I still teach reinforcement learning in one of my Stanford classes and my PhD thesis
was on reinforcement learning. I find that if I'm trying to teach students the most useful
techniques for them to use today, I end up shrinking the amount of time I talk about reinforcement
learning.
It's not what's working today.
Now a world changes so fast, maybe because we totally different in a couple years.
But I think we need a couple more things for reinforcement learning to get there.
One of my teams is looking to reinforce learning for some robotic control tasks.
So I see the applications.
But if you look at it as a percentage of all of the
impact of the types of things we do, at least today, outside of playing video games, in a few other games,
the scope, actually at Newer, a bunch of us were standing around saying, hey, what's your best example
of an actual deployable reinforcement learningforced learning application and, you know, among, you know, like, seeing machine learning researchers, right?
And again, there are some emerging ones,
but there are not that many great examples.
Well, I think you're absolutely right.
The sad thing is there hasn't been a big application
impact for real world application reinforcement learning.
I think its biggest impact to me has been in the
toy domain, in the game domain, in a small example. That's what I mean for educational purpose.
It seems to be a fun thing to explore on that works with. But I think from your perspective,
and I think that might be the best perspective is if you're trying to educate with a simple
example in order to illustrate how this can actually be grown, the scale, and have a real world impact, then perhaps focusing on the fundamentals
of supervised learning in the context of a simple data set, even like an M-ness data set
is the right way, is the right path to take.
I just, the amount of fun I've seen people have with reinforcement
learning has been great, but not in the applied impact on the real world setting. So it's
a trade-off. How much impact you want to have, or is it how much fun you want to have?
That's really cool. I feel like the world actually needs all sorts. Even within machine
learning, I feel like deep learning is so exciting, but AI team shouldn't just use deep learning.
I find that my teams use a portfolio of tools.
And maybe that's not the exciting thing to say,
but some days we use a neural net,
some days we use a PCA,
actually the other day I was sitting down with my team,
looking at PCA residuals, trying to figure out
what's going on with PC applied to a manufacturing problem.
And some days we use a probabilistic graphical model.
Some days we use a knowledge draft, which is one of the things that has tremendous industry
impact, but the amount of chat about knowledge drafts in academia has really thin compared
to the actual role of impact.
So I think we're forced to learning should be in that portfolio and then it's about balancing
how much we teach all of these things.
And the world should have diverse skills.
If you said, if everyone just learned one narrow thing.
Yeah, the diverse skill helped you discover the right tool for the job.
What is the most beautiful, surprising, or inspiring idea in deep learning to you?
Something that captivated your imagination.
Is it the skill that could be the performance that could be achieved with scale or is there other ideas?
I think that if my only job was being an academic researcher and if an unlimited budget and you know didn't have to worry about short term impact and only focus on long term impact,
I've really spent all my time doing research on unsubvised learning. I still think unsubvised learning is a beautiful idea. At both these past
new herbs in ICML, I was attending workshops on the Sintervera's talks about self-supervised
learning, which is one vertical segment, maybe a sort of unsubvised learning that I'm excited about.
Maybe just to summarize the idea, I guess you know the idea of describe movie. Not please. So here's an example of self-survised learning. Let's say
we grab a lot of unlabeled images off the internet, so with infinite amounts of the stuff of data,
I'm going to take each image and rotate it by a random multiple of 90 degrees. And then I'm going
to train a supervised neural network to predict what was the original
orientation.
So it has some to be rotated 90 degrees, 180 degrees, 37 degrees or zero degrees.
So you can generate an infinite amount of label data because you rotated the image.
So you know what's the ground truth label.
And so various researchers have found that by unlabeled data and making up label datasets and training a large neural network on these tasks, you can then take the hidden layer representation and transfer to a different task very powerfully. which is how we learn one of the ways we learn where the embeddings is another example. And I think there's now this portfolio of techniques
for generating these made up tasks.
Another one called jigsaw would be if you take an image,
cut it up into a three by three grid,
so like a nine, three by three puzzle piece,
jump out the nine pieces and have a neural network
predict which of the nine factorial possible
permutations it came from. So many groups including your OpenAI, Peter B has been
looking at doing some work on this to Facebook, Google, Brain, I think deep mind.
Oh, actually Aaron Van De Oalt has great work on the CPC objective. So many teams are doing
exciting work and I think this is a way to generate infinite label data
and I find this a very exciting piece of unsubisable.
So long term, you think that's going to unlock
a lot of power in machine learning systems.
Is this kind of unsupervised learning?
I don't think there's a whole inch a lot of it.
I think it's just a piece of it.
And I think this one piece, self-supervised learning,
is starting to get traction.
We're very close to it being useful.
Well, where the embedding is really useful?
I think we're getting closer and closer
to just having a significant real world impact,
maybe in computer vision and video.
But I think this concept,
and I think there'll be other concepts around it. Other unsupervised learning things that I worked on, I've been excited about.
I was really excited about SmartSchooling and ICA, Slow Feature Analysis. I think all of these are
ideas that various of us were working on about the jacket. I go, oh, before we all got distracted
by how well supervised learning was doing. But when we return, we return to the fundamentals of representation learning that really started
this movement of deep learning.
I think there's a lot more work that one could explore around the steam of ideas and other
ideas to come with better algorithms.
So if we could return to maybe talk quickly about the specifics of deep learning.ai, the
deep learning specialization perhaps, how long does it take to complete the course, would you say?
The official length of the deep learning specialization is I think 16 weeks, so about four months,
but it's good at your own pace. So if you subscribe to the deep learning specialization,
there are people that have finished it in less than a month by working more intensely and study more intensely, so it really depends on the individual.
When we created the Debian Specialization, we wanted to make it very accessible and very
affordable.
And with Coursera and Debian and Diary Eyes education mission, one of the things that's
really important to me is that if there's someone for whom paying anything is a financial
hardship, then just apply for financial aid and get it for free.
If you were to recommend a daily schedule for people in learning, whether it's through
the deep learning data as a specialization or just learning in the world of deep learning,
what would you recommend?
How would they go about day to day,
sort of specific advice about learning,
about their journey in the world of deep learning machine learning?
I think getting the habit of learning is key and that means regularity.
For example, we send out weekly newsletter,
the batch every Wednesday, so people
know it's coming Wednesday, you can spend a lot of the time on Wednesday catching up
on the latest news through the batch on Wednesday.
And for myself, I've picked up a habit of spending some time every Saturday and every Sunday
reading or studying.
And so I don't wake up on the Saturday and have to make a decision
to I feel like reading or studying today or not. It's just it's just what I do. And the fact is
that habit makes it easier. So I think if someone can get into that habit, it's like, you know,
just like we brush our teeth every morning. I don't think about it. If I thought about this
little bit annoying, I have to spend two minutes doing that. But it's a habit that takes no cognitive loads.
But this would be so much harder if we have to make a decision every morning.
So, and actually that's the reason why we're the same thing every day as well.
It's just one less decision.
I just get out and then we're right.
I'm sure it.
So, but I think if you can get that habit, that consistency of studying, then it actually
feels easier. So yeah, it's kind of amazing.
And in my own life, like I play guitar every day for
the effort myself to at least for five minutes play guitar.
It's just it's a ridiculously short period of time, but because I've gotten into that habit,
it's incredible where you can accomplish in a period of a year or two years.
You can become, you know,
exceptionally good at certain aspects of a thing by just doing it every day for a very short period of time. It's kind of a miracle that that's how it
works. It's as up over time. Yeah, and I think it's often not about the birth of sustained
efforts and the all-nighters, because you could only do that a limited number of times. It's the
sustained effort over a long time. I think you know reading
two research papers isn't nice thing to do but the power is not reading two research papers,
it's reading two research papers a week for a year. Then you read a hundred papers and you actually
learn a lot when you read a hundred papers. So regularity and making learning a habit,
and making learning a habit. Do you have general other study tips for particularly deep learning that people should, in their process of learnings, are some kind of recommendations or tips you have as they learn?
One thing I still do when I'm trying to study something really deeply is take handwritten notes. It varies. I know there are a lot of people that
take the deep learning courses during the commutes or something where maybe more work with the
take notes. So I know it may not work for everyone. But when I'm taking courses on course error,
you know, and I still take some of my every non-dense, the most recent I took was a course on clinical
trials, because it's just about that. I got out of my little most good note book and I was sitting in my desk, just taking down
notes, so what the instructor was saying.
We know that that act of taking notes, preferably handwritten notes, increases retention.
As you're watching the video, just kind of pausing maybe and then taking the basic insights
down on paper.
Yeah. So there've been a few studies. If you search online, you find some of these studies that taking handwritten notes because handwriting is slower as well, saying just now, right?
It calls you to recode the knowledge in your own words more and that process of recoding
promotes long-term retention. This is as opposed to typing, which is fine.
Okay, typing is better than nothing, right?
And taking a class and not taking a class and not taking a class at all.
But comparing handwritten notes and typing, you can usually type faster for a lot of people
that you can handwrite notes.
And so when people type, they're more likely to just transcribe verbatim when they heard,
and that reduces the amount
of re-colding, and that actually results in less long-term retention.
I don't know what the psychological effect there is, but so true.
There's something fundamentally different about writing, handwriting, I wonder what that
is.
I wonder if it is as simple as just the time it takes to write it slower.
Yeah, and because you can't write as many words, you have to take
what they said and summarize it into fewer words. And that summarization process requires
deeper processing of the meaning, which then results in better attention. That's fascinating.
And I've spent, I think, because of Coursera, I've spent so much time studying pedagogy,
it's actually one of my passions. I really love learning how to more efficiently help others learn.
Yeah, one of the things I do both when creating videos
or when we write the batch is, I try to think,
is one minute spent of us going to be a more efficient learning
experience than one minute spent anywhere else?
And we really try to, you know, make a time efficient for the learners
because to know everyone's busy. So when we're editing, I often tell my teams, every
word needs to fight for his life. And if you can delete it, where does he still need to
then not wait? That's not waste the learning this time.
Ah, that's so, it's so amazing that you think that way because there is millions of people
there impacted by your teaching and sort of that one minute spent has a ripple effect right through years of time which is fascinating.
How does one make a career out of an interest in deep learning?
Do you have advice for people we just talked about at the beginning early steps but if you
want to make it a entire life journey or at least a journey of a decade or two. How do you do it?
So, most of the things that get started and I think in the early positive career coursework
like the D-line specialization is a very efficient way to master this material.
So because instructors, B.M. or. or someone else, or Lawrence Maroni,
it teaches our TensorFlow specialization, and all the things we're working on,
spend effort to try to make it time efficient for you to learn new concepts.
So coursework is actually a very efficient way for people to learn concepts
in the beginning parts of Brick and Tune New fields.
In fact, one thing I see at Stanford,
some of my PhD students want to jump
into the research right away,
and I actually tend to say, look,
in your first couple of years of PhD student,
spend time taking courses,
because it lays a foundation,
it's fine if you're less productive
in your first couple of years,
you'd be better off in the long term.
Beyond a certain point, there's materials
that doesn't exist in courses, because it's too cutting-edge, the course is in three-tier yet, there's materials that doesn't exist in courses because it's too cutting-edge,
the course has been created yet, there's some practical experience that we're not yet that good
as teaching in the course. I think after exhausting the efficient course where then most people
need to go on to either ideally work on projects and then maybe also continue their learning
by reading blog posts
and research papers and things like that. Doing projects is really important. And again,
I think it's important to start small and just do something. Today, you read about deep
learning. If you say, oh, all these people doing such exciting things, whether I'm not
building a neural network, they change the world and what's the point? Well, the point
is sometimes building that tiny neural network, you know, be it M-ness or upgrade to a fashion
M-ness to whatever. So doing your own fun, hobby project. That's how you gain the skills to let
you do bigger and bigger projects. I find this to be true at the individual level and also at the
organizational level. For a company to become good at machine learning sometimes the right thing to do is not to
Tackle the giant project is instead to do the small project that less the organization learn and then build out from there But this true both for individuals and and and for and for companies
Just taking the first step and then taking small steps is the key
Should students pursue a PhD?
Do you think you can do so much?
That's not one of the fascinating things in machine learning. You can have so much impact
without ever getting a PhD. So what are your thoughts? Should people go to grad school?
Should people get a PhD? I think that there are multiple good options of which doing a PhD
could be one of them. I think that if someone's admitted to a top PhD program,
you know, MIT, Stanford, top schools, I think that's a very good experience. Or if someone gets a job
at a top organization, at a top AI team, I think that's also a very good experience. There are some
things you still need a PhD to do. If someone's aspirations to be a professor at the top academic university, you just need
a PhD to do that.
But if it goes to start a company, build a company to do great technical work, I think a
PhD is a good experience.
But I would look at the different options available to someone.
Where are the places where you can get a job, where are the places and get a PhD program,
and kind of weigh the pros and cons of those.
So just to linger on that for a little bit longer, what final places and get an IP issue from around and kind of weigh the pros and cons of those.
So just to link on that for a little bit longer, what final dreams and goals do you think people should have? So what options should they explore? So you can work in industry. So for a large company,
like Google, Facebook, buy do all these large sort of companies that already have huge teams of machine learning engineers
You can also do with an industry sort of more research groups that kind of like Google research Google Brain
Then you can also do like we said a professor and that is an academia and
What else? Oh, you can sell Bill your own company?
You can do a startup is there anything that stands out between those options
or are they all beautiful different journeys that people should consider?
I think the thing that affects your experience more is less
are you in discomfort versus that accompany your academia versus industry? I think the
thing that affects your experience most is who are the people you're interacting with in a daily
basis. So even if you look at some of the large companies,
the experience of individuals in different teams
is very different.
And what matters most is not the logo above the door
when you walk into the giant building every day.
What matters most is who are the 10 people,
who are the 30 people you interact with every day.
So I actually tend to advise people,
if you're gonna job from a company,
ask who is your manager,
who are your peers, who are you actually going to talk to.
We're all social creatures.
We tend to become more like the people around us.
And if you're working with great people, you will learn faster.
Or if you get admitted, if you get a job at a great company or a great university, maybe
the logo you walk in, you know, is great, but you're actually stuck on some team doing really work with, doesn't excite you, and that's actually a really bad experience.
So this is true both for universities and for large companies. For small companies, you can kind of figure out who you will work with quite quickly.
And I tend to advise people if a company refuses to tell you who you work with, So if I say, oh, join us, the rotation system will figure out, I think that that's a worrying
answer because it means you may not get sent to, you may not actually get to a team with
great peers and great people to work with.
It's actually a really profound advice that we kind of sometimes sweep.
We don't consider too rigorously or carefully, the people
around you are really often, especially when you accomplish great things.
It seems the great things are accomplished because of the people around you.
So that's, it's not about the, the, whether you learn this thing or that thing or like
you said, the logo that hangs up top, it's the people. That's a fascinating, and it's such a hard search process of finding, just like finding
the right friends and somebody to get married with and that kind of thing.
It's a very hard search, it's a people search problem.
Yeah, but I think when someone interviews at a university or the research lab or the
Losh Corporation, it's good to
insist on just asking, who are the people? Who is my manager? And if you refuse to tell me,
I'm going to think, well, maybe that's because you don't have a good answer. It may not be someone
I like. And if you don't particularly connect or something feels off with the people,
then don't stick to it. That's a really important signal to consider.
Yeah, yeah.
And actually, I actually, in my standard class, CS230,
as well as an ACM talk,
I think I gave like a hour long talk on career advice,
including on the job search process
and then some of these, so you can find
those videos on-law.
Awesome, and I'll point them.
I'll point people to them, beautiful.
So the AI Fund helps AI startups get off the ground.
Or perhaps you can elaborate on all the fun things it's involved with.
What's your advice on how does one build a successful AI startup?
You know, in St. Convality, a lot of stots of failures come from building other products
that no one wanted.
So when cool technology, but who's going to use it?
So I think I tend to be very out come driven and customer obsessed.
Ultimately, we don't get to vote if we succeed or fail.
It's only the customer that the only one that gets a thumbs up or thumbs down votes in the long term in a short term
You know, there are various people to get various votes, but in the long term, that's what really matters
So as you build a star, we have to constantly ask the question
Will the customer gives a give a thumbs up on this?
I think so I think startups that are very customer focused, customer says, deeply understand the customer
and are oriented to serve the customer
are more likely to succeed.
With the provisional, I think all of us should only do things
that we think create social good and boost the world forward.
So I personally don't want to build addictive digital products
just to sell off ads.
There are things that that could be lucrative that I won't do. I personally don't want to build addictive digital products just the cell of ads or the
things that that could be lucrative that I won't do.
But if we can find ways to serve people in meaningful ways, I think those can be great
things to do.
Either in the academic setting or in a corporate setting or a startup setting.
So can you give me the idea of why you started the AI fund? I remember when I was leading the AI group at Baidu,
I had two jobs, two parts of my job.
One was to build an AI engine to support these listening businesses,
and that was running, just ran, just performed by itself.
The second part of my job at the time was to try to
systematically initiate new lines of businesses using the company's AI capabilities. So, you know, the self-driving
car team came up with my group, the smart speaker team, similar to what is Amazon Echo
Alexa in the US, but we actually announced it before Amazon did. So, I do wasn't following
Amazon. That came out of my group. And I found that to be
actually the most fun part of my job. So what I want to do was to build AI Fund as a start
of studio to systematically create new start-ups from scratch. With all of the things we can
now do with AI, I think the ability to build new teams that go after this rich space of opportunities is a very important way to very important mechanism to get these projects done that I think will move the world forward.
So I've unfortunately built a few teams that had a meaningful positive impact.
And I felt that we might be able to do this in a more systematic, repeatable way. So a startup studio is a relatively
new concept. There are maybe dozens of startup studios right now, but I feel like all of us,
many teams are still trying to figure out how do you systematically build companies with a high
success rate. So I think even a lot of my venture capital friends
are seem to be more and more building companies rather than investing in companies. But I find
the fascinating thing to do to figure out the mechanisms by which we could systematically
build successful teams, successful businesses in areas that we find meaningful.
So startup studio is something, is a place and a mechanism for startups to go from zero
to success. So try to develop a blueprint. It's actually a place for us to build startups from
scratch. So we often bring in founders and work with them or maybe even have existing ideas
that we match founders with and then this launches, hopefully,
into successful companies.
So how close are you to figuring out a way
to automate the process of starting from scratch
and building successful AI startup?
Yeah.
I think we've been constantly improving
and iterating on our processes, but how we do that.
So things like how many customer calls do we need to make
and we'll really get customer validation.
How do you make sure that the technology can be built?
Quite a lot of our businesses need cutting edge.
Machine learning algorithms.
So kind of algorithms are developing
the last one or two years.
And even if it works in a research paper,
it turns out taking the production is really hard.
There are a lot of issues,
but making these things work in the real life
that are not widely addressed in academia.
So how do you validate that this is actually doable?
How do you build a team get to specialize domain knowledge
be it in education or healthcare,
whatever sector we're focusing on?
So I think we've actually getting,
we've been getting much better at giving the entrepreneurs
a high success rate, but I think we're still, I think the whole world is still in the
early phases freaking this out.
But do you think there is some aspects of that process that are transferable from one
startup to another to another to another?
Yeah, very much so.
You know, starting a company to most entrepreneurs is a really lonely thing. I've seen so many entrepreneurs
not know how to make certain decisions. When do you need to, how do you do BDP sales? If you don't know that,
it's really hard. How do you market this efficiently other than near buying ads, which is really expensive.
Are there more efficient tactics to that?
Or from a machine learning project,
basic decisions can change the course of whether
machine learning product works or not.
So there are so many hundreds of decisions that
entrepreneurs need to make and making
mistake in a couple of key decisions can have
a huge impact on the fate of the company.
So I think a startup studio provides a support structure that makes starting a company
much less of a low-only experience. And also, when facing with these key decisions,
like trying to hire your first VP of engineering, what's a good selection criteria. How do you solve? Should I hire this person or not?
By helping by having an equal system around the entrepreneurs, the founders to help, I think we
help them at the key moments and hopefully, significantly make them more enjoyable and then
higher success rate. So there's somebody to brainstorm with in these very difficult decision points.
somebody to brainstorm with in these very difficult decision points.
And also to help them recognize what they may not even realize as a key decision point. Right. That's the first and probably the most important part, yeah.
I can say one other thing. I think building companies is one thing, but I feel like it's really
important that we built companies that move the world forward
for example
within the A.I. fun team those ones an idea for a new company that if it had succeeded
When it resulted in people watching a lot more videos in a certain narrow vertical type of video
I looked at it the business case was fine revenue case was fine, but I looked at it and I just said, I don't want to do this.
I don't actually just want to have a lot more people watch this type of video, it wasn't
educational, it was an educational video, and so I co-de-idear on the basis that I didn't
think it would actually help people.
Whether building companies or work event prizes or doing personal projects, I think it's
tough to each of us to figure out what's the difference we want to make in the world.
With Learning AI, you help already establish companies grow their AI and machine learning
efforts. How does a large company integrate machine learning into their efforts?
AI is a general purpose technology and I think it will transform
every industry. Our community has already transformed to logic center software internet sector,
most software internet companies, outside the top 506 or 304, already have reasonable machine learning
capabilities or getting there is the room for improvement. But when I look outside the software internet sector, everything from manufacturing, agriculture,
healthcare, logistics, transportation, there's so many opportunities that very few people
are working on.
So I think the next way for AI is first also transform all of those other industries.
There was a McKinsey study estimating $13 trillion of global economic growth.
USGDP is $19 trillion or $13 trillion is a big number or PWC estimates $16 trillion.
So whatever number is as large, but the interesting thing to me was a lot of that impact would
be outside the software internet sector.
So we need more teams to work with these companies to help them adopt AI. And I think this
is one thing. So I'll make, you know, help drive global economic growth and make humanity more powerful.
And like you said, the impact is there. So what are the best industries, the biggest industries
where AI can help perhaps outside the software tech sector? Frankly, I think it's all of them.
software to accept there. Frankly, I think it's all of them.
So some of the ones I'm spending a lot of time
on are manufacturing, agriculture,
looking to healthcare.
For example, in manufacturing,
we do a lot of work in visual inspection,
where today there are people standing around
using the eye, human eye to check if,
this plastic pile or the smartphone or this thing
has a stretch or a gentle something in it.
We can use a camera to take a picture,
use a algorithm, deep learning and other things
to check if it's defective or not
and thus help factories improve you
then improve quality and improve throughput.
It turns out the practical problems we run into
are very different than the ones you might read about in most research papers. The data says they're really small, so we've
faced small data problems. The factories keep on changing the environment, so it works
well on your test set, but guess what? Something changes in the factory. The lights go on
or off. Recently, there was a factory in which I burnt through through the factory and
pooped on something. So that changed stuff. So increasing our outro of make robustness,
so all the changes happen in the factory, I find that we run a lot of practical problems
that are not as widely discussed in academia. It's really fun being on the cutting edge,
solving these problems before maybe before many people are even aware that there is a problem there.
And that's such a fascinating space.
You're absolutely right.
But what is the first step that a company should take?
It's just scary leap into this new world of going from the human eye,
inspecting to digitizing that process, having a camera, having an algorithm.
What's the first step?
Like what's the early journey that you recommend
that you see these companies taking?
I published a document called the AI Transformation Playbook
that's online and taught briefly in the AI for everyone
called Sancos era about the long-term journey
that companies should take,
but the first step is actually to start small.
I've seen a lot more companies fail by starting to bake than by starting to small.
Take even Google.
Most people don't realize how hard it was and how controversial it was in the early
days.
So when I started Google Brain, it was controversial.
People thought deep learning, neural nets, tried it, didn't work.
Why would you want to do deep learning?
So my first internal customer in Google was the Google speech team, which is not the most
lucrative project in Google, but not the most important.
It's not web search or advertising.
But by starting small, my team helped the speech team build a more accurate speech recognition
system.
And this caused their peers other teams to start at more favorite deep learning.
My second internal customer was the Google Maps team where we used computer vision to read
health numbers from basic street view images to more accurately locate houses with Google
Maps.
So improve the quality of the geodata.
And those only after those two successes
that I then started the most serious conversation
with the Google Ads team.
And so there's a ripple effect that you showed
that it works in these cases,
and it just propagates through the entire company
that this thing has a lot of value and use for us.
I think the early small scale projects,
it helps the teams gain faith,
but also helps the teams learn what these technologies do.
I still remember when our first GPU server, it was a server under some guy's desk.
And, you know, and then that taught us early important lessons about how do you have multiple users share a set of GPUs, which is really non-obvious at the time. But those early lessons were important.
We learned a lot from that first GPU server that later
helped the teams think through how to scale it up
to much larger deployments.
Are there concrete challenges that companies face
that the UCs important for them to solve?
I think building and deploying machine learning systems
is hard.
There's a huge gap between something that works in a Jupiter notebook on your laptop
versus something that runs in a production deployment setting in a factory or a culture or a plant or whatever.
So I see a lot of people get something to work on your laptop and say, wow, look what I've done.
And that's great, that's hard.
That's a very important first step.
But a lot of teams underestimate the rest of the steps needed. So, for example, I've heard this exact same conversation
between a lot of machine learning people and business people. The machine learning person
says, look, my algorithm does well on the test set. And it's a clean test set. I didn't
repeat. And then in the business person says, thank you very much, but your algorithm sucks,
it doesn't work.
And the machine learning presence says, no, wait, I did well on the test set.
And I think there is a golf between what it takes to do well on the test set on your
hard drive versus what it takes to work well in a deployment setting.
Some common problems, robustness and generalization,
you deploy something in the factory,
maybe they chop down a tree outside the factory,
so the tree no longer covers a window
and the lighting is different, so the test set changes.
And in machine learning, especially in academia,
we don't know how the deal with test set distributions
are dramatically different than the training set distribution.
This research, this stuff like a domain annotation, transfer learning, you know, the people working on it, but we're really not good at this.
So how do you actually get this to work? Because your test set distribution is going to change.
And I think also, if you look at the number of lines of code in the software system, the machine learning model is maybe 5% or even fewer relative to the entire software system we need to build.
So how do you get all that work done to make it reliable and systematic?
So good software engineering work is fundamental here,
to building a successful small machine learning system?
Yes, and the software system needs to interface
with people's workloads.
So machine learning is automation on steroids.
If we take one task, all the many tasks
are done in factories.
So it factually does lots of things.
One task is vision inspection.
If we automate that one task, it can be really valuable.
But you may need to redesign a lot of other tasks
around that one task.
For example, say the machine learning algorithm says, this is defective. What is supposed to do with you? can be really valuable, but you may need to redesign a lot of other tasks around that one task.
For example, say the machine learning algorithm says this is defective.
What is supposed to do with you?
Do you throw the way?
Do you get a human to double check?
Do you want to rework it or fix it?
So you need to redesign a lot of tasks around that thing you've now automated.
So planning for the change management and making sure that the software you write is consistent
with the new workflow.
And you take the time to explain to people when these are happened.
So I think what Lani AI has become good at,
and I think we learned by making the steps and, you know, painful experiences,
or my... what would become good at is working with our partners to think through
all the things beyond just the machine learning model,
or you put a notebook, but to build the entire system,
manage the change process, and figure out
how the deploy does in the way that has an actual impact.
The processes that the large software tech companies use
for deploying don't work for a lot of other scenarios.
For example, when I was leading large speech teams,
if the speech
refusion system goes down, what happens? What alarms goes off? And then someone
like me would say, Hey, you 20 engineers, please fix this. Right? And if I get,
but if you have a system, go down the factory, there are not 20 machine learning
engineers sitting around, you can page it due to you and have them fix it. So how do you
deal with the maintenance or the, or the death of, or M.O.O. So all the other aspects of this. So these are
concepts that I think landing AI and then a few other teams on the cutting
Asia, but we don't even have systematic terminology yet to describe some of
the stuff we do because I think we're indenting it on the fly.
So you mentioned some people are interested in discovering mathematical beauty and truth
in the universe and you're interested in having a big positive impact in the world. So let me
ask the two are not inconsistent. No, they're all together. I'm only have joking because you're
probably interested a little bit in both. But let me ask a romanticized question. So much of the work
in both. But let me ask a romanticized question. So much of the work, your work and our discussion today has been on a applied AI. Maybe you can even call narrow AI, where the goal is to
create systems that automate some specific process that adds a lot of value to the world.
But there's another branch of AI starting with Alan Turing, that kind of dreams of creating
human level or superhuman level intelligence. Is this something you dream of as well? Do you think we human beings
will ever build a human level intelligence or super human level intelligence system?
I would love to get to AGI and I think humanity will, but whether it takes
a hundred years or five hundred or five thousand, I find it hard to estimate. Do you have, some folks have worries about the different trajectories that path would
take, even existential threats of an AGI system?
Do you have such concerns whether in the short term or the long term?
I do worry about the long term fate of humanity.
I do wonder as well. I do worry about overpopulation
on the planet Mars. Just not today. I think there will be a day when maybe maybe someday in the
future, Mars will be polluted. There are these children dying and some will look back at this video
and say, Andrew, how is Andrew so hot? You think, careful, all these children dying on the planet Mars. And I
apologize to the future viewer, I do care about the children,
but I just don't know how to product the viewer on that today.
Your picture will be in the dictionary for the people who are
ignorant about the overpopulation of Mars. Yes. So it's a long
term problem. Is there something in the short term we should be
thinking about in terms of aligning
the values of our AI systems with the values of us humans?
Sort of something that's to it, Russ.
So another folks are thinking about
as this system develops more and more,
we want to make sure that it represents
the better angels of our nature,
the ethics, the values of our society.
If you take self-driving cars,
the biggest problem with self-driving cars
is not that there's some trolley dilemma
and you teach this so you know,
how many times when you are driving your car,
did you face this moral dilemma,
is it who died, who died traction to?
So I think self-driving cars will run at that problem roughly as often as we do when we drive our cars. The biggest problem
is self-driving cars is when there's a big white truck across the road and what you should
do is break and not crash into it and the self-driving car fails and it crashes into it.
So I think we need to solve that problem for us. I think the problem with some of these discussions about HGI, you know, alignment,
the paperclip problem is that is a huge distraction from the much harder problems that we actually
need to adjust today. It's not hard problems in the adjust today. I think bias is a huge issue.
I worry about wealth and equality. the AI and internet are causing an acceleration
of concentration of power because we can now centralize data, use AI to process it, and
so industry after industry, with effect every industry.
So the internet industry has a lot of win and take modes of win and take all dynamics,
but with infected all these other industries.
So also giving these other industries win and they take most of them, when they take all flavors.
So look at what Uber and Lyft did at a taxi industry.
So we're doing this type of thing in the live.
So creating tremendous wealth,
but how do we make sure that the wealth is fairly shared?
I think that, and then how do we help people
whose jobs are displaced?
I think education is part of it.
There may be even more that we need to do than education.
I think bias is a serious issue.
They're at various uses of AI like deep fakes being used for various nefarious purposes.
I worry about some teams maybe accidentally, and I hope not deliberately, making a lot of noise about
things that problems in the distant future, rather than focusing on sensitive, much
harder problems.
Yeah, overshadow the problems that we have already today that are exceptionally challenging
like those you said, and even the silly ones, but the ones that have a huge impact, which
is the lighting variation outside of your factory window,
that ultimately is what makes the difference between,
like you said, the Jupiter notebook and something
that actually transforms an entire industry potentially.
Yeah, and I think, and just to,
some companies are a regulator, it comes to you,
and says, look, your product is messing things up.
Fixing it may have a revenue impact.
Well, it's much more fun to talk to them about how you promise not to wipe out humanity
into a face that actually really haunt problems we face.
So your life has been a great journey from teaching to research to entrepreneurship
to two questions. One, are there regrets moments that if you went back, you would do differently,
and two, are there moments you're especially proud of?
Moments that made you truly happy.
You know, I've made so many mistakes.
It feels like every time I discover something, I go,
why did I think of this, you know, five years earlier or ten years earlier. And sometimes I read a book,
and I go, I wish I read this book ten years ago, my life would have been so different. Although
that happened recently, and I was thinking, if only I read this book when we're starting
to have Coursera, it could have been so much better. But I discovered that book had not yet been
written, or it's not in Coursera, so that may be better. But I find that the process of discovery, we keep on finding out things
that seem so obvious in hindsight, but it always takes us so much longer than I wish to
figure it out. So on the second question, are there moments in your life that if you look back that you're
especially proud of or you're especially happy that filled you with happiness and fulfillment?
Well, two answers, one is my daughter nowver.
Yes, of course.
She's like, now, how much time I spend for?
I just can't spend enough time with her.
Congratulations, by the way.
Thank you. And then second is, hoping other people, I think to me, I think the meaning of life is
hoping others achieve whatever are their dreams.
And then also to try to move the world forward by making humanity more powerful as a whole.
So the times that I felt most happy and most proud was when I felt someone else
allowed me the good fortune of helping them a little bit on the path to their dreams.
I think there's no better way to end it than talking about happiness and the meaning of life.
So it's a huge honor. Me and millions of people thank you for all the work you've done.
Thank you for talking to me. Thank you so much. thanks. Thanks for listening to this conversation with Andrew Eng.
And thank you to our presenting sponsor, CashApp. Download it. Use code Lex Podcast. You'll get $10
and $10 will go to first. An organization that inspires and educates young minds to become science
and technology innovators of tomorrow. If you enjoy this podcast, subscribe to my YouTube, give it 5 stars on Apple Podcast,
support it on Patreon or simply connect with me on Twitter at Lex Friedman.
And now let me leave you with some words of wisdom from Andrew Eng.
Ask yourself, if what you're working on succeeds beyond your wildest dreams, which you have
significantly helped other people.
If not, then keep searching for something else to work on,
otherwise you're not living up to your full potential.
Thank you.