back to indexAndrew Ng: Deep Learning, Education, and Real-World AI | Lex Fridman Podcast #73
Chapters
0:0 Introduction
2:23 First few steps in AI
5:5 Early days of online education
16:7 Teaching on a whiteboard
17:46 Pieter Abbeel and early research at Stanford
23:17 Early days of deep learning
32:55 Quick preview: deeplearning.ai, landing.ai, and AI fund
33:23 deeplearning.ai: how to get started in deep learning
45:55 Unsupervised learning
49:40 deeplearning.ai (continued)
56:12 Career in deep learning
58:56 Should you get a PhD?
63:28 AI fund - building startups
71:14 Landing.ai - growing AI efforts in established companies
80:44 Artificial general intelligence
00:00:00.000 |
The following is a conversation with Andrew Ng, 00:00:03.640 |
one of the most impactful educators, researchers, 00:00:06.480 |
innovators, and leaders in artificial intelligence 00:00:15.320 |
launched Deep Learning AI, Landing AI, and the AI Fund, 00:00:28.680 |
and inspire millions of students, including me. 00:00:40.520 |
support it on Patreon, or simply connect with me on Twitter 00:00:48.400 |
As usual, I'll do one or two minutes of ads now, 00:01:13.760 |
Brokerage services are provided by Cash App Investing, 00:01:25.400 |
in the context of the history of money, is fascinating. 00:01:33.820 |
Debits and credits on ledgers started over 30,000 years ago. 00:01:38.540 |
The US dollar was created over 200 years ago, 00:01:42.280 |
and Bitcoin, the first decentralized cryptocurrency, 00:01:48.220 |
So given that history, cryptocurrency's still very much 00:02:12.200 |
that is helping to advance robotics and STEM education 00:02:17.680 |
And now, here's my conversation with Andrew Ng. 00:02:23.200 |
The courses you taught on machine learning at Stanford, 00:02:29.480 |
have educated and inspired millions of people. 00:02:31.880 |
So let me ask you, what people or ideas inspired you 00:02:35.080 |
to get into computer science and machine learning 00:02:38.160 |
When did you first fall in love with the field, 00:02:45.440 |
I started learning to code when I was five or six years old. 00:02:50.160 |
At that time, I was learning the basic programming language, 00:02:53.720 |
and I would take these books and they'll tell you, 00:03:09.920 |
So I thought it was fascinating as a young kid 00:03:15.000 |
that was really just copying code from a book 00:03:17.020 |
into my computer to then play these cool little video games. 00:03:21.040 |
Another moment for me was when I was a teenager 00:03:27.840 |
was reading about expert systems and about neural networks. 00:03:41.680 |
while I was in high school, this was in Singapore, 00:03:57.820 |
If only we could write software, build a robot, 00:04:07.660 |
Even the way I think about machine learning today, 00:04:10.060 |
we're very good at writing learning algorithms 00:04:16.960 |
Mass Open Online Courses, that later led to Coursera, 00:04:20.120 |
I was trying to automate what could be automatable 00:04:26.480 |
try to automate parts of that to make it more, 00:04:30.360 |
sort of to have more impact from a single teacher, 00:04:41.320 |
And I found myself filming the exact same video every year, 00:04:53.080 |
building a deeper relationship with students. 00:04:55.340 |
So that process of thinking through how to do that, 00:04:58.020 |
that led to the first MOOCs that we launched. 00:05:00.800 |
- And then you have more time to write new jokes. 00:05:05.420 |
Are there favorite memories from your early days 00:05:07.540 |
at Stanford teaching thousands of people in person 00:05:26.380 |
A lot of times, launching the first MOOCs at Stanford, 00:05:36.520 |
and we had not yet actually filmed the videos. 00:05:40.620 |
100,000 people waiting for us to produce the content. 00:05:49.260 |
and then I would think, okay, do you want to go home now? 00:05:51.580 |
Or do you want to go to the office to film videos? 00:05:54.740 |
And the thought of being able to help 100,000 people 00:06:25.580 |
to help so many people learn about machine learning. 00:06:29.480 |
The fact that you're probably somewhat alone, 00:06:36.180 |
and kind of going home alone at 1 or 2 AM at night, 00:06:48.980 |
I mean, is there a feeling of just satisfaction 00:06:55.180 |
And I wasn't thinking about what I was feeling. 00:07:16.660 |
to make these concepts as clear as possible for learners? 00:07:21.580 |
I've seen sometimes instructors is tempting to, 00:07:27.340 |
"someone will cite my papers a couple more times." 00:07:46.940 |
- And the kind of learner you imagined in your mind 00:07:56.340 |
interested in machine learning and AI as possible. 00:08:26.820 |
So we've tried to consistently make decisions 00:09:23.940 |
that maybe it feels like it came out of nowhere, 00:09:33.220 |
My first foray into this type of online education 00:09:59.060 |
to learn what the ideas that work and what doesn't. 00:10:03.780 |
I was really excited about and really proud of 00:10:08.460 |
could be logged into the website at the same time. 00:10:13.580 |
if you're logged in and then I want to log in, 00:10:15.940 |
you need to log out if it's the same browser, 00:10:20.340 |
say you and me were watching a video together 00:10:24.260 |
What if a website could have you type your name 00:10:27.220 |
and password, have me type my name and password? 00:10:44.020 |
Sacred Heart Cathedral Prep, the teacher was great. 00:10:54.060 |
So you can play back, pause at your own speed 00:10:57.780 |
So that was one example of a tiny lesson learned 00:11:09.380 |
there's something that looks amazing on paper 00:11:20.460 |
through a lot of different features and a lot of ideas 00:11:27.700 |
that showed the world that MOOCs can educate millions. 00:11:32.220 |
- And I think with the whole machine learning movement 00:11:34.860 |
as well, I think it didn't come out of nowhere. 00:11:38.340 |
Instead, what happened was as more people learn 00:11:41.460 |
about machine learning, they will tell their friends 00:11:43.460 |
and their friends will see how it's applicable 00:11:56.980 |
I could easily see it being north of 50%, right? 00:11:59.740 |
Because so many AI developers broadly construed, 00:12:04.660 |
not just people doing the machine learning modeling, 00:12:06.540 |
but the people building infrastructure, data pipelines, 00:12:15.300 |
I feel like today, almost every software engineer 00:12:22.540 |
my microcontroller developer doesn't need to deal with cloud. 00:12:25.500 |
But I feel like the vast majority of software engineers today 00:12:28.940 |
are sort of having an appreciation of the cloud. 00:12:31.980 |
I think in the future, maybe we'll approach nearly 100% 00:12:35.060 |
of all developers being, you know, in some way, 00:12:38.020 |
an AI developer, at least having an appreciation 00:12:41.980 |
- And my hope is that there's this kind of effect 00:12:45.060 |
that there's people who are not really interested 00:12:47.660 |
in being a programmer or being into software engineering, 00:12:54.500 |
even mechanical engineers, all these disciplines 00:12:57.180 |
that are now more and more sitting on large datasets. 00:13:01.580 |
And here, they didn't think they're interested 00:13:05.700 |
and they realized there's this set of machine learning tools 00:13:09.420 |
So they actually become, they learn to program 00:13:13.580 |
So like the, not just, 'cause you've mentioned 00:13:21.900 |
the kinds of people who are becoming developers 00:13:33.860 |
And maybe you thought, maybe not everyone needs 00:13:37.620 |
You know, you just go listen to a few monks, right? 00:13:44.220 |
Or maybe we just need a few handful of authors 00:13:50.340 |
But what we found was that by giving as many people, 00:13:53.180 |
you know, in some countries, almost everyone, 00:14:01.380 |
such as if I send you an email or you send me an email. 00:14:04.980 |
I think in computing, we're still in that phase 00:14:20.460 |
similar to how most people in developed economies 00:14:24.420 |
I would love to see the owners of a mom and pop store 00:14:29.100 |
to customize the TV display for their special this week. 00:14:32.460 |
And I think it'll enhance human to computer communications, 00:14:36.340 |
which is becoming more and more important in today's world. 00:14:38.780 |
- So you think it's possible that machine learning 00:14:45.900 |
yeah, like you said, the owners of a mom and pop shop, 00:14:52.180 |
would have some degree of programming capability? 00:14:58.580 |
There's one other interesting thing, you know, 00:15:02.860 |
if I talk to a lot of people in their daily professions, 00:15:11.300 |
But what I found with the rise of machine learning 00:15:13.300 |
and data science is that I think the number of people 00:15:15.980 |
with a concrete use for data science in their daily lives, 00:15:19.460 |
in their jobs, may be even larger than the number of people 00:15:22.860 |
with a concrete use for software engineering. 00:15:25.460 |
For example, actually, if you run a small mom and pop store, 00:15:28.180 |
I think if you can analyze the data about your sales, 00:15:30.900 |
your customers, I think there's actually real value there, 00:15:34.220 |
maybe even more than traditional software engineering. 00:15:39.380 |
in various professions, be it recruiters or accountants 00:15:42.940 |
or, you know, people that work in the factories, 00:15:47.420 |
I feel if they were data scientists at some level, 00:15:51.340 |
they could immediately use that in their work. 00:15:54.540 |
So I think that data science and machine learning 00:15:56.900 |
may be an even easier entree into the developer world 00:16:00.460 |
for a lot of people than the software engineering. 00:16:04.420 |
And I agree with that, but that's beautifully put. 00:16:07.860 |
We live in a world where most courses and talks have slides, 00:16:22.100 |
So let me ask, why do you like using a marker and whiteboard 00:16:27.660 |
- I think it depends on the concepts you want to explain. 00:16:32.380 |
For mathematical concepts, it's nice to build up 00:16:37.060 |
And the whiteboard marker or the pen and stylus 00:16:43.980 |
build up a complex concept one piece at a time 00:16:48.580 |
And sometimes that enhances understandability. 00:17:00.420 |
and sometimes I use a whiteboard or a stylus. 00:17:03.220 |
- The slowness of a whiteboard is also its upside, 00:17:06.340 |
'cause it forces you to reduce everything to the basics. 00:17:11.340 |
So some of your talks involve the whiteboard. 00:17:14.900 |
I mean, there's really not, you go very slowly, 00:17:17.860 |
and you really focus on the most simple principles. 00:17:21.620 |
that enforces a kind of a minimalism of ideas 00:17:26.540 |
that I think is, surprisingly at least for me, 00:17:46.380 |
Peter Abbeel, who's now one of the top roboticists 00:17:49.500 |
and reinforcement learning experts in the world, 00:17:53.140 |
So I bring him up just because I kind of imagine 00:17:56.940 |
this must have been an interesting time in your life. 00:18:02.340 |
Do you have any favorite memories of working with Peter, 00:18:04.980 |
since you're your first student in those uncertain times, 00:18:08.380 |
especially before deep learning really sort of blew up? 00:18:17.820 |
- Yeah, I was really fortunate to have had Peter Abbeel 00:18:22.740 |
And I think even my long-term professional success 00:18:29.980 |
So I was really grateful to him for working with me. 00:18:41.100 |
Peter's PhD thesis was using reinforcement learning 00:18:53.460 |
You can watch videos of us using reinforcement learning 00:19:00.060 |
- It's one of the most incredible robotics videos ever. 00:19:05.140 |
That's from like 2008 or seven or six, like that range. 00:19:10.140 |
- Something like that, yeah, so we're 10 years old. 00:19:13.020 |
- That was really inspiring to a lot of people, yeah. 00:19:15.420 |
- What not many people see is how hard it was. 00:19:18.900 |
So Peter and Adam Coates and Morgan Quigley and I 00:19:22.780 |
were working on various versions of the helicopter, 00:19:27.460 |
For example, turns out one of the hardest problems we had 00:19:29.860 |
was when the helicopter is flying around upside down, 00:19:32.380 |
doing stunts, how do you figure out the position? 00:19:42.300 |
GPS unit is facing down, so you can't see the satellites. 00:19:44.860 |
So we experimented trying to have two GPS units, 00:19:51.900 |
'cause the downward facing one couldn't synchronize 00:19:59.500 |
complicated configuration of specialized hardware 00:20:06.820 |
Spent about a year working on that, didn't work. 00:20:15.980 |
looking at some of the latest things we had tried 00:20:18.740 |
that didn't work and saying, "Done it, what now?" 00:20:23.260 |
Because we tried so many things and it just didn't work. 00:20:26.940 |
In the end, what we did, and Adam Coates was crucial to this, 00:20:34.260 |
and use cameras on the ground to localize the helicopter. 00:20:39.820 |
so that we could then focus on the reinforcement learning 00:20:42.380 |
and inverse reinforcement learning techniques 00:20:44.420 |
so it didn't actually make the helicopter fly. 00:20:46.700 |
And I'm reminded, when I was doing this work at Stanford, 00:20:51.780 |
around that time, there was a lot of reinforcement learning 00:20:55.220 |
theoretical papers, but not a lot of practical applications. 00:20:59.540 |
So the autonomous helicopter work for flying helicopters 00:21:11.580 |
I feel like we might've almost come full circle with today. 00:21:14.740 |
There's so much buzz, so much hype, so much excitement 00:21:19.020 |
But again, we're hunting for more applications 00:21:24.740 |
- What was the drive, sort of in the face of the fact 00:21:30.140 |
what motivate you in the uncertainty and the challenges 00:21:33.020 |
to get the helicopter, sort of to do the applied work, 00:21:41.860 |
sort of the setbacks that you mentioned for localization. 00:21:55.460 |
but when I work on theory myself, and this is personal taste, 00:21:58.700 |
I'm not saying anyone else should do what I do, 00:22:00.820 |
but when I work on theory, I personally enjoy it more 00:22:04.100 |
if I feel that the work I do will influence people, 00:22:17.780 |
and it kind of just said, "Hey, why do you do what you do?" 00:22:32.900 |
to discover truth and beauty in the universe." 00:22:47.060 |
but I am more motivated when I can see a line 00:22:50.780 |
to how the work that my teams and I are doing helps people. 00:23:02.540 |
but when I delve into either theory or practice, 00:23:23.260 |
What did you see in this field that gave you confidence? 00:23:33.820 |
- Yeah, I can tell you the thing we got wrong 00:23:36.980 |
The thing we really got wrong was the importance of, 00:23:39.740 |
the early importance of unsupervised learning. 00:23:46.740 |
we put a lot of effort into unsupervised learning 00:23:58.980 |
And Geoff Hinton and I were sitting in the cafeteria 00:24:06.180 |
he started sketching this argument on a napkin. 00:24:14.180 |
so there's 10 to the 14 synaptic connections. 00:24:16.940 |
You will live for about 10 to the nine seconds. 00:24:37.980 |
in up to 10 to the nine seconds of your life. 00:24:43.780 |
which is a lot of problems, it's very simplified, 00:24:52.580 |
I am not pointing out 10 to five bits per second 00:25:08.900 |
There's just no way that most of what we know 00:25:13.460 |
But where you get so many bits of information 00:25:21.500 |
and there are a lot of known forces argument, 00:25:24.780 |
really convinced me that there's a lot of power 00:25:28.220 |
So that was the part that we actually maybe got wrong. 00:25:32.540 |
I still think unsupervised learning is really important, 00:25:34.860 |
but in the early days, you know, 10, 15 years ago, 00:25:38.900 |
a lot of us thought that was the path forward. 00:25:45.660 |
- For the time, that was the part we got wrong. 00:25:48.540 |
The part we got right was the importance of scale. 00:26:00.020 |
and Adam had run these experiments at Stanford 00:26:02.380 |
showing that the bigger we train a learning algorithm, 00:26:10.140 |
there was a graph that Adam generated, you know, 00:26:12.900 |
where the X-axis, Y-axis lines going up into the right. 00:26:17.540 |
the better its performance, accuracy is the vertical axis. 00:26:20.340 |
So it was really based on that chart that Adam generated 00:26:33.420 |
that Adam generated that gave me the conviction 00:26:37.100 |
to go with Sebastian Thrun to pitch, you know, 00:26:52.300 |
so we should chase a larger and larger scale. 00:26:55.460 |
And I think people don't realize how groundbreaking of it, 00:27:02.340 |
that bigger datasets will result in better performance. 00:27:10.060 |
you know, senior people in the machine learning community, 00:27:29.900 |
some of them were trying to talk me out of it. 00:27:32.500 |
But I find that if you want to make a breakthrough, 00:27:43.020 |
- Let me ask you just in a small tangent on that topic. 00:27:46.100 |
I find myself arguing with people saying that greater scale, 00:27:51.100 |
especially in the context of active learning, 00:27:59.220 |
is going to lead to even further breakthroughs 00:28:09.140 |
So you want to increase the efficiency of learning. 00:28:14.020 |
And I personally believe that just bigger datasets 00:28:16.820 |
will still, with the same learning methods we have now, 00:28:21.820 |
What's your intuition at this time on this dual side? 00:28:27.980 |
Is do we need to come up with better architectures 00:28:30.740 |
for learning, or can we just get bigger, better datasets 00:28:44.740 |
your Bayes error rate, or approaching or surpassing 00:28:51.300 |
that we will never surpass a Bayes error rate. 00:28:53.660 |
But then I think there are plenty of problems 00:28:58.740 |
human level performance or from Bayes error rate. 00:29:09.460 |
But on the flip side, if we look at the recent breakthroughs 00:29:12.900 |
using transforming networks or language models, 00:29:20.660 |
If we look at what happened with GP2 and BERT, 00:29:28.300 |
is the scale of the dataset it was trained on 00:29:32.500 |
because there's some, so it was like redded threads 00:29:47.340 |
- I find that today we have maturing processes 00:29:54.820 |
It took a long time to evolve the good processes. 00:29:57.500 |
I remember when my friends and I were emailing 00:30:07.580 |
We're very immature in terms of tools for managing data 00:30:12.100 |
and how to solve the very hot, messy data problems. 00:30:15.380 |
I think there's a lot of innovation there to be had still. 00:30:18.980 |
I love the idea that you were versioning through email. 00:30:28.940 |
it's not at all uncommon for there to be multiple labelers 00:30:36.380 |
And so we would, doing the work in visual inspection, 00:30:40.580 |
we will take, say a plastic pot and show it to one inspector 00:30:44.820 |
and the inspector, sometimes very opinionated, 00:30:59.380 |
And then sometimes you take the same plastic pot, 00:31:01.820 |
show it to the same inspector in the afternoon, 00:31:16.900 |
doesn't agree with himself or herself in the span of a day? 00:31:20.420 |
So I think these are the types of very practical, 00:31:23.740 |
very messy data problems that my teams wrestle with. 00:31:28.740 |
In the case of large consumer internet companies, 00:31:32.980 |
where you have a billion users, you have a lot of data, 00:31:42.500 |
If you have just a small data, very small data sets, 00:31:55.860 |
that actually is 10% of your data set has a big impact. 00:32:01.020 |
This is an example of the types of things that my teams, 00:32:15.340 |
in thinking about the actual labeling process. 00:32:22.580 |
and all those kinds of like pragmatic real world problems. 00:32:27.340 |
- Yeah, I find that actually when I'm teaching at Stanford, 00:32:29.700 |
I increasingly encourage students at Stanford 00:32:32.740 |
to try to find their own project for the end of term project 00:32:43.460 |
and define your own problem and find your own data set 00:32:45.620 |
rather than go to one of the several good websites, 00:32:48.740 |
very good websites with clean scoped data sets 00:32:57.020 |
the AI Fund, Landing AI, and DeepLearning.ai. 00:33:10.540 |
and DeepLearning.ai is for education of everyone else 00:33:14.700 |
or of individuals interested of getting into the field 00:33:19.500 |
So let's perhaps talk about each of these areas. 00:33:22.340 |
First, DeepLearning.ai, how, the basic question, 00:33:27.340 |
how does a person interested in deep learning 00:33:31.580 |
- DeepLearning.ai is working to create causes 00:33:41.340 |
through Stanford remains one of the most popular causes 00:33:45.500 |
- To this day, it's probably one of the courses, 00:33:52.340 |
or how did you fall in love with machine learning 00:33:55.660 |
it always goes back to Andrew Yang at some point. 00:33:59.180 |
The amount of people you've influenced is ridiculous. 00:34:03.260 |
So for that, I'm sure I speak for a lot of people 00:34:20.180 |
something like one third of our programmers are self-taught. 00:34:29.420 |
So, 'cause you teach yourself, I don't teach people. 00:34:34.540 |
So yeah, so how does one get started in deep learning 00:34:38.100 |
and where does deeplearning.ai fit into that? 00:34:44.180 |
I think it was Coursera's top specialization, 00:34:54.540 |
to learn about everything from neural networks 00:35:11.020 |
So you deeply understand it and can implement it 00:35:19.580 |
for somebody to take the deep learning specialization 00:35:22.180 |
in terms of maybe math or programming background? 00:35:35.980 |
If you know calculus is great, you get better intuitions, 00:35:38.740 |
but deliberately try to teach that specialization 00:35:42.740 |
So I think high school math would be sufficient. 00:35:56.020 |
even very, very basic linear algebra in some programming. 00:36:02.260 |
will find the deep learning specialization a bit easier, 00:36:06.500 |
into the deep learning specialization directly, 00:36:17.540 |
which is covered more slowly in the machine learning course. 00:36:20.300 |
- Could you briefly mention some of the key concepts 00:36:25.140 |
that you envision them learning in the first few months, 00:36:29.380 |
- So if you take the deep learning specialization, 00:36:31.940 |
you learn the foundations of what is a neural network, 00:36:44.940 |
One thing I'm very proud of in that specialization 00:36:57.380 |
So how do you tell if the algorithm is overfitting? 00:37:00.300 |
When should you not bother to collect more data? 00:37:06.260 |
there are engineers that will spend six months 00:37:18.380 |
and could have figured out six months earlier 00:37:23.980 |
So just don't spend six months collecting more data, 00:37:30.300 |
So go through a lot of the practical know-how 00:37:35.460 |
when you take the deep learning specialization, 00:37:44.340 |
to train it, to do the inference on a particular dataset, 00:37:52.220 |
to where you spend, like you said, six months learning, 00:38:00.220 |
a small aspect of the data that could already tell you 00:38:05.700 |
- Yes, and also the systematic frameworks of thinking 00:38:09.380 |
for how to go about building practical machine learning. 00:38:12.460 |
Maybe to make an analogy, when we learn to code, 00:38:15.460 |
we have to learn the syntax of some programming language, 00:38:17.900 |
right, be it Python or C++ or Octave or whatever. 00:38:34.700 |
So those frameworks are what makes a programmer efficient, 00:38:41.740 |
I remember when I was an undergrad at Carnegie Mellon, 00:38:56.620 |
Well, they would delete every single line of code 00:38:59.700 |
So really efficient for getting rid of syntax errors, 00:39:09.420 |
is very different than the way you do binary search 00:39:19.020 |
but I find that the people that are really good 00:39:36.420 |
sort of going into the questions of overfitting 00:39:40.740 |
That's the logical space that the debugging is happening in 00:39:46.540 |
- Yeah, often the question is, why doesn't it work yet? 00:39:54.900 |
Change the architecture, more data, more regularization, 00:40:05.860 |
so you don't spend six months heading down the blind alley 00:40:14.060 |
do you think students struggle the most with? 00:40:21.740 |
It hooks them and it inspires them and they really get it. 00:40:30.300 |
I think one of the challenges of deep learning 00:40:45.940 |
I think one of the challenges of learning math 00:41:01.900 |
try to break down the concepts to maximize the odds 00:41:07.020 |
So when you move on to the more advanced thing, 00:41:18.700 |
And then eventually why we build RNNs on LSTMs 00:41:27.700 |
Actually, I'm curious, you do a lot of teaching as well. 00:41:33.180 |
this is the hard concept moment in your teaching? 00:41:53.420 |
I do think there's moments that are like aha moments 00:41:59.500 |
I think for some reason, reinforcement learning, 00:42:05.660 |
is a really great way to really inspire people 00:42:09.680 |
and get what the use of neural networks can do. 00:42:13.620 |
Even though neural networks really are just a part 00:42:18.640 |
but it's a really nice way to paint the entirety 00:42:21.420 |
of the picture of a neural network being able to learn 00:42:24.920 |
from scratch, knowing nothing, and explore the world 00:42:32.080 |
when you use deep RL to teach people about neural networks, 00:42:37.960 |
I find like a lot of the inspired sort of fire 00:42:44.880 |
Do you find reinforcement learning to be a useful part 00:42:55.640 |
And my PhD thesis was on reinforcement learning. 00:43:01.600 |
the most useful techniques for them to use today, 00:43:12.440 |
Maybe it'll be totally different in a couple of years. 00:43:20.600 |
- One of my teams is looking to reinforcement learning 00:43:25.280 |
but if you look at it as a percentage of all of the impact 00:43:30.160 |
is at least today, outside of playing video games 00:43:38.500 |
Actually at NeurIPS, a bunch of us were standing around 00:43:42.900 |
"of an actual deployed reinforcement learning application?" 00:43:45.340 |
And among senior machine learning researchers. 00:43:58.300 |
The sad thing is there hasn't been a big application 00:44:02.020 |
impact for real-world application reinforcement learning. 00:44:04.900 |
I think its biggest impact to me has been in the toy domain, 00:44:13.660 |
It seems to be a fun thing to explore neural networks with. 00:44:19.160 |
and I think that might be the best perspective, 00:44:21.980 |
is if you're trying to educate with a simple example 00:44:24.820 |
in order to illustrate how this can actually be grown 00:44:33.780 |
of supervised learning in the context of a simple data set, 00:44:38.780 |
even like an MNIST data set is the right way, 00:44:43.980 |
I just, the amount of fun I've seen people have 00:44:48.540 |
but not in the applied impact on the real-world setting. 00:44:52.900 |
So it's a trade-off, how much impact you want to have 00:44:58.260 |
And I feel like the world actually needs all sorts. 00:45:05.900 |
but the AI team shouldn't just use deep learning. 00:45:08.500 |
I find that my teams use a portfolio of tools. 00:45:11.780 |
And maybe that's not the exciting thing to say, 00:45:20.100 |
Actually, the other day I was sitting down with my team 00:45:25.860 |
And some days we use a probabilistic graphical model, 00:45:33.100 |
but the amount of chatter about knowledge drafts 00:45:47.940 |
It'd be sad if everyone just learned one narrow thing. 00:45:52.460 |
help you discover the right tool for the job. 00:46:07.220 |
the performance that could be achieved with scale, 00:46:18.180 |
and didn't have to worry about short-term impact 00:46:27.580 |
I still think unsupervised learning is a beautiful idea. 00:46:49.420 |
- So here's the example of self-supervised learning. 00:46:55.820 |
We have infinite amounts of this type of data. 00:46:59.380 |
and rotate it by a random multiple of 90 degrees. 00:47:03.100 |
And then I'm going to train a supervised neural network 00:47:06.340 |
to predict what was the original orientation. 00:47:14.380 |
So you can generate an infinite amounts of labeled data 00:47:28.460 |
and training a large neural network on these tasks, 00:47:31.580 |
you can then take the hidden layer representation 00:47:33.820 |
and transfer it to a different task very powerfully. 00:47:41.380 |
predict the missing word, which is how we learn. 00:47:47.780 |
And I think there's now this portfolio of techniques 00:47:53.740 |
Another one called Jigsaw would be if you take an image, 00:48:02.100 |
jump up to nine pieces and have a neural network predict 00:48:05.140 |
which of the nine factorial possible permutations 00:48:13.700 |
Peter Abbeel's been doing some work on this too, 00:48:20.220 |
oh, actually Aaron Van Der Oort has great work 00:48:26.220 |
and I think this is a way to generate infinite label data. 00:48:34.180 |
- So long-term you think that's going to unlock 00:48:45.180 |
And I think this one piece, self-supervised learning 00:48:57.180 |
to just having a significant real world impact 00:49:05.140 |
and I think there'll be other concepts around it. 00:49:08.180 |
Other unsupervised learning things that I worked on 00:49:17.620 |
I think all of these are ideas that various of us 00:49:30.860 |
that really started this movement of deep learning. 00:49:33.900 |
- I think there's a lot more work that one could explore 00:49:40.300 |
- So if we could return to maybe talk quickly 00:49:49.540 |
How long does it take to complete the course, 00:49:52.740 |
- The official length of the deep learning specialization 00:50:00.700 |
So if you subscribe to the deep learning specialization, 00:50:03.660 |
there are people that finish that in less than a month 00:50:05.820 |
by working more intensely and studying more intensely. 00:50:10.740 |
Yeah, when we created the deep learning specialization, 00:50:13.460 |
we wanted to make it very accessible and very affordable. 00:50:18.100 |
And with Coursera and deeplearning.ai's education mission, 00:50:21.820 |
one of the things that's really important to me 00:50:23.500 |
is that if there's someone for whom paying anything 00:50:29.460 |
then just apply for financial aid and get it for free. 00:50:32.740 |
- If you were to recommend a daily schedule for people 00:50:41.380 |
or just learning in the world of deep learning, 00:50:46.900 |
How would they go about day-to-day sort of specific advice 00:50:52.700 |
in the world of deep learning, machine learning? 00:50:54.900 |
- I think getting the habit of learning is key, 00:51:01.220 |
So for example, we send out our weekly newsletter, 00:51:09.660 |
you can spend a little bit of time on Wednesday, 00:51:11.660 |
catching up on the latest news through The Batch 00:51:27.780 |
Do I feel like reading or studying today or not? 00:51:34.260 |
So I think if someone can get into that habit, 00:51:37.580 |
it's like, you know, just like we brush our teeth 00:51:42.140 |
If I thought about it, it's a little bit annoying 00:51:46.060 |
but it's a habit that it takes no cognitive load, 00:52:08.420 |
In my own life, like I play guitar every day for, 00:52:12.780 |
I force myself to at least for five minutes play guitar. 00:52:28.380 |
at certain aspects of a thing by just doing it every day 00:52:32.140 |
It's kind of a miracle that that's how it works. 00:52:36.300 |
- Yeah, and I think it's often not about the burst 00:52:41.980 |
because you can only do that a limited number of times. 00:52:47.340 |
I think, you know, reading two research papers 00:52:52.060 |
but the power is not reading two research papers. 00:52:54.340 |
It's reading two research papers a week for a year. 00:52:58.980 |
and you actually learn a lot when you read a hundred papers. 00:53:10.300 |
for particularly deep learning that people should, 00:53:22.220 |
when I'm trying to study something really deeply 00:53:28.340 |
that take the deep learning courses during a commute 00:53:32.100 |
or something where it may be more awkward to take notes. 00:53:52.260 |
And that act, we know that that act of taking notes, 00:53:55.500 |
preferably handwritten notes, increases retention. 00:54:03.900 |
and then taking the basic insights down on paper? 00:54:15.220 |
because handwriting is slower, as we're saying just now, 00:54:18.140 |
it causes you to recode the knowledge in your own words more 00:54:23.220 |
and that process of recoding promotes long-term retention. 00:54:30.820 |
or in taking a class and not taking notes is better 00:54:39.660 |
For a lot of people, you can handwrite notes. 00:54:43.060 |
they're more likely to just transcribe verbatim 00:54:49.180 |
And that actually results in less long-term retention. 00:54:52.540 |
- I don't know what the psychological effect there is, 00:55:01.700 |
as just the time it takes to write is slower. 00:55:04.420 |
- Yeah, and because you can't write as many words, 00:55:19.020 |
- Oh, and I've spent, I think, because of Coursera, 00:55:25.340 |
I really love learning how to more efficiently 00:55:30.260 |
- Yeah, one of the things I do both when creating videos 00:55:39.220 |
going to be a more efficient learning experience 00:55:50.140 |
So when we're editing, I often tell my teams, 00:55:55.300 |
And if we can delete a word, let's just delete it 00:55:56.980 |
and not wait, let's not waste the learners' time. 00:56:00.220 |
- Wow, that's so, it's so amazing that you think that way 00:56:02.260 |
'cause there is millions of people that are impacted 00:56:04.300 |
by your teaching and sort of that one minute spent 00:56:08.420 |
Through years of time, which is fascinating to think about. 00:56:18.780 |
We just talked about sort of the beginning, early steps, 00:56:21.460 |
but if you want to make it an entire life's journey 00:56:24.420 |
or at least a journey of a decade or two, how do you do it? 00:56:32.060 |
- And I think in the early parts of a career, 00:56:35.500 |
coursework, like the deep learning specialization, 00:56:38.820 |
is a very efficient way to master this material. 00:56:49.340 |
Lawrence Moroney teaches our TensorFlow specialization 00:56:53.420 |
spend effort to try to make it time efficient 00:56:58.140 |
So coursework is actually a very efficient way 00:57:02.780 |
in the beginning parts of breaking into a new field. 00:57:13.620 |
in your first couple of years as a PhD student, 00:57:15.460 |
spend time taking courses, 'cause it lays a foundation. 00:57:24.980 |
there's materials that doesn't exist in courses 00:57:31.460 |
that we're not yet that good as teaching in a course. 00:57:34.660 |
And I think after exhausting the efficient coursework, 00:57:53.020 |
And again, I think it's important to start small 00:58:08.780 |
be it MNIST or upgrade to a fashion MNIST, to whatever. 00:58:18.940 |
I find this to be true at the individual level 00:58:23.780 |
For a company to become good at machine learning, 00:58:35.340 |
But this is true both for individuals and for companies. 00:58:49.300 |
That's one of the fascinating things in machine learning. 00:58:51.540 |
You can have so much impact without ever getting a PhD. 00:58:59.780 |
- I think that there are multiple good options 00:59:05.380 |
I think that if someone's admitted to a top PhD program, 00:59:15.700 |
Or if someone gets a job at a top organization, 00:59:24.060 |
There are some things you still need a PhD to do. 00:59:25.980 |
If someone's aspiration is to be a professor, 00:59:31.140 |
But if it goes to, you know, start a company, 00:59:41.540 |
where are the places where you can get a job? 00:59:43.100 |
Where are the places you can get in a PhD program? 00:59:45.060 |
And kind of weigh the pros and cons of those. 00:59:47.660 |
- So just to linger on that for a little bit longer, 00:59:50.060 |
what final dreams and goals do you think people should have? 00:59:57.420 |
So you can work in industry, so for a large company, 01:00:06.140 |
that already have huge teams of machine learning engineers. 01:00:12.340 |
that kind of like Google Research, Google Brain. 01:00:25.180 |
Is there anything that stands out between those options 01:00:32.780 |
- I think the thing that affects your experience more 01:00:34.820 |
is less are you in this company versus that company 01:00:40.140 |
I think the thing that affects your experience most 01:00:41.660 |
is who are the people you're interacting with 01:00:45.500 |
So even if you look at some of the large companies, 01:00:49.540 |
the experience of individuals in different teams 01:00:53.060 |
And what matters most is not the logo above the door 01:00:56.260 |
when you walk into the giant building every day. 01:00:58.460 |
What matters the most is who are the 10 people, 01:01:00.620 |
who are the 30 people you interact with every day. 01:01:12.580 |
We tend to become more like the people around us. 01:01:20.660 |
if you get a job at a great company or a great university, 01:01:31.300 |
And then that's actually a really bad experience. 01:01:38.140 |
For small companies, you can kind of figure out 01:01:43.860 |
if a company refuses to tell you who you work with, 01:01:58.340 |
with great peers and great people to work with. 01:02:05.140 |
We don't consider too rigorously or carefully. 01:02:16.780 |
So it's not about whether you learn this thing or that thing 01:02:21.780 |
or like you said, the logo that hangs up top, 01:02:29.460 |
of finding, just like finding the right friends 01:02:34.220 |
and somebody to get married with and that kind of thing. 01:02:37.540 |
It's a very hard search, it's a people search problem. 01:02:46.980 |
it's good to insist on just asking who are the people? 01:02:51.420 |
And if you refuse to tell me, I'm gonna think, 01:02:53.900 |
well, maybe that's 'cause you don't have a good answer. 01:03:06.380 |
That's a really important signal to consider. 01:03:14.620 |
I think I gave like a hour long talk on career advice, 01:03:18.300 |
including on the job search process and then some of these. 01:03:27.180 |
So the AI fund helps AI startups get off the ground, 01:03:37.860 |
on how does one build a successful AI startup? 01:03:42.380 |
- In Silicon Valley, a lot of startup failures 01:03:44.980 |
come from building a product that no one wanted. 01:03:48.500 |
So when, cool technology, but who's gonna use it? 01:03:59.500 |
Ultimately, we don't get to vote if we succeed or fail. 01:04:04.140 |
It's only the customer that they're the only one 01:04:07.020 |
that gets a thumbs up or thumbs down votes in the long term. 01:04:13.100 |
but in the long term, that's what really matters. 01:04:25.100 |
I think startups that are very customer focused, 01:04:27.460 |
customer obsessed, deeply understand the customer 01:04:44.500 |
addictive digital products just to sell a lot of ads. 01:04:47.900 |
There are things that could be lucrative that I won't do, 01:04:50.740 |
but if we can find ways to serve people in meaningful ways, 01:04:59.060 |
either in the academic setting or in a corporate setting 01:05:03.100 |
- So can you give me the idea of why you started the AI fund? 01:05:08.740 |
- I remember when I was leading the AI group at Baidu, 01:05:19.140 |
And that was running, just ran, just performed by itself. 01:05:31.140 |
So the self-driving car team came out of my group, 01:05:37.140 |
similar to what is Amazon Echo or Alexa in the US, 01:05:41.020 |
but we actually announced it before Amazon did. 01:05:48.780 |
and I found that to be actually the most fun part of my job. 01:05:58.300 |
to systematically create new startups from scratch. 01:06:12.340 |
very important mechanism to get these projects done 01:06:16.580 |
So I've been fortunate to have built a few teams 01:06:27.980 |
So a startup studio is a relatively new concept. 01:06:31.540 |
There are maybe dozens of startup studios right now, 01:06:45.460 |
So I think even a lot of my venture capital friends 01:06:49.140 |
are seem to be more and more building companies 01:06:56.700 |
by which we could systematically build successful teams, 01:06:59.700 |
successful businesses in areas that we find meaningful. 01:07:16.540 |
So we often bring in founders and work with them, 01:07:34.580 |
to automate the process of starting from scratch 01:07:40.540 |
- Yeah, I think we've been constantly improving 01:07:44.460 |
and iterating on our processes, how we do that. 01:07:48.460 |
So things like, how many customer calls do we need to make 01:07:53.500 |
How do we make sure this technology can be built? 01:07:55.260 |
Quite a lot of our businesses need cutting edge 01:08:04.980 |
it turns out taking the production is really hard. 01:08:07.020 |
There are a lot of issues for making these things work 01:08:09.500 |
in the real life that are not widely addressed in academia. 01:08:14.100 |
So how do we validate that this is actually doable? 01:08:24.460 |
we've been getting much better at giving the entrepreneurs 01:08:29.180 |
a high success rate, but I think we're still, 01:08:35.500 |
- But do you think there is some aspects of that process 01:08:39.180 |
that are transferable from one startup to another, 01:08:45.020 |
You know, starting a company to most entrepreneurs 01:08:56.300 |
Like, when do you need to, how do you do B2B sales? 01:09:05.540 |
other than buying ads, which is really expensive? 01:09:15.300 |
of whether a machine learning product works or not. 01:09:18.460 |
And so there are so many hundreds of decisions 01:09:22.700 |
and making a mistake in a couple of key decisions 01:09:25.780 |
can have a huge impact on the fate of the company. 01:09:30.260 |
So I think a startup studio provides a support structure 01:09:36.300 |
And also when facing with these key decisions, 01:09:40.020 |
like trying to hire your first VP of engineering, 01:09:48.820 |
By having an ecosystem around the entrepreneurs, 01:09:57.460 |
and hopefully significantly make them more enjoyable 01:10:10.980 |
what they may not even realize is a key decision point. 01:10:15.180 |
That's the first and probably the most important part. 01:10:26.460 |
that we build companies that move the world forward. 01:10:37.460 |
would have resulted in people watching a lot more videos 01:10:43.340 |
And I looked at it, the business case was fine, 01:10:50.020 |
I don't actually just want to have a lot more people 01:10:59.980 |
that I didn't think it would actually help people. 01:11:01.980 |
So whether building companies or work of enterprises 01:11:11.020 |
what's the difference we want to make in the world. 01:11:20.220 |
How does a large company integrate machine learning 01:11:27.700 |
and I think it would transform every industry. 01:11:30.540 |
Our community has already transformed to a large extent 01:11:40.100 |
already have reasonable machine learning capabilities 01:11:46.380 |
But when I look outside the software internet sector, 01:11:59.780 |
is for us to also transform all of those other industries. 01:12:04.620 |
estimating $13 trillion of global economic growth. 01:12:18.420 |
But the interesting thing to me was a lot of that impact 01:12:20.780 |
would be outside the software internet sector. 01:12:23.740 |
So we need more teams to work with these companies 01:12:46.300 |
Some of the ones I'm spending a lot of time on 01:12:49.940 |
are manufacturing, agriculture, looking into healthcare. 01:13:02.940 |
to check if this plastic part or the smartphone 01:13:05.820 |
or this thing has a scratch or a dent or something in it. 01:13:12.500 |
use a algorithm, deep learning and other things 01:13:23.660 |
It turns out the practical problems we run into 01:13:25.820 |
are very different than the ones you might read about 01:13:32.500 |
You know, the factories keep on changing the environment, 01:13:50.860 |
And so increasing our algorithmic robustness, 01:13:57.100 |
I find that we run into a lot of practical problems 01:13:59.300 |
that are not as widely discussed in academia, 01:14:02.660 |
and it's really fun kind of being on the cutting edge, 01:14:13.300 |
but what is the first step that a company should take? 01:14:28.340 |
Like, what's the early journey that you recommend 01:14:41.620 |
about the long-term journey that companies should take, 01:14:44.780 |
but the first step is actually to start small. 01:14:49.500 |
by starting too big than by starting too small. 01:14:54.700 |
You know, most people don't realize how hard it was 01:14:57.620 |
and how controversial it was in the early days. 01:15:00.620 |
So when I started Google Brain, it was controversial. 01:15:14.060 |
which is not the most lucrative project in Google, 01:15:25.980 |
build a more accurate speech recognition system. 01:15:30.820 |
to start to have more faith in deep learning. 01:15:33.060 |
My second internal customer was the Google Maps team, 01:15:36.500 |
where we used computer vision to read house numbers 01:15:41.140 |
to more accurately locate houses within Google Maps, 01:15:48.380 |
that I then started a more serious conversation 01:15:54.220 |
that you showed that it works in these cases, 01:15:56.900 |
and then it just propagates through the entire company, 01:15:59.300 |
that this thing has a lot of value and use for us. 01:16:07.420 |
but also helps the teams learn what these technologies do. 01:16:20.580 |
about how do you have multiple users share a set of GPUs, 01:16:34.100 |
how to scale it up to much larger deployments. 01:16:37.540 |
- Are there concrete challenges that companies face 01:16:47.340 |
There's a huge gulf between something that works 01:16:54.620 |
in a factory or agriculture plant or whatever. 01:16:58.420 |
So I see a lot of people get something to work 01:17:05.860 |
but a lot of teams underestimate the rest of the steps needed. 01:17:09.580 |
So for example, I've heard this exact same conversation 01:17:12.500 |
between a lot of machine learning people and business people. 01:17:16.860 |
"Look, my algorithm does well on the test set. 01:17:23.500 |
And the business person says, "Thank you very much, 01:17:33.660 |
And I think there is a gulf between what it takes 01:17:40.660 |
versus what it takes to work well in a deployment setting. 01:17:44.220 |
Some common problems, robustness and generalization. 01:17:50.620 |
maybe they chop down a tree outside the factory 01:17:57.820 |
And in machine learning, and especially in academia, 01:18:01.300 |
we don't know how to deal with test set distributions 01:18:08.660 |
there's stuff like domain annotation, transfer learning. 01:18:18.180 |
Because your test set distribution is going to change. 01:18:21.700 |
And I think also, if you look at the number of lines of code 01:18:25.780 |
in the software system, the machine learning model 01:18:32.740 |
to the entire software system you need to build. 01:18:38.940 |
- So good software engineering work is fundamental here 01:18:42.700 |
to building a successful small machine learning system. 01:18:46.380 |
- Yes, and the software system needs to interface 01:18:50.620 |
So machine learning is automation on steroids. 01:19:00.820 |
If we automate that one task, it can be really valuable, 01:19:03.980 |
but you may need to redesign a lot of other tasks 01:19:07.380 |
For example, say the machine learning algorithm 01:19:29.820 |
So I think what landing AI has become good at, 01:19:39.100 |
What we've become good at is working with our partners 01:19:51.580 |
manage the change process and figure out how to deploy this 01:19:56.860 |
The processes that the large software tech companies use 01:19:59.980 |
for deploying don't work for a lot of other scenarios. 01:20:03.020 |
For example, when I was leading large speech teams, 01:20:07.060 |
if the speech recognition system goes down, what happens? 01:20:09.820 |
Well, alarms goes off and then someone like me would say, 01:20:16.740 |
But if you have a system go down in the factory, 01:20:21.460 |
sitting around, you can page a duty and have them fix it. 01:20:23.940 |
So how do you deal with the maintenance or the DevOps 01:20:30.340 |
So these are concepts that I think landing AI 01:20:34.060 |
and a few other teams are on the cutting edge of, 01:20:36.580 |
but we don't even have systematic terminology yet 01:20:41.060 |
because I think we're inventing it on the fly. 01:20:43.380 |
- So you mentioned some people are interested 01:20:49.700 |
and you're interested in having a big positive impact 01:21:00.980 |
'cause you're probably interested a little bit in both. 01:21:06.140 |
So much of the work, your work and our discussion today 01:21:40.980 |
but whether it takes a hundred years or 500 or 5,000, 01:21:49.560 |
so some folks have worries about the different trajectories 01:22:02.380 |
- I do worry about the long-term fate of humanity. 01:22:09.400 |
I do worry about overpopulation on the planet Mars, 01:22:21.680 |
and someone will look back at this video and say, 01:22:32.040 |
but I just don't know how to productively work on that today. 01:22:47.700 |
in terms of aligning the values of our AI systems 01:23:19.380 |
how many times when you are driving your car, 01:23:25.900 |
So I think self-driving cars will run into that problem 01:23:28.380 |
roughly as often as we do when we drive our cars. 01:23:33.460 |
is when there's a big white truck across the road, 01:23:35.860 |
and what you should do is brake and not crash into it, 01:23:38.300 |
and the self-driving car fails and it crashes into it. 01:23:41.260 |
So I think we need to solve that problem first. 01:23:43.620 |
I think the problem with some of these discussions 01:23:45.740 |
about AGI, you know, alignment, the paperclip problem, 01:23:50.740 |
is that is a huge distraction from the much harder problems 01:23:58.940 |
Some of the hard problems we need to address today, 01:24:10.820 |
because we can now centralize data, use AI to process it. 01:24:17.620 |
So the internet industry has a lot of win-and-take modes 01:24:22.140 |
but we've infected all these other industries. 01:24:26.260 |
win-and-take modes or win-and-take-all flavors. 01:24:28.740 |
So look at what Uber and Lyft did to the taxi industry. 01:24:32.580 |
So we're doing this type of thing to a lot of... 01:24:36.500 |
but how do we make sure that the wealth is fairly shared? 01:24:41.220 |
And then how do we help people whose jobs are displaced? 01:24:47.020 |
There may be even more that we need to do than education. 01:24:57.180 |
like deepfakes being used for various nefarious purposes. 01:25:00.580 |
So I worry about some teams, maybe accidentally, 01:25:13.180 |
rather than focusing on some of the much harder problems. 01:25:18.820 |
They're exceptionally challenging, like those you said, 01:25:40.060 |
some companies, when a regulator comes to you and says, 01:25:49.460 |
about how you promised not to wipe out humanity 01:25:51.860 |
than to face the actually really hard problems we face. 01:25:57.620 |
from teaching to research to entrepreneurship. 01:26:04.140 |
moments that if you went back, you would do differently? 01:26:07.100 |
And two, are there moments you're especially proud of, 01:26:15.740 |
It feels like every time I discover something, 01:26:20.260 |
I go, "Why didn't I think of this five years earlier 01:26:38.860 |
"If only I read this book when we're starting up Coursera, 01:26:43.900 |
But I discovered the book had not yet been written 01:27:09.300 |
if you look back, that you're especially proud of 01:27:14.180 |
that filled you with happiness and fulfillment? 01:27:22.380 |
- She's like, "No matter how much time I spend with her, 01:27:32.260 |
is helping others achieve whatever are their dreams. 01:27:36.860 |
And then also to try to move the world forward 01:27:43.700 |
So the times that I felt most happy and most proud 01:27:46.540 |
was when I felt someone else allowed me the good fortune 01:27:51.540 |
of helping them a little bit on the path to their dreams. 01:27:58.660 |
than talking about happiness and the meaning of life. 01:28:08.820 |
- Thanks for listening to this conversation with Andrew Ng. 01:28:11.740 |
And thank you to our presenting sponsor, Cash App. 01:28:20.220 |
an organization that inspires and educates young minds 01:28:23.340 |
to become science and technology innovators of tomorrow. 01:28:26.580 |
If you enjoy this podcast, subscribe on YouTube, 01:28:31.580 |
support on Patreon, or simply connect with me on Twitter 01:28:36.180 |
And now let me leave you with some words of wisdom 01:28:46.340 |
would you have significantly helped other people? 01:28:48.840 |
If not, then keep searching for something else to work on. 01:28:53.220 |
Otherwise, you're not living up to your full potential. 01:28:57.900 |
Thank you for listening and hope to see you next time.