Andrew Ng: Advice on Getting Started in Deep Learning

00:00:00.000 | So let's perhaps talk about each of these areas.

00:00:04.640 | First, deeplearning.ai.

00:00:06.880 | How, the basic question,

00:00:10.080 | how does a person interested in deep learning

00:00:12.440 | get started in the field?

00:00:14.680 | - Deeplearning.ai is working to create courses

00:00:18.000 | to help people break into AI.

00:00:19.800 | So my machine learning course

00:00:23.120 | that I taught through Stanford

00:00:24.560 | is one of the most popular courses on Coursera.

00:00:27.800 | To this day, it's probably one of the courses,

00:00:30.840 | sort of, if I ask somebody,

00:00:32.160 | how did you get into machine learning

00:00:34.640 | or how did you fall in love with machine learning

00:00:36.480 | or what gets you interested,

00:00:37.960 | it always goes back to Andrew Yang at some point.

00:00:41.480 | The amount of people you've influenced is ridiculous.

00:00:45.560 | So for that, I'm sure I speak for a lot of people

00:00:48.120 | say big thank you.

00:00:49.440 | - No, yeah, thank you.

00:00:50.480 | You know, I was once reading a news article,

00:00:54.360 | I think it was Tech Review

00:00:57.440 | and I'm gonna mess up the statistic,

00:00:59.960 | but I remember reading an article that said

00:01:02.440 | something like one third of our programmers are self-taught.

00:01:06.000 | I may have the number one third wrong,

00:01:07.360 | it was two thirds.

00:01:08.200 | But when I read that article, I thought,

00:01:09.560 | this doesn't make sense.

00:01:10.440 | Everyone is self-taught.

00:01:12.120 | 'Cause you teach yourself, I don't teach people.

00:01:14.720 | I just- - That's well put.

00:01:16.800 | So yeah, so how does one get started in deep learning

00:01:20.360 | and where does deeplearning.ai fit into that?

00:01:22.920 | - So the deep learning specialization

00:01:24.720 | offered by deep learning.ai is,

00:01:26.480 | I think it was Coursera's top specialization.

00:01:32.320 | It might still be.

00:01:33.160 | So it's a very popular way for people

00:01:35.240 | to take that specialization,

00:01:36.800 | to learn about everything from neural networks

00:01:40.120 | to how to tune in your network.

00:01:42.440 | So what does a conf net do?

00:01:43.920 | What is a RNN or a sequence model

00:01:46.440 | or what is an attention model?

00:01:48.120 | And so the deep learning specialization

00:01:50.440 | steps everyone through those algorithms.

00:01:53.280 | So you deeply understand it and can implement it

00:01:55.520 | and use it for whatever applications.

00:01:57.560 | - From the very beginning.

00:01:58.800 | So what would you say are the prerequisites

00:02:01.840 | for somebody to take the deep learning specialization

00:02:04.440 | in terms of maybe math or programming background?

00:02:07.960 | - Yeah, need to understand basic programming

00:02:10.360 | since there are programming exercises in Python.

00:02:12.760 | And the math prereq is quite basic.

00:02:16.680 | So no calculus is needed.

00:02:18.280 | If you know calculus, it's great.

00:02:19.440 | You get better intuitions.

00:02:21.000 | But deliberately try to teach that specialization

00:02:23.600 | without requiring calculus.

00:02:25.000 | So I think high school math would be sufficient.

00:02:29.600 | If you know how to multiply two matrices,

00:02:31.360 | I think that's great.

00:02:34.560 | - So a little basic linear algebra is great.

00:02:37.120 | - Basically in the algebra,

00:02:38.280 | even very, very basically in the algebra

00:02:40.600 | and some programming.

00:02:42.440 | I think that people that have done

00:02:43.560 | the machine learning course

00:02:44.520 | will find a deep learning specialization a bit easier.

00:02:47.440 | But it's also possible to jump

00:02:48.760 | into the deep learning specialization directly,

00:02:50.680 | but it'll be a little bit harder

00:02:52.280 | since we tend to go over faster concepts

00:02:56.840 | like how does gradient descent work

00:02:58.520 | and what is the objective function,

00:02:59.760 | which is covered more slowly

00:03:01.280 | in the machine learning course.

00:03:02.560 | - Could you briefly mention some of the key concepts

00:03:05.200 | in deep learning that students should learn

00:03:07.360 | that you envision them learning

00:03:08.680 | in the first few months, in the first year or so?

00:03:11.640 | - So if you take the deep learning specialization,

00:03:14.200 | you learn the foundations of what is a neural network?

00:03:17.200 | How do you build up a neural network

00:03:19.200 | from a single logistic unit,

00:03:21.120 | to a stack of layers,

00:03:22.960 | to different activation functions?

00:03:25.480 | You learn how to train the neural networks.

00:03:27.200 | One thing I'm very proud of in that specialization

00:03:30.120 | is we go through a lot of practical know-how

00:03:32.640 | of how to actually make these things work.

00:03:34.600 | So what are the differences

00:03:35.680 | between different optimization algorithms?

00:03:38.120 | What do you do if the algorithm overfits?

00:03:39.640 | So how do you tell if the algorithm is overfitting?

00:03:41.400 | When do you collect more data?

00:03:42.600 | When should you not bother to collect more data?

00:03:45.600 | I find that even today, unfortunately,

00:03:48.560 | there are engineers that will spend six months

00:03:52.360 | trying to pursue a particular direction,

00:03:54.960 | such as collect more data,

00:03:56.280 | because we heard more data is valuable.

00:03:58.240 | But sometimes you could run some tests

00:04:00.680 | and could have figured out six months earlier

00:04:02.800 | that for this particular problem,

00:04:04.360 | collecting more data isn't going to cut it.

00:04:06.280 | So just don't spend six months collecting more data.

00:04:08.600 | Spend your time modifying the architecture

00:04:11.680 | or trying something else.

00:04:12.600 | So go through a lot of the practical know-how

00:04:14.960 | so that when someone,

00:04:17.760 | when you take the deep learning specialization,

00:04:19.600 | you have those skills to be very efficient

00:04:22.120 | in how you build these networks.

00:04:24.320 | - So dive right in to play with the network,

00:04:26.640 | to train it, to do the inference on a particular dataset,

00:04:29.600 | to build the intuition about it

00:04:30.880 | without building it up too big

00:04:34.520 | to where you spend, like you said, six months learning,

00:04:38.000 | building up your big project

00:04:39.800 | without building any intuition of a small,

00:04:42.520 | a small aspect of the data that could already tell you

00:04:45.800 | everything you need to know about that data.

00:04:47.960 | - Yes, and also the systematic frameworks of thinking

00:04:51.640 | for how to go about building practical machine learning.

00:04:54.720 | Maybe to make an analogy,

00:04:56.600 | when we learn to code,

00:04:57.720 | we have to learn the syntax of some programming language,

00:05:00.160 | right, be it Python or C++ or Octave or whatever.

00:05:03.840 | But the equally important

00:05:05.280 | or maybe even more important part of coding

00:05:07.280 | is to understand how to string together these lines of code

00:05:10.000 | into coherent things.

00:05:11.120 | So, you know, when should you put something

00:05:13.440 | in a function column?

00:05:14.280 | When should you not?

00:05:15.120 | How do you think about abstraction?

00:05:16.960 | So those frameworks are what makes a programmer efficient,

00:05:21.400 | even more than understanding the syntax.

00:05:24.000 | I remember when I was an undergrad at Carnegie Mellon,

00:05:27.200 | one of my friends would debug their code

00:05:29.880 | by first trying to compile it,

00:05:31.600 | and then it was C++ code.

00:05:33.240 | And then every line that has syntax error,

00:05:35.680 | they want to get rid of the syntax errors

00:05:37.040 | as quickly as possible.

00:05:38.040 | So how do you do that?

00:05:38.920 | Well, they would delete every single line of code

00:05:40.680 | with a syntax error.

00:05:42.000 | So really efficient for getting rid of syntax errors

00:05:44.040 | for horrible debugging errors.

00:05:45.280 | So I think, so we learn how to debug.

00:05:47.840 | And I think in machine learning,

00:05:49.320 | the way you debug a machine learning program

00:05:51.720 | is very different than the way you, you know,

00:05:53.720 | like do binary search or whatever,

00:05:55.440 | or use a debugger, like trace through the code

00:05:57.440 | in traditional software engineering.

00:05:59.320 | So it's an evolving discipline,

00:06:01.320 | but I find that the people that are really good

00:06:03.120 | at debugging machine learning algorithms

00:06:05.200 | are easily 10X, maybe 100X faster

00:06:08.440 | at getting something to work.

00:06:09.960 | So-

00:06:10.800 | - And the basic process of debugging is,

00:06:12.760 | so the bug in this case,

00:06:14.960 | why isn't this thing learning, improving,

00:06:18.760 | sort of going into the questions of overfitting

00:06:21.680 | and all those kinds of things.

00:06:23.080 | That's the logical space that the debugging is happening in

00:06:27.680 | with neural networks.

00:06:28.880 | - Yeah, often the question is, why doesn't it work yet?

00:06:32.760 | Or can I expect this to eventually work?

00:06:35.400 | And what are the things I could try?

00:06:37.200 | Change the architecture, more data, more regularization,

00:06:39.880 | different optimization algorithm, you know,

00:06:42.600 | different types of data.

00:06:44.320 | So to answer those questions systematically,

00:06:46.560 | so that you don't heading down the,

00:06:48.160 | so you don't spend six months heading down the blind alley

00:06:50.400 | before someone comes and says,

00:06:52.120 | why did you spend six months doing this?

00:06:54.480 | - What concepts in deep learning

00:06:56.360 | do you think students struggle the most with?

00:06:58.840 | Or sort of is the biggest challenge for them

00:07:01.400 | once they get over that hill?

00:07:04.000 | It's, it hooks them and it inspires them

00:07:07.800 | and they really get it.

00:07:10.160 | - Similar to learning mathematics,

00:07:12.600 | I think one of the challenges of deep learning

00:07:14.760 | is that there are a lot of concepts

00:07:16.400 | that build on top of each other.

00:07:18.000 | If you ask me what's hard about mathematics,

00:07:21.160 | I have a hard time pinpointing one thing.

00:07:23.320 | Is it addition, subtraction?

00:07:24.680 | Is it a carry?

00:07:25.520 | Is it multiplication?

00:07:26.800 | There's just a lot of stuff.

00:07:28.200 | I think one of the challenges of learning math

00:07:30.200 | and of learning certain technical fields

00:07:32.200 | is that there are a lot of concepts.

00:07:33.920 | And if you miss a concept,

00:07:35.400 | then you're kind of missing the prerequisite

00:07:37.840 | for something that comes later.

00:07:40.360 | So in the deep learning specialization,

00:07:44.200 | try to break down the concepts

00:07:45.880 | to maximize the odds of each component being understandable.

00:07:49.320 | So when you move on to the more advanced thing,

00:07:51.640 | we learn confinates,

00:07:53.160 | hopefully you have enough intuitions

00:07:54.680 | from the earlier sections

00:07:56.320 | to then understand why we structure confinates

00:07:59.160 | in a certain way.

00:08:00.960 | And then eventually why we built RNNs on LSTMs

00:08:05.400 | or attention model in a certain way,

00:08:07.120 | building on top of the earlier concepts.

00:08:10.040 | - Actually, I'm curious,

00:08:11.000 | you do a lot of teaching as well.

00:08:13.320 | Do you have a favorite,

00:08:15.520 | this is the hard concept moment in your teaching?

00:08:18.680 | - Well, I don't think anyone's ever turned

00:08:23.520 | the interview on me.

00:08:24.680 | - I'm glad to be first.

00:08:27.120 | - I think that's a really good question.

00:08:31.320 | Yeah, it's really hard to capture the moment

00:08:33.640 | when they struggle.

00:08:34.480 | I think you put it really eloquently.

00:08:35.720 | I do think there's moments

00:08:37.520 | that are like aha moments that really inspire people.

00:08:41.800 | I think for some reason, reinforcement learning,

00:08:45.720 | especially deep reinforcement learning

00:08:47.960 | is a really great way to really inspire people

00:08:51.960 | and get what the use of neural networks can do.

00:08:55.920 | Even though neural networks really are just a part

00:08:59.000 | of the deep RL framework,

00:09:00.960 | but it's a really nice way to paint the entirety

00:09:03.720 | of the picture of a neural network

00:09:06.360 | being able to learn from scratch,

00:09:08.320 | knowing nothing and explore the world and pick up lessons.

00:09:11.480 | I find that a lot of the aha moments happen

00:09:14.360 | when you use deep RL to teach people

00:09:17.840 | about neural networks, which is counterintuitive.

00:09:20.240 | I find like a lot of the inspired sort of fire

00:09:23.120 | in people's passion, people's eyes,

00:09:24.680 | it comes from the RL world.

00:09:27.200 | Do you find reinforcement learning

00:09:29.400 | to be a useful part of the teaching process or no?

00:09:33.000 | - I still teach reinforcement learning

00:09:35.840 | in one of my Stanford classes

00:09:37.960 | and my PhD thesis was on reinforcement learning.

00:09:39.880 | So I clearly love the field.

00:09:41.720 | I find that if I'm trying to teach students

00:09:43.880 | the most useful techniques for them to use today,

00:09:46.960 | I end up shrinking the amount of time

00:09:49.480 | I talk about reinforcement learning.

00:09:51.120 | It's not what's working today.

00:09:53.200 | Now our world changes so fast.

00:09:54.720 | Maybe it'll be totally different in a couple of years,

00:09:57.440 | but I think we need a couple more things

00:10:00.160 | for reinforcement learning to get there.

00:10:02.000 | - To actually get there, yeah.

00:10:02.920 | - One of my teams is looking to reinforcement learning

00:10:05.040 | for some robotic control tasks.

00:10:06.240 | So I see the applications,

00:10:07.600 | but if you look at it as a percentage

00:10:10.040 | of all of the impact of the types of things we do,

00:10:12.480 | is at least today, outside of playing video games

00:10:17.480 | in a few of the games, the scope.

00:10:20.800 | Actually at NeurIPS, a bunch of us were standing around

00:10:23.280 | saying, "Hey, what's your best example

00:10:25.200 | "of an actual deploy reinforcement learning application?"

00:10:27.640 | And among senior machine learning researchers.

00:10:31.400 | And again, there are some emerging ones,

00:10:33.800 | but there are not that many great examples.

00:10:37.640 | - I think you're absolutely right.

00:10:40.560 | The sad thing is there hasn't been a big application,

00:10:44.280 | impactful real-world application reinforcement learning.

00:10:47.200 | I think its biggest impact to me has been in the toy domain,

00:10:51.720 | in the game domain, in the small example.

00:10:53.680 | That's what I mean for educational purpose,

00:10:55.920 | it seems to be a fun thing to explore neural networks with.

00:10:59.160 | But I think from your perspective,

00:11:01.440 | and I think that might be the best perspective,

00:11:04.280 | is if you're trying to educate with a simple example

00:11:07.120 | in order to illustrate how this can actually be grown

00:11:10.720 | to scale and have a real world impact,

00:11:14.040 | then perhaps focusing on the fundamentals

00:11:16.080 | of supervised learning in the context of a simple dataset,

00:11:21.080 | even like an MNIST dataset is the right way,

00:11:24.200 | is the right path to take.

00:11:26.280 | I just, the amount of fun I've seen people have

00:11:29.240 | with reinforcement learning has been great,

00:11:30.840 | but not in the applied impact on the real world setting.

00:11:35.200 | So it's a trade-off, how much impact you want to have

00:11:37.800 | versus how much fun you want to have.

00:11:39.720 | - Yeah, that's really cool.

00:11:40.560 | And I feel like the world actually needs all sorts,

00:11:43.640 | even within machine learning,

00:11:44.960 | I feel like deep learning is so exciting,

00:11:48.200 | but the AI team shouldn't just use deep learning.

00:11:50.800 | I find that my teams use a portfolio of tools,

00:11:54.080 | and maybe that's not the exciting thing to say,

00:11:55.840 | but some days we use a neural net,

00:11:58.040 | some days we use a PCA,

00:12:02.400 | actually the other day I was sitting down with my team

00:12:03.920 | looking at PC residuals,

00:12:05.160 | trying to figure out what's going on

00:12:06.240 | with PC applied to a manufacturing problem.

00:12:08.160 | And some days we use a probabilistic graphical model,

00:12:10.640 | some days we use a knowledge draft,

00:12:12.240 | which is one of the things

00:12:13.080 | that has tremendous industry impact,

00:12:15.440 | but the amount of chatter about knowledge drafts

00:12:18.080 | in academia is really thin

00:12:19.720 | compared to the actual real world impact.

00:12:22.040 | So I think reinforcement learning

00:12:23.840 | should be in that portfolio,

00:12:25.000 | and then it's about balancing

00:12:26.120 | how much we teach all of these things.

00:12:27.680 | And the world should have diverse skills,

00:12:30.240 | it'd be sad if everyone just learned one narrow thing.

00:12:33.880 | - Yeah, the diverse skill

00:12:34.800 | help you discover the right tool for the job.

00:12:37.400 | So if we could return to maybe talk quickly

00:12:41.160 | about the specifics of deeplearning.ai,

00:12:43.880 | the deep learning specialization, perhaps.

00:12:46.680 | How long does it take to complete the course,

00:12:48.480 | would you say?

00:12:49.880 | - The official length of the deep learning specialization

00:12:52.480 | is I think 16 weeks, so about four months,

00:12:56.160 | but it's go at your own pace.

00:12:57.840 | So if you subscribe to the deep learning specialization,

00:13:00.800 | there are people that finish it in less than a month

00:13:02.920 | by working more intensely and studying more intensely.

00:13:05.200 | So it really depends on the individual.

00:13:07.840 | When we created the deep learning specialization,

00:13:10.600 | we wanted to make it very accessible and very affordable.

00:13:15.200 | And with Coursera and deeplearning.ai's education mission,

00:13:18.920 | one of the things that's really important to me

00:13:20.600 | is that if there's someone for whom paying anything

00:13:24.280 | is a financial hardship,

00:13:26.600 | then just apply for financial aid and get it for free.

00:13:29.880 | - If you were to recommend a daily schedule for people

00:13:35.240 | in learning, whether it's through the deeplearning.ai

00:13:37.800 | specialization or just learning

00:13:39.920 | in the world of deep learning, what would you recommend?

00:13:44.040 | How do they go about day-to-day sort of specific advice

00:13:47.400 | about learning, about their journey

00:13:49.800 | in the world of deep learning, machine learning?

00:13:52.000 | - I think getting the habit of learning is key,

00:13:56.280 | and that means regularity.

00:13:58.360 | So for example, we send out our weekly newsletter,

00:14:03.800 | The Batch, every Wednesday.

00:14:05.240 | So people know it's coming Wednesday,

00:14:06.800 | you can spend a little bit of time on Wednesday

00:14:08.800 | catching up on the latest news through The Batch

00:14:11.640 | on Wednesday.

00:14:14.640 | And for myself, I've picked up a habit

00:14:17.160 | of spending some time every Saturday and every Sunday

00:14:20.600 | reading or studying.

00:14:21.800 | And so I don't wake up on a Saturday

00:14:23.840 | and have to make a decision.

00:14:24.920 | Do I feel like reading or studying today or not?

00:14:27.360 | It's just what I do.

00:14:28.880 | And the fact is a habit makes it easier.

00:14:31.440 | So I think if someone can get into that habit,

00:14:34.720 | it's like, you know,

00:14:36.040 | just like we brush our teeth every morning.

00:14:38.320 | I don't think about it.

00:14:39.280 | If I thought about it, it's a little bit annoying

00:14:40.760 | to have to spend two minutes doing that,

00:14:43.200 | but it's a habit that it takes no cognitive load,

00:14:46.400 | but this would be so much harder

00:14:47.640 | if we have to make a decision every morning.

00:14:50.320 | So, and actually that's the reason why

00:14:52.120 | I wear the same thing every day as well.

00:14:53.280 | It's just one less decision.

00:14:54.440 | I just get up and wear my blue shirt.

00:14:56.800 | So, but I think if you can get that habit,

00:14:58.440 | that consistency of studying,

00:15:00.120 | then it actually feels easier.

00:15:02.880 | - So yeah, it's kind of amazing.

00:15:05.560 | In my own life, like I play guitar every day for,

00:15:09.920 | I force myself to at least for five minutes play guitar.

00:15:12.760 | It's a ridiculously short period of time,

00:15:15.400 | but because I've gotten into that habit,

00:15:17.320 | it's incredible what you can accomplish

00:15:19.000 | in a period of a year or two years.

00:15:21.680 | You can become, you know,

00:15:24.560 | exceptionally good at certain aspects of a thing

00:15:26.960 | by just doing it every day for a very short period of time.

00:15:29.280 | It's kind of a miracle that that's how it works.

00:15:31.880 | It adds up over time.

00:15:33.440 | - Yeah, and I think it's often not about

00:15:36.200 | the burst of sustained efforts and the all-nighters,

00:15:39.120 | because you can only do that a limited number of times.

00:15:41.480 | It's the sustained effort over a long time.

00:15:44.480 | I think, you know, reading two research papers

00:15:47.640 | is a nice thing to do,

00:15:49.200 | but the power is not reading two research papers.

00:15:51.480 | It's reading two research papers a week for a year.

00:15:54.760 | Then you've read a hundred papers

00:15:56.120 | and you actually learn a lot when you read a hundred papers.

00:15:59.280 | - So regularity and making learning a habit.

00:16:03.680 | Do you have general other study tips

00:16:07.440 | for particularly deep learning that people should,

00:16:11.240 | in their process of learning,

00:16:12.840 | is there some kind of recommendations

00:16:14.480 | or tips you have as they learn?

00:16:17.600 | - One thing I still do

00:16:19.400 | when I'm trying to study something really deeply

00:16:21.120 | is take handwritten notes.

00:16:23.640 | It varies.

00:16:24.480 | I know there are a lot of people

00:16:25.480 | that take the deep learning courses

00:16:27.160 | during a commute or something

00:16:29.800 | where it may be more awkward to take notes.

00:16:31.600 | So I know it may not work for everyone,

00:16:34.560 | but when I'm taking courses on Coursera,

00:16:37.280 | you know, and I still take some every now and then,

00:16:39.520 | the most recent one I took was a course on clinical trials,

00:16:42.200 | because I was interested about that.

00:16:43.480 | I got out my little Moleskine notebook

00:16:45.720 | and I was sitting at my desk.

00:16:46.720 | I was just taking down notes

00:16:48.240 | of what the instructor was saying.

00:16:49.400 | And that act, we know that that act of taking notes,

00:16:52.640 | preferably handwritten notes, increases retention.

00:16:56.560 | - So as you're sort of watching the video,

00:16:59.560 | just kind of pausing maybe,

00:17:01.040 | and then taking the basic insights down on paper?

00:17:05.040 | - Yeah, so there've been a few studies.

00:17:07.240 | If you search online, you find some of these studies

00:17:09.920 | that taking handwritten notes,

00:17:12.360 | because handwriting is slower, as we're saying just now,

00:17:16.240 | it causes you to recode the knowledge in your own words more

00:17:20.360 | and that process of recoding promotes long-term retention.

00:17:23.960 | This is as opposed to typing, which is fine.

00:17:26.160 | Again, typing is better than nothing,

00:17:27.960 | or in taking a class and not taking notes

00:17:29.680 | is better than not taking any class at all.

00:17:31.560 | But comparing handwritten notes and typing,

00:17:35.240 | you can usually type faster.

00:17:36.760 | For a lot of people, they can handwrite notes.

00:17:38.640 | And so when people type,

00:17:40.160 | they're more likely to just transcribe verbatim

00:17:42.600 | what they heard,

00:17:43.480 | and that reduces the amount of recoding,

00:17:46.280 | and that actually results in less long-term retention.

00:17:49.640 | - I don't know what the psychological effect there is,

00:17:51.520 | but it's so true.

00:17:52.520 | There's something fundamentally different

00:17:54.080 | about writing, handwriting.

00:17:56.760 | I wonder what that is.

00:17:57.600 | I wonder if it is as simple

00:17:58.800 | as just the time it takes to write is slower.

00:18:01.560 | - Yeah, and because you can't write as many words,

00:18:05.320 | you have to take whatever they said

00:18:07.440 | and summarize it into fewer words.

00:18:09.160 | And that summarization process

00:18:10.640 | requires deeper processing of the meaning,

00:18:13.080 | which then results in better retention.

00:18:15.080 | - That's fascinating.

00:18:16.120 | - Oh, and I've spent,

00:18:18.160 | I think because of Coursera,

00:18:19.640 | I've spent so much time studying pedagogy.

00:18:21.360 | It's actually one of my passions.

00:18:22.480 | I really love learning how to more efficiently

00:18:25.240 | help others learn.

00:18:26.400 | Yeah, one of the things I do both when creating videos

00:18:30.800 | or when we write the batch is,

00:18:32.520 | I try to think, is one minute spent with us

00:18:36.400 | going to be a more efficient learning experience

00:18:39.200 | than one minute spent anywhere else?

00:18:41.120 | And we really try to make it time efficient

00:18:44.920 | for the learners, 'cause everyone's busy.

00:18:47.360 | So when we're editing, I often tell my teams,

00:18:50.680 | every word needs to fight for its life.

00:18:52.480 | And if you can delete a word,

00:18:53.400 | let's just delete it and not wait.

00:18:54.880 | Let's not waste the learners' time.

00:18:57.160 | - Oh, it's so amazing that you think that way,

00:18:59.360 | 'cause there is millions of people

00:19:00.800 | that are impacted by your teaching.

00:19:01.960 | And sort of that one minute spent

00:19:03.920 | has a ripple effect, right?

00:19:05.520 | Through years of time,

00:19:06.800 | which is just fascinating to think about.

00:19:09.760 | How does one make a career

00:19:11.440 | out of an interest in deep learning?

00:19:13.120 | Do you have advice for people?

00:19:15.880 | We just talked about sort of the beginning, early steps,

00:19:18.560 | but if you want to make it an entire life's journey,

00:19:21.520 | or at least a journey of a decade or two,

00:19:23.960 | how do you do it?

00:19:25.880 | - So most important thing is to get started.

00:19:28.320 | - Right, of course.

00:19:29.160 | - And I think in the early parts of a career,

00:19:32.640 | coursework, like the deep learning specialization,

00:19:36.880 | is a very efficient way to master this material.

00:19:41.000 | So, because instructors, be it me or someone else,

00:19:46.000 | or Lawrence Moroney teaches our TensorFlow specialization,

00:19:48.960 | and other things we're working on,

00:19:50.520 | spend effort to try to make it time efficient

00:19:53.320 | for you to learn a new concept.

00:19:55.200 | So coursework is actually a very efficient way

00:19:58.200 | for people to learn concepts

00:19:59.880 | at the beginning parts of breaking into a new field.

00:20:02.560 | In fact, one thing I see at Stanford,

00:20:06.080 | some of my PhD students want to jump

00:20:07.880 | into research right away,

00:20:09.000 | and I actually tend to say,

00:20:10.120 | look, in your first couple of years as a PhD student,

00:20:12.600 | spend time taking courses,

00:20:14.320 | because it lays the foundation.

00:20:15.520 | It's fine if you're less productive

00:20:17.240 | in your first couple of years.

00:20:18.240 | You'll be better off in the long term.

00:20:20.960 | Beyond a certain point,

00:20:22.120 | there's materials that doesn't exist in courses,

00:20:24.960 | because it's too cutting edge,

00:20:26.080 | the course hasn't been created yet,

00:20:27.360 | there's some practical experience

00:20:28.600 | that we're not yet that good at teaching in a course.

00:20:31.800 | And I think after exhausting the efficient coursework,

00:20:34.920 | then most people need to go on

00:20:37.560 | to either ideally work on projects,

00:20:41.760 | and then maybe also continue their learning

00:20:44.360 | by reading blog posts and research papers

00:20:46.840 | and things like that.

00:20:48.280 | Doing projects is really important.

00:20:50.160 | And again, I think it's important

00:20:52.880 | to start small and just do something.

00:20:55.480 | Today you read about deep learning,

00:20:56.640 | if you say, oh, all these people

00:20:57.560 | are doing such exciting things,

00:20:58.920 | what if I'm not building a neural network

00:21:00.720 | that changes the world,

00:21:01.560 | then what's the point?

00:21:02.400 | Well, the point is sometimes building

00:21:04.200 | that tiny neural network,

00:21:05.880 | be it MNIST or upgrade to a fashion MNIST,

00:21:09.000 | to whatever, doing your own fun hobby project.

00:21:12.480 | That's how you gain the skills

00:21:13.800 | to let you do bigger and bigger projects.

00:21:16.040 | I find this to be true at the individual level

00:21:18.360 | and also at the organizational level.

00:21:20.880 | For a company to become good at machine learning,

00:21:22.640 | sometimes the right thing to do

00:21:23.960 | is not to tackle the giant project,

00:21:26.960 | is instead to do the small project

00:21:29.040 | that lets the organization learn

00:21:31.120 | and then build up from there.

00:21:32.400 | But this is true both for individuals

00:21:33.840 | and for companies.

00:21:37.040 | - Taking the first step

00:21:38.520 | and then taking small steps is the key.

00:21:42.360 | Should students pursue a PhD, do you think?

00:21:44.960 | You can do so much.

00:21:46.400 | That's one of the fascinating things

00:21:47.800 | in machine learning.

00:21:48.640 | You can have so much impact

00:21:49.800 | without ever getting a PhD.

00:21:51.960 | So what are your thoughts?

00:21:53.520 | Should people go to grad school?

00:21:54.880 | Should people get a PhD?

00:21:56.920 | - I think that there are multiple good options

00:21:59.200 | of which doing a PhD could be one of them.

00:22:02.520 | I think that if someone's admitted

00:22:04.400 | to a top PhD program,

00:22:06.480 | at MIT, Stanford, top schools,

00:22:09.440 | I think that's a very good experience.

00:22:12.840 | Or if someone gets a job at a top organization,

00:22:16.320 | at a top AI team,

00:22:17.680 | I think that's also a very good experience.

00:22:20.360 | There are some things you still need a PhD to do.

00:22:23.120 | If someone's aspiration is to be a professor

00:22:25.120 | at the top academic university,

00:22:26.320 | you just need a PhD to do that.

00:22:28.280 | But if it goes to start a company,

00:22:30.520 | build a company, do great technical work,

00:22:32.560 | I think a PhD is a good experience.

00:22:34.880 | But I would look at the different options

00:22:37.480 | available to someone.

00:22:38.680 | Where are the places where you can get a job?

00:22:40.240 | Where are the places where you can get in a PhD program?

00:22:42.200 | And kind of weigh the pros and cons of those.

00:22:44.800 | - So just to linger on that for a little bit longer,

00:22:47.200 | what final dreams and goals

00:22:48.960 | do you think people should have?

00:22:50.160 | So what options should they explore?

00:22:54.560 | So you can work in industry.

00:22:56.960 | So for a large company,

00:22:59.240 | like Google, Facebook, Baidu,

00:23:00.720 | all these large companies

00:23:03.280 | that already have huge teams of machine learning engineers.

00:23:06.440 | You can also do within industry,

00:23:08.160 | sort of more research groups

00:23:09.480 | that kind of like Google Research, Google Brain.

00:23:12.360 | Then you can also do, like we said,

00:23:14.520 | a professor in academia.

00:23:17.520 | And what else?

00:23:19.080 | Oh, you can build your own company.

00:23:21.080 | You can do a startup.

00:23:22.280 | Is there anything that stands out between those options

00:23:25.720 | or are they all beautiful different journeys

00:23:28.000 | that people should consider?

00:23:29.880 | - I think the thing that affects your experience more

00:23:31.960 | is less are you in this company versus that company

00:23:35.280 | or academia versus industry.

00:23:37.280 | I think the thing that affects your experience most

00:23:38.800 | is who are the people you're interacting with

00:23:40.880 | in a daily basis.

00:23:42.640 | So even if you look at some of the large companies,

00:23:46.680 | the experience of individuals in different teams

00:23:49.040 | is very different.

00:23:50.200 | And what matters most is not the logo above the door

00:23:53.360 | when you walk into the giant building every day.

00:23:55.600 | What matters the most is who are the 10 people,

00:23:57.760 | the 30 people you interact with every day.

00:24:00.320 | So I actually tend to advise people,

00:24:02.040 | if you get a job from a company,

00:24:04.600 | ask who is your manager, who are your peers,

00:24:07.360 | who are you actually going to talk to?

00:24:08.520 | We're all social creatures.

00:24:09.600 | We tend to become more like the people around us.

00:24:12.520 | And if you're working with great people,

00:24:14.640 | you will learn faster.

00:24:16.240 | Or if you get admitted,

00:24:17.680 | if you get a job at a great company or a great university,

00:24:21.280 | maybe the logo you walk in is great,

00:24:23.880 | but you're actually stuck on some team

00:24:25.360 | doing really work that doesn't excite you.

00:24:28.320 | And then that's actually a really bad experience.

00:24:30.840 | So this is true both for universities

00:24:33.400 | and for large companies.

00:24:35.160 | For small companies, you can kind of figure out

00:24:36.880 | who you'll be working with quite quickly.

00:24:39.000 | And I tend to advise people,

00:24:40.880 | if a company refuses to tell you who you will work with,

00:24:43.800 | someone will say, "Oh, join us.

00:24:44.840 | "There's a rotation system.

00:24:45.680 | "We'll figure it out."

00:24:46.680 | I think that that's a worrying answer

00:24:48.720 | because it means you may not get sent to,

00:24:53.160 | you may not actually get to a team

00:24:55.360 | with great peers and great people to work with.

00:24:57.920 | - It's actually a really profound advice

00:24:59.680 | that we kind of sometimes sweep.

00:25:02.160 | We don't consider too rigorously or carefully.

00:25:05.720 | The people around you are really often,

00:25:08.520 | especially when you accomplish great things,

00:25:10.160 | it seems the great things are accomplished

00:25:11.720 | because of the people around you.

00:25:13.840 | So it's not about whether you learn this thing or that thing,

00:25:18.840 | or like you said, the logo that hangs up top,

00:25:22.160 | it's the people.

00:25:23.000 | That's a fascinating,

00:25:24.560 | and it's such a hard search process

00:25:26.480 | of finding, just like finding the right friends

00:25:31.240 | and somebody to get married with and that kind of thing.

00:25:34.560 | It's a very hard search, it's a people search problem.

00:25:38.000 | - Yeah, I think when someone interviews,

00:25:40.760 | at a university or the research lab

00:25:42.400 | or the large corporation,

00:25:44.000 | it's good to insist on just asking,

00:25:46.440 | "Who are the people?

00:25:47.280 | "Who is my manager?"

00:25:48.440 | And if you refuse to tell me,

00:25:49.640 | I'm gonna think,

00:25:50.880 | "Well, maybe that's 'cause you don't have a good answer."

00:25:52.760 | "It may not be someone I like."

00:25:54.480 | - And if you don't particularly connect,

00:25:56.640 | if something feels off with the people,

00:25:59.560 | then don't stick to it.

00:26:03.480 | That's a really important signal to consider.

00:26:05.800 | - Yeah, yeah.

00:26:06.720 | And actually, in my Stanford class, CS230,

00:26:10.560 | as well as an ACM talk,

00:26:11.760 | I think I gave like a hour-long talk on career advice,

00:26:15.440 | including on the job search process and some of these.

00:26:18.200 | So you can find those videos online.

00:26:20.440 | - Awesome, and I'll point them.

00:26:22.280 | I'll point people to them.

00:26:23.720 | Beautiful.

00:26:25.320 | (silence)

00:26:27.480 | (silence)

00:26:29.640 | (silence)

00:26:31.800 | (silence)

00:26:33.960 | (silence)

00:26:36.120 | (silence)

00:26:38.280 | (silence)

00:26:40.440 | [BLANK_AUDIO]

Andrew Ng: Advice on Getting Started in Deep Learning | AI Podcast Clips

Chapters