back to index

Andrew Ng: Advice on Getting Started in Deep Learning | AI Podcast Clips


Chapters

0:0
1:17 How Does One Get Started in Deep Learning
2:0 Prerequisites for Somebody To Take the Deep Learning Specialization
6:54 Concepts in Deep Learning Do You Think Students Struggle the Most with
7:12 Challenges of Deep Learning
12:45 How Long Does It Take To Complete the Course
19:8 How Does One Make a Career out of an Interest in Deep Learning
21:41 Should Students Pursue a Phd

Whisper Transcript | Transcript Only Page

00:00:00.000 | So let's perhaps talk about each of these areas.
00:00:04.640 | First, deeplearning.ai.
00:00:06.880 | How, the basic question,
00:00:10.080 | how does a person interested in deep learning
00:00:12.440 | get started in the field?
00:00:14.680 | - Deeplearning.ai is working to create courses
00:00:18.000 | to help people break into AI.
00:00:19.800 | So my machine learning course
00:00:23.120 | that I taught through Stanford
00:00:24.560 | is one of the most popular courses on Coursera.
00:00:27.800 | To this day, it's probably one of the courses,
00:00:30.840 | sort of, if I ask somebody,
00:00:32.160 | how did you get into machine learning
00:00:34.640 | or how did you fall in love with machine learning
00:00:36.480 | or what gets you interested,
00:00:37.960 | it always goes back to Andrew Yang at some point.
00:00:41.480 | The amount of people you've influenced is ridiculous.
00:00:45.560 | So for that, I'm sure I speak for a lot of people
00:00:48.120 | say big thank you.
00:00:49.440 | - No, yeah, thank you.
00:00:50.480 | You know, I was once reading a news article,
00:00:54.360 | I think it was Tech Review
00:00:57.440 | and I'm gonna mess up the statistic,
00:00:59.960 | but I remember reading an article that said
00:01:02.440 | something like one third of our programmers are self-taught.
00:01:06.000 | I may have the number one third wrong,
00:01:07.360 | it was two thirds.
00:01:08.200 | But when I read that article, I thought,
00:01:09.560 | this doesn't make sense.
00:01:10.440 | Everyone is self-taught.
00:01:12.120 | 'Cause you teach yourself, I don't teach people.
00:01:14.720 | I just- - That's well put.
00:01:16.800 | So yeah, so how does one get started in deep learning
00:01:20.360 | and where does deeplearning.ai fit into that?
00:01:22.920 | - So the deep learning specialization
00:01:24.720 | offered by deep learning.ai is,
00:01:26.480 | I think it was Coursera's top specialization.
00:01:32.320 | It might still be.
00:01:33.160 | So it's a very popular way for people
00:01:35.240 | to take that specialization,
00:01:36.800 | to learn about everything from neural networks
00:01:40.120 | to how to tune in your network.
00:01:42.440 | So what does a conf net do?
00:01:43.920 | What is a RNN or a sequence model
00:01:46.440 | or what is an attention model?
00:01:48.120 | And so the deep learning specialization
00:01:50.440 | steps everyone through those algorithms.
00:01:53.280 | So you deeply understand it and can implement it
00:01:55.520 | and use it for whatever applications.
00:01:57.560 | - From the very beginning.
00:01:58.800 | So what would you say are the prerequisites
00:02:01.840 | for somebody to take the deep learning specialization
00:02:04.440 | in terms of maybe math or programming background?
00:02:07.960 | - Yeah, need to understand basic programming
00:02:10.360 | since there are programming exercises in Python.
00:02:12.760 | And the math prereq is quite basic.
00:02:16.680 | So no calculus is needed.
00:02:18.280 | If you know calculus, it's great.
00:02:19.440 | You get better intuitions.
00:02:21.000 | But deliberately try to teach that specialization
00:02:23.600 | without requiring calculus.
00:02:25.000 | So I think high school math would be sufficient.
00:02:29.600 | If you know how to multiply two matrices,
00:02:31.360 | I think that's great.
00:02:34.560 | - So a little basic linear algebra is great.
00:02:37.120 | - Basically in the algebra,
00:02:38.280 | even very, very basically in the algebra
00:02:40.600 | and some programming.
00:02:42.440 | I think that people that have done
00:02:43.560 | the machine learning course
00:02:44.520 | will find a deep learning specialization a bit easier.
00:02:47.440 | But it's also possible to jump
00:02:48.760 | into the deep learning specialization directly,
00:02:50.680 | but it'll be a little bit harder
00:02:52.280 | since we tend to go over faster concepts
00:02:56.840 | like how does gradient descent work
00:02:58.520 | and what is the objective function,
00:02:59.760 | which is covered more slowly
00:03:01.280 | in the machine learning course.
00:03:02.560 | - Could you briefly mention some of the key concepts
00:03:05.200 | in deep learning that students should learn
00:03:07.360 | that you envision them learning
00:03:08.680 | in the first few months, in the first year or so?
00:03:11.640 | - So if you take the deep learning specialization,
00:03:14.200 | you learn the foundations of what is a neural network?
00:03:17.200 | How do you build up a neural network
00:03:19.200 | from a single logistic unit,
00:03:21.120 | to a stack of layers,
00:03:22.960 | to different activation functions?
00:03:25.480 | You learn how to train the neural networks.
00:03:27.200 | One thing I'm very proud of in that specialization
00:03:30.120 | is we go through a lot of practical know-how
00:03:32.640 | of how to actually make these things work.
00:03:34.600 | So what are the differences
00:03:35.680 | between different optimization algorithms?
00:03:38.120 | What do you do if the algorithm overfits?
00:03:39.640 | So how do you tell if the algorithm is overfitting?
00:03:41.400 | When do you collect more data?
00:03:42.600 | When should you not bother to collect more data?
00:03:45.600 | I find that even today, unfortunately,
00:03:48.560 | there are engineers that will spend six months
00:03:52.360 | trying to pursue a particular direction,
00:03:54.960 | such as collect more data,
00:03:56.280 | because we heard more data is valuable.
00:03:58.240 | But sometimes you could run some tests
00:04:00.680 | and could have figured out six months earlier
00:04:02.800 | that for this particular problem,
00:04:04.360 | collecting more data isn't going to cut it.
00:04:06.280 | So just don't spend six months collecting more data.
00:04:08.600 | Spend your time modifying the architecture
00:04:11.680 | or trying something else.
00:04:12.600 | So go through a lot of the practical know-how
00:04:14.960 | so that when someone,
00:04:17.760 | when you take the deep learning specialization,
00:04:19.600 | you have those skills to be very efficient
00:04:22.120 | in how you build these networks.
00:04:24.320 | - So dive right in to play with the network,
00:04:26.640 | to train it, to do the inference on a particular dataset,
00:04:29.600 | to build the intuition about it
00:04:30.880 | without building it up too big
00:04:34.520 | to where you spend, like you said, six months learning,
00:04:38.000 | building up your big project
00:04:39.800 | without building any intuition of a small,
00:04:42.520 | a small aspect of the data that could already tell you
00:04:45.800 | everything you need to know about that data.
00:04:47.960 | - Yes, and also the systematic frameworks of thinking
00:04:51.640 | for how to go about building practical machine learning.
00:04:54.720 | Maybe to make an analogy,
00:04:56.600 | when we learn to code,
00:04:57.720 | we have to learn the syntax of some programming language,
00:05:00.160 | right, be it Python or C++ or Octave or whatever.
00:05:03.840 | But the equally important
00:05:05.280 | or maybe even more important part of coding
00:05:07.280 | is to understand how to string together these lines of code
00:05:10.000 | into coherent things.
00:05:11.120 | So, you know, when should you put something
00:05:13.440 | in a function column?
00:05:14.280 | When should you not?
00:05:15.120 | How do you think about abstraction?
00:05:16.960 | So those frameworks are what makes a programmer efficient,
00:05:21.400 | even more than understanding the syntax.
00:05:24.000 | I remember when I was an undergrad at Carnegie Mellon,
00:05:27.200 | one of my friends would debug their code
00:05:29.880 | by first trying to compile it,
00:05:31.600 | and then it was C++ code.
00:05:33.240 | And then every line that has syntax error,
00:05:35.680 | they want to get rid of the syntax errors
00:05:37.040 | as quickly as possible.
00:05:38.040 | So how do you do that?
00:05:38.920 | Well, they would delete every single line of code
00:05:40.680 | with a syntax error.
00:05:42.000 | So really efficient for getting rid of syntax errors
00:05:44.040 | for horrible debugging errors.
00:05:45.280 | So I think, so we learn how to debug.
00:05:47.840 | And I think in machine learning,
00:05:49.320 | the way you debug a machine learning program
00:05:51.720 | is very different than the way you, you know,
00:05:53.720 | like do binary search or whatever,
00:05:55.440 | or use a debugger, like trace through the code
00:05:57.440 | in traditional software engineering.
00:05:59.320 | So it's an evolving discipline,
00:06:01.320 | but I find that the people that are really good
00:06:03.120 | at debugging machine learning algorithms
00:06:05.200 | are easily 10X, maybe 100X faster
00:06:08.440 | at getting something to work.
00:06:10.800 | - And the basic process of debugging is,
00:06:12.760 | so the bug in this case,
00:06:14.960 | why isn't this thing learning, improving,
00:06:18.760 | sort of going into the questions of overfitting
00:06:21.680 | and all those kinds of things.
00:06:23.080 | That's the logical space that the debugging is happening in
00:06:27.680 | with neural networks.
00:06:28.880 | - Yeah, often the question is, why doesn't it work yet?
00:06:32.760 | Or can I expect this to eventually work?
00:06:35.400 | And what are the things I could try?
00:06:37.200 | Change the architecture, more data, more regularization,
00:06:39.880 | different optimization algorithm, you know,
00:06:42.600 | different types of data.
00:06:44.320 | So to answer those questions systematically,
00:06:46.560 | so that you don't heading down the,
00:06:48.160 | so you don't spend six months heading down the blind alley
00:06:50.400 | before someone comes and says,
00:06:52.120 | why did you spend six months doing this?
00:06:54.480 | - What concepts in deep learning
00:06:56.360 | do you think students struggle the most with?
00:06:58.840 | Or sort of is the biggest challenge for them
00:07:01.400 | once they get over that hill?
00:07:04.000 | It's, it hooks them and it inspires them
00:07:07.800 | and they really get it.
00:07:10.160 | - Similar to learning mathematics,
00:07:12.600 | I think one of the challenges of deep learning
00:07:14.760 | is that there are a lot of concepts
00:07:16.400 | that build on top of each other.
00:07:18.000 | If you ask me what's hard about mathematics,
00:07:21.160 | I have a hard time pinpointing one thing.
00:07:23.320 | Is it addition, subtraction?
00:07:24.680 | Is it a carry?
00:07:25.520 | Is it multiplication?
00:07:26.800 | There's just a lot of stuff.
00:07:28.200 | I think one of the challenges of learning math
00:07:30.200 | and of learning certain technical fields
00:07:32.200 | is that there are a lot of concepts.
00:07:33.920 | And if you miss a concept,
00:07:35.400 | then you're kind of missing the prerequisite
00:07:37.840 | for something that comes later.
00:07:40.360 | So in the deep learning specialization,
00:07:44.200 | try to break down the concepts
00:07:45.880 | to maximize the odds of each component being understandable.
00:07:49.320 | So when you move on to the more advanced thing,
00:07:51.640 | we learn confinates,
00:07:53.160 | hopefully you have enough intuitions
00:07:54.680 | from the earlier sections
00:07:56.320 | to then understand why we structure confinates
00:07:59.160 | in a certain way.
00:08:00.960 | And then eventually why we built RNNs on LSTMs
00:08:05.400 | or attention model in a certain way,
00:08:07.120 | building on top of the earlier concepts.
00:08:10.040 | - Actually, I'm curious,
00:08:11.000 | you do a lot of teaching as well.
00:08:13.320 | Do you have a favorite,
00:08:15.520 | this is the hard concept moment in your teaching?
00:08:18.680 | - Well, I don't think anyone's ever turned
00:08:23.520 | the interview on me.
00:08:24.680 | - I'm glad to be first.
00:08:27.120 | - I think that's a really good question.
00:08:31.320 | Yeah, it's really hard to capture the moment
00:08:33.640 | when they struggle.
00:08:34.480 | I think you put it really eloquently.
00:08:35.720 | I do think there's moments
00:08:37.520 | that are like aha moments that really inspire people.
00:08:41.800 | I think for some reason, reinforcement learning,
00:08:45.720 | especially deep reinforcement learning
00:08:47.960 | is a really great way to really inspire people
00:08:51.960 | and get what the use of neural networks can do.
00:08:55.920 | Even though neural networks really are just a part
00:08:59.000 | of the deep RL framework,
00:09:00.960 | but it's a really nice way to paint the entirety
00:09:03.720 | of the picture of a neural network
00:09:06.360 | being able to learn from scratch,
00:09:08.320 | knowing nothing and explore the world and pick up lessons.
00:09:11.480 | I find that a lot of the aha moments happen
00:09:14.360 | when you use deep RL to teach people
00:09:17.840 | about neural networks, which is counterintuitive.
00:09:20.240 | I find like a lot of the inspired sort of fire
00:09:23.120 | in people's passion, people's eyes,
00:09:24.680 | it comes from the RL world.
00:09:27.200 | Do you find reinforcement learning
00:09:29.400 | to be a useful part of the teaching process or no?
00:09:33.000 | - I still teach reinforcement learning
00:09:35.840 | in one of my Stanford classes
00:09:37.960 | and my PhD thesis was on reinforcement learning.
00:09:39.880 | So I clearly love the field.
00:09:41.720 | I find that if I'm trying to teach students
00:09:43.880 | the most useful techniques for them to use today,
00:09:46.960 | I end up shrinking the amount of time
00:09:49.480 | I talk about reinforcement learning.
00:09:51.120 | It's not what's working today.
00:09:53.200 | Now our world changes so fast.
00:09:54.720 | Maybe it'll be totally different in a couple of years,
00:09:57.440 | but I think we need a couple more things
00:10:00.160 | for reinforcement learning to get there.
00:10:02.000 | - To actually get there, yeah.
00:10:02.920 | - One of my teams is looking to reinforcement learning
00:10:05.040 | for some robotic control tasks.
00:10:06.240 | So I see the applications,
00:10:07.600 | but if you look at it as a percentage
00:10:10.040 | of all of the impact of the types of things we do,
00:10:12.480 | is at least today, outside of playing video games
00:10:17.480 | in a few of the games, the scope.
00:10:20.800 | Actually at NeurIPS, a bunch of us were standing around
00:10:23.280 | saying, "Hey, what's your best example
00:10:25.200 | "of an actual deploy reinforcement learning application?"
00:10:27.640 | And among senior machine learning researchers.
00:10:31.400 | And again, there are some emerging ones,
00:10:33.800 | but there are not that many great examples.
00:10:37.640 | - I think you're absolutely right.
00:10:40.560 | The sad thing is there hasn't been a big application,
00:10:44.280 | impactful real-world application reinforcement learning.
00:10:47.200 | I think its biggest impact to me has been in the toy domain,
00:10:51.720 | in the game domain, in the small example.
00:10:53.680 | That's what I mean for educational purpose,
00:10:55.920 | it seems to be a fun thing to explore neural networks with.
00:10:59.160 | But I think from your perspective,
00:11:01.440 | and I think that might be the best perspective,
00:11:04.280 | is if you're trying to educate with a simple example
00:11:07.120 | in order to illustrate how this can actually be grown
00:11:10.720 | to scale and have a real world impact,
00:11:14.040 | then perhaps focusing on the fundamentals
00:11:16.080 | of supervised learning in the context of a simple dataset,
00:11:21.080 | even like an MNIST dataset is the right way,
00:11:24.200 | is the right path to take.
00:11:26.280 | I just, the amount of fun I've seen people have
00:11:29.240 | with reinforcement learning has been great,
00:11:30.840 | but not in the applied impact on the real world setting.
00:11:35.200 | So it's a trade-off, how much impact you want to have
00:11:37.800 | versus how much fun you want to have.
00:11:39.720 | - Yeah, that's really cool.
00:11:40.560 | And I feel like the world actually needs all sorts,
00:11:43.640 | even within machine learning,
00:11:44.960 | I feel like deep learning is so exciting,
00:11:48.200 | but the AI team shouldn't just use deep learning.
00:11:50.800 | I find that my teams use a portfolio of tools,
00:11:54.080 | and maybe that's not the exciting thing to say,
00:11:55.840 | but some days we use a neural net,
00:11:58.040 | some days we use a PCA,
00:12:02.400 | actually the other day I was sitting down with my team
00:12:03.920 | looking at PC residuals,
00:12:05.160 | trying to figure out what's going on
00:12:06.240 | with PC applied to a manufacturing problem.
00:12:08.160 | And some days we use a probabilistic graphical model,
00:12:10.640 | some days we use a knowledge draft,
00:12:12.240 | which is one of the things
00:12:13.080 | that has tremendous industry impact,
00:12:15.440 | but the amount of chatter about knowledge drafts
00:12:18.080 | in academia is really thin
00:12:19.720 | compared to the actual real world impact.
00:12:22.040 | So I think reinforcement learning
00:12:23.840 | should be in that portfolio,
00:12:25.000 | and then it's about balancing
00:12:26.120 | how much we teach all of these things.
00:12:27.680 | And the world should have diverse skills,
00:12:30.240 | it'd be sad if everyone just learned one narrow thing.
00:12:33.880 | - Yeah, the diverse skill
00:12:34.800 | help you discover the right tool for the job.
00:12:37.400 | So if we could return to maybe talk quickly
00:12:41.160 | about the specifics of deeplearning.ai,
00:12:43.880 | the deep learning specialization, perhaps.
00:12:46.680 | How long does it take to complete the course,
00:12:48.480 | would you say?
00:12:49.880 | - The official length of the deep learning specialization
00:12:52.480 | is I think 16 weeks, so about four months,
00:12:56.160 | but it's go at your own pace.
00:12:57.840 | So if you subscribe to the deep learning specialization,
00:13:00.800 | there are people that finish it in less than a month
00:13:02.920 | by working more intensely and studying more intensely.
00:13:05.200 | So it really depends on the individual.
00:13:07.840 | When we created the deep learning specialization,
00:13:10.600 | we wanted to make it very accessible and very affordable.
00:13:15.200 | And with Coursera and deeplearning.ai's education mission,
00:13:18.920 | one of the things that's really important to me
00:13:20.600 | is that if there's someone for whom paying anything
00:13:24.280 | is a financial hardship,
00:13:26.600 | then just apply for financial aid and get it for free.
00:13:29.880 | - If you were to recommend a daily schedule for people
00:13:35.240 | in learning, whether it's through the deeplearning.ai
00:13:37.800 | specialization or just learning
00:13:39.920 | in the world of deep learning, what would you recommend?
00:13:44.040 | How do they go about day-to-day sort of specific advice
00:13:47.400 | about learning, about their journey
00:13:49.800 | in the world of deep learning, machine learning?
00:13:52.000 | - I think getting the habit of learning is key,
00:13:56.280 | and that means regularity.
00:13:58.360 | So for example, we send out our weekly newsletter,
00:14:03.800 | The Batch, every Wednesday.
00:14:05.240 | So people know it's coming Wednesday,
00:14:06.800 | you can spend a little bit of time on Wednesday
00:14:08.800 | catching up on the latest news through The Batch
00:14:11.640 | on Wednesday.
00:14:14.640 | And for myself, I've picked up a habit
00:14:17.160 | of spending some time every Saturday and every Sunday
00:14:20.600 | reading or studying.
00:14:21.800 | And so I don't wake up on a Saturday
00:14:23.840 | and have to make a decision.
00:14:24.920 | Do I feel like reading or studying today or not?
00:14:27.360 | It's just what I do.
00:14:28.880 | And the fact is a habit makes it easier.
00:14:31.440 | So I think if someone can get into that habit,
00:14:34.720 | it's like, you know,
00:14:36.040 | just like we brush our teeth every morning.
00:14:38.320 | I don't think about it.
00:14:39.280 | If I thought about it, it's a little bit annoying
00:14:40.760 | to have to spend two minutes doing that,
00:14:43.200 | but it's a habit that it takes no cognitive load,
00:14:46.400 | but this would be so much harder
00:14:47.640 | if we have to make a decision every morning.
00:14:50.320 | So, and actually that's the reason why
00:14:52.120 | I wear the same thing every day as well.
00:14:53.280 | It's just one less decision.
00:14:54.440 | I just get up and wear my blue shirt.
00:14:56.800 | So, but I think if you can get that habit,
00:14:58.440 | that consistency of studying,
00:15:00.120 | then it actually feels easier.
00:15:02.880 | - So yeah, it's kind of amazing.
00:15:05.560 | In my own life, like I play guitar every day for,
00:15:09.920 | I force myself to at least for five minutes play guitar.
00:15:12.760 | It's a ridiculously short period of time,
00:15:15.400 | but because I've gotten into that habit,
00:15:17.320 | it's incredible what you can accomplish
00:15:19.000 | in a period of a year or two years.
00:15:21.680 | You can become, you know,
00:15:24.560 | exceptionally good at certain aspects of a thing
00:15:26.960 | by just doing it every day for a very short period of time.
00:15:29.280 | It's kind of a miracle that that's how it works.
00:15:31.880 | It adds up over time.
00:15:33.440 | - Yeah, and I think it's often not about
00:15:36.200 | the burst of sustained efforts and the all-nighters,
00:15:39.120 | because you can only do that a limited number of times.
00:15:41.480 | It's the sustained effort over a long time.
00:15:44.480 | I think, you know, reading two research papers
00:15:47.640 | is a nice thing to do,
00:15:49.200 | but the power is not reading two research papers.
00:15:51.480 | It's reading two research papers a week for a year.
00:15:54.760 | Then you've read a hundred papers
00:15:56.120 | and you actually learn a lot when you read a hundred papers.
00:15:59.280 | - So regularity and making learning a habit.
00:16:03.680 | Do you have general other study tips
00:16:07.440 | for particularly deep learning that people should,
00:16:11.240 | in their process of learning,
00:16:12.840 | is there some kind of recommendations
00:16:14.480 | or tips you have as they learn?
00:16:17.600 | - One thing I still do
00:16:19.400 | when I'm trying to study something really deeply
00:16:21.120 | is take handwritten notes.
00:16:23.640 | It varies.
00:16:24.480 | I know there are a lot of people
00:16:25.480 | that take the deep learning courses
00:16:27.160 | during a commute or something
00:16:29.800 | where it may be more awkward to take notes.
00:16:31.600 | So I know it may not work for everyone,
00:16:34.560 | but when I'm taking courses on Coursera,
00:16:37.280 | you know, and I still take some every now and then,
00:16:39.520 | the most recent one I took was a course on clinical trials,
00:16:42.200 | because I was interested about that.
00:16:43.480 | I got out my little Moleskine notebook
00:16:45.720 | and I was sitting at my desk.
00:16:46.720 | I was just taking down notes
00:16:48.240 | of what the instructor was saying.
00:16:49.400 | And that act, we know that that act of taking notes,
00:16:52.640 | preferably handwritten notes, increases retention.
00:16:56.560 | - So as you're sort of watching the video,
00:16:59.560 | just kind of pausing maybe,
00:17:01.040 | and then taking the basic insights down on paper?
00:17:05.040 | - Yeah, so there've been a few studies.
00:17:07.240 | If you search online, you find some of these studies
00:17:09.920 | that taking handwritten notes,
00:17:12.360 | because handwriting is slower, as we're saying just now,
00:17:16.240 | it causes you to recode the knowledge in your own words more
00:17:20.360 | and that process of recoding promotes long-term retention.
00:17:23.960 | This is as opposed to typing, which is fine.
00:17:26.160 | Again, typing is better than nothing,
00:17:27.960 | or in taking a class and not taking notes
00:17:29.680 | is better than not taking any class at all.
00:17:31.560 | But comparing handwritten notes and typing,
00:17:35.240 | you can usually type faster.
00:17:36.760 | For a lot of people, they can handwrite notes.
00:17:38.640 | And so when people type,
00:17:40.160 | they're more likely to just transcribe verbatim
00:17:42.600 | what they heard,
00:17:43.480 | and that reduces the amount of recoding,
00:17:46.280 | and that actually results in less long-term retention.
00:17:49.640 | - I don't know what the psychological effect there is,
00:17:51.520 | but it's so true.
00:17:52.520 | There's something fundamentally different
00:17:54.080 | about writing, handwriting.
00:17:56.760 | I wonder what that is.
00:17:57.600 | I wonder if it is as simple
00:17:58.800 | as just the time it takes to write is slower.
00:18:01.560 | - Yeah, and because you can't write as many words,
00:18:05.320 | you have to take whatever they said
00:18:07.440 | and summarize it into fewer words.
00:18:09.160 | And that summarization process
00:18:10.640 | requires deeper processing of the meaning,
00:18:13.080 | which then results in better retention.
00:18:15.080 | - That's fascinating.
00:18:16.120 | - Oh, and I've spent,
00:18:18.160 | I think because of Coursera,
00:18:19.640 | I've spent so much time studying pedagogy.
00:18:21.360 | It's actually one of my passions.
00:18:22.480 | I really love learning how to more efficiently
00:18:25.240 | help others learn.
00:18:26.400 | Yeah, one of the things I do both when creating videos
00:18:30.800 | or when we write the batch is,
00:18:32.520 | I try to think, is one minute spent with us
00:18:36.400 | going to be a more efficient learning experience
00:18:39.200 | than one minute spent anywhere else?
00:18:41.120 | And we really try to make it time efficient
00:18:44.920 | for the learners, 'cause everyone's busy.
00:18:47.360 | So when we're editing, I often tell my teams,
00:18:50.680 | every word needs to fight for its life.
00:18:52.480 | And if you can delete a word,
00:18:53.400 | let's just delete it and not wait.
00:18:54.880 | Let's not waste the learners' time.
00:18:57.160 | - Oh, it's so amazing that you think that way,
00:18:59.360 | 'cause there is millions of people
00:19:00.800 | that are impacted by your teaching.
00:19:01.960 | And sort of that one minute spent
00:19:03.920 | has a ripple effect, right?
00:19:05.520 | Through years of time,
00:19:06.800 | which is just fascinating to think about.
00:19:09.760 | How does one make a career
00:19:11.440 | out of an interest in deep learning?
00:19:13.120 | Do you have advice for people?
00:19:15.880 | We just talked about sort of the beginning, early steps,
00:19:18.560 | but if you want to make it an entire life's journey,
00:19:21.520 | or at least a journey of a decade or two,
00:19:23.960 | how do you do it?
00:19:25.880 | - So most important thing is to get started.
00:19:28.320 | - Right, of course.
00:19:29.160 | - And I think in the early parts of a career,
00:19:32.640 | coursework, like the deep learning specialization,
00:19:36.880 | is a very efficient way to master this material.
00:19:41.000 | So, because instructors, be it me or someone else,
00:19:46.000 | or Lawrence Moroney teaches our TensorFlow specialization,
00:19:48.960 | and other things we're working on,
00:19:50.520 | spend effort to try to make it time efficient
00:19:53.320 | for you to learn a new concept.
00:19:55.200 | So coursework is actually a very efficient way
00:19:58.200 | for people to learn concepts
00:19:59.880 | at the beginning parts of breaking into a new field.
00:20:02.560 | In fact, one thing I see at Stanford,
00:20:06.080 | some of my PhD students want to jump
00:20:07.880 | into research right away,
00:20:09.000 | and I actually tend to say,
00:20:10.120 | look, in your first couple of years as a PhD student,
00:20:12.600 | spend time taking courses,
00:20:14.320 | because it lays the foundation.
00:20:15.520 | It's fine if you're less productive
00:20:17.240 | in your first couple of years.
00:20:18.240 | You'll be better off in the long term.
00:20:20.960 | Beyond a certain point,
00:20:22.120 | there's materials that doesn't exist in courses,
00:20:24.960 | because it's too cutting edge,
00:20:26.080 | the course hasn't been created yet,
00:20:27.360 | there's some practical experience
00:20:28.600 | that we're not yet that good at teaching in a course.
00:20:31.800 | And I think after exhausting the efficient coursework,
00:20:34.920 | then most people need to go on
00:20:37.560 | to either ideally work on projects,
00:20:41.760 | and then maybe also continue their learning
00:20:44.360 | by reading blog posts and research papers
00:20:46.840 | and things like that.
00:20:48.280 | Doing projects is really important.
00:20:50.160 | And again, I think it's important
00:20:52.880 | to start small and just do something.
00:20:55.480 | Today you read about deep learning,
00:20:56.640 | if you say, oh, all these people
00:20:57.560 | are doing such exciting things,
00:20:58.920 | what if I'm not building a neural network
00:21:00.720 | that changes the world,
00:21:01.560 | then what's the point?
00:21:02.400 | Well, the point is sometimes building
00:21:04.200 | that tiny neural network,
00:21:05.880 | be it MNIST or upgrade to a fashion MNIST,
00:21:09.000 | to whatever, doing your own fun hobby project.
00:21:12.480 | That's how you gain the skills
00:21:13.800 | to let you do bigger and bigger projects.
00:21:16.040 | I find this to be true at the individual level
00:21:18.360 | and also at the organizational level.
00:21:20.880 | For a company to become good at machine learning,
00:21:22.640 | sometimes the right thing to do
00:21:23.960 | is not to tackle the giant project,
00:21:26.960 | is instead to do the small project
00:21:29.040 | that lets the organization learn
00:21:31.120 | and then build up from there.
00:21:32.400 | But this is true both for individuals
00:21:33.840 | and for companies.
00:21:37.040 | - Taking the first step
00:21:38.520 | and then taking small steps is the key.
00:21:42.360 | Should students pursue a PhD, do you think?
00:21:44.960 | You can do so much.
00:21:46.400 | That's one of the fascinating things
00:21:47.800 | in machine learning.
00:21:48.640 | You can have so much impact
00:21:49.800 | without ever getting a PhD.
00:21:51.960 | So what are your thoughts?
00:21:53.520 | Should people go to grad school?
00:21:54.880 | Should people get a PhD?
00:21:56.920 | - I think that there are multiple good options
00:21:59.200 | of which doing a PhD could be one of them.
00:22:02.520 | I think that if someone's admitted
00:22:04.400 | to a top PhD program,
00:22:06.480 | at MIT, Stanford, top schools,
00:22:09.440 | I think that's a very good experience.
00:22:12.840 | Or if someone gets a job at a top organization,
00:22:16.320 | at a top AI team,
00:22:17.680 | I think that's also a very good experience.
00:22:20.360 | There are some things you still need a PhD to do.
00:22:23.120 | If someone's aspiration is to be a professor
00:22:25.120 | at the top academic university,
00:22:26.320 | you just need a PhD to do that.
00:22:28.280 | But if it goes to start a company,
00:22:30.520 | build a company, do great technical work,
00:22:32.560 | I think a PhD is a good experience.
00:22:34.880 | But I would look at the different options
00:22:37.480 | available to someone.
00:22:38.680 | Where are the places where you can get a job?
00:22:40.240 | Where are the places where you can get in a PhD program?
00:22:42.200 | And kind of weigh the pros and cons of those.
00:22:44.800 | - So just to linger on that for a little bit longer,
00:22:47.200 | what final dreams and goals
00:22:48.960 | do you think people should have?
00:22:50.160 | So what options should they explore?
00:22:54.560 | So you can work in industry.
00:22:56.960 | So for a large company,
00:22:59.240 | like Google, Facebook, Baidu,
00:23:00.720 | all these large companies
00:23:03.280 | that already have huge teams of machine learning engineers.
00:23:06.440 | You can also do within industry,
00:23:08.160 | sort of more research groups
00:23:09.480 | that kind of like Google Research, Google Brain.
00:23:12.360 | Then you can also do, like we said,
00:23:14.520 | a professor in academia.
00:23:17.520 | And what else?
00:23:19.080 | Oh, you can build your own company.
00:23:21.080 | You can do a startup.
00:23:22.280 | Is there anything that stands out between those options
00:23:25.720 | or are they all beautiful different journeys
00:23:28.000 | that people should consider?
00:23:29.880 | - I think the thing that affects your experience more
00:23:31.960 | is less are you in this company versus that company
00:23:35.280 | or academia versus industry.
00:23:37.280 | I think the thing that affects your experience most
00:23:38.800 | is who are the people you're interacting with
00:23:40.880 | in a daily basis.
00:23:42.640 | So even if you look at some of the large companies,
00:23:46.680 | the experience of individuals in different teams
00:23:49.040 | is very different.
00:23:50.200 | And what matters most is not the logo above the door
00:23:53.360 | when you walk into the giant building every day.
00:23:55.600 | What matters the most is who are the 10 people,
00:23:57.760 | the 30 people you interact with every day.
00:24:00.320 | So I actually tend to advise people,
00:24:02.040 | if you get a job from a company,
00:24:04.600 | ask who is your manager, who are your peers,
00:24:07.360 | who are you actually going to talk to?
00:24:08.520 | We're all social creatures.
00:24:09.600 | We tend to become more like the people around us.
00:24:12.520 | And if you're working with great people,
00:24:14.640 | you will learn faster.
00:24:16.240 | Or if you get admitted,
00:24:17.680 | if you get a job at a great company or a great university,
00:24:21.280 | maybe the logo you walk in is great,
00:24:23.880 | but you're actually stuck on some team
00:24:25.360 | doing really work that doesn't excite you.
00:24:28.320 | And then that's actually a really bad experience.
00:24:30.840 | So this is true both for universities
00:24:33.400 | and for large companies.
00:24:35.160 | For small companies, you can kind of figure out
00:24:36.880 | who you'll be working with quite quickly.
00:24:39.000 | And I tend to advise people,
00:24:40.880 | if a company refuses to tell you who you will work with,
00:24:43.800 | someone will say, "Oh, join us.
00:24:44.840 | "There's a rotation system.
00:24:45.680 | "We'll figure it out."
00:24:46.680 | I think that that's a worrying answer
00:24:48.720 | because it means you may not get sent to,
00:24:53.160 | you may not actually get to a team
00:24:55.360 | with great peers and great people to work with.
00:24:57.920 | - It's actually a really profound advice
00:24:59.680 | that we kind of sometimes sweep.
00:25:02.160 | We don't consider too rigorously or carefully.
00:25:05.720 | The people around you are really often,
00:25:08.520 | especially when you accomplish great things,
00:25:10.160 | it seems the great things are accomplished
00:25:11.720 | because of the people around you.
00:25:13.840 | So it's not about whether you learn this thing or that thing,
00:25:18.840 | or like you said, the logo that hangs up top,
00:25:22.160 | it's the people.
00:25:23.000 | That's a fascinating,
00:25:24.560 | and it's such a hard search process
00:25:26.480 | of finding, just like finding the right friends
00:25:31.240 | and somebody to get married with and that kind of thing.
00:25:34.560 | It's a very hard search, it's a people search problem.
00:25:38.000 | - Yeah, I think when someone interviews,
00:25:40.760 | at a university or the research lab
00:25:42.400 | or the large corporation,
00:25:44.000 | it's good to insist on just asking,
00:25:46.440 | "Who are the people?
00:25:47.280 | "Who is my manager?"
00:25:48.440 | And if you refuse to tell me,
00:25:49.640 | I'm gonna think,
00:25:50.880 | "Well, maybe that's 'cause you don't have a good answer."
00:25:52.760 | "It may not be someone I like."
00:25:54.480 | - And if you don't particularly connect,
00:25:56.640 | if something feels off with the people,
00:25:59.560 | then don't stick to it.
00:26:03.480 | That's a really important signal to consider.
00:26:05.800 | - Yeah, yeah.
00:26:06.720 | And actually, in my Stanford class, CS230,
00:26:10.560 | as well as an ACM talk,
00:26:11.760 | I think I gave like a hour-long talk on career advice,
00:26:15.440 | including on the job search process and some of these.
00:26:18.200 | So you can find those videos online.
00:26:20.440 | - Awesome, and I'll point them.
00:26:22.280 | I'll point people to them.
00:26:23.720 | Beautiful.
00:26:25.320 | (silence)
00:26:27.480 | (silence)
00:26:29.640 | (silence)
00:26:31.800 | (silence)
00:26:33.960 | (silence)
00:26:36.120 | (silence)
00:26:38.280 | (silence)
00:26:40.440 | [BLANK_AUDIO]