Back to Index

Andrew Ng: Advice on Getting Started in Deep Learning | AI Podcast Clips


Chapters

0:0
1:17 How Does One Get Started in Deep Learning
2:0 Prerequisites for Somebody To Take the Deep Learning Specialization
6:54 Concepts in Deep Learning Do You Think Students Struggle the Most with
7:12 Challenges of Deep Learning
12:45 How Long Does It Take To Complete the Course
19:8 How Does One Make a Career out of an Interest in Deep Learning
21:41 Should Students Pursue a Phd

Transcript

So let's perhaps talk about each of these areas. First, deeplearning.ai. How, the basic question, how does a person interested in deep learning get started in the field? - Deeplearning.ai is working to create courses to help people break into AI. So my machine learning course that I taught through Stanford is one of the most popular courses on Coursera.

To this day, it's probably one of the courses, sort of, if I ask somebody, how did you get into machine learning or how did you fall in love with machine learning or what gets you interested, it always goes back to Andrew Yang at some point. The amount of people you've influenced is ridiculous.

So for that, I'm sure I speak for a lot of people say big thank you. - No, yeah, thank you. You know, I was once reading a news article, I think it was Tech Review and I'm gonna mess up the statistic, but I remember reading an article that said something like one third of our programmers are self-taught.

I may have the number one third wrong, it was two thirds. But when I read that article, I thought, this doesn't make sense. Everyone is self-taught. 'Cause you teach yourself, I don't teach people. I just- - That's well put. So yeah, so how does one get started in deep learning and where does deeplearning.ai fit into that?

- So the deep learning specialization offered by deep learning.ai is, I think it was Coursera's top specialization. It might still be. So it's a very popular way for people to take that specialization, to learn about everything from neural networks to how to tune in your network. So what does a conf net do?

What is a RNN or a sequence model or what is an attention model? And so the deep learning specialization steps everyone through those algorithms. So you deeply understand it and can implement it and use it for whatever applications. - From the very beginning. So what would you say are the prerequisites for somebody to take the deep learning specialization in terms of maybe math or programming background?

- Yeah, need to understand basic programming since there are programming exercises in Python. And the math prereq is quite basic. So no calculus is needed. If you know calculus, it's great. You get better intuitions. But deliberately try to teach that specialization without requiring calculus. So I think high school math would be sufficient.

If you know how to multiply two matrices, I think that's great. - So a little basic linear algebra is great. - Basically in the algebra, even very, very basically in the algebra and some programming. I think that people that have done the machine learning course will find a deep learning specialization a bit easier.

But it's also possible to jump into the deep learning specialization directly, but it'll be a little bit harder since we tend to go over faster concepts like how does gradient descent work and what is the objective function, which is covered more slowly in the machine learning course. - Could you briefly mention some of the key concepts in deep learning that students should learn that you envision them learning in the first few months, in the first year or so?

- So if you take the deep learning specialization, you learn the foundations of what is a neural network? How do you build up a neural network from a single logistic unit, to a stack of layers, to different activation functions? You learn how to train the neural networks. One thing I'm very proud of in that specialization is we go through a lot of practical know-how of how to actually make these things work.

So what are the differences between different optimization algorithms? What do you do if the algorithm overfits? So how do you tell if the algorithm is overfitting? When do you collect more data? When should you not bother to collect more data? I find that even today, unfortunately, there are engineers that will spend six months trying to pursue a particular direction, such as collect more data, because we heard more data is valuable.

But sometimes you could run some tests and could have figured out six months earlier that for this particular problem, collecting more data isn't going to cut it. So just don't spend six months collecting more data. Spend your time modifying the architecture or trying something else. So go through a lot of the practical know-how so that when someone, when you take the deep learning specialization, you have those skills to be very efficient in how you build these networks.

- So dive right in to play with the network, to train it, to do the inference on a particular dataset, to build the intuition about it without building it up too big to where you spend, like you said, six months learning, building up your big project without building any intuition of a small, a small aspect of the data that could already tell you everything you need to know about that data.

- Yes, and also the systematic frameworks of thinking for how to go about building practical machine learning. Maybe to make an analogy, when we learn to code, we have to learn the syntax of some programming language, right, be it Python or C++ or Octave or whatever. But the equally important or maybe even more important part of coding is to understand how to string together these lines of code into coherent things.

So, you know, when should you put something in a function column? When should you not? How do you think about abstraction? So those frameworks are what makes a programmer efficient, even more than understanding the syntax. I remember when I was an undergrad at Carnegie Mellon, one of my friends would debug their code by first trying to compile it, and then it was C++ code.

And then every line that has syntax error, they want to get rid of the syntax errors as quickly as possible. So how do you do that? Well, they would delete every single line of code with a syntax error. So really efficient for getting rid of syntax errors for horrible debugging errors.

So I think, so we learn how to debug. And I think in machine learning, the way you debug a machine learning program is very different than the way you, you know, like do binary search or whatever, or use a debugger, like trace through the code in traditional software engineering.

So it's an evolving discipline, but I find that the people that are really good at debugging machine learning algorithms are easily 10X, maybe 100X faster at getting something to work. So- - And the basic process of debugging is, so the bug in this case, why isn't this thing learning, improving, sort of going into the questions of overfitting and all those kinds of things.

That's the logical space that the debugging is happening in with neural networks. - Yeah, often the question is, why doesn't it work yet? Or can I expect this to eventually work? And what are the things I could try? Change the architecture, more data, more regularization, different optimization algorithm, you know, different types of data.

So to answer those questions systematically, so that you don't heading down the, so you don't spend six months heading down the blind alley before someone comes and says, why did you spend six months doing this? - What concepts in deep learning do you think students struggle the most with?

Or sort of is the biggest challenge for them once they get over that hill? It's, it hooks them and it inspires them and they really get it. - Similar to learning mathematics, I think one of the challenges of deep learning is that there are a lot of concepts that build on top of each other.

If you ask me what's hard about mathematics, I have a hard time pinpointing one thing. Is it addition, subtraction? Is it a carry? Is it multiplication? There's just a lot of stuff. I think one of the challenges of learning math and of learning certain technical fields is that there are a lot of concepts.

And if you miss a concept, then you're kind of missing the prerequisite for something that comes later. So in the deep learning specialization, try to break down the concepts to maximize the odds of each component being understandable. So when you move on to the more advanced thing, we learn confinates, hopefully you have enough intuitions from the earlier sections to then understand why we structure confinates in a certain way.

And then eventually why we built RNNs on LSTMs or attention model in a certain way, building on top of the earlier concepts. - Actually, I'm curious, you do a lot of teaching as well. Do you have a favorite, this is the hard concept moment in your teaching? - Well, I don't think anyone's ever turned the interview on me.

- I'm glad to be first. - I think that's a really good question. Yeah, it's really hard to capture the moment when they struggle. I think you put it really eloquently. I do think there's moments that are like aha moments that really inspire people. I think for some reason, reinforcement learning, especially deep reinforcement learning is a really great way to really inspire people and get what the use of neural networks can do.

Even though neural networks really are just a part of the deep RL framework, but it's a really nice way to paint the entirety of the picture of a neural network being able to learn from scratch, knowing nothing and explore the world and pick up lessons. I find that a lot of the aha moments happen when you use deep RL to teach people about neural networks, which is counterintuitive.

I find like a lot of the inspired sort of fire in people's passion, people's eyes, it comes from the RL world. Do you find reinforcement learning to be a useful part of the teaching process or no? - I still teach reinforcement learning in one of my Stanford classes and my PhD thesis was on reinforcement learning.

So I clearly love the field. I find that if I'm trying to teach students the most useful techniques for them to use today, I end up shrinking the amount of time I talk about reinforcement learning. It's not what's working today. Now our world changes so fast. Maybe it'll be totally different in a couple of years, but I think we need a couple more things for reinforcement learning to get there.

- To actually get there, yeah. - One of my teams is looking to reinforcement learning for some robotic control tasks. So I see the applications, but if you look at it as a percentage of all of the impact of the types of things we do, is at least today, outside of playing video games in a few of the games, the scope.

Actually at NeurIPS, a bunch of us were standing around saying, "Hey, what's your best example "of an actual deploy reinforcement learning application?" And among senior machine learning researchers. And again, there are some emerging ones, but there are not that many great examples. - I think you're absolutely right. The sad thing is there hasn't been a big application, impactful real-world application reinforcement learning.

I think its biggest impact to me has been in the toy domain, in the game domain, in the small example. That's what I mean for educational purpose, it seems to be a fun thing to explore neural networks with. But I think from your perspective, and I think that might be the best perspective, is if you're trying to educate with a simple example in order to illustrate how this can actually be grown to scale and have a real world impact, then perhaps focusing on the fundamentals of supervised learning in the context of a simple dataset, even like an MNIST dataset is the right way, is the right path to take.

I just, the amount of fun I've seen people have with reinforcement learning has been great, but not in the applied impact on the real world setting. So it's a trade-off, how much impact you want to have versus how much fun you want to have. - Yeah, that's really cool.

And I feel like the world actually needs all sorts, even within machine learning, I feel like deep learning is so exciting, but the AI team shouldn't just use deep learning. I find that my teams use a portfolio of tools, and maybe that's not the exciting thing to say, but some days we use a neural net, some days we use a PCA, actually the other day I was sitting down with my team looking at PC residuals, trying to figure out what's going on with PC applied to a manufacturing problem.

And some days we use a probabilistic graphical model, some days we use a knowledge draft, which is one of the things that has tremendous industry impact, but the amount of chatter about knowledge drafts in academia is really thin compared to the actual real world impact. So I think reinforcement learning should be in that portfolio, and then it's about balancing how much we teach all of these things.

And the world should have diverse skills, it'd be sad if everyone just learned one narrow thing. - Yeah, the diverse skill help you discover the right tool for the job. So if we could return to maybe talk quickly about the specifics of deeplearning.ai, the deep learning specialization, perhaps. How long does it take to complete the course, would you say?

- The official length of the deep learning specialization is I think 16 weeks, so about four months, but it's go at your own pace. So if you subscribe to the deep learning specialization, there are people that finish it in less than a month by working more intensely and studying more intensely.

So it really depends on the individual. When we created the deep learning specialization, we wanted to make it very accessible and very affordable. And with Coursera and deeplearning.ai's education mission, one of the things that's really important to me is that if there's someone for whom paying anything is a financial hardship, then just apply for financial aid and get it for free.

- If you were to recommend a daily schedule for people in learning, whether it's through the deeplearning.ai specialization or just learning in the world of deep learning, what would you recommend? How do they go about day-to-day sort of specific advice about learning, about their journey in the world of deep learning, machine learning?

- I think getting the habit of learning is key, and that means regularity. So for example, we send out our weekly newsletter, The Batch, every Wednesday. So people know it's coming Wednesday, you can spend a little bit of time on Wednesday catching up on the latest news through The Batch on Wednesday.

And for myself, I've picked up a habit of spending some time every Saturday and every Sunday reading or studying. And so I don't wake up on a Saturday and have to make a decision. Do I feel like reading or studying today or not? It's just what I do. And the fact is a habit makes it easier.

So I think if someone can get into that habit, it's like, you know, just like we brush our teeth every morning. I don't think about it. If I thought about it, it's a little bit annoying to have to spend two minutes doing that, but it's a habit that it takes no cognitive load, but this would be so much harder if we have to make a decision every morning.

So, and actually that's the reason why I wear the same thing every day as well. It's just one less decision. I just get up and wear my blue shirt. So, but I think if you can get that habit, that consistency of studying, then it actually feels easier. - So yeah, it's kind of amazing.

In my own life, like I play guitar every day for, I force myself to at least for five minutes play guitar. It's a ridiculously short period of time, but because I've gotten into that habit, it's incredible what you can accomplish in a period of a year or two years.

You can become, you know, exceptionally good at certain aspects of a thing by just doing it every day for a very short period of time. It's kind of a miracle that that's how it works. It adds up over time. - Yeah, and I think it's often not about the burst of sustained efforts and the all-nighters, because you can only do that a limited number of times.

It's the sustained effort over a long time. I think, you know, reading two research papers is a nice thing to do, but the power is not reading two research papers. It's reading two research papers a week for a year. Then you've read a hundred papers and you actually learn a lot when you read a hundred papers.

- So regularity and making learning a habit. Do you have general other study tips for particularly deep learning that people should, in their process of learning, is there some kind of recommendations or tips you have as they learn? - One thing I still do when I'm trying to study something really deeply is take handwritten notes.

It varies. I know there are a lot of people that take the deep learning courses during a commute or something where it may be more awkward to take notes. So I know it may not work for everyone, but when I'm taking courses on Coursera, you know, and I still take some every now and then, the most recent one I took was a course on clinical trials, because I was interested about that.

I got out my little Moleskine notebook and I was sitting at my desk. I was just taking down notes of what the instructor was saying. And that act, we know that that act of taking notes, preferably handwritten notes, increases retention. - So as you're sort of watching the video, just kind of pausing maybe, and then taking the basic insights down on paper?

- Yeah, so there've been a few studies. If you search online, you find some of these studies that taking handwritten notes, because handwriting is slower, as we're saying just now, it causes you to recode the knowledge in your own words more and that process of recoding promotes long-term retention.

This is as opposed to typing, which is fine. Again, typing is better than nothing, or in taking a class and not taking notes is better than not taking any class at all. But comparing handwritten notes and typing, you can usually type faster. For a lot of people, they can handwrite notes.

And so when people type, they're more likely to just transcribe verbatim what they heard, and that reduces the amount of recoding, and that actually results in less long-term retention. - I don't know what the psychological effect there is, but it's so true. There's something fundamentally different about writing, handwriting.

I wonder what that is. I wonder if it is as simple as just the time it takes to write is slower. - Yeah, and because you can't write as many words, you have to take whatever they said and summarize it into fewer words. And that summarization process requires deeper processing of the meaning, which then results in better retention.

- That's fascinating. - Oh, and I've spent, I think because of Coursera, I've spent so much time studying pedagogy. It's actually one of my passions. I really love learning how to more efficiently help others learn. Yeah, one of the things I do both when creating videos or when we write the batch is, I try to think, is one minute spent with us going to be a more efficient learning experience than one minute spent anywhere else?

And we really try to make it time efficient for the learners, 'cause everyone's busy. So when we're editing, I often tell my teams, every word needs to fight for its life. And if you can delete a word, let's just delete it and not wait. Let's not waste the learners' time.

- Oh, it's so amazing that you think that way, 'cause there is millions of people that are impacted by your teaching. And sort of that one minute spent has a ripple effect, right? Through years of time, which is just fascinating to think about. How does one make a career out of an interest in deep learning?

Do you have advice for people? We just talked about sort of the beginning, early steps, but if you want to make it an entire life's journey, or at least a journey of a decade or two, how do you do it? - So most important thing is to get started.

- Right, of course. - And I think in the early parts of a career, coursework, like the deep learning specialization, is a very efficient way to master this material. So, because instructors, be it me or someone else, or Lawrence Moroney teaches our TensorFlow specialization, and other things we're working on, spend effort to try to make it time efficient for you to learn a new concept.

So coursework is actually a very efficient way for people to learn concepts at the beginning parts of breaking into a new field. In fact, one thing I see at Stanford, some of my PhD students want to jump into research right away, and I actually tend to say, look, in your first couple of years as a PhD student, spend time taking courses, because it lays the foundation.

It's fine if you're less productive in your first couple of years. You'll be better off in the long term. Beyond a certain point, there's materials that doesn't exist in courses, because it's too cutting edge, the course hasn't been created yet, there's some practical experience that we're not yet that good at teaching in a course.

And I think after exhausting the efficient coursework, then most people need to go on to either ideally work on projects, and then maybe also continue their learning by reading blog posts and research papers and things like that. Doing projects is really important. And again, I think it's important to start small and just do something.

Today you read about deep learning, if you say, oh, all these people are doing such exciting things, what if I'm not building a neural network that changes the world, then what's the point? Well, the point is sometimes building that tiny neural network, be it MNIST or upgrade to a fashion MNIST, to whatever, doing your own fun hobby project.

That's how you gain the skills to let you do bigger and bigger projects. I find this to be true at the individual level and also at the organizational level. For a company to become good at machine learning, sometimes the right thing to do is not to tackle the giant project, is instead to do the small project that lets the organization learn and then build up from there.

But this is true both for individuals and for companies. - Taking the first step and then taking small steps is the key. Should students pursue a PhD, do you think? You can do so much. That's one of the fascinating things in machine learning. You can have so much impact without ever getting a PhD.

So what are your thoughts? Should people go to grad school? Should people get a PhD? - I think that there are multiple good options of which doing a PhD could be one of them. I think that if someone's admitted to a top PhD program, at MIT, Stanford, top schools, I think that's a very good experience.

Or if someone gets a job at a top organization, at a top AI team, I think that's also a very good experience. There are some things you still need a PhD to do. If someone's aspiration is to be a professor at the top academic university, you just need a PhD to do that.

But if it goes to start a company, build a company, do great technical work, I think a PhD is a good experience. But I would look at the different options available to someone. Where are the places where you can get a job? Where are the places where you can get in a PhD program?

And kind of weigh the pros and cons of those. - So just to linger on that for a little bit longer, what final dreams and goals do you think people should have? So what options should they explore? So you can work in industry. So for a large company, like Google, Facebook, Baidu, all these large companies that already have huge teams of machine learning engineers.

You can also do within industry, sort of more research groups that kind of like Google Research, Google Brain. Then you can also do, like we said, a professor in academia. And what else? Oh, you can build your own company. You can do a startup. Is there anything that stands out between those options or are they all beautiful different journeys that people should consider?

- I think the thing that affects your experience more is less are you in this company versus that company or academia versus industry. I think the thing that affects your experience most is who are the people you're interacting with in a daily basis. So even if you look at some of the large companies, the experience of individuals in different teams is very different.

And what matters most is not the logo above the door when you walk into the giant building every day. What matters the most is who are the 10 people, the 30 people you interact with every day. So I actually tend to advise people, if you get a job from a company, ask who is your manager, who are your peers, who are you actually going to talk to?

We're all social creatures. We tend to become more like the people around us. And if you're working with great people, you will learn faster. Or if you get admitted, if you get a job at a great company or a great university, maybe the logo you walk in is great, but you're actually stuck on some team doing really work that doesn't excite you.

And then that's actually a really bad experience. So this is true both for universities and for large companies. For small companies, you can kind of figure out who you'll be working with quite quickly. And I tend to advise people, if a company refuses to tell you who you will work with, someone will say, "Oh, join us.

"There's a rotation system. "We'll figure it out." I think that that's a worrying answer because it means you may not get sent to, you may not actually get to a team with great peers and great people to work with. - It's actually a really profound advice that we kind of sometimes sweep.

We don't consider too rigorously or carefully. The people around you are really often, especially when you accomplish great things, it seems the great things are accomplished because of the people around you. So it's not about whether you learn this thing or that thing, or like you said, the logo that hangs up top, it's the people.

That's a fascinating, and it's such a hard search process of finding, just like finding the right friends and somebody to get married with and that kind of thing. It's a very hard search, it's a people search problem. - Yeah, I think when someone interviews, at a university or the research lab or the large corporation, it's good to insist on just asking, "Who are the people?

"Who is my manager?" And if you refuse to tell me, I'm gonna think, "Well, maybe that's 'cause you don't have a good answer." "It may not be someone I like." - And if you don't particularly connect, if something feels off with the people, then don't stick to it. That's a really important signal to consider.

- Yeah, yeah. And actually, in my Stanford class, CS230, as well as an ACM talk, I think I gave like a hour-long talk on career advice, including on the job search process and some of these. So you can find those videos online. - Awesome, and I'll point them. I'll point people to them.

Beautiful. (silence) (silence) (silence) (silence) (silence) (silence) (silence)