back to index

Andrew Ng: Deep Learning, Education, and Real-World AI | Lex Fridman Podcast #73


Chapters

0:0 Introduction
2:23 First few steps in AI
5:5 Early days of online education
16:7 Teaching on a whiteboard
17:46 Pieter Abbeel and early research at Stanford
23:17 Early days of deep learning
32:55 Quick preview: deeplearning.ai, landing.ai, and AI fund
33:23 deeplearning.ai: how to get started in deep learning
45:55 Unsupervised learning
49:40 deeplearning.ai (continued)
56:12 Career in deep learning
58:56 Should you get a PhD?
63:28 AI fund - building startups
71:14 Landing.ai - growing AI efforts in established companies
80:44 Artificial general intelligence

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Andrew Ng,
00:00:03.640 | one of the most impactful educators, researchers,
00:00:06.480 | innovators, and leaders in artificial intelligence
00:00:09.600 | and technology space in general.
00:00:11.960 | He co-founded Coursera and Google Brain,
00:00:15.320 | launched Deep Learning AI, Landing AI, and the AI Fund,
00:00:19.640 | and was the chief scientist at Baidu.
00:00:23.120 | As a Stanford professor and with Coursera
00:00:25.920 | and Deep Learning AI, he has helped educate
00:00:28.680 | and inspire millions of students, including me.
00:00:32.840 | This is the Artificial Intelligence Podcast.
00:00:36.360 | If you enjoy it, subscribe on YouTube,
00:00:38.560 | get five stars on Apple Podcasts,
00:00:40.520 | support it on Patreon, or simply connect with me on Twitter
00:00:43.840 | at Lex Friedman, spelled F-R-I-D-M-A-N.
00:00:48.400 | As usual, I'll do one or two minutes of ads now,
00:00:51.320 | and never any ads in the middle
00:00:52.760 | that can break the flow of the conversation.
00:00:54.980 | I hope that works for you
00:00:56.400 | and doesn't hurt the listening experience.
00:00:59.040 | This show is presented by Cash App,
00:01:01.200 | the number one finance app in the App Store.
00:01:03.600 | When you get it, use code LEXPODCAST.
00:01:07.080 | Cash App lets you send money to friends,
00:01:09.360 | buy Bitcoin, and invest in the stock market
00:01:11.860 | with as little as $1.
00:01:13.760 | Brokerage services are provided by Cash App Investing,
00:01:16.760 | a subsidiary of Square, and member SIPC.
00:01:20.800 | Since Cash App allows you to buy Bitcoin,
00:01:23.120 | let me mention that cryptocurrency,
00:01:25.400 | in the context of the history of money, is fascinating.
00:01:28.840 | I recommend "A Scent of Money"
00:01:31.200 | as a great book on this history.
00:01:33.820 | Debits and credits on ledgers started over 30,000 years ago.
00:01:38.540 | The US dollar was created over 200 years ago,
00:01:42.280 | and Bitcoin, the first decentralized cryptocurrency,
00:01:45.480 | released just over 10 years ago.
00:01:48.220 | So given that history, cryptocurrency's still very much
00:01:51.560 | in its early days of development,
00:01:53.680 | but it's still aiming to, and just might,
00:01:56.700 | redefine the nature of money.
00:01:59.800 | So again, if you get Cash App
00:02:01.680 | from the App Store or Google Play,
00:02:03.480 | and use the code LEXPODCAST, you'll get $10,
00:02:07.320 | and Cash App will also donate $10 to FIRST,
00:02:10.180 | one of my favorite organizations
00:02:12.200 | that is helping to advance robotics and STEM education
00:02:15.460 | for young people around the world.
00:02:17.680 | And now, here's my conversation with Andrew Ng.
00:02:23.200 | The courses you taught on machine learning at Stanford,
00:02:25.920 | and later on Coursera that you co-founded,
00:02:29.480 | have educated and inspired millions of people.
00:02:31.880 | So let me ask you, what people or ideas inspired you
00:02:35.080 | to get into computer science and machine learning
00:02:37.160 | when you were young?
00:02:38.160 | When did you first fall in love with the field,
00:02:41.600 | is another way to put it?
00:02:42.840 | - Growing up in Hong Kong and Singapore,
00:02:45.440 | I started learning to code when I was five or six years old.
00:02:50.160 | At that time, I was learning the basic programming language,
00:02:53.720 | and I would take these books and they'll tell you,
00:02:56.200 | type this program into your computer.
00:02:57.960 | So type that program to my computer.
00:03:00.120 | And as a result of all that typing,
00:03:02.520 | I would get to play these very simple,
00:03:04.680 | shoot them up games that I had implemented
00:03:07.880 | on my little computer.
00:03:09.920 | So I thought it was fascinating as a young kid
00:03:13.000 | that I could write this code,
00:03:15.000 | that was really just copying code from a book
00:03:17.020 | into my computer to then play these cool little video games.
00:03:21.040 | Another moment for me was when I was a teenager
00:03:25.200 | and my father, 'cause he's a doctor,
00:03:27.840 | was reading about expert systems and about neural networks.
00:03:31.360 | So he got me to read some of these books
00:03:32.880 | and I thought it was really cool
00:03:34.800 | that you could write a computer
00:03:36.240 | that started to exhibit intelligence.
00:03:39.200 | Then I remember doing an internship
00:03:41.680 | while I was in high school, this was in Singapore,
00:03:44.440 | where I remember doing a lot of photocopying
00:03:47.660 | and I was an office assistant.
00:03:50.380 | And the highlight of my job
00:03:51.580 | was when I got to use the Shredder.
00:03:53.820 | So the teenager in me remember thinking,
00:03:55.980 | boy, this is a lot of photocopying.
00:03:57.820 | If only we could write software, build a robot,
00:03:59.660 | something to automate this,
00:04:01.100 | maybe I could do something else.
00:04:03.020 | So I think a lot of my work since then
00:04:05.480 | has centered on the theme of automation.
00:04:07.660 | Even the way I think about machine learning today,
00:04:10.060 | we're very good at writing learning algorithms
00:04:12.060 | that can automate things that people can do.
00:04:15.080 | Or even launching the first MOOCs,
00:04:16.960 | Mass Open Online Courses, that later led to Coursera,
00:04:20.120 | I was trying to automate what could be automatable
00:04:23.320 | in how I was teaching on campus.
00:04:25.400 | - Process of education,
00:04:26.480 | try to automate parts of that to make it more,
00:04:30.360 | sort of to have more impact from a single teacher,
00:04:33.400 | a single educator.
00:04:34.760 | - Yeah, I felt, teaching at Stanford,
00:04:37.920 | I was teaching machine learning
00:04:39.160 | to about 400 students a year at the time.
00:04:41.320 | And I found myself filming the exact same video every year,
00:04:46.140 | telling the same jokes in the same room.
00:04:48.840 | And I thought, why am I doing this?
00:04:50.420 | Why don't we just take last year's video
00:04:51.860 | and then I can spend my time
00:04:53.080 | building a deeper relationship with students.
00:04:55.340 | So that process of thinking through how to do that,
00:04:58.020 | that led to the first MOOCs that we launched.
00:05:00.800 | - And then you have more time to write new jokes.
00:05:05.420 | Are there favorite memories from your early days
00:05:07.540 | at Stanford teaching thousands of people in person
00:05:10.100 | and then millions of people online?
00:05:13.220 | - You know, teaching online,
00:05:17.020 | what not many people know was that
00:05:20.340 | a lot of those videos were shot
00:05:22.500 | between the hours of 10 PM and 3 AM.
00:05:26.380 | A lot of times, launching the first MOOCs at Stanford,
00:05:31.100 | we'd already announced the course,
00:05:32.260 | about 100,000 people had signed up.
00:05:34.380 | We just started to write the code
00:05:36.520 | and we had not yet actually filmed the videos.
00:05:39.260 | So we had a lot of pressure,
00:05:40.620 | 100,000 people waiting for us to produce the content.
00:05:43.400 | So many Fridays, Saturdays,
00:05:46.220 | I would go out, have dinner with my friends,
00:05:49.260 | and then I would think, okay, do you want to go home now?
00:05:51.580 | Or do you want to go to the office to film videos?
00:05:54.740 | And the thought of being able to help 100,000 people
00:05:57.820 | potentially learn machine learning,
00:05:59.500 | fortunately that made me think,
00:06:01.940 | okay, I'm going to go to my office,
00:06:03.180 | go to my tiny little recording studio.
00:06:05.460 | I would adjust my Logic Webcam,
00:06:07.940 | adjust my Wacom tablet,
00:06:10.620 | make sure my lapel mic was on,
00:06:12.580 | and then I would start recording,
00:06:13.940 | often until 2 AM or 3 AM.
00:06:16.060 | I think unfortunately that doesn't show
00:06:18.460 | that it was recorded that late at night,
00:06:20.500 | but it was really inspiring,
00:06:22.900 | the thought that we could create content
00:06:25.580 | to help so many people learn about machine learning.
00:06:27.940 | - How did that feel?
00:06:29.480 | The fact that you're probably somewhat alone,
00:06:31.540 | maybe a couple of friends,
00:06:33.740 | recording with a Logitech webcam
00:06:36.180 | and kind of going home alone at 1 or 2 AM at night,
00:06:41.060 | and knowing that that's going to reach
00:06:43.260 | sort of thousands of people,
00:06:45.300 | eventually millions of people.
00:06:47.020 | What's that feeling like?
00:06:48.980 | I mean, is there a feeling of just satisfaction
00:06:51.700 | of pushing through?
00:06:54.180 | - I think it's humbling.
00:06:55.180 | And I wasn't thinking about what I was feeling.
00:06:58.020 | I think one thing that I'm proud to say
00:07:00.740 | we got right from the early days
00:07:02.540 | was I told my whole team back then
00:07:05.260 | that the number one priority
00:07:06.500 | is to do what's best for learners,
00:07:08.060 | do what's best for students.
00:07:09.260 | And so when I went to the recording studio,
00:07:11.580 | the only thing on my mind was,
00:07:13.340 | what can I say?
00:07:14.180 | How can I design my slides?
00:07:15.260 | What do I need to draw right
00:07:16.660 | to make these concepts as clear as possible for learners?
00:07:20.620 | I think, you know,
00:07:21.580 | I've seen sometimes instructors is tempting to,
00:07:24.140 | "Hey, let's talk about my work.
00:07:25.500 | "Maybe if I teach you about my research,
00:07:27.340 | "someone will cite my papers a couple more times."
00:07:30.020 | And I think one of the things we got right
00:07:31.940 | launched the first few MOOCs,
00:07:33.060 | and later building Coursera
00:07:34.220 | was putting in place that bedrock principle
00:07:37.060 | of let's just do what's best for learners
00:07:38.940 | and forget about everything else.
00:07:40.260 | And I think that that as a guiding principle
00:07:43.260 | turned out to be really important
00:07:44.820 | to the rise of the MOOC movement.
00:07:46.940 | - And the kind of learner you imagined in your mind
00:07:49.340 | is as broad as possible,
00:07:52.700 | as global as possible.
00:07:53.980 | So really try to reach as many people
00:07:56.340 | interested in machine learning and AI as possible.
00:07:59.860 | - I really want to help anyone
00:08:01.060 | that had an interest in machine learning
00:08:03.300 | to break into the field.
00:08:04.660 | And I think sometimes,
00:08:07.060 | I've actually had people ask me,
00:08:08.260 | "Hey, why are you spending so much time
00:08:09.940 | "explaining gradient descent?"
00:08:11.660 | And my answer was,
00:08:13.820 | "If I look at what I think the learner needs
00:08:15.580 | "and what benefit from,
00:08:16.660 | "I felt that having that,
00:08:18.980 | "a good understanding of the foundations,
00:08:20.620 | "kind of back to the basics,
00:08:21.940 | "would put them in a better state
00:08:23.860 | "to then build on a long-term career."
00:08:26.820 | So we've tried to consistently make decisions
00:08:29.620 | on that principle.
00:08:30.620 | - So one of the things you actually revealed
00:08:32.700 | to the narrow AI community at the time
00:08:36.620 | and to the world,
00:08:38.140 | is that the amount of people
00:08:39.660 | who are actually interested in AI
00:08:41.100 | is much larger than we imagined.
00:08:43.540 | By you teaching the class
00:08:44.900 | and how popular it became,
00:08:47.060 | it showed that,
00:08:48.500 | wow, this isn't just a small community
00:08:50.980 | of sort of people who go to NeurIPS
00:08:54.540 | and it's much bigger.
00:08:56.740 | It's developers,
00:08:57.860 | it's people from all over the world.
00:08:59.860 | I mean, I'm Russian,
00:09:00.820 | so everybody in Russia is really interested.
00:09:03.420 | There's a huge number of programmers
00:09:04.860 | who are interested in machine learning.
00:09:06.580 | India, China, South America, everywhere.
00:09:10.820 | There's just millions of people
00:09:12.300 | who are interested in machine learning.
00:09:13.660 | So how big do you get a sense
00:09:15.140 | that the number of people is
00:09:17.060 | that are interested from your perspective?
00:09:20.460 | - I think the number's grown over time.
00:09:22.820 | I think it's one of those things
00:09:23.940 | that maybe it feels like it came out of nowhere,
00:09:26.540 | but there's an inside that building it.
00:09:27.980 | It took years.
00:09:28.820 | It's not those overnight successes
00:09:30.380 | that took years to get there.
00:09:33.220 | My first foray into this type of online education
00:09:36.060 | was when we were filming my Stanford class
00:09:37.940 | and sticking the videos on YouTube
00:09:39.500 | and then some other things.
00:09:40.500 | We had uploaded the whole works and so on,
00:09:42.140 | but it's basically the one hour,
00:09:44.340 | 15 minute video that we put on YouTube.
00:09:46.940 | And then we had four or five other versions
00:09:50.020 | of websites that I had built,
00:09:52.140 | most of which you would never have heard of
00:09:53.900 | because they reached small audiences,
00:09:55.900 | but that allowed me to iterate,
00:09:57.580 | allowed my team and me to iterate,
00:09:59.060 | to learn what the ideas that work and what doesn't.
00:10:02.420 | For example, one of the features
00:10:03.780 | I was really excited about and really proud of
00:10:05.860 | was build this website where multiple people
00:10:08.460 | could be logged into the website at the same time.
00:10:11.340 | So today, if you go to a website,
00:10:13.580 | if you're logged in and then I want to log in,
00:10:15.940 | you need to log out if it's the same browser,
00:10:17.620 | same computer.
00:10:18.940 | But I thought, well, what if two people,
00:10:20.340 | say you and me were watching a video together
00:10:23.260 | in front of a computer?
00:10:24.260 | What if a website could have you type your name
00:10:27.220 | and password, have me type my name and password?
00:10:29.100 | And then now the computer knows
00:10:30.260 | both of us are watching together
00:10:31.860 | and it gives both of us credit
00:10:33.220 | for anything we do as a group.
00:10:35.420 | Influences feature rolled it out
00:10:37.020 | in a school in San Francisco.
00:10:39.740 | We had about 20 something users.
00:10:41.900 | Where's the teacher there?
00:10:44.020 | Sacred Heart Cathedral Prep, the teacher was great.
00:10:46.340 | I mean, guess what?
00:10:47.380 | Zero people use this feature.
00:10:49.940 | It turns out people studying online,
00:10:51.940 | they want to watch the videos by themselves.
00:10:54.060 | So you can play back, pause at your own speed
00:10:56.700 | rather than in groups.
00:10:57.780 | So that was one example of a tiny lesson learned
00:11:00.580 | out of many that allows us to hone in
00:11:03.260 | to the set of features.
00:11:04.820 | - And it sounds like a brilliant feature.
00:11:06.460 | So I guess the lesson to take from that is
00:11:09.380 | there's something that looks amazing on paper
00:11:13.940 | and then nobody uses it.
00:11:15.260 | It doesn't actually have the impact
00:11:17.140 | that you think it might have.
00:11:18.380 | I say, yeah, I saw that you've really went
00:11:20.460 | through a lot of different features and a lot of ideas
00:11:23.380 | and to arrive at the final,
00:11:24.460 | at Coursera's final kind of powerful thing
00:11:27.700 | that showed the world that MOOCs can educate millions.
00:11:32.220 | - And I think with the whole machine learning movement
00:11:34.860 | as well, I think it didn't come out of nowhere.
00:11:38.340 | Instead, what happened was as more people learn
00:11:41.460 | about machine learning, they will tell their friends
00:11:43.460 | and their friends will see how it's applicable
00:11:44.900 | to their work.
00:11:46.020 | And then the community kept on growing.
00:11:48.620 | And I think we're still growing.
00:11:50.980 | I don't know in the future what percentage
00:11:53.700 | of our developers will be AI developers.
00:11:56.980 | I could easily see it being north of 50%, right?
00:11:59.740 | Because so many AI developers broadly construed,
00:12:04.660 | not just people doing the machine learning modeling,
00:12:06.540 | but the people building infrastructure, data pipelines,
00:12:09.620 | you know, all the software's surrounding
00:12:11.580 | the core machine learning model.
00:12:13.820 | Maybe it's even bigger.
00:12:15.300 | I feel like today, almost every software engineer
00:12:18.300 | has some understanding of the cloud.
00:12:20.460 | Not all, you know, but maybe it's just
00:12:22.540 | my microcontroller developer doesn't need to deal with cloud.
00:12:25.500 | But I feel like the vast majority of software engineers today
00:12:28.940 | are sort of having an appreciation of the cloud.
00:12:31.980 | I think in the future, maybe we'll approach nearly 100%
00:12:35.060 | of all developers being, you know, in some way,
00:12:38.020 | an AI developer, at least having an appreciation
00:12:40.620 | of machine learning.
00:12:41.980 | - And my hope is that there's this kind of effect
00:12:45.060 | that there's people who are not really interested
00:12:47.660 | in being a programmer or being into software engineering,
00:12:50.980 | like biologists, chemists, and physicists,
00:12:54.500 | even mechanical engineers, all these disciplines
00:12:57.180 | that are now more and more sitting on large datasets.
00:13:01.580 | And here, they didn't think they're interested
00:13:03.700 | in programming until they have this dataset
00:13:05.700 | and they realized there's this set of machine learning tools
00:13:07.820 | that allow you to use the dataset.
00:13:09.420 | So they actually become, they learn to program
00:13:12.180 | and they become new programmers.
00:13:13.580 | So like the, not just, 'cause you've mentioned
00:13:16.220 | a larger percentage of developers
00:13:17.780 | become machine learning people.
00:13:19.660 | The, it seems like more and more,
00:13:21.900 | the kinds of people who are becoming developers
00:13:25.420 | is also growing significantly.
00:13:27.380 | - Yeah, I think once upon a time,
00:13:29.900 | only a small part of humanity was literate,
00:13:32.780 | you know, could read and write.
00:13:33.860 | And maybe you thought, maybe not everyone needs
00:13:36.620 | to learn to read and write.
00:13:37.620 | You know, you just go listen to a few monks, right?
00:13:42.100 | Read to you, and maybe that was enough.
00:13:44.220 | Or maybe we just need a few handful of authors
00:13:46.420 | to write the best sellers,
00:13:47.780 | and then no one else needs to write.
00:13:50.340 | But what we found was that by giving as many people,
00:13:53.180 | you know, in some countries, almost everyone,
00:13:55.380 | basic literacy, it dramatically enhanced
00:13:58.060 | human to human communications.
00:13:59.460 | And we can now write for an audience of one,
00:14:01.380 | such as if I send you an email or you send me an email.
00:14:04.980 | I think in computing, we're still in that phase
00:14:07.780 | where so few people know how to code
00:14:09.660 | that the coders mostly have to code
00:14:12.140 | for relatively large audiences.
00:14:14.500 | But if everyone, or most people,
00:14:17.140 | became developers at some level,
00:14:20.460 | similar to how most people in developed economies
00:14:22.940 | are somewhat literate,
00:14:24.420 | I would love to see the owners of a mom and pop store
00:14:27.940 | be able to write a little bit of code
00:14:29.100 | to customize the TV display for their special this week.
00:14:32.460 | And I think it'll enhance human to computer communications,
00:14:36.340 | which is becoming more and more important in today's world.
00:14:38.780 | - So you think it's possible that machine learning
00:14:41.740 | becomes kind of similar to literacy where,
00:14:45.900 | yeah, like you said, the owners of a mom and pop shop,
00:14:49.980 | is basically everybody in all walks of life
00:14:52.180 | would have some degree of programming capability?
00:14:55.620 | - I could see society getting there.
00:14:58.580 | There's one other interesting thing, you know,
00:15:00.700 | if I go talk to the mom and pop store,
00:15:02.860 | if I talk to a lot of people in their daily professions,
00:15:05.380 | I previously didn't have a good story
00:15:07.300 | for why they should learn to code.
00:15:09.340 | You know, we could give them some reasons.
00:15:11.300 | But what I found with the rise of machine learning
00:15:13.300 | and data science is that I think the number of people
00:15:15.980 | with a concrete use for data science in their daily lives,
00:15:19.460 | in their jobs, may be even larger than the number of people
00:15:22.860 | with a concrete use for software engineering.
00:15:25.460 | For example, actually, if you run a small mom and pop store,
00:15:28.180 | I think if you can analyze the data about your sales,
00:15:30.900 | your customers, I think there's actually real value there,
00:15:34.220 | maybe even more than traditional software engineering.
00:15:37.260 | So I find that for a lot of my friends
00:15:39.380 | in various professions, be it recruiters or accountants
00:15:42.940 | or, you know, people that work in the factories,
00:15:45.260 | which I deal with more and more these days,
00:15:47.420 | I feel if they were data scientists at some level,
00:15:51.340 | they could immediately use that in their work.
00:15:54.540 | So I think that data science and machine learning
00:15:56.900 | may be an even easier entree into the developer world
00:16:00.460 | for a lot of people than the software engineering.
00:16:03.420 | - That's interesting.
00:16:04.420 | And I agree with that, but that's beautifully put.
00:16:07.860 | We live in a world where most courses and talks have slides,
00:16:11.260 | PowerPoint, keynote, and yet you famously
00:16:14.620 | often still use a marker and a whiteboard.
00:16:17.300 | The simplicity of that is compelling,
00:16:19.380 | and for me, at least, fun to watch.
00:16:22.100 | So let me ask, why do you like using a marker and whiteboard
00:16:25.860 | even on the biggest of stages?
00:16:27.660 | - I think it depends on the concepts you want to explain.
00:16:32.380 | For mathematical concepts, it's nice to build up
00:16:34.900 | the equation one piece at a time.
00:16:37.060 | And the whiteboard marker or the pen and stylus
00:16:41.340 | is a very easy way to build up the equation,
00:16:43.980 | build up a complex concept one piece at a time
00:16:47.420 | while you're talking about it.
00:16:48.580 | And sometimes that enhances understandability.
00:16:51.660 | The downside of writing is that it's slow.
00:16:54.820 | And so if you want a long sentence,
00:16:56.380 | it's very hard to write that.
00:16:57.380 | So I think there are pros and cons.
00:16:58.420 | And sometimes I use slides,
00:17:00.420 | and sometimes I use a whiteboard or a stylus.
00:17:03.220 | - The slowness of a whiteboard is also its upside,
00:17:06.340 | 'cause it forces you to reduce everything to the basics.
00:17:11.340 | So some of your talks involve the whiteboard.
00:17:14.900 | I mean, there's really not, you go very slowly,
00:17:17.860 | and you really focus on the most simple principles.
00:17:20.180 | And that's a beautiful,
00:17:21.620 | that enforces a kind of a minimalism of ideas
00:17:26.540 | that I think is, surprisingly at least for me,
00:17:29.420 | is great for education.
00:17:31.660 | Like a great talk, I think,
00:17:34.260 | is not one that has a lot of content.
00:17:37.100 | A great talk is one that just clearly says
00:17:40.340 | a few simple ideas.
00:17:41.980 | And I think you,
00:17:42.820 | the whiteboard somehow enforces that.
00:17:46.380 | Peter Abbeel, who's now one of the top roboticists
00:17:49.500 | and reinforcement learning experts in the world,
00:17:51.500 | was your first PhD student.
00:17:53.140 | So I bring him up just because I kind of imagine
00:17:56.940 | this must have been an interesting time in your life.
00:18:02.340 | Do you have any favorite memories of working with Peter,
00:18:04.980 | since you're your first student in those uncertain times,
00:18:08.380 | especially before deep learning really sort of blew up?
00:18:13.380 | Any favorite memories from those times?
00:18:17.820 | - Yeah, I was really fortunate to have had Peter Abbeel
00:18:20.740 | as my first PhD student.
00:18:22.740 | And I think even my long-term professional success
00:18:25.620 | builds on early foundations or early work
00:18:27.740 | that Peter was so critical to.
00:18:29.980 | So I was really grateful to him for working with me.
00:18:33.260 | What not a lot of people know
00:18:36.700 | is just how hard research was and still is.
00:18:41.100 | Peter's PhD thesis was using reinforcement learning
00:18:44.940 | to fly helicopters.
00:18:47.180 | And so, actually even today,
00:18:49.380 | the website heli.stanford.edu,
00:18:51.700 | H-E-L-I.stanford.edu is still up.
00:18:53.460 | You can watch videos of us using reinforcement learning
00:18:56.340 | to make a helicopter fly upside down,
00:18:58.060 | fly loose, so it's cool.
00:19:00.060 | - It's one of the most incredible robotics videos ever.
00:19:02.460 | So people should watch it.
00:19:03.700 | - Oh yeah, thank you. - It's inspiring.
00:19:05.140 | That's from like 2008 or seven or six, like that range.
00:19:10.140 | - Something like that, yeah, so we're 10 years old.
00:19:13.020 | - That was really inspiring to a lot of people, yeah.
00:19:15.420 | - What not many people see is how hard it was.
00:19:18.900 | So Peter and Adam Coates and Morgan Quigley and I
00:19:22.780 | were working on various versions of the helicopter,
00:19:25.500 | and a lot of things did not work.
00:19:27.460 | For example, turns out one of the hardest problems we had
00:19:29.860 | was when the helicopter is flying around upside down,
00:19:32.380 | doing stunts, how do you figure out the position?
00:19:34.900 | How do you localize a helicopter?
00:19:36.860 | So we wanted to try all sorts of things.
00:19:38.900 | Having one GPS unit doesn't work
00:19:41.220 | 'cause you're flying upside down,
00:19:42.300 | GPS unit is facing down, so you can't see the satellites.
00:19:44.860 | So we experimented trying to have two GPS units,
00:19:48.620 | one facing up, one facing down.
00:19:49.980 | So if you flip over, that didn't work
00:19:51.900 | 'cause the downward facing one couldn't synchronize
00:19:54.340 | if you're flipping quickly.
00:19:56.740 | Morgan Quigley was exploring this crazy,
00:19:59.500 | complicated configuration of specialized hardware
00:20:02.460 | to interpret GPS signals.
00:20:04.580 | A look into FPG is completely insane.
00:20:06.820 | Spent about a year working on that, didn't work.
00:20:10.300 | So I remember Peter, great guy, him and me,
00:20:14.420 | sitting down in my office,
00:20:15.980 | looking at some of the latest things we had tried
00:20:18.740 | that didn't work and saying, "Done it, what now?"
00:20:23.260 | Because we tried so many things and it just didn't work.
00:20:26.940 | In the end, what we did, and Adam Coates was crucial to this,
00:20:32.260 | was put cameras on the ground
00:20:34.260 | and use cameras on the ground to localize the helicopter.
00:20:36.980 | And that solved the localization problem
00:20:39.820 | so that we could then focus on the reinforcement learning
00:20:42.380 | and inverse reinforcement learning techniques
00:20:44.420 | so it didn't actually make the helicopter fly.
00:20:46.700 | And I'm reminded, when I was doing this work at Stanford,
00:20:51.780 | around that time, there was a lot of reinforcement learning
00:20:55.220 | theoretical papers, but not a lot of practical applications.
00:20:59.540 | So the autonomous helicopter work for flying helicopters
00:21:03.380 | was one of the few practical applications
00:21:06.620 | of reinforcement learning at the time,
00:21:08.020 | which caused it to become pretty well-known.
00:21:11.580 | I feel like we might've almost come full circle with today.
00:21:14.740 | There's so much buzz, so much hype, so much excitement
00:21:17.580 | about reinforcement learning.
00:21:19.020 | But again, we're hunting for more applications
00:21:21.820 | of all of these great ideas
00:21:23.140 | that the communities come up with.
00:21:24.740 | - What was the drive, sort of in the face of the fact
00:21:28.260 | that most people are doing theoretical work,
00:21:30.140 | what motivate you in the uncertainty and the challenges
00:21:33.020 | to get the helicopter, sort of to do the applied work,
00:21:36.500 | to get the actual system to work?
00:21:38.540 | Yeah, in the face of fear, uncertainty,
00:21:41.860 | sort of the setbacks that you mentioned for localization.
00:21:46.020 | - I like stuff that works.
00:21:48.100 | - In the physical world.
00:21:48.940 | So like, it's back to the shredder.
00:21:51.260 | - You know, I like theory,
00:21:55.460 | but when I work on theory myself, and this is personal taste,
00:21:58.700 | I'm not saying anyone else should do what I do,
00:22:00.820 | but when I work on theory, I personally enjoy it more
00:22:04.100 | if I feel that the work I do will influence people,
00:22:08.740 | have positive impact, or help someone.
00:22:10.820 | I remember when, many years ago,
00:22:14.940 | I was speaking with a mathematics professor,
00:22:17.780 | and it kind of just said, "Hey, why do you do what you do?"
00:22:21.300 | And then he said, he actually,
00:22:23.740 | he had stars in his eyes when he answered,
00:22:25.780 | and this mathematician,
00:22:27.980 | not from Stanford, different university,
00:22:29.580 | he said, "I do what I do because it helps me
00:22:32.900 | to discover truth and beauty in the universe."
00:22:36.740 | He had stars in his eyes when he said that.
00:22:38.580 | And I thought, that's great.
00:22:39.980 | I don't want to do that.
00:22:42.660 | I think it's great that someone does that,
00:22:44.180 | fully support the people that do it,
00:22:45.500 | a lot of respect for people that do that,
00:22:47.060 | but I am more motivated when I can see a line
00:22:50.780 | to how the work that my teams and I are doing helps people.
00:22:55.660 | The world needs all sorts of people.
00:22:58.580 | I'm just one type.
00:22:59.500 | I don't think everyone should do things
00:23:01.380 | the same way as I do,
00:23:02.540 | but when I delve into either theory or practice,
00:23:06.100 | if I personally have conviction
00:23:08.580 | that here's a pathway to help people,
00:23:10.460 | I find that more satisfying.
00:23:14.380 | To have that conviction.
00:23:15.860 | - That's your path.
00:23:17.620 | You were a proponent of deep learning
00:23:20.020 | before it gained widespread acceptance.
00:23:23.260 | What did you see in this field that gave you confidence?
00:23:25.940 | What was your thinking process like
00:23:27.740 | in that first decade of the,
00:23:29.700 | I don't know what that's called,
00:23:31.460 | 2000s, the aughts?
00:23:33.820 | - Yeah, I can tell you the thing we got wrong
00:23:35.660 | and the thing we got right.
00:23:36.980 | The thing we really got wrong was the importance of,
00:23:39.740 | the early importance of unsupervised learning.
00:23:43.580 | So early days of Google Brain,
00:23:46.740 | we put a lot of effort into unsupervised learning
00:23:48.740 | rather than supervised learning.
00:23:50.300 | And there was this argument,
00:23:51.660 | I think it was around 2005 after NeurIPS,
00:23:56.140 | at that time called NIPS,
00:23:57.100 | but now NeurIPS had ended.
00:23:58.980 | And Geoff Hinton and I were sitting in the cafeteria
00:24:01.980 | outside the conference,
00:24:03.380 | we had lunch, we were just chatting.
00:24:04.900 | And Geoff pulled out this napkin,
00:24:06.180 | he started sketching this argument on a napkin.
00:24:08.660 | It was very compelling, I'll repeat it.
00:24:11.860 | Human brain has about a hundred trillion,
00:24:14.180 | so there's 10 to the 14 synaptic connections.
00:24:16.940 | You will live for about 10 to the nine seconds.
00:24:21.220 | That's 30 years.
00:24:22.100 | You actually live for two by 10 to the nine,
00:24:24.460 | maybe three by 10 to the nine seconds.
00:24:25.860 | So just let's say 10 to the nine.
00:24:27.980 | So if each synaptic connection,
00:24:30.660 | each weight in your brain's neural network
00:24:32.740 | has just a one bit parameter,
00:24:35.020 | that's 10 to the 14 bits you need to learn
00:24:37.980 | in up to 10 to the nine seconds of your life.
00:24:41.820 | So via this simple argument,
00:24:43.780 | which is a lot of problems, it's very simplified,
00:24:46.100 | that's 10 to the five bits per second
00:24:47.620 | you need to learn in your life.
00:24:49.780 | And I have a one-year-old daughter,
00:24:52.580 | I am not pointing out 10 to five bits per second
00:24:56.580 | of labels to her.
00:24:58.900 | So, and I think I'm a very loving parent,
00:25:02.100 | but I'm just not gonna do that.
00:25:03.660 | So from this, you know, very crude,
00:25:07.260 | definitely problematic argument.
00:25:08.900 | There's just no way that most of what we know
00:25:11.340 | is through supervised learning.
00:25:13.460 | But where you get so many bits of information
00:25:15.340 | is from sucking in images, audio,
00:25:16.980 | just experiences in the world.
00:25:18.540 | And so that argument,
00:25:21.500 | and there are a lot of known forces argument,
00:25:23.220 | you know, we should go into,
00:25:24.780 | really convinced me that there's a lot of power
00:25:26.940 | to unsupervised learning.
00:25:28.220 | So that was the part that we actually maybe got wrong.
00:25:32.540 | I still think unsupervised learning is really important,
00:25:34.860 | but in the early days, you know, 10, 15 years ago,
00:25:38.900 | a lot of us thought that was the path forward.
00:25:41.220 | - Oh, so you're saying that that perhaps
00:25:43.460 | was the wrong intuition for the time.
00:25:45.660 | - For the time, that was the part we got wrong.
00:25:48.540 | The part we got right was the importance of scale.
00:25:51.580 | So Adam Coates, another wonderful person,
00:25:55.900 | fortunate to have worked with him,
00:25:58.140 | he was in my group at Stanford at the time,
00:26:00.020 | and Adam had run these experiments at Stanford
00:26:02.380 | showing that the bigger we train a learning algorithm,
00:26:06.060 | the better its performance.
00:26:07.900 | And it was based on that,
00:26:10.140 | there was a graph that Adam generated, you know,
00:26:12.900 | where the X-axis, Y-axis lines going up into the right.
00:26:15.780 | So the bigger you make this thing,
00:26:17.540 | the better its performance, accuracy is the vertical axis.
00:26:20.340 | So it was really based on that chart that Adam generated
00:26:22.780 | that he gave me the conviction
00:26:23.980 | that we could scale these models way bigger
00:26:26.260 | than what we could on a few CPUs,
00:26:27.900 | which is what we had at Stanford,
00:26:29.420 | that we could get even better results.
00:26:31.580 | And it was really based on that one figure
00:26:33.420 | that Adam generated that gave me the conviction
00:26:37.100 | to go with Sebastian Thrun to pitch, you know,
00:26:40.180 | starting a project at Google,
00:26:42.820 | which became the Google Brain Project.
00:26:44.140 | - Google Brain, you go find Google Brain.
00:26:45.740 | And there the intuition was scale
00:26:49.100 | will bring performance for the system,
00:26:52.300 | so we should chase a larger and larger scale.
00:26:55.460 | And I think people don't realize how groundbreaking of it,
00:27:00.140 | it's simple, but it's a groundbreaking idea
00:27:02.340 | that bigger datasets will result in better performance.
00:27:06.180 | - It was controversial at the time.
00:27:08.740 | Some of my well-meaning friends,
00:27:10.060 | you know, senior people in the machine learning community,
00:27:11.580 | I won't name, but who's people,
00:27:13.780 | some of whom we know,
00:27:16.100 | my well-meaning friends came
00:27:17.900 | and were trying to give me friendly advice,
00:27:19.460 | like, "Hey, Andrew, why are you doing this?
00:27:20.980 | This is crazy.
00:27:21.900 | It's in the near natural architecture.
00:27:23.300 | Look at these architectures of building.
00:27:24.900 | You just want to go for scale?
00:27:26.060 | Like, this is a bad career move."
00:27:27.420 | So my well-meaning friends, you know,
00:27:29.900 | some of them were trying to talk me out of it.
00:27:32.500 | But I find that if you want to make a breakthrough,
00:27:36.660 | you sometimes have to have conviction
00:27:38.980 | and do something before it's popular,
00:27:41.020 | since that lets you have a bigger impact.
00:27:43.020 | - Let me ask you just in a small tangent on that topic.
00:27:46.100 | I find myself arguing with people saying that greater scale,
00:27:51.100 | especially in the context of active learning,
00:27:53.500 | so very carefully selecting the dataset,
00:27:56.940 | but growing the scale of the dataset,
00:27:59.220 | is going to lead to even further breakthroughs
00:28:01.620 | in deep learning.
00:28:02.780 | And there's currently pushback at that idea,
00:28:05.900 | that larger datasets are no longer...
00:28:09.140 | So you want to increase the efficiency of learning.
00:28:11.860 | You want to make better learning mechanisms.
00:28:14.020 | And I personally believe that just bigger datasets
00:28:16.820 | will still, with the same learning methods we have now,
00:28:20.020 | will result in better performance.
00:28:21.820 | What's your intuition at this time on this dual side?
00:28:27.980 | Is do we need to come up with better architectures
00:28:30.740 | for learning, or can we just get bigger, better datasets
00:28:35.700 | that will improve performance?
00:28:37.740 | - I think both are important.
00:28:39.260 | And it's also problem dependent.
00:28:40.940 | So for a few datasets, we may be approaching
00:28:44.740 | your Bayes error rate, or approaching or surpassing
00:28:47.940 | human level performance.
00:28:49.260 | And then there's that theoretical ceiling
00:28:51.300 | that we will never surpass a Bayes error rate.
00:28:53.660 | But then I think there are plenty of problems
00:28:56.260 | where we're still quite far from either
00:28:58.740 | human level performance or from Bayes error rate.
00:29:00.980 | And bigger datasets with new networks,
00:29:05.420 | without further algorithmic innovation,
00:29:07.140 | will be sufficient to take us further.
00:29:09.460 | But on the flip side, if we look at the recent breakthroughs
00:29:12.900 | using transforming networks or language models,
00:29:15.540 | it was a combination of novel architecture,
00:29:18.380 | but also scale had a lot to do with it.
00:29:20.660 | If we look at what happened with GP2 and BERT,
00:29:23.060 | I think scale was a large part of the story.
00:29:26.380 | - Yeah, that's not often talked about,
00:29:28.300 | is the scale of the dataset it was trained on
00:29:31.020 | and the quality of the dataset,
00:29:32.500 | because there's some, so it was like redded threads
00:29:37.180 | that had, they were uprated highly.
00:29:39.980 | So there's already some weak supervision
00:29:42.980 | on a very large dataset
00:29:44.860 | that people don't often talk about, right?
00:29:47.340 | - I find that today we have maturing processes
00:29:50.580 | and managing code, things like Git, right?
00:29:53.540 | Version control.
00:29:54.820 | It took a long time to evolve the good processes.
00:29:57.500 | I remember when my friends and I were emailing
00:30:00.380 | each other C++ files in email,
00:30:02.380 | but then we had, was it CVS or version Git?
00:30:05.260 | Maybe something else in the future.
00:30:07.580 | We're very immature in terms of tools for managing data
00:30:10.780 | and think about the clean data
00:30:12.100 | and how to solve the very hot, messy data problems.
00:30:15.380 | I think there's a lot of innovation there to be had still.
00:30:18.980 | I love the idea that you were versioning through email.
00:30:22.220 | - I'll give you one example.
00:30:23.940 | When we work with manufacturing companies,
00:30:28.940 | it's not at all uncommon for there to be multiple labelers
00:30:34.220 | that disagree with each other, right?
00:30:36.380 | And so we would, doing the work in visual inspection,
00:30:40.580 | we will take, say a plastic pot and show it to one inspector
00:30:44.820 | and the inspector, sometimes very opinionated,
00:30:47.300 | they'll go, "Clearly that's a defect.
00:30:48.660 | This scratch, unacceptable.
00:30:49.780 | Gotta reject this pot."
00:30:51.340 | Take the same pot to different inspector,
00:30:53.460 | different, very opinionated,
00:30:54.940 | "Clearly the scratch is small.
00:30:56.260 | It's fine. Don't throw it away.
00:30:57.620 | You're going to make us use."
00:30:59.380 | And then sometimes you take the same plastic pot,
00:31:01.820 | show it to the same inspector in the afternoon,
00:31:04.300 | as opposed to in the morning,
00:31:05.500 | and very opinionated go in the morning,
00:31:07.540 | they say, "Clearly this is okay."
00:31:08.820 | In the afternoon, equally confident,
00:31:10.540 | "Clearly this is a defect."
00:31:12.380 | And so what is an AI team supposed to do
00:31:14.820 | if sometimes even one person
00:31:16.900 | doesn't agree with himself or herself in the span of a day?
00:31:20.420 | So I think these are the types of very practical,
00:31:23.740 | very messy data problems that my teams wrestle with.
00:31:28.740 | In the case of large consumer internet companies,
00:31:32.980 | where you have a billion users, you have a lot of data,
00:31:35.660 | you don't worry about it.
00:31:36.500 | Just take the average, it kind of works.
00:31:38.420 | But in a case of other industry settings,
00:31:40.820 | we don't have big data.
00:31:42.500 | If you have just a small data, very small data sets,
00:31:44.580 | maybe you have a hundred defective parts,
00:31:47.620 | or a hundred examples of a defect.
00:31:49.860 | If you have only a hundred examples,
00:31:51.420 | these little labeling errors,
00:31:53.340 | if 10 of your hundred labels are wrong,
00:31:55.860 | that actually is 10% of your data set has a big impact.
00:31:58.660 | So how do you clean this up?
00:31:59.740 | What are you supposed to do?
00:32:01.020 | This is an example of the types of things that my teams,
00:32:05.060 | this is a landing AI example,
00:32:06.740 | are wrestling with to deal with small data,
00:32:09.180 | which comes up all the time
00:32:10.220 | once you're outside consumer internet.
00:32:12.220 | - Yeah, that's fascinating.
00:32:13.060 | So then you invest more effort and time
00:32:15.340 | in thinking about the actual labeling process.
00:32:18.140 | What are the labels?
00:32:19.660 | What are the, how are disagreements resolved
00:32:22.580 | and all those kinds of like pragmatic real world problems.
00:32:25.860 | That's a fascinating space.
00:32:27.340 | - Yeah, I find that actually when I'm teaching at Stanford,
00:32:29.700 | I increasingly encourage students at Stanford
00:32:32.740 | to try to find their own project for the end of term project
00:32:37.740 | rather than just downloading someone else's
00:32:40.460 | nicely clean data set.
00:32:42.060 | It's actually much harder if you need to go
00:32:43.460 | and define your own problem and find your own data set
00:32:45.620 | rather than go to one of the several good websites,
00:32:48.740 | very good websites with clean scoped data sets
00:32:52.900 | that you could just work on.
00:32:54.300 | - You're now running three efforts,
00:32:57.020 | the AI Fund, Landing AI, and DeepLearning.ai.
00:33:02.020 | As you've said, the AI Fund is involved
00:33:04.660 | in creating new companies from scratch.
00:33:06.700 | Landing AI is involved in helping
00:33:08.660 | already established companies do AI
00:33:10.540 | and DeepLearning.ai is for education of everyone else
00:33:14.700 | or of individuals interested of getting into the field
00:33:18.180 | and excelling in it.
00:33:19.500 | So let's perhaps talk about each of these areas.
00:33:22.340 | First, DeepLearning.ai, how, the basic question,
00:33:27.340 | how does a person interested in deep learning
00:33:30.140 | get started in the field?
00:33:31.580 | - DeepLearning.ai is working to create causes
00:33:35.700 | to help people break into AI.
00:33:37.540 | So my machine learning course that I taught
00:33:41.340 | through Stanford remains one of the most popular causes
00:33:44.300 | on Coursera.
00:33:45.500 | - To this day, it's probably one of the courses,
00:33:48.540 | sort of, if I ask somebody,
00:33:49.820 | how did you get into machine learning
00:33:52.340 | or how did you fall in love with machine learning
00:33:54.180 | or what gets you interested,
00:33:55.660 | it always goes back to Andrew Yang at some point.
00:33:59.180 | The amount of people you've influenced is ridiculous.
00:34:03.260 | So for that, I'm sure I speak for a lot of people
00:34:05.820 | say big thank you.
00:34:07.140 | - No, yeah, thank you.
00:34:08.140 | You know, I was once reading a news article,
00:34:12.100 | I think it was tech review
00:34:15.180 | and I'm gonna mess up the statistic,
00:34:17.700 | but I remember reading an article that said
00:34:20.180 | something like one third of our programmers are self-taught.
00:34:23.780 | I may have the number one third wrong,
00:34:25.140 | it was two thirds.
00:34:25.980 | But when I read that article, I thought,
00:34:27.300 | this doesn't make sense.
00:34:28.180 | Everyone is self-taught.
00:34:29.420 | So, 'cause you teach yourself, I don't teach people.
00:34:32.460 | I just- - That's well put.
00:34:34.540 | So yeah, so how does one get started in deep learning
00:34:38.100 | and where does deeplearning.ai fit into that?
00:34:40.660 | - So the deep learning specialization
00:34:42.420 | offered by deep learning.ai is,
00:34:44.180 | I think it was Coursera's top specialization,
00:34:50.060 | it might still be.
00:34:50.900 | So it's a very popular way for people
00:34:52.980 | to take that specialization,
00:34:54.540 | to learn about everything from neural networks
00:34:57.860 | to how to tune in your network.
00:35:00.140 | So what does a conf net do?
00:35:01.660 | What is a RNN or a sequence model
00:35:04.180 | or what is an attention model?
00:35:05.860 | And so the deep learning specialization
00:35:08.140 | steps everyone through those algorithms.
00:35:11.020 | So you deeply understand it and can implement it
00:35:13.220 | and use it for whatever application.
00:35:15.300 | - From the very beginning?
00:35:16.540 | So what would you say are the prerequisites
00:35:19.580 | for somebody to take the deep learning specialization
00:35:22.180 | in terms of maybe math or programming background?
00:35:25.660 | - Yeah, need to understand basic programming
00:35:28.100 | since there are pro exercises in Python.
00:35:31.340 | And the math prereq is quite basic.
00:35:34.420 | So no calculus is needed.
00:35:35.980 | If you know calculus is great, you get better intuitions,
00:35:38.740 | but deliberately try to teach that specialization
00:35:41.340 | without requiring calculus.
00:35:42.740 | So I think high school math would be sufficient.
00:35:47.340 | If you know how to multiply two matrices,
00:35:49.100 | I think that's great.
00:35:52.300 | - So a little basic linear algebra is great.
00:35:54.860 | - Basic linear algebra,
00:35:56.020 | even very, very basic linear algebra in some programming.
00:36:00.180 | I think that people that have done
00:36:01.300 | the machine learning course
00:36:02.260 | will find the deep learning specialization a bit easier,
00:36:05.180 | but it's also possible to jump
00:36:06.500 | into the deep learning specialization directly,
00:36:08.420 | but it'll be a little bit harder
00:36:10.020 | since we tend to go over faster concepts
00:36:14.580 | like how does gradient descent work
00:36:16.300 | and what is the objective function,
00:36:17.540 | which is covered more slowly in the machine learning course.
00:36:20.300 | - Could you briefly mention some of the key concepts
00:36:22.980 | in deep learning that students should learn
00:36:25.140 | that you envision them learning in the first few months,
00:36:27.820 | in the first year or so?
00:36:29.380 | - So if you take the deep learning specialization,
00:36:31.940 | you learn the foundations of what is a neural network,
00:36:34.940 | how do you build up a neural network
00:36:36.940 | from a single logistic unit,
00:36:38.860 | to a stack of layers,
00:36:40.700 | to different activation functions.
00:36:43.220 | You learn how to train the neural networks.
00:36:44.940 | One thing I'm very proud of in that specialization
00:36:47.860 | is we go through a lot of practical know-how
00:36:50.380 | of how to actually make these things work.
00:36:52.340 | So what are the differences
00:36:53.420 | between different optimization algorithms?
00:36:55.860 | What do you do if the algorithm overfits?
00:36:57.380 | So how do you tell if the algorithm is overfitting?
00:36:59.140 | When do you collect more data?
00:37:00.300 | When should you not bother to collect more data?
00:37:03.300 | I find that even today, unfortunately,
00:37:06.260 | there are engineers that will spend six months
00:37:10.060 | trying to pursue a particular direction,
00:37:12.660 | such as collect more data,
00:37:13.980 | because we heard more data is valuable.
00:37:15.940 | But sometimes you could run some tests
00:37:18.380 | and could have figured out six months earlier
00:37:20.500 | that for this particular problem,
00:37:22.060 | collecting more data isn't going to cut it.
00:37:23.980 | So just don't spend six months collecting more data,
00:37:26.300 | spend your time modifying the architecture
00:37:29.380 | or trying something else.
00:37:30.300 | So go through a lot of the practical know-how
00:37:32.660 | so that when someone,
00:37:35.460 | when you take the deep learning specialization,
00:37:37.300 | you have those skills to be very efficient
00:37:39.820 | in how you build these networks.
00:37:42.020 | - So dive right in to play with the network,
00:37:44.340 | to train it, to do the inference on a particular dataset,
00:37:47.300 | to build intuition about it
00:37:48.580 | without building it up too big
00:37:52.220 | to where you spend, like you said, six months learning,
00:37:55.740 | building up your big project
00:37:57.540 | without building any intuition of a small,
00:38:00.220 | a small aspect of the data that could already tell you
00:38:03.540 | everything you need to know about that data.
00:38:05.700 | - Yes, and also the systematic frameworks of thinking
00:38:09.380 | for how to go about building practical machine learning.
00:38:12.460 | Maybe to make an analogy, when we learn to code,
00:38:15.460 | we have to learn the syntax of some programming language,
00:38:17.900 | right, be it Python or C++ or Octave or whatever.
00:38:21.580 | But the equally important
00:38:23.020 | or maybe even more important part of coding
00:38:25.020 | is to understand how to string together
00:38:26.940 | these lines of code into coherent things.
00:38:28.900 | So, you know, when should you put something
00:38:31.180 | in a function call and when should you not?
00:38:32.940 | How do you think about abstraction?
00:38:34.700 | So those frameworks are what makes a programmer efficient,
00:38:39.140 | even more than understanding the syntax.
00:38:41.740 | I remember when I was an undergrad at Carnegie Mellon,
00:38:44.940 | one of my friends would debug their code
00:38:47.620 | by first trying to compile it,
00:38:49.340 | and then it was C++ code.
00:38:50.980 | And then every line that the syntax error,
00:38:53.420 | they want to get rid of the syntax errors
00:38:54.740 | as quickly as possible.
00:38:55.740 | So how do you do that?
00:38:56.620 | Well, they would delete every single line of code
00:38:58.420 | with a syntax error.
00:38:59.700 | So really efficient for getting rid of syntax errors,
00:39:01.740 | but horrible debugging errors.
00:39:02.980 | So I think, so we learn how to debug.
00:39:05.540 | And I think in machine learning,
00:39:07.020 | the way you debug a machine learning program
00:39:09.420 | is very different than the way you do binary search
00:39:12.540 | or whatever, or use a debugger,
00:39:14.140 | like trace through the code
00:39:15.140 | in the traditional software engineering.
00:39:17.020 | So it's an evolving discipline,
00:39:19.020 | but I find that the people that are really good
00:39:20.820 | at debugging machine learning algorithms
00:39:22.900 | are easily 10X, maybe 100X faster
00:39:26.140 | at getting something to work.
00:39:28.500 | - And the basic process of debugging is,
00:39:30.460 | so the bug in this case,
00:39:32.620 | why isn't this thing learning, improving,
00:39:36.420 | sort of going into the questions of overfitting
00:39:39.340 | and all those kinds of things.
00:39:40.740 | That's the logical space that the debugging is happening in
00:39:45.340 | with neural networks.
00:39:46.540 | - Yeah, often the question is, why doesn't it work yet?
00:39:50.420 | Or can I expect this to eventually work?
00:39:53.060 | And what are the things I could try?
00:39:54.900 | Change the architecture, more data, more regularization,
00:39:57.580 | different optimization algorithm,
00:39:59.180 | different types of data.
00:40:02.020 | So to answer those questions systematically,
00:40:04.260 | so that you don't heading down the,
00:40:05.860 | so you don't spend six months heading down the blind alley
00:40:08.100 | before someone comes and says,
00:40:09.820 | why did you spend six months doing this?
00:40:12.180 | - What concepts in deep learning
00:40:14.060 | do you think students struggle the most with?
00:40:16.540 | Or sort of is the biggest challenge for them
00:40:19.100 | once they get over that hill?
00:40:21.740 | It hooks them and it inspires them and they really get it.
00:40:26.660 | - Similar to learning mathematics,
00:40:30.300 | I think one of the challenges of deep learning
00:40:32.500 | is that there are a lot of concepts
00:40:34.140 | that build on top of each other.
00:40:35.740 | If you ask me what's hard about mathematics,
00:40:38.900 | I have a hard time pinpointing one thing.
00:40:41.060 | Is it addition, subtraction?
00:40:42.420 | Is it a carry?
00:40:43.260 | Is it multiplication?
00:40:44.540 | There's just a lot of stuff.
00:40:45.940 | I think one of the challenges of learning math
00:40:47.940 | and of learning certain technical fields
00:40:49.900 | is that there are a lot of concepts.
00:40:51.660 | And if you miss a concept,
00:40:53.140 | then you're kind of missing the prerequisite
00:40:55.540 | for something that comes later.
00:40:57.180 | So in the deep learning specialization,
00:41:01.900 | try to break down the concepts to maximize the odds
00:41:04.780 | of each component being understandable.
00:41:07.020 | So when you move on to the more advanced thing,
00:41:09.340 | we learn conf nets.
00:41:10.860 | Hopefully you have enough intuitions
00:41:12.380 | from the earlier sections to then understand
00:41:14.980 | why we structure conf nets in a certain way.
00:41:18.700 | And then eventually why we build RNNs on LSTMs
00:41:23.140 | or attention model in a certain way,
00:41:24.860 | building on top of the earlier concepts.
00:41:27.700 | Actually, I'm curious, you do a lot of teaching as well.
00:41:30.980 | Do you have a favorite,
00:41:33.180 | this is the hard concept moment in your teaching?
00:41:36.380 | - Well, I don't think anyone's ever turned
00:41:41.220 | the interview on me.
00:41:43.660 | Matthew, you're first.
00:41:44.820 | (laughing)
00:41:46.580 | - I think that's a really good question.
00:41:49.020 | Yeah, it's really hard to capture the moment
00:41:51.340 | when they struggle.
00:41:52.180 | I think you put it really eloquently.
00:41:53.420 | I do think there's moments that are like aha moments
00:41:57.380 | that really inspire people.
00:41:59.500 | I think for some reason, reinforcement learning,
00:42:03.420 | especially deep reinforcement learning
00:42:05.660 | is a really great way to really inspire people
00:42:09.680 | and get what the use of neural networks can do.
00:42:13.620 | Even though neural networks really are just a part
00:42:16.700 | of the deep RL framework,
00:42:18.640 | but it's a really nice way to paint the entirety
00:42:21.420 | of the picture of a neural network being able to learn
00:42:24.920 | from scratch, knowing nothing, and explore the world
00:42:27.900 | and pick up lessons.
00:42:29.200 | I find that a lot of the aha moments happen
00:42:32.080 | when you use deep RL to teach people about neural networks,
00:42:36.400 | which is counterintuitive.
00:42:37.960 | I find like a lot of the inspired sort of fire
00:42:40.800 | in people's passion, people's eyes,
00:42:42.400 | comes from the RL world.
00:42:44.880 | Do you find reinforcement learning to be a useful part
00:42:48.640 | of the teaching process or no?
00:42:50.680 | - I still teach reinforcement learning
00:42:53.520 | in one of my Stanford classes.
00:42:55.640 | And my PhD thesis was on reinforcement learning.
00:42:57.520 | So I currently love the field.
00:42:59.440 | I find that if I'm trying to teach students
00:43:01.600 | the most useful techniques for them to use today,
00:43:04.640 | I end up shrinking the amount of time
00:43:07.180 | I talk about reinforcement learning.
00:43:08.840 | It's not what's working today.
00:43:10.920 | Now our world changes so fast.
00:43:12.440 | Maybe it'll be totally different in a couple of years.
00:43:16.040 | But I think we need a couple more things
00:43:17.840 | for reinforcement learning to get there.
00:43:19.680 | - To actually get there, yeah.
00:43:20.600 | - One of my teams is looking to reinforcement learning
00:43:22.760 | for some robotic control tasks.
00:43:23.960 | So I see the applications,
00:43:25.280 | but if you look at it as a percentage of all of the impact
00:43:28.680 | of the types of things we do,
00:43:30.160 | is at least today, outside of playing video games
00:43:35.280 | in a few of the games, the scope.
00:43:38.500 | Actually at NeurIPS, a bunch of us were standing around
00:43:40.980 | saying, "Hey, what's your best example
00:43:42.900 | "of an actual deployed reinforcement learning application?"
00:43:45.340 | And among senior machine learning researchers.
00:43:49.100 | And again, there are some emerging ones,
00:43:51.500 | but there are not that many great examples.
00:43:55.340 | - I think you're absolutely right.
00:43:58.300 | The sad thing is there hasn't been a big application
00:44:02.020 | impact for real-world application reinforcement learning.
00:44:04.900 | I think its biggest impact to me has been in the toy domain,
00:44:09.460 | in the game domain, in the small example.
00:44:11.400 | That's what I mean for educational purpose.
00:44:13.660 | It seems to be a fun thing to explore neural networks with.
00:44:16.920 | But I think from your perspective,
00:44:19.160 | and I think that might be the best perspective,
00:44:21.980 | is if you're trying to educate with a simple example
00:44:24.820 | in order to illustrate how this can actually be grown
00:44:28.440 | to scale and have a real-world impact,
00:44:31.760 | then perhaps focusing on the fundamentals
00:44:33.780 | of supervised learning in the context of a simple data set,
00:44:38.780 | even like an MNIST data set is the right way,
00:44:41.940 | is the right path to take.
00:44:43.980 | I just, the amount of fun I've seen people have
00:44:46.920 | with reinforcement learning has been great,
00:44:48.540 | but not in the applied impact on the real-world setting.
00:44:52.900 | So it's a trade-off, how much impact you want to have
00:44:55.500 | versus how much fun you want to have.
00:44:57.420 | - Yeah, that's really cool.
00:44:58.260 | And I feel like the world actually needs all sorts.
00:45:01.340 | Even within machine learning,
00:45:02.660 | I feel like deep learning is so exciting,
00:45:05.900 | but the AI team shouldn't just use deep learning.
00:45:08.500 | I find that my teams use a portfolio of tools.
00:45:11.780 | And maybe that's not the exciting thing to say,
00:45:13.540 | but some days we use a neural net,
00:45:15.740 | some days we use a PCA.
00:45:20.100 | Actually, the other day I was sitting down with my team
00:45:21.620 | looking at PC residuals,
00:45:22.860 | trying to figure out what's going on
00:45:23.940 | with PC applied to a manufacturing problem.
00:45:25.860 | And some days we use a probabilistic graphical model,
00:45:28.340 | some days we use a knowledge graph,
00:45:29.940 | which is one of the things
00:45:30.780 | that has tremendous industry impact,
00:45:33.100 | but the amount of chatter about knowledge drafts
00:45:35.740 | in academia is really thin
00:45:37.380 | compared to the actual real-world impact.
00:45:39.700 | So I think reinforcement learning
00:45:41.460 | should be in that portfolio,
00:45:42.660 | and it's about balancing how much we teach
00:45:44.460 | all of these things.
00:45:45.340 | And the world should have diverse skills.
00:45:47.940 | It'd be sad if everyone just learned one narrow thing.
00:45:51.580 | - Yeah, the diverse skill
00:45:52.460 | help you discover the right tool for the job.
00:45:55.300 | What is the most beautiful, surprising,
00:45:57.380 | or inspiring idea in deep learning to you?
00:46:00.780 | Something that captivated your imagination.
00:46:04.660 | Is it the scale that could be,
00:46:07.220 | the performance that could be achieved with scale,
00:46:09.060 | or is there other ideas?
00:46:10.420 | - I think that if my only job
00:46:14.420 | was being an academic researcher,
00:46:16.580 | if an unlimited budget,
00:46:18.180 | and didn't have to worry about short-term impact
00:46:21.900 | and only focus on long-term impact,
00:46:23.860 | I probably spend all my time doing research
00:46:25.460 | on unsupervised learning.
00:46:27.580 | I still think unsupervised learning is a beautiful idea.
00:46:30.380 | At both this past NeurIPS and ICML,
00:46:34.500 | I was attending workshops
00:46:36.140 | or listening to various talks
00:46:37.540 | about self-supervised learning,
00:46:39.380 | which is one vertical segment,
00:46:41.620 | maybe, of sort of unsupervised learning
00:46:43.380 | that I'm excited about.
00:46:45.260 | Maybe just to summarize the idea,
00:46:46.540 | I guess you know the idea,
00:46:47.500 | but I'll describe briefly.
00:46:48.580 | - No, please.
00:46:49.420 | - So here's the example of self-supervised learning.
00:46:52.100 | Let's say we grab a lot of unlabeled images
00:46:54.980 | off the internet.
00:46:55.820 | We have infinite amounts of this type of data.
00:46:58.260 | I'm gonna take each image
00:46:59.380 | and rotate it by a random multiple of 90 degrees.
00:47:03.100 | And then I'm going to train a supervised neural network
00:47:06.340 | to predict what was the original orientation.
00:47:09.060 | So has this been rotated 90 degrees,
00:47:10.980 | 180 degrees, 270 degrees, or zero degrees?
00:47:14.380 | So you can generate an infinite amounts of labeled data
00:47:17.700 | because you rotated the image,
00:47:19.060 | so you know what's the ground truth label.
00:47:21.420 | And so various researchers have found
00:47:24.380 | that by taking unlabeled data
00:47:26.540 | and making up labeled datasets
00:47:28.460 | and training a large neural network on these tasks,
00:47:31.580 | you can then take the hidden layer representation
00:47:33.820 | and transfer it to a different task very powerfully.
00:47:37.300 | Learning word embeddings,
00:47:39.660 | where we take a sentence, delete a word,
00:47:41.380 | predict the missing word, which is how we learn.
00:47:44.100 | One of the ways we learn word embeddings
00:47:46.140 | is another example.
00:47:47.780 | And I think there's now this portfolio of techniques
00:47:51.100 | for generating these made-up tasks.
00:47:53.740 | Another one called Jigsaw would be if you take an image,
00:47:57.380 | cut it up into a three by three grid,
00:47:59.820 | so like a nine, three by three puzzle piece,
00:48:02.100 | jump up to nine pieces and have a neural network predict
00:48:05.140 | which of the nine factorial possible permutations
00:48:08.460 | it came from.
00:48:09.940 | So many groups, including OpenAI,
00:48:13.700 | Peter Abbeel's been doing some work on this too,
00:48:16.860 | Facebook, Google Brain, I think DeepMind,
00:48:20.220 | oh, actually Aaron Van Der Oort has great work
00:48:23.060 | on the CPC objective.
00:48:24.460 | So many teams are doing exciting work
00:48:26.220 | and I think this is a way to generate infinite label data.
00:48:30.500 | And I find this a very exciting piece
00:48:33.340 | of unsupervised learning.
00:48:34.180 | - So long-term you think that's going to unlock
00:48:37.260 | a lot of power in machine learning systems
00:48:40.060 | is this kind of unsupervised learning?
00:48:42.380 | - I don't think there's a whole enchilada.
00:48:44.020 | I think it's just a piece of it.
00:48:45.180 | And I think this one piece, self-supervised learning
00:48:48.860 | is starting to get traction.
00:48:50.300 | We're very close to it being useful.
00:48:53.300 | Well, word embeddings is really useful.
00:48:55.540 | I think we're getting closer and closer
00:48:57.180 | to just having a significant real world impact
00:49:00.380 | maybe in computer vision and video.
00:49:02.340 | But I think this concept,
00:49:05.140 | and I think there'll be other concepts around it.
00:49:08.180 | Other unsupervised learning things that I worked on
00:49:10.660 | I've been excited about.
00:49:12.220 | I was really excited about sparse coding
00:49:14.700 | and ICA, slow feature analysis.
00:49:17.620 | I think all of these are ideas that various of us
00:49:20.180 | were working on about a decade ago
00:49:21.820 | before we all got distracted
00:49:23.300 | by how well supervised learning was doing.
00:49:25.820 | - Yeah.
00:49:26.660 | So we would return to the fundamentals
00:49:29.540 | of representation learning
00:49:30.860 | that really started this movement of deep learning.
00:49:33.900 | - I think there's a lot more work that one could explore
00:49:35.940 | around this theme of ideas and other ideas
00:49:38.340 | to come up with better algorithms.
00:49:40.300 | - So if we could return to maybe talk quickly
00:49:44.020 | about the specifics of deeplearning.ai,
00:49:46.740 | the deep learning specialization perhaps.
00:49:49.540 | How long does it take to complete the course,
00:49:51.340 | would you say?
00:49:52.740 | - The official length of the deep learning specialization
00:49:55.300 | is I think 16 weeks, so about four months,
00:49:59.020 | but it's go at your own pace.
00:50:00.700 | So if you subscribe to the deep learning specialization,
00:50:03.660 | there are people that finish that in less than a month
00:50:05.820 | by working more intensely and studying more intensely.
00:50:08.100 | So it really depends on the individual.
00:50:10.740 | Yeah, when we created the deep learning specialization,
00:50:13.460 | we wanted to make it very accessible and very affordable.
00:50:18.100 | And with Coursera and deeplearning.ai's education mission,
00:50:21.820 | one of the things that's really important to me
00:50:23.500 | is that if there's someone for whom paying anything
00:50:27.180 | is a financial hardship,
00:50:29.460 | then just apply for financial aid and get it for free.
00:50:32.740 | - If you were to recommend a daily schedule for people
00:50:38.100 | in learning, whether it's through
00:50:39.540 | the deeplearning.ai specialization
00:50:41.380 | or just learning in the world of deep learning,
00:50:44.180 | what would you recommend?
00:50:46.900 | How would they go about day-to-day sort of specific advice
00:50:50.300 | about learning, about their journey
00:50:52.700 | in the world of deep learning, machine learning?
00:50:54.900 | - I think getting the habit of learning is key,
00:50:59.140 | and that means regularity.
00:51:01.220 | So for example, we send out our weekly newsletter,
00:51:06.660 | The Batch, every Wednesday.
00:51:08.100 | So people know it's coming Wednesday,
00:51:09.660 | you can spend a little bit of time on Wednesday,
00:51:11.660 | catching up on the latest news through The Batch
00:51:14.500 | on Wednesday.
00:51:17.500 | And for myself, I've picked up a habit
00:51:20.020 | of spending some time every Saturday
00:51:22.620 | and every Sunday reading or studying.
00:51:24.660 | And so I don't wake up on a Saturday
00:51:26.700 | and have to make a decision.
00:51:27.780 | Do I feel like reading or studying today or not?
00:51:30.220 | It's just what I do.
00:51:31.740 | And the fact is a habit makes it easier.
00:51:34.260 | So I think if someone can get into that habit,
00:51:37.580 | it's like, you know, just like we brush our teeth
00:51:40.180 | every morning, I don't think about it.
00:51:42.140 | If I thought about it, it's a little bit annoying
00:51:43.620 | to have to spend two minutes doing that,
00:51:46.060 | but it's a habit that it takes no cognitive load,
00:51:49.260 | but this would be so much harder
00:51:50.500 | if we have to make a decision every morning.
00:51:53.180 | So, and actually that's the reason
00:51:54.780 | why I wear the same thing every day as well.
00:51:56.140 | It's just one less decision.
00:51:57.300 | I just get up and wear my blue shirt.
00:51:59.660 | So, but I think if you can get that habit,
00:52:01.300 | that consistency of studying,
00:52:02.980 | then it actually feels easier.
00:52:05.740 | - So yeah, it's kind of amazing.
00:52:08.420 | In my own life, like I play guitar every day for,
00:52:12.780 | I force myself to at least for five minutes play guitar.
00:52:15.660 | It's a ridiculously short period of time,
00:52:18.260 | but because I've gotten into that habit,
00:52:20.220 | it's incredible what you can accomplish
00:52:21.860 | in a period of a year or two years.
00:52:24.540 | You can become, you know, exceptionally good
00:52:28.380 | at certain aspects of a thing by just doing it every day
00:52:30.980 | for a very short period of time.
00:52:32.140 | It's kind of a miracle that that's how it works.
00:52:34.740 | It adds up over time.
00:52:36.300 | - Yeah, and I think it's often not about the burst
00:52:39.700 | of sustained efforts and the all-nighters,
00:52:41.980 | because you can only do that a limited number of times.
00:52:44.340 | It's the sustained effort over a long time.
00:52:47.340 | I think, you know, reading two research papers
00:52:50.500 | is a nice thing to do,
00:52:52.060 | but the power is not reading two research papers.
00:52:54.340 | It's reading two research papers a week for a year.
00:52:57.620 | Then you've read a hundred papers
00:52:58.980 | and you actually learn a lot when you read a hundred papers.
00:53:02.100 | - So regularity and making learning a habit.
00:53:06.540 | Do you have general other study tips
00:53:10.300 | for particularly deep learning that people should,
00:53:14.100 | in their process of learning,
00:53:15.700 | is there some kind of recommendations
00:53:17.340 | or tips you have as they learn?
00:53:20.460 | - One thing I still do
00:53:22.220 | when I'm trying to study something really deeply
00:53:23.980 | is take handwritten notes.
00:53:26.500 | It varies.
00:53:27.340 | I know there are a lot of people
00:53:28.340 | that take the deep learning courses during a commute
00:53:32.100 | or something where it may be more awkward to take notes.
00:53:34.460 | So I know it may not work for everyone,
00:53:37.420 | but when I'm taking courses on Coursera,
00:53:40.420 | and I still take some every now and then,
00:53:42.380 | the most recent one I took
00:53:43.220 | was a course on clinical trials
00:53:45.060 | because I was interested about that.
00:53:46.340 | I got out my little Moleskine notebook
00:53:48.580 | and I was sitting at my desk,
00:53:49.580 | I was just taking down notes
00:53:51.100 | of what the instructor was saying.
00:53:52.260 | And that act, we know that that act of taking notes,
00:53:55.500 | preferably handwritten notes, increases retention.
00:53:59.420 | - So as you're sort of watching the video,
00:54:02.420 | just kind of pausing maybe
00:54:03.900 | and then taking the basic insights down on paper?
00:54:07.900 | - Yeah, so there've been a few studies.
00:54:10.100 | If you search online,
00:54:11.220 | you find some of these studies
00:54:12.780 | that taking handwritten notes,
00:54:15.220 | because handwriting is slower, as we're saying just now,
00:54:18.140 | it causes you to recode the knowledge in your own words more
00:54:23.220 | and that process of recoding promotes long-term retention.
00:54:26.820 | This is as opposed to typing, which is fine.
00:54:29.020 | Again, typing is better than nothing,
00:54:30.820 | or in taking a class and not taking notes is better
00:54:32.860 | than not taking any class at all.
00:54:34.460 | But comparing handwritten notes and typing,
00:54:38.140 | you can usually type faster.
00:54:39.660 | For a lot of people, you can handwrite notes.
00:54:41.540 | And so when people type,
00:54:43.060 | they're more likely to just transcribe verbatim
00:54:45.500 | what they heard,
00:54:46.380 | and that reduces the amount of recoding.
00:54:49.180 | And that actually results in less long-term retention.
00:54:52.540 | - I don't know what the psychological effect there is,
00:54:54.420 | but it's so true.
00:54:55.420 | There's something fundamentally different
00:54:57.020 | about writing, handwriting.
00:54:59.660 | I wonder what that is.
00:55:00.500 | I wonder if it is as simple
00:55:01.700 | as just the time it takes to write is slower.
00:55:04.420 | - Yeah, and because you can't write as many words,
00:55:08.180 | you have to take whatever they said
00:55:10.300 | and summarize it into fewer words.
00:55:12.060 | And that summarization process
00:55:13.500 | requires deeper processing of the meaning,
00:55:15.980 | which then results in better retention.
00:55:17.980 | - That's fascinating.
00:55:19.020 | - Oh, and I've spent, I think, because of Coursera,
00:55:22.500 | I've spent so much time studying pedagogy.
00:55:24.220 | It's actually one of my passions.
00:55:25.340 | I really love learning how to more efficiently
00:55:28.140 | help others learn.
00:55:30.260 | - Yeah, one of the things I do both when creating videos
00:55:33.620 | or when we write the batch is,
00:55:35.340 | I try to think, is one minute spent with us
00:55:39.220 | going to be a more efficient learning experience
00:55:42.020 | than one minute spent anywhere else?
00:55:43.900 | And we really try to, you know,
00:55:46.460 | make it time efficient for the learners
00:55:48.420 | 'cause, you know, everyone's busy.
00:55:50.140 | So when we're editing, I often tell my teams,
00:55:53.500 | every word needs to fight for its life.
00:55:55.300 | And if we can delete a word, let's just delete it
00:55:56.980 | and not wait, let's not waste the learners' time.
00:56:00.220 | - Wow, that's so, it's so amazing that you think that way
00:56:02.260 | 'cause there is millions of people that are impacted
00:56:04.300 | by your teaching and sort of that one minute spent
00:56:06.860 | has a ripple effect, right?
00:56:08.420 | Through years of time, which is fascinating to think about.
00:56:11.740 | How does one make a career
00:56:14.340 | out of an interest in deep learning?
00:56:16.020 | Do you have advice for people?
00:56:18.780 | We just talked about sort of the beginning, early steps,
00:56:21.460 | but if you want to make it an entire life's journey
00:56:24.420 | or at least a journey of a decade or two, how do you do it?
00:56:28.780 | - So most important thing is to get started.
00:56:31.220 | - Right, of course.
00:56:32.060 | - And I think in the early parts of a career,
00:56:35.500 | coursework, like the deep learning specialization,
00:56:38.820 | is a very efficient way to master this material.
00:56:43.900 | So, because, you know, instructors,
00:56:47.420 | be it me or someone else, or, you know,
00:56:49.340 | Lawrence Moroney teaches our TensorFlow specialization
00:56:51.860 | and other things we're working on,
00:56:53.420 | spend effort to try to make it time efficient
00:56:56.220 | for you to learn a new concept.
00:56:58.140 | So coursework is actually a very efficient way
00:57:01.100 | for people to learn concepts
00:57:02.780 | in the beginning parts of breaking into a new field.
00:57:05.460 | In fact, one thing I see at Stanford,
00:57:08.980 | some of my PhD students want to jump
00:57:10.780 | into research right away.
00:57:11.860 | And I actually tend to say, look,
00:57:13.620 | in your first couple of years as a PhD student,
00:57:15.460 | spend time taking courses, 'cause it lays a foundation.
00:57:18.420 | It's fine if you're less productive
00:57:20.100 | in your first couple of years.
00:57:21.140 | You'll be better off in the longterm.
00:57:23.860 | Beyond a certain point,
00:57:24.980 | there's materials that doesn't exist in courses
00:57:27.820 | because it's too cutting edge.
00:57:28.940 | The course hasn't been created yet.
00:57:30.220 | There's some practical experience
00:57:31.460 | that we're not yet that good as teaching in a course.
00:57:34.660 | And I think after exhausting the efficient coursework,
00:57:37.780 | then most people need to go on
00:57:40.420 | to either ideally work on projects,
00:57:44.620 | and then maybe also continue their learning
00:57:47.220 | by reading blog posts and research papers
00:57:49.700 | and things like that.
00:57:51.140 | Doing projects is really important.
00:57:53.020 | And again, I think it's important to start small
00:57:56.700 | and to just do something.
00:57:58.380 | Today, you read about deep learning,
00:57:59.540 | it feels like, oh, all these people
00:58:00.380 | are doing such exciting things.
00:58:01.820 | What if I'm not building a neural network
00:58:03.620 | that changes the world?
00:58:04.460 | Then what's the point?
00:58:05.300 | Well, the point is sometimes building
00:58:07.100 | that tiny neural network,
00:58:08.780 | be it MNIST or upgrade to a fashion MNIST, to whatever.
00:58:13.060 | So doing your own fun hobby project.
00:58:15.380 | That's how you gain the skills
00:58:16.740 | to let you do bigger and bigger projects.
00:58:18.940 | I find this to be true at the individual level
00:58:21.260 | and also at the organizational level.
00:58:23.780 | For a company to become good at machine learning,
00:58:25.540 | sometimes the right thing to do
00:58:26.860 | is not to tackle the giant project,
00:58:29.900 | is instead to do the small project
00:58:31.940 | that lets the organization learn
00:58:34.060 | and then build out from there.
00:58:35.340 | But this is true both for individuals and for companies.
00:58:39.500 | - To taking the first step
00:58:41.420 | and then taking small steps is the key.
00:58:45.260 | Should students pursue a PhD, do you think?
00:58:47.860 | You can do so much.
00:58:49.300 | That's one of the fascinating things in machine learning.
00:58:51.540 | You can have so much impact without ever getting a PhD.
00:58:54.860 | So what are your thoughts?
00:58:56.420 | Should people go to grad school?
00:58:57.740 | Should people get a PhD?
00:58:59.780 | - I think that there are multiple good options
00:59:02.100 | of which doing a PhD could be one of them.
00:59:05.380 | I think that if someone's admitted to a top PhD program,
00:59:08.860 | you know, at MIT, Stanford, top schools,
00:59:12.140 | I think that's a very good experience.
00:59:15.700 | Or if someone gets a job at a top organization,
00:59:19.180 | at the top AI team,
00:59:20.540 | I think that's also a very good experience.
00:59:24.060 | There are some things you still need a PhD to do.
00:59:25.980 | If someone's aspiration is to be a professor,
00:59:27.780 | you know, at the top academic university,
00:59:29.180 | you just need a PhD to do that.
00:59:31.140 | But if it goes to, you know, start a company,
00:59:33.380 | build a company, do great technical work,
00:59:35.420 | I think a PhD is a good experience.
00:59:37.740 | But I would look at the different options
00:59:40.340 | available to someone, you know,
00:59:41.540 | where are the places where you can get a job?
00:59:43.100 | Where are the places you can get in a PhD program?
00:59:45.060 | And kind of weigh the pros and cons of those.
00:59:47.660 | - So just to linger on that for a little bit longer,
00:59:50.060 | what final dreams and goals do you think people should have?
00:59:53.020 | So what options should they explore?
00:59:57.420 | So you can work in industry, so for a large company,
01:00:01.180 | like Google, Facebook, Baidu,
01:00:03.580 | all these large sort of companies
01:00:06.140 | that already have huge teams of machine learning engineers.
01:00:09.300 | You can also do within industry,
01:00:11.020 | sort of more research groups
01:00:12.340 | that kind of like Google Research, Google Brain.
01:00:15.180 | Then you can also do, like we said,
01:00:17.380 | a professor in academia.
01:00:20.420 | And what else?
01:00:21.940 | Oh, you can build your own company.
01:00:23.980 | You can do a startup.
01:00:25.180 | Is there anything that stands out between those options
01:00:28.620 | or are they all beautiful different journeys
01:00:30.900 | that people should consider?
01:00:32.780 | - I think the thing that affects your experience more
01:00:34.820 | is less are you in this company versus that company
01:00:38.180 | or academia versus industry.
01:00:40.140 | I think the thing that affects your experience most
01:00:41.660 | is who are the people you're interacting with
01:00:43.740 | in a daily basis.
01:00:45.500 | So even if you look at some of the large companies,
01:00:49.540 | the experience of individuals in different teams
01:00:51.900 | is very different.
01:00:53.060 | And what matters most is not the logo above the door
01:00:56.260 | when you walk into the giant building every day.
01:00:58.460 | What matters the most is who are the 10 people,
01:01:00.620 | who are the 30 people you interact with every day.
01:01:03.260 | So I actually tend to advise people,
01:01:04.980 | if you get a job from a company,
01:01:07.540 | ask who is your manager, who are your peers,
01:01:10.300 | who are you actually gonna talk to?
01:01:11.460 | We're all social creatures.
01:01:12.580 | We tend to become more like the people around us.
01:01:15.500 | And if you're working with great people,
01:01:17.620 | you will learn faster.
01:01:19.180 | Or if you get admitted,
01:01:20.660 | if you get a job at a great company or a great university,
01:01:24.260 | maybe the logo you walk in is great,
01:01:26.860 | but you're actually stuck on some team
01:01:28.340 | doing really work that doesn't excite you.
01:01:31.300 | And then that's actually a really bad experience.
01:01:33.780 | So this is true both for universities
01:01:36.340 | and for large companies.
01:01:38.140 | For small companies, you can kind of figure out
01:01:39.860 | who you'd be working with quite quickly.
01:01:41.980 | And I tend to advise people,
01:01:43.860 | if a company refuses to tell you who you work with,
01:01:46.780 | someone say, "Oh, join us.
01:01:47.820 | "The rotation system, we'll figure it out."
01:01:49.660 | I think that's a worrying answer
01:01:51.700 | because it means you may not get sent to,
01:01:56.140 | you may not actually get to a team
01:01:58.340 | with great peers and great people to work with.
01:02:00.940 | - It's actually a really profound advice
01:02:02.660 | that we kind of sometimes sweep.
01:02:05.140 | We don't consider too rigorously or carefully.
01:02:08.660 | The people around you are really often,
01:02:11.500 | especially when you accomplish great things,
01:02:13.140 | it seems the great things are accomplished
01:02:14.700 | because of the people around you.
01:02:16.780 | So it's not about whether you learn this thing or that thing
01:02:21.780 | or like you said, the logo that hangs up top,
01:02:25.140 | it's the people.
01:02:25.980 | That's a fascinating,
01:02:27.540 | and it's such a hard search process
01:02:29.460 | of finding, just like finding the right friends
01:02:34.220 | and somebody to get married with and that kind of thing.
01:02:37.540 | It's a very hard search, it's a people search problem.
01:02:40.980 | - Yeah, but I think when someone interviews
01:02:43.740 | at a university or the research lab
01:02:45.340 | or the large corporation,
01:02:46.980 | it's good to insist on just asking who are the people?
01:02:50.300 | Who is my manager?
01:02:51.420 | And if you refuse to tell me, I'm gonna think,
01:02:53.900 | well, maybe that's 'cause you don't have a good answer.
01:02:55.700 | It may not be someone I like.
01:02:57.340 | - And if you don't particularly connect,
01:02:59.500 | if something feels off with the people,
01:03:02.420 | then don't stick to it.
01:03:06.380 | That's a really important signal to consider.
01:03:08.700 | - Yeah, yeah.
01:03:09.580 | And actually in my Stanford class, CS230,
01:03:13.420 | as well as an ACM talk,
01:03:14.620 | I think I gave like a hour long talk on career advice,
01:03:18.300 | including on the job search process and then some of these.
01:03:21.260 | So you can find those videos online.
01:03:23.300 | - Awesome, and I'll point them.
01:03:25.140 | I'll point people to them, beautiful.
01:03:27.180 | So the AI fund helps AI startups get off the ground,
01:03:33.420 | or perhaps you can elaborate
01:03:34.820 | on all the fun things it's involved with.
01:03:37.020 | What's your advice
01:03:37.860 | on how does one build a successful AI startup?
01:03:42.380 | - In Silicon Valley, a lot of startup failures
01:03:44.980 | come from building a product that no one wanted.
01:03:48.500 | So when, cool technology, but who's gonna use it?
01:03:53.460 | So I think I tend to be very outcome driven
01:03:57.700 | and customer obsessed.
01:03:59.500 | Ultimately, we don't get to vote if we succeed or fail.
01:04:04.140 | It's only the customer that they're the only one
01:04:07.020 | that gets a thumbs up or thumbs down votes in the long term.
01:04:09.620 | In the short term, there are various people
01:04:12.140 | that get various votes,
01:04:13.100 | but in the long term, that's what really matters.
01:04:16.340 | - So as you build a startup,
01:04:17.500 | you have to constantly ask the question,
01:04:19.820 | will the customer give a thumbs up on this?
01:04:24.260 | - I think so.
01:04:25.100 | I think startups that are very customer focused,
01:04:27.460 | customer obsessed, deeply understand the customer
01:04:30.420 | and are oriented to serve the customer
01:04:33.340 | are more likely to succeed.
01:04:36.580 | With the provisional,
01:04:37.420 | I think all of us should only do things
01:04:39.100 | that we think create social good
01:04:40.940 | and moves the world forward.
01:04:42.340 | So I personally don't wanna build
01:04:44.500 | addictive digital products just to sell a lot of ads.
01:04:47.900 | There are things that could be lucrative that I won't do,
01:04:50.740 | but if we can find ways to serve people in meaningful ways,
01:04:55.340 | I think those can be great things to do,
01:04:59.060 | either in the academic setting or in a corporate setting
01:05:01.580 | or a startup setting.
01:05:03.100 | - So can you give me the idea of why you started the AI fund?
01:05:08.740 | - I remember when I was leading the AI group at Baidu,
01:05:13.340 | I had two jobs, two parts of my job.
01:05:15.980 | One was to build an AI engine
01:05:17.460 | to support the existing businesses.
01:05:19.140 | And that was running, just ran, just performed by itself.
01:05:23.340 | The second part of my job at the time,
01:05:24.860 | which was to try to systematically initiate
01:05:27.340 | new lines of businesses
01:05:29.100 | using the company's AI capabilities.
01:05:31.140 | So the self-driving car team came out of my group,
01:05:34.500 | the smart speaker team,
01:05:37.140 | similar to what is Amazon Echo or Alexa in the US,
01:05:41.020 | but we actually announced it before Amazon did.
01:05:42.860 | So Baidu wasn't following Amazon.
01:05:46.980 | That came out of my group,
01:05:48.780 | and I found that to be actually the most fun part of my job.
01:05:53.380 | So what I wanted to do was to build AI fund
01:05:56.500 | as a startup studio
01:05:58.300 | to systematically create new startups from scratch.
01:06:02.700 | With all the things we can now do with AI,
01:06:04.940 | I think the ability to build new teams,
01:06:07.380 | to go after this rich space of opportunities
01:06:10.100 | is a very important way to,
01:06:12.340 | very important mechanism to get these projects done
01:06:14.980 | that I think will move the world forward.
01:06:16.580 | So I've been fortunate to have built a few teams
01:06:19.340 | that had a meaningful, positive impact.
01:06:21.660 | And I felt that we might be able to do this
01:06:25.140 | in a more systematic, repeatable way.
01:06:27.980 | So a startup studio is a relatively new concept.
01:06:31.540 | There are maybe dozens of startup studios right now,
01:06:35.780 | but I feel like all of us,
01:06:38.740 | many teams are still trying to figure out
01:06:40.940 | how do you systematically build companies
01:06:43.780 | with a high success rate?
01:06:45.460 | So I think even a lot of my venture capital friends
01:06:49.140 | are seem to be more and more building companies
01:06:51.740 | rather than investing in companies.
01:06:53.140 | But I find it a fascinating thing to do
01:06:55.340 | to figure out the mechanisms
01:06:56.700 | by which we could systematically build successful teams,
01:06:59.700 | successful businesses in areas that we find meaningful.
01:07:03.420 | - So a startup studio is something,
01:07:05.820 | is a place and a mechanism for startups
01:07:09.260 | to go from zero to success,
01:07:11.220 | to try to develop a blueprint.
01:07:13.740 | - It's actually a place for us
01:07:14.820 | to build startups from scratch.
01:07:16.540 | So we often bring in founders and work with them,
01:07:21.300 | or maybe even have existing ideas
01:07:23.900 | that we match founders with.
01:07:26.620 | And then this launches, hopefully,
01:07:29.660 | into successful companies.
01:07:31.140 | - So how close are you to figuring out a way
01:07:34.580 | to automate the process of starting from scratch
01:07:38.460 | and building successful AI startup?
01:07:40.540 | - Yeah, I think we've been constantly improving
01:07:44.460 | and iterating on our processes, how we do that.
01:07:48.460 | So things like, how many customer calls do we need to make
01:07:51.340 | in order to get customer validation?
01:07:53.500 | How do we make sure this technology can be built?
01:07:55.260 | Quite a lot of our businesses need cutting edge
01:07:58.060 | machine learning algorithms.
01:07:59.140 | So kind of algorithms that are developed
01:08:00.660 | in the last one or two years.
01:08:02.580 | And even if it works in a research paper,
01:08:04.980 | it turns out taking the production is really hard.
01:08:07.020 | There are a lot of issues for making these things work
01:08:09.500 | in the real life that are not widely addressed in academia.
01:08:14.100 | So how do we validate that this is actually doable?
01:08:17.220 | How do you build a team,
01:08:18.300 | get the specialized domain knowledge,
01:08:19.940 | be it in education or healthcare,
01:08:21.500 | or whatever sector we're focusing on?
01:08:23.140 | So I think we've actually getting,
01:08:24.460 | we've been getting much better at giving the entrepreneurs
01:08:29.180 | a high success rate, but I think we're still,
01:08:32.140 | I think the whole world is still
01:08:33.820 | in the early phases of figuring this out.
01:08:35.500 | - But do you think there is some aspects of that process
01:08:39.180 | that are transferable from one startup to another,
01:08:41.620 | to another, to another?
01:08:43.220 | - Yeah, very much so.
01:08:45.020 | You know, starting a company to most entrepreneurs
01:08:47.700 | is a really lonely thing.
01:08:50.700 | And I've seen so many entrepreneurs not know
01:08:54.500 | how to make certain decisions.
01:08:56.300 | Like, when do you need to, how do you do B2B sales?
01:09:00.100 | If you don't know that, it's really hard.
01:09:02.380 | Or how do you market this efficiently
01:09:05.540 | other than buying ads, which is really expensive?
01:09:08.420 | Are there more efficient tactics for that?
01:09:10.060 | Or for a machine learning project,
01:09:13.060 | basic decisions can change the course
01:09:15.300 | of whether a machine learning product works or not.
01:09:18.460 | And so there are so many hundreds of decisions
01:09:21.100 | that entrepreneurs need to make,
01:09:22.700 | and making a mistake in a couple of key decisions
01:09:25.780 | can have a huge impact on the fate of the company.
01:09:30.260 | So I think a startup studio provides a support structure
01:09:33.100 | that makes starting a company
01:09:34.420 | much less of a lonely experience.
01:09:36.300 | And also when facing with these key decisions,
01:09:40.020 | like trying to hire your first VP of engineering,
01:09:44.820 | what's a good selection criteria?
01:09:46.460 | How do you source?
01:09:47.300 | Should I hire this person or not?
01:09:48.820 | By having an ecosystem around the entrepreneurs,
01:09:53.140 | the founders, to help,
01:09:54.700 | I think we help them at the key moments
01:09:57.460 | and hopefully significantly make them more enjoyable
01:10:01.020 | and then higher success rate.
01:10:02.540 | - So they have somebody to brainstorm with
01:10:04.700 | in these very difficult decision points.
01:10:08.020 | - And also to help them recognize
01:10:10.980 | what they may not even realize is a key decision point.
01:10:14.340 | Right?
01:10:15.180 | That's the first and probably the most important part.
01:10:17.300 | Yeah.
01:10:18.220 | - Actually, I can say one other thing.
01:10:20.100 | I think building companies is one thing,
01:10:23.900 | but I feel like it's really important
01:10:26.460 | that we build companies that move the world forward.
01:10:30.100 | For example, within the AI fund team,
01:10:32.580 | there was once an idea for a new company
01:10:35.620 | that if it had succeeded,
01:10:37.460 | would have resulted in people watching a lot more videos
01:10:40.220 | in a certain narrow vertical type of video.
01:10:43.340 | And I looked at it, the business case was fine,
01:10:45.700 | the revenue case was fine,
01:10:46.820 | but I looked at it and just said,
01:10:48.300 | "I don't want to do this.
01:10:50.020 | I don't actually just want to have a lot more people
01:10:52.500 | watch this type of video."
01:10:53.860 | Wasn't educational.
01:10:54.740 | It was educational, maybe.
01:10:56.340 | And so I cut the idea on the basis
01:10:59.980 | that I didn't think it would actually help people.
01:11:01.980 | So whether building companies or work of enterprises
01:11:05.460 | or doing personal projects,
01:11:06.700 | I think it's up to each of us to figure out
01:11:11.020 | what's the difference we want to make in the world.
01:11:14.140 | - With Learning AI,
01:11:15.340 | you help already established companies
01:11:17.100 | grow their AI and machine learning efforts.
01:11:20.220 | How does a large company integrate machine learning
01:11:22.980 | into their efforts?
01:11:24.020 | - AI is a general purpose technology,
01:11:27.700 | and I think it would transform every industry.
01:11:30.540 | Our community has already transformed to a large extent
01:11:33.820 | to software internet sector.
01:11:35.500 | Most software internet companies,
01:11:36.860 | they're outside the top, right?
01:11:38.180 | Five or six or three or four,
01:11:40.100 | already have reasonable machine learning capabilities
01:11:43.340 | or are getting there.
01:11:44.220 | It's still room for improvement.
01:11:46.380 | But when I look outside the software internet sector,
01:11:49.220 | everything from manufacturing, agriculture,
01:11:51.620 | healthcare, logistics, transportation,
01:11:53.940 | there's so many opportunities
01:11:55.540 | that very few people are working on.
01:11:57.980 | So I think the next wave for AI
01:11:59.780 | is for us to also transform all of those other industries.
01:12:03.420 | There was a McKinsey study
01:12:04.620 | estimating $13 trillion of global economic growth.
01:12:09.700 | US GDP is $19 trillion,
01:12:11.780 | so 13 trillion is a big number,
01:12:13.340 | or PWC estimate $16 trillion.
01:12:16.060 | So whatever number is, it's large.
01:12:18.420 | But the interesting thing to me was a lot of that impact
01:12:20.780 | would be outside the software internet sector.
01:12:23.740 | So we need more teams to work with these companies
01:12:28.060 | to help them adopt AI.
01:12:29.780 | And I think this is one of the things
01:12:30.900 | that'll help drive global economic growth
01:12:33.700 | and make humanity more powerful.
01:12:35.980 | - And like you said, the impact is there.
01:12:37.900 | So what are the best industries,
01:12:39.580 | the biggest industries where AI can help,
01:12:41.780 | perhaps outside the software tech sector?
01:12:44.500 | - Frankly, I think it's all of them.
01:12:46.300 | Some of the ones I'm spending a lot of time on
01:12:49.940 | are manufacturing, agriculture, looking into healthcare.
01:12:54.620 | For example, in manufacturing,
01:12:56.580 | we do a lot of work in visual inspection,
01:12:58.740 | where today there are people standing around
01:13:01.460 | using the eye, human eye,
01:13:02.940 | to check if this plastic part or the smartphone
01:13:05.820 | or this thing has a scratch or a dent or something in it.
01:13:09.460 | We can use a camera to take a picture,
01:13:12.500 | use a algorithm, deep learning and other things
01:13:15.500 | to check if it's defective or not,
01:13:17.900 | and thus help factories improve yield
01:13:20.540 | and improve quality and improve throughput.
01:13:23.660 | It turns out the practical problems we run into
01:13:25.820 | are very different than the ones you might read about
01:13:28.180 | in most research papers.
01:13:29.620 | The data sets are really small,
01:13:30.820 | so we face small data problems.
01:13:32.500 | You know, the factories keep on changing the environment,
01:13:35.860 | so it works well on your test set,
01:13:38.380 | but guess what?
01:13:39.420 | You know, something changes in the factory.
01:13:42.060 | The lights go on or off.
01:13:43.580 | Recently, there was a factory
01:13:45.180 | in which a bird flew through the factory
01:13:47.900 | and pooped on something,
01:13:48.940 | and so that changed stuff.
01:13:50.860 | And so increasing our algorithmic robustness,
01:13:54.300 | so all the changes happen in the factory,
01:13:57.100 | I find that we run into a lot of practical problems
01:13:59.300 | that are not as widely discussed in academia,
01:14:02.660 | and it's really fun kind of being on the cutting edge,
01:14:05.180 | solving these problems before, you know,
01:14:07.500 | maybe before many people are even aware
01:14:09.380 | that there is a problem there.
01:14:10.460 | - And that's such a fascinating space.
01:14:12.340 | You're absolutely right,
01:14:13.300 | but what is the first step that a company should take?
01:14:16.660 | It's just a scary leap into this new world
01:14:19.460 | of going from the human eye inspecting
01:14:22.580 | to digitizing that process,
01:14:24.780 | having a camera, having an algorithm.
01:14:27.340 | What's the first step?
01:14:28.340 | Like, what's the early journey that you recommend
01:14:31.260 | that you see these companies taking?
01:14:33.580 | - I published a document
01:14:34.620 | called the "AI Transformation Playbook."
01:14:37.220 | I found this online and taught briefly
01:14:39.260 | in the AI for Everyone course on Coursera
01:14:41.620 | about the long-term journey that companies should take,
01:14:44.780 | but the first step is actually to start small.
01:14:47.580 | I've seen a lot more companies fail
01:14:49.500 | by starting too big than by starting too small.
01:14:53.340 | Take even Google.
01:14:54.700 | You know, most people don't realize how hard it was
01:14:57.620 | and how controversial it was in the early days.
01:15:00.620 | So when I started Google Brain, it was controversial.
01:15:04.460 | People thought deep learning,
01:15:06.180 | tried it, didn't work.
01:15:07.460 | Why would you want to do deep learning?
01:15:09.300 | So my first internal customer within Google
01:15:12.500 | was the Google speech team,
01:15:14.060 | which is not the most lucrative project in Google,
01:15:17.220 | not the most important.
01:15:18.420 | It's not web search or advertising,
01:15:20.740 | but by starting small,
01:15:22.860 | my team helped the speech team
01:15:25.980 | build a more accurate speech recognition system.
01:15:28.380 | And this caused their peers, other teams,
01:15:30.820 | to start to have more faith in deep learning.
01:15:33.060 | My second internal customer was the Google Maps team,
01:15:36.500 | where we used computer vision to read house numbers
01:15:39.660 | from basic street view images
01:15:41.140 | to more accurately locate houses within Google Maps,
01:15:43.740 | so improve the quality of the geodata.
01:15:45.940 | And it was only after those two successes
01:15:48.380 | that I then started a more serious conversation
01:15:50.660 | with the Google Ads team.
01:15:52.780 | - And so there's a ripple effect
01:15:54.220 | that you showed that it works in these cases,
01:15:56.900 | and then it just propagates through the entire company,
01:15:59.300 | that this thing has a lot of value and use for us.
01:16:02.980 | - I think the early small-scale projects,
01:16:05.300 | it helps the teams gain faith,
01:16:07.420 | but also helps the teams learn what these technologies do.
01:16:11.740 | I still remember when our first GPU server,
01:16:14.540 | it was a server under some guy's desk.
01:16:17.020 | And that taught us early important lessons
01:16:20.580 | about how do you have multiple users share a set of GPUs,
01:16:25.220 | which was really not obvious at the time,
01:16:27.180 | but those early lessons were important.
01:16:29.420 | We learned a lot from that first GPU server
01:16:32.140 | that later helped the teams think through
01:16:34.100 | how to scale it up to much larger deployments.
01:16:37.540 | - Are there concrete challenges that companies face
01:16:40.260 | that you see is important for them to solve?
01:16:43.900 | - I think building and deploying
01:16:45.300 | machine learning systems is hard.
01:16:47.340 | There's a huge gulf between something that works
01:16:49.820 | in a Jupyter notebook on your laptop
01:16:51.820 | versus something that runs
01:16:53.060 | in a production deployment setting
01:16:54.620 | in a factory or agriculture plant or whatever.
01:16:58.420 | So I see a lot of people get something to work
01:17:00.660 | on your laptop and say,
01:17:01.500 | "Oh, look what I've done."
01:17:02.340 | And that's great, that's hard.
01:17:03.980 | That's a very important first step,
01:17:05.860 | but a lot of teams underestimate the rest of the steps needed.
01:17:09.580 | So for example, I've heard this exact same conversation
01:17:12.500 | between a lot of machine learning people and business people.
01:17:15.220 | The machine learning person says,
01:17:16.860 | "Look, my algorithm does well on the test set.
01:17:20.900 | It's a clean test set, I didn't peak."
01:17:23.500 | And the business person says, "Thank you very much,
01:17:26.700 | but your algorithm sucks, it doesn't work."
01:17:29.380 | And the machine learning person says,
01:17:30.900 | "No, wait, I did well on the test set."
01:17:33.660 | And I think there is a gulf between what it takes
01:17:38.420 | to do well on a test set on your hard drive
01:17:40.660 | versus what it takes to work well in a deployment setting.
01:17:44.220 | Some common problems, robustness and generalization.
01:17:48.780 | You deploy something in the factory,
01:17:50.620 | maybe they chop down a tree outside the factory
01:17:52.660 | so the tree no longer covers a window
01:17:55.460 | and the lighting is different.
01:17:56.500 | So the test set changes.
01:17:57.820 | And in machine learning, and especially in academia,
01:18:01.300 | we don't know how to deal with test set distributions
01:18:04.060 | that are dramatically different
01:18:05.580 | than the training set distribution.
01:18:07.460 | You know, there's research,
01:18:08.660 | there's stuff like domain annotation, transfer learning.
01:18:12.220 | You know, there are people working on it,
01:18:13.820 | but we're really not good at this.
01:18:15.580 | So how do you actually get this to work?
01:18:18.180 | Because your test set distribution is going to change.
01:18:21.700 | And I think also, if you look at the number of lines of code
01:18:25.780 | in the software system, the machine learning model
01:18:28.660 | is maybe 5% or even fewer relative
01:18:32.740 | to the entire software system you need to build.
01:18:35.540 | So how do you get all that work done
01:18:37.260 | and make it reliable and systematic?
01:18:38.940 | - So good software engineering work is fundamental here
01:18:42.700 | to building a successful small machine learning system.
01:18:46.380 | - Yes, and the software system needs to interface
01:18:49.340 | with people's workloads.
01:18:50.620 | So machine learning is automation on steroids.
01:18:54.060 | If we take one task out of many tasks
01:18:56.420 | that are done in the factory,
01:18:57.260 | so the factory does lots of things.
01:18:58.900 | One task is visual inspection.
01:19:00.820 | If we automate that one task, it can be really valuable,
01:19:03.980 | but you may need to redesign a lot of other tasks
01:19:06.180 | around that one task.
01:19:07.380 | For example, say the machine learning algorithm
01:19:09.860 | says this is defective.
01:19:11.020 | What are you supposed to do?
01:19:11.940 | Do you throw it away?
01:19:12.780 | Do you get a human to double check?
01:19:14.220 | Do you want to rework it or fix it?
01:19:17.020 | So you need to redesign a lot of tasks
01:19:18.620 | around that thing you've now automated.
01:19:20.780 | So planning for the change management
01:19:23.340 | and making sure that the software you write
01:19:25.620 | is consistent with the new workflow.
01:19:27.460 | And you take the time to explain to people
01:19:28.980 | what needs to happen.
01:19:29.820 | So I think what landing AI has become good at,
01:19:34.820 | and then I think we learn by making missteps
01:19:37.300 | and painful experiences.
01:19:39.100 | What we've become good at is working with our partners
01:19:43.740 | to think through all the things beyond
01:19:46.500 | just the machine learning model,
01:19:48.220 | that you put in the notebook,
01:19:49.100 | but to build the entire system,
01:19:51.580 | manage the change process and figure out how to deploy this
01:19:54.380 | in a way that has an actual impact.
01:19:56.860 | The processes that the large software tech companies use
01:19:59.980 | for deploying don't work for a lot of other scenarios.
01:20:03.020 | For example, when I was leading large speech teams,
01:20:07.060 | if the speech recognition system goes down, what happens?
01:20:09.820 | Well, alarms goes off and then someone like me would say,
01:20:12.220 | "Hey, you 20 engineers, please fix this."
01:20:15.060 | Right?
01:20:15.900 | And then we're good.
01:20:16.740 | But if you have a system go down in the factory,
01:20:19.380 | there are not 20 machine learning engineers
01:20:21.460 | sitting around, you can page a duty and have them fix it.
01:20:23.940 | So how do you deal with the maintenance or the DevOps
01:20:27.460 | or the MLOps or the other aspects of this?
01:20:30.340 | So these are concepts that I think landing AI
01:20:34.060 | and a few other teams are on the cutting edge of,
01:20:36.580 | but we don't even have systematic terminology yet
01:20:39.620 | to describe some of the stuff we do
01:20:41.060 | because I think we're inventing it on the fly.
01:20:43.380 | - So you mentioned some people are interested
01:20:46.700 | in discovering mathematical beauty and truth
01:20:48.820 | in the universe,
01:20:49.700 | and you're interested in having a big positive impact
01:20:53.900 | in the world.
01:20:55.140 | So let me ask--
01:20:55.980 | - The two are not inconsistent.
01:20:57.340 | - No, they're all together.
01:20:58.820 | I'm only half joking
01:21:00.980 | 'cause you're probably interested a little bit in both.
01:21:03.580 | But let me ask a romanticized question.
01:21:06.140 | So much of the work, your work and our discussion today
01:21:09.580 | has been on applied AI.
01:21:12.100 | Maybe you can even call narrow AI,
01:21:14.620 | where the goal is to create systems
01:21:15.900 | that automate some specific process
01:21:17.560 | that adds a lot of value to the world.
01:21:19.820 | But there's another branch of AI,
01:21:21.260 | starting with Alan Turing,
01:21:22.940 | that kind of dreams of creating human level
01:21:25.660 | or superhuman level intelligence.
01:21:27.500 | Is this something you dream of as well?
01:21:30.500 | Do you think we human beings will ever build
01:21:33.300 | a human level intelligence
01:21:34.620 | or superhuman level intelligence system?
01:21:37.300 | - I would love to get to AGI,
01:21:38.820 | and I think humanity will,
01:21:40.980 | but whether it takes a hundred years or 500 or 5,000,
01:21:45.140 | I find hard to estimate.
01:21:48.040 | - Do you have,
01:21:49.560 | so some folks have worries about the different trajectories
01:21:53.260 | that path would take,
01:21:54.480 | even existential threats of an AGI system.
01:21:57.560 | Do you have such concerns,
01:21:59.740 | whether in the short term or the long term?
01:22:02.380 | - I do worry about the long-term fate of humanity.
01:22:06.920 | I do wonder as well,
01:22:09.400 | I do worry about overpopulation on the planet Mars,
01:22:13.320 | just not today.
01:22:14.360 | I think there will be a day
01:22:16.040 | when maybe someday in the future,
01:22:18.620 | Mars will be polluted,
01:22:20.180 | there are all these children dying,
01:22:21.680 | and someone will look back at this video and say,
01:22:23.260 | "Andrew, how was Andrew so heartless?
01:22:25.060 | He didn't care about all these children
01:22:26.660 | dying on the planet Mars."
01:22:28.260 | And I apologize to the future viewer,
01:22:30.380 | I do care about the children,
01:22:32.040 | but I just don't know how to productively work on that today.
01:22:35.700 | - Your picture will be in the dictionary
01:22:37.980 | for the people who are ignorant
01:22:39.300 | about the overpopulation on Mars.
01:22:41.700 | Yes, so it's a long-term problem.
01:22:44.460 | Is there something in the short term
01:22:45.940 | we should be thinking about
01:22:47.700 | in terms of aligning the values of our AI systems
01:22:50.620 | with the values of us humans?
01:22:54.420 | Sort of something that Stuart Russell
01:22:56.580 | and other folks are thinking about
01:22:58.220 | as this system develops more and more,
01:23:00.660 | we want to make sure that it represents
01:23:03.380 | the better angels of our nature,
01:23:05.140 | the ethics, the values of our society.
01:23:08.720 | - You know, if you take self-driving cars,
01:23:12.940 | the biggest problem with self-driving cars
01:23:14.340 | is not that there's some trolley dilemma,
01:23:17.940 | and you teach this, so you know,
01:23:19.380 | how many times when you are driving your car,
01:23:21.900 | did you face this moral dilemma?
01:23:23.500 | Who do I crash into?
01:23:25.900 | So I think self-driving cars will run into that problem
01:23:28.380 | roughly as often as we do when we drive our cars.
01:23:31.220 | The biggest problem with self-driving cars
01:23:33.460 | is when there's a big white truck across the road,
01:23:35.860 | and what you should do is brake and not crash into it,
01:23:38.300 | and the self-driving car fails and it crashes into it.
01:23:41.260 | So I think we need to solve that problem first.
01:23:43.620 | I think the problem with some of these discussions
01:23:45.740 | about AGI, you know, alignment, the paperclip problem,
01:23:50.740 | is that is a huge distraction from the much harder problems
01:23:56.340 | that we actually need to address today.
01:23:58.940 | Some of the hard problems we need to address today,
01:24:00.620 | I think bias is a huge issue.
01:24:03.780 | I worry about wealth inequality.
01:24:05.620 | AI and internet are causing an acceleration
01:24:09.340 | of concentration of power,
01:24:10.820 | because we can now centralize data, use AI to process it.
01:24:14.420 | And so industry after industry,
01:24:15.900 | we've affected every industry.
01:24:17.620 | So the internet industry has a lot of win-and-take modes
01:24:20.460 | or win-and-take-all dynamics,
01:24:22.140 | but we've infected all these other industries.
01:24:24.380 | So we're also giving these other industries
01:24:26.260 | win-and-take modes or win-and-take-all flavors.
01:24:28.740 | So look at what Uber and Lyft did to the taxi industry.
01:24:32.580 | So we're doing this type of thing to a lot of...
01:24:34.180 | So we're creating tremendous wealth,
01:24:36.500 | but how do we make sure that the wealth is fairly shared?
01:24:39.980 | I think that...
01:24:41.220 | And then how do we help people whose jobs are displaced?
01:24:45.300 | I think education is part of it.
01:24:47.020 | There may be even more that we need to do than education.
01:24:50.620 | I think bias is a serious issue.
01:24:55.100 | There are adverse uses of AI,
01:24:57.180 | like deepfakes being used for various nefarious purposes.
01:25:00.580 | So I worry about some teams, maybe accidentally,
01:25:05.580 | and I hope not deliberately,
01:25:07.860 | making a lot of noise about things,
01:25:11.060 | problems in the distant future,
01:25:13.180 | rather than focusing on some of the much harder problems.
01:25:15.860 | - Yeah, they overshadow the problems
01:25:17.700 | that we have already today.
01:25:18.820 | They're exceptionally challenging, like those you said,
01:25:21.300 | and even the silly ones,
01:25:22.620 | but the ones that have a huge impact,
01:25:24.620 | which is the lighting variation
01:25:26.020 | outside of your factory window.
01:25:28.020 | That ultimately is what makes the difference
01:25:31.380 | between, like you said, the Jupiter notebook
01:25:33.220 | and something that actually transforms
01:25:35.500 | an entire industry, potentially.
01:25:37.500 | - Yeah, and I think, and just to,
01:25:40.060 | some companies, when a regulator comes to you and says,
01:25:43.220 | "Look, your product is messing things up.
01:25:45.860 | "Fixing it may have a revenue impact."
01:25:47.860 | Well, it's much more fun to talk to them
01:25:49.460 | about how you promised not to wipe out humanity
01:25:51.860 | than to face the actually really hard problems we face.
01:25:54.820 | - So your life has been a great journey,
01:25:57.620 | from teaching to research to entrepreneurship.
01:26:00.700 | Two questions.
01:26:01.940 | One, are there regrets,
01:26:04.140 | moments that if you went back, you would do differently?
01:26:07.100 | And two, are there moments you're especially proud of,
01:26:10.820 | moments that made you truly happy?
01:26:13.260 | - You know, I've made so many mistakes.
01:26:15.740 | It feels like every time I discover something,
01:26:20.260 | I go, "Why didn't I think of this five years earlier
01:26:25.620 | "or even 10 years earlier?"
01:26:27.300 | And sometimes I read a book and I go,
01:26:32.940 | "I wish I read this book 10 years ago.
01:26:34.900 | "My life would have been so different."
01:26:36.500 | Although that happened recently.
01:26:37.700 | And then I was thinking,
01:26:38.860 | "If only I read this book when we're starting up Coursera,
01:26:41.660 | "it could have been so much better."
01:26:43.900 | But I discovered the book had not yet been written
01:26:45.820 | when we were starting Coursera,
01:26:46.740 | so that made me feel better.
01:26:48.140 | But I find that the process of discovery,
01:26:53.220 | we keep on finding out things
01:26:54.580 | that seem so obvious in hindsight,
01:26:56.740 | but it always takes us so much longer
01:26:59.460 | than I wish to figure it out.
01:27:02.180 | - So on the second question,
01:27:06.380 | are there moments in your life that,
01:27:09.300 | if you look back, that you're especially proud of
01:27:12.460 | or you're especially happy,
01:27:14.180 | that filled you with happiness and fulfillment?
01:27:18.100 | - Well, two answers.
01:27:19.940 | One, does my daughter know of her?
01:27:21.420 | - Yes, of course.
01:27:22.380 | - She's like, "No matter how much time I spend with her,
01:27:23.860 | "I just can't spend enough time with her."
01:27:25.420 | - Congratulations, by the way.
01:27:26.540 | - Thank you.
01:27:27.540 | And then second is helping other people.
01:27:29.580 | I think, to me, I think the meaning of life
01:27:32.260 | is helping others achieve whatever are their dreams.
01:27:36.860 | And then also to try to move the world forward
01:27:40.140 | by making humanity more powerful as a whole.
01:27:43.700 | So the times that I felt most happy and most proud
01:27:46.540 | was when I felt someone else allowed me the good fortune
01:27:51.540 | of helping them a little bit on the path to their dreams.
01:27:56.140 | - I think there's no better way to end it
01:27:58.660 | than talking about happiness and the meaning of life.
01:28:00.860 | So Andrew, it's a huge honor.
01:28:03.100 | Me and millions of people thank you
01:28:04.580 | for all the work you've done.
01:28:05.820 | Thank you for talking today.
01:28:07.020 | - Thank you so much, thanks.
01:28:08.820 | - Thanks for listening to this conversation with Andrew Ng.
01:28:11.740 | And thank you to our presenting sponsor, Cash App.
01:28:14.740 | Download it, use code LEXPODCAST.
01:28:17.300 | You'll get $10 and $10 will go to FIRST,
01:28:20.220 | an organization that inspires and educates young minds
01:28:23.340 | to become science and technology innovators of tomorrow.
01:28:26.580 | If you enjoy this podcast, subscribe on YouTube,
01:28:29.540 | give it five stars on Apple Podcast,
01:28:31.580 | support on Patreon, or simply connect with me on Twitter
01:28:35.060 | at Lex Friedman.
01:28:36.180 | And now let me leave you with some words of wisdom
01:28:39.260 | from Andrew Ng.
01:28:40.300 | Ask yourself if what you're working on
01:28:43.900 | succeeds beyond your wildest dreams,
01:28:46.340 | would you have significantly helped other people?
01:28:48.840 | If not, then keep searching for something else to work on.
01:28:53.220 | Otherwise, you're not living up to your full potential.
01:28:57.900 | Thank you for listening and hope to see you next time.
01:29:01.020 | (upbeat music)
01:29:03.600 | (upbeat music)
01:29:06.180 | [BLANK_AUDIO]