back to indexJeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
Chapters
0:0
0:1 Jeremy Howard
1:17 What's the First Program You'Ve Ever Ridden
3:9 Programming Languages
4:36 The Connection between Excel and Access
9:24 Array Oriented Languages
23:36 The Origin Story of Fast Ai
40:57 The Difference between Theory and Practice of Deep Learning
41:51 Transfer Learning
59:28 Super Convergence
62:8 The Future of Learning Rate Magic
66:16 Different Cloud Options for Training
69:13 Deep Learning Frameworks
92:52 What Is Space Repetition
93:56 Spaced Repetition Learning
97:59 Advice for People Learning New Things
100:6 Next Big Breakthrough in Artificial Intelligence
00:00:00.000 |
The following is a conversation with Jeremy Howard. 00:00:03.160 |
He's the founder of Fast AI, a research institute 00:00:06.480 |
dedicated to making deep learning more accessible. 00:00:18.800 |
And in general, he's a successful entrepreneur, 00:00:21.720 |
educator, researcher, and an inspiring personality 00:00:27.040 |
When someone asks me, how do I get started with deep learning? 00:00:30.240 |
Fast AI is one of the top places I point them to. 00:00:41.000 |
They can sometimes dilute the value of educational content 00:00:46.760 |
Fast AI has a focus on practical application of deep learning 00:00:52.840 |
that is incredibly both accessible to beginners 00:01:03.840 |
give it five stars on iTunes, support it on Patreon, 00:01:13.360 |
And now, here's my conversation with Jeremy Howard. 00:01:26.720 |
I did an assignment where I decided to try to find out 00:01:40.640 |
So I wrote a program on my Commodore 64 in BASIC 00:01:53.560 |
- Like you want an actual exactly three to two ratio, 00:02:01.520 |
So that's well-tempered, as they say in the-- 00:02:10.480 |
- I did music all my life, so I played saxophone 00:02:14.080 |
and clarinet and piano and guitar and drums and whatever, so. 00:02:26.160 |
For various reasons, couldn't really keep it going, 00:02:30.200 |
particularly 'cause I had a lot of problems with RSI, 00:02:32.600 |
with my fingers, and so I had to kind of like 00:02:34.760 |
cut back anything that used hands and fingers. 00:02:38.320 |
I hope one day I'll be able to get back to it health-wise. 00:02:43.920 |
- So there's a love for music underlying it all? 00:02:52.880 |
Well, probably bass saxophone, but they're awkward. 00:03:01.720 |
There's something about a brain that utilizes those 00:03:07.560 |
So you've used and studied quite a few programming languages. 00:03:11.240 |
Can you give an overview of what you've used? 00:03:30.720 |
but the programming environment was fantastic. 00:03:33.080 |
It's like the ability to create user interfaces 00:03:38.080 |
and tie data and actions to them and create reports 00:03:42.520 |
and all that, I've never seen anything as good. 00:03:56.160 |
but unfortunately nobody's ever achieved anything like that. 00:04:06.280 |
- It was a database program that Microsoft produced, 00:04:09.640 |
part of Office, and it kind of withered, you know, 00:04:13.440 |
but basically it lets you in a totally graphical way 00:04:31.480 |
but for like useful little applications that I loved. 00:04:36.360 |
- So what's the connection between Excel and Access? 00:04:42.120 |
So Access kind of was the relational database equivalent, 00:04:55.360 |
So, but it's just not as rich a programming model 00:05:04.640 |
And so I've always loved relational databases, 00:05:07.320 |
but today programming on top of a relational database 00:05:13.520 |
You know, you generally either need to kind of, 00:05:19.920 |
unless you use SQLite, which has its own issues. 00:05:27.600 |
you'll need to like create an, add an ORM on top. 00:05:34.360 |
and it's just a lot more awkward than it should be. 00:05:36.960 |
There are people that are trying to make it easier. 00:05:39.200 |
So in particular, I think of F#, you know, Don Syme, 00:05:51.600 |
So you actually get like tab completion for fields 00:06:01.480 |
I guess was a starting point, which I still miss. 00:06:07.800 |
- That's interesting just to pause on that for a second. 00:06:09.880 |
It's interesting that you're connecting programming languages 00:06:20.560 |
you always had a love and a connection with data. 00:06:24.840 |
- I've always been interested in doing useful things 00:06:31.840 |
and doing something with it and putting it out there again. 00:06:38.360 |
So I also did a lot of stuff with AppleScript 00:06:42.960 |
So it's kind of nice being able to get the computer 00:06:57.840 |
then would have been Delphi, which was Object Pascal, 00:07:11.040 |
Delphi was amazing 'cause it was like a compiled, 00:07:14.840 |
fast language that was as easy to use as Visual Basic. 00:07:19.840 |
- Delphi, what is it similar to in more modern languages? 00:07:32.280 |
So I'm not sure there's anything quite like it anymore. 00:07:59.320 |
is that we'll hopefully get back to where Delphi was. 00:08:02.800 |
There is actually a free Pascal project nowadays 00:08:09.320 |
which is also attempting to kind of recreate Delphi. 00:08:18.520 |
that's one of your favorite programming languages. 00:09:06.120 |
which I guess kind of Lisp and Scheme and whatever, 00:09:13.360 |
The second was the kind of imperative slash OO, 00:09:26.880 |
which started with a paper by a guy called Ken Iverson, 00:09:37.480 |
It was called "Notation as a Tool for Thought." 00:09:41.480 |
And it was the development of a new type of math notation. 00:09:51.320 |
and also well-defined than traditional math notation, 00:09:57.680 |
And so he actually turned that into a programming language. 00:10:04.120 |
or the, sorry, late '50s, all the names were available. 00:10:06.720 |
So he called his language a programming language, or APL. 00:10:15.320 |
as a tool for thought, by which he means math notation. 00:10:18.280 |
And Ken and his son went on to do many things, 00:10:29.600 |
And J is the most expressive, composable language of, 00:10:35.560 |
beautifully designed language I've ever seen. 00:10:44.520 |
Does it have that kind of thing, or is it more like-- 00:10:45.360 |
- Not really, it's an array-oriented language. 00:10:57.560 |
don't use any loops, but the whole thing is done 00:11:01.000 |
with kind of a extreme version of broadcasting, 00:11:06.000 |
if you're familiar with that NumPy/Python concept. 00:11:22.920 |
because you can do so much with one line of code, 00:11:27.760 |
you very rarely need more than that to express your program. 00:11:31.120 |
And so you can kind of keep it all in your head, 00:11:36.080 |
It's interesting, APL created two main branches, 00:11:41.640 |
J is this kind of like open-source niche community 00:11:52.160 |
It's an astonishingly expensive programming language, 00:12:06.680 |
it sits inside level three cache on your CPU, 00:12:09.360 |
and it easily wins every benchmark I've ever seen 00:12:22.720 |
But it's like this path of programming languages 00:12:30.360 |
than the ones that almost anybody uses every day. 00:12:38.360 |
- It's pretty heavily focused on computation. 00:12:45.640 |
So there's a lot of things you can do with it. 00:12:51.400 |
on making like user interface toolkits or whatever. 00:12:59.280 |
- At the same time, you've done a lot of stuff 00:13:21.200 |
I just wanna get stuff done and solve problems. 00:13:29.640 |
and Perl was great 'cause back in the late '90s, 00:13:32.800 |
early 2000s, it just had a lot of stuff it could do. 00:13:37.800 |
I still had to write my own monitoring system 00:13:45.720 |
but it was a super flexible language to do that in. 00:13:50.240 |
- And you used Perl for Fastmail, you used it as a backend? 00:14:04.840 |
where Python really takes over a lot of the same tasks? 00:14:22.240 |
And no project can be successful if there isn't, 00:14:30.560 |
with a strong leader that loses that strong leadership. 00:14:37.880 |
You know, Python is a lot less elegant language 00:14:42.880 |
in nearly every way, but it has the data science libraries 00:14:51.320 |
So I kind of use it 'cause it's the best we have, 00:15:01.840 |
- But what do you think the future of programming looks like? 00:15:04.080 |
What do you hope the future of programming looks like 00:15:11.880 |
- I hope Swift is successful because the goal of Swift, 00:15:21.040 |
is to be infinitely hackable, and that's what I want. 00:15:23.360 |
I want something where me and the people I do research with 00:15:26.960 |
and my students can look at and change everything 00:15:32.040 |
There's nothing mysterious and magical and inaccessible. 00:15:36.240 |
Unfortunately with Python, it's the opposite of that 00:15:38.600 |
because Python's so slow, it's extremely unhackable. 00:15:45.360 |
So your debugger doesn't work in the same way, 00:15:48.960 |
your build system doesn't work in the same way. 00:15:55.640 |
Is it for the objective of optimizing training 00:16:00.160 |
of neural networks, inference of neural networks? 00:16:04.360 |
or is there some non-performance related just-- 00:16:09.040 |
I mean, in the end, I wanna be productive as a practitioner. 00:16:16.320 |
our understanding of deep learning is incredibly primitive. 00:16:23.240 |
even though it works better than anything else out there. 00:16:26.160 |
There's so many opportunities to make it better. 00:16:28.640 |
So you look at any domain area, like, I don't know, 00:16:35.680 |
or natural language processing classification 00:16:39.400 |
Every time I look at an area with deep learning, 00:16:44.440 |
There's lots and lots of obviously stupid ways to do things 00:16:54.840 |
- You think the programming language has a role in that? 00:17:09.280 |
particularly around recurrent neural networks 00:17:16.840 |
The actual loop where we actually loop through words, 00:17:23.760 |
So we actually can't innovate with the kernel, 00:17:40.080 |
Another example, convolutional neural networks, 00:17:42.640 |
which are actually the most popular architecture 00:17:44.720 |
for lots of things, maybe most things in deep learning. 00:17:59.920 |
And yeah, just researchers and practitioners don't. 00:18:13.240 |
- So you think it's just too difficult to write in CUDA C 00:18:18.240 |
that a programming, like a higher level programming language 00:18:33.120 |
or with sparse convolutional neural networks? 00:18:48.440 |
is that we ignored that whole APL kind of direction, 00:18:52.640 |
almost nearly everybody did for 60 years, 50 years. 00:19:03.560 |
and kind of create some interesting new directions 00:19:07.280 |
So the place where that's particularly happening right now 00:19:18.040 |
And yeah, 'cause it's actually not gonna be Swift 00:19:22.120 |
because the problem is that currently writing 00:19:29.960 |
is too complicated regardless of what language you use. 00:19:33.800 |
And that's just because if you have to deal with the fact 00:19:47.000 |
it's just so much boilerplate that to do that well, 00:20:10.000 |
let's let people create like domain specific languages 00:20:16.840 |
These are the kinds of things we do generally on the GPU 00:20:27.800 |
A lot of this work is actually sitting on top 00:20:35.960 |
where they came up with such a domain specific language. 00:20:38.800 |
In fact, two, one domain specific language for expressing 00:20:43.760 |
And another domain specific language for expressing 00:20:46.280 |
this is the kind of the way I want you to structure 00:20:50.280 |
the compilation of that and like do it block by block 00:20:54.920 |
And they were able to show how you can compress 00:20:57.720 |
the amount of code by 10X compared to optimized GPU code 00:21:07.560 |
that kind of sitting on top of that kind of research 00:21:10.520 |
and MLIR is pulling a lot of those best practices together. 00:21:15.120 |
And now we're starting to see work done on making 00:21:18.040 |
all of that directly accessible through Swift 00:21:25.880 |
And hopefully we'll get then Swift CUDA kernels 00:21:56.760 |
is that they or others could write MLIR backends 00:22:01.760 |
for other GPUs or other tensor computation devices 00:22:16.640 |
So yeah, being able to target lots of backends 00:22:26.720 |
'cause at the moment NVIDIA is massively overcharging 00:22:36.760 |
'cause nobody else is doing the software properly. 00:22:39.320 |
- In the cloud there is some competition, right? 00:22:45.080 |
But TPUs are almost unprogrammable at the moment. 00:22:48.240 |
- So you can't, the TPUs has the same problem that you can't- 00:22:52.040 |
So TPUs, Google actually made an explicit decision 00:22:57.240 |
because they felt that there was too much IP in there. 00:23:00.000 |
And if they gave people direct access to program them, 00:23:04.360 |
So you can't actually directly program the memory in a TPU. 00:23:09.960 |
You can't even directly create code that runs on 00:23:13.960 |
and that you look at on the machine that has the GPU. 00:23:18.520 |
So all you can really do is this kind of cookie cutter thing 00:23:39.080 |
What is the motivation, its mission, its dream? 00:23:44.400 |
- So I guess the founding story is heavily tied 00:23:50.240 |
to my previous startup, which is a company called Analytic, 00:23:53.560 |
which was the first company to focus on deep learning 00:23:58.240 |
And I created that because I saw there was a huge 00:24:16.080 |
But I guess that maybe if we used deep learning 00:24:21.080 |
for some of the analytics, we could maybe make it 00:24:29.800 |
- Where's the biggest benefit, just before we get 00:24:32.520 |
to fast AI, where's the biggest benefit of AI in medicine 00:24:37.960 |
- Not much happening today in terms of like stuff 00:24:43.080 |
but in terms of the opportunity, it's to take markets 00:24:54.160 |
small numbers of doctors, and provide diagnostic, 00:24:59.160 |
particularly treatment planning and triage kind of on device 00:25:05.120 |
so that if you do a test for malaria or tuberculosis 00:25:12.960 |
that even a healthcare worker that's had a month 00:25:15.240 |
of training can get a very high quality assessment 00:25:20.240 |
of whether the patient might be at risk and tell, 00:25:27.400 |
So for example, in Africa, outside of South Africa, 00:25:34.000 |
for the entire continent, so most countries don't have any. 00:25:37.120 |
So if your kid is sick and they need something diagnosed 00:25:39.720 |
through medical imaging, the person, even if you're able 00:25:42.880 |
to get medical imaging done, the person that looks at it 00:25:45.040 |
will be a nurse at best, but actually in India, for example, 00:25:50.040 |
and China, almost no x-rays are read by anybody, 00:25:54.760 |
by any trained professional because they don't have enough. 00:25:59.240 |
So if instead we had a algorithm that could take 00:26:03.920 |
the most likely high risk 5% and say, triage basically, 00:26:13.240 |
it would massively change the kind of way that 00:26:17.120 |
what's possible with medicine in the developing world. 00:26:23.720 |
They're the developing world, they're not the poor world, 00:26:25.560 |
they're the developing world, so they have the money, 00:26:38.520 |
- Shortage of expertise, okay, and that's where 00:26:43.360 |
and magnify the expertise they do have, essentially. 00:26:47.800 |
- So you do see, just to linger it a little bit longer, 00:26:52.800 |
the interaction, do you still see the human experts 00:27:00.720 |
- Or is there something in medicine that could be automated 00:27:03.760 |
- I don't see the point of even thinking about that, 00:27:08.480 |
why would we want to find a way not to use them? 00:27:21.920 |
your unit economics at all, and it totally ignores the fact 00:27:25.520 |
that there are things people do better than machines. 00:27:36.640 |
there may be some problems where you can avoid 00:27:40.280 |
even going to the expert ever, sort of maybe preventative 00:27:43.880 |
care or some basic stuff, allowing the expert to focus 00:27:48.320 |
on the things that are really that, you know. 00:27:51.360 |
- Well, that's what the triage would do, right? 00:27:53.000 |
So the triage would say, okay, this 99% triage, 00:28:05.920 |
So the experts are being used to look at the stuff 00:28:12.280 |
which most things is not, you know, it's fine. 00:28:16.360 |
- Why do you think we haven't quite made progress 00:28:19.360 |
on that yet, in terms of the scale of how much AI 00:28:29.680 |
I only started in Lytic in like 2014, and before that, 00:28:36.720 |
the medical world was not aware of the opportunities here. 00:28:40.680 |
So I went to RSNA, which is the world's largest 00:28:44.920 |
radiology conference, and I told everybody I could, 00:28:49.240 |
you know, like, I'm doing this thing with deep learning, 00:28:53.360 |
And no one had any idea what I was talking about, 00:28:58.560 |
So like, we've come from absolute zero, which is hard, 00:29:04.680 |
and then the whole regulatory framework, education system, 00:29:09.920 |
everything is just set up to think of doctoring 00:29:17.120 |
who are deep learning practitioners and doctors 00:29:24.000 |
the first ones come out of their PhD programs, 00:29:31.600 |
has a number of students now who are data science experts, 00:29:38.960 |
deep learning experts, and actual medical doctors. 00:29:46.120 |
Quite a few doctors have completed our fast AI course now 00:29:50.040 |
and are publishing papers and creating journal reading 00:30:02.920 |
The regulators have to learn how to regulate this, 00:30:08.760 |
and then the lawyers at hospitals have to develop 00:30:13.320 |
a new way of understanding that sometimes it makes sense 00:30:18.240 |
for data to be, you know, looked at in raw form 00:30:30.080 |
it sounds, well, it's probably the hardest problem, 00:30:33.840 |
but sounds reminiscent of autonomous vehicles as well. 00:30:43.640 |
and more the interpretation of that regulation 00:30:55.000 |
does not stand for privacy, it stands for portability. 00:30:57.640 |
It's actually meant to be a way that data can be used. 00:31:04.360 |
because the idea is that would be more practical 00:31:06.520 |
and it would help people to use this legislation 00:31:10.440 |
to actually share data in a more thoughtful way. 00:31:17.760 |
they say, oh, if we don't know, we won't get sued, 00:31:29.160 |
hospital lawyers are not incented to make bold decisions 00:31:36.480 |
- Or even to embrace technology that saves lives. 00:31:44.160 |
- Also, it is also, saves lives in a very abstract way, 00:31:47.800 |
which is like, oh, we've been able to release 00:31:55.280 |
I can say like, oh, we ended up with this paper, 00:31:57.720 |
which found this result, which diagnosed a thousand 00:32:09.360 |
you may be able to point to a life that was taken 00:32:14.280 |
- Yeah, or a person whose privacy was violated. 00:32:28.280 |
We'll get back to fast AI, but on the question of privacy, 00:32:32.520 |
data is the fuel for so much innovation in deep learning. 00:32:39.760 |
whether we're talking about Twitter, Facebook, YouTube, 00:32:44.000 |
just the technologies like in the medical field 00:32:48.640 |
that rely on people's data in order to create impact. 00:32:53.360 |
How do we get that right, respecting people's privacy 00:32:58.360 |
and yet creating technology that is learned from data? 00:33:03.320 |
- One of my areas of focus is on doing more with less data, 00:33:27.720 |
- So Google and IBM both strongly push the idea 00:33:33.080 |
that they have more data and more computation 00:33:35.440 |
and more intelligent people than anybody else. 00:33:45.400 |
Like Jeff Dean has gone out there and given talks 00:33:50.520 |
"a thousand times more computation, but less people." 00:33:55.160 |
Our goal is to use the people that you have better 00:34:03.000 |
So one of the things that we've discovered is, 00:34:06.040 |
or at least highlighted, is that you very, very, 00:34:13.360 |
And so the data you already have in your organization 00:34:16.160 |
will be enough to get state-of-the-art results. 00:34:19.240 |
So like my starting point would be to kind of say 00:34:21.320 |
around privacy is a lot of people are looking for ways 00:34:29.960 |
They assume that they need more data than they do 00:34:34.160 |
of transfer learning, which is this critical technique 00:34:42.000 |
- Is your sense, one reason you might wanna collect data 00:34:44.680 |
from everyone is like in the recommender system context, 00:34:49.680 |
where your individual, Jeremy Howard's individual data 00:35:06.360 |
Is your sense we can build with a small amount of data, 00:35:11.680 |
general models that will have a huge impact for most people 00:35:16.000 |
that we don't need to have data from each individual? 00:35:23.400 |
you know, recommender systems have this cold start problem 00:35:31.960 |
So we can't recommend him things based on what else 00:35:54.760 |
'cause they think, oh, we don't wanna bother the user. 00:36:00.960 |
where you get my marketing record from Axiom or whatever 00:36:11.600 |
of saving me five minutes on answering some questions 00:36:26.160 |
the places where people are invading our privacy 00:36:32.800 |
is really about just trying to make them more money 00:36:41.080 |
to places that they don't have to pay for them. 00:37:00.360 |
yeah, I mean, the hospital has my medical imaging, 00:37:16.920 |
One of the things DocAI does is that it has an app 00:37:19.720 |
you can connect to Sutter Health and LabCorp and Walgreens 00:37:49.760 |
So that has a beautiful, interesting tangent, 00:37:53.120 |
but to return back to the origin story of Fast.ai. 00:38:06.360 |
where are the biggest opportunities for deep learning? 00:38:10.400 |
'Cause I knew from my time at Kaggle in particular 00:38:14.080 |
that deep learning had kind of hit this threshold point 00:38:16.920 |
where it was rapidly becoming the state-of-the-art approach 00:38:21.600 |
And I'd been working with neural nets for over 20 years. 00:38:25.400 |
I knew that from a theoretical point of view, 00:38:36.280 |
the biggest low-hanging fruit in the shortest time period. 00:38:39.440 |
I picked medicine, but there were so many I could have picked 00:38:43.960 |
and so there was a kind of level of frustration for me 00:38:46.280 |
of like, okay, I'm really glad we've opened up 00:39:00.440 |
it took me a really long time to even get a sense 00:39:02.320 |
of like what kind of problems do medical practitioners solve? 00:39:07.480 |
So I kind of felt like I need to approach this differently 00:39:12.520 |
if I wanna maximize the positive impact of deep learning. 00:39:19.280 |
and trying to become good at it and building something, 00:39:21.800 |
I should let people who are already domain experts 00:39:36.800 |
how to get deep learning into the hands of people 00:39:40.160 |
who could benefit from it and help them to do so 00:39:43.280 |
in as quick and easy and effective a way as possible. 00:39:47.120 |
- Got it, so sort of empower the domain experts. 00:39:56.360 |
my background is very applied and industrial. 00:40:00.000 |
Like my first job was at McKinsey and Company. 00:40:10.560 |
so I kind of respect them and appreciate them 00:40:12.840 |
and I know that's where the value generation in society is. 00:40:16.560 |
And so I also know how most of them can't code 00:40:21.560 |
and most of them don't have the time to invest, 00:40:26.080 |
you know, three years in a graduate degree or whatever. 00:40:29.440 |
So it's like, how do I upskill those domain experts? 00:40:33.640 |
I think that would be a super powerful thing, 00:40:36.200 |
you know, biggest societal impact I could have. 00:40:41.800 |
- So, so much of Fast.ai students and researchers 00:40:45.800 |
and the things you teach are pragmatically minded, 00:40:55.880 |
So from your experience, what's the difference 00:40:58.200 |
between theory and practice of deep learning? 00:41:01.260 |
- Well, most of the research in the deep mining world 00:41:11.080 |
- Yeah, it's a problem in science in general. 00:41:26.240 |
So that means that they all need to work on the same thing. 00:41:29.080 |
And so it really, and the thing they work on, 00:41:33.040 |
there's nothing to encourage them to work on things 00:41:49.340 |
Whereas the things that really make a difference, 00:41:52.800 |
like if we can do better at transfer learning, 00:41:59.800 |
can do world-class work with less resources and less data. 00:42:11.920 |
how do we get more out of the human beings in the loop? 00:42:21.220 |
because it's just not a trendy thing right now. 00:42:23.840 |
- You know what, somebody started to interrupt. 00:42:27.080 |
He was saying that nobody is publishing on active learning, 00:42:36.840 |
they're going to innovate on active learning. 00:42:39.680 |
- Yeah, everybody kind of reinvents active learning 00:42:43.800 |
because they start labeling things and they think, 00:42:46.420 |
gosh, this is taking a long time and it's very expensive. 00:42:56.920 |
Maybe I'll just start labeling those two classes. 00:43:14.160 |
just has no reason to care about practical results. 00:43:17.500 |
The funny thing is, I've only really ever written one paper. 00:43:20.000 |
I hate writing papers and I didn't even write it. 00:43:22.800 |
It was my colleague, Sebastian Ruder, who actually wrote it. 00:43:27.960 |
but it was basically introducing transfer learning, 00:43:30.640 |
successful transfer learning to NLP for the first time. 00:43:45.340 |
and I thought I only wanna teach people practical stuff. 00:43:47.500 |
And I think the only practical stuff is transfer learning. 00:43:50.540 |
And I couldn't find any examples of transfer learning in NLP, 00:43:54.540 |
And I was shocked to find that as soon as I did it, 00:43:57.300 |
which the basic prototype took a couple of days, 00:44:06.720 |
And I just thought, well, this is ridiculous. 00:44:13.800 |
and he kindly offered to write it up, the results. 00:44:21.360 |
which is the top computational linguistics conference. 00:44:25.560 |
So like people do actually care once you do it, 00:44:28.880 |
but I guess it's difficult for maybe like junior researchers 00:44:32.780 |
or like, I don't care whether I get citations 00:44:37.740 |
There's nothing in my life that makes that important, 00:44:43.980 |
I guess they have to pick the kind of safe option, 00:44:48.980 |
which is like, yeah, make a slight improvement 00:44:52.280 |
on something that everybody's already working on. 00:45:01.180 |
- Although, I mean, the nice thing is nowadays, 00:45:02.940 |
everybody is now working on NLP transfer learning 00:45:05.300 |
because since that time we've had GPT and GPT-2 and BERT 00:45:22.140 |
I think transfer learning and active learning 00:45:27.360 |
- I actually helped start a startup called Platform AI, 00:45:31.760 |
And yeah, it's been interesting trying to kind of 00:45:34.640 |
see what research is out there and make the most of it. 00:45:43.000 |
Can you tell the story of the Stanford competition, 00:45:54.280 |
is that I basically teach two courses a year, 00:46:02.080 |
and then cutting edge deep learning for coders, 00:46:19.760 |
And I invite anybody, any student who wants to come 00:46:22.120 |
and hang out with me while I build the course. 00:46:26.600 |
And so we have 20 or 30 people in a big office 00:46:46.400 |
Seems kind of not exactly relevant to what we're doing, 00:46:53.320 |
"Oh crap, there's only 10 days till it's over. 00:46:58.080 |
And we're kind of busy trying to teach this course. 00:47:00.960 |
But we're like, "Oh, it would make an interesting 00:47:17.560 |
And we focused on this small one called Cypher 10, 00:47:32.480 |
- That's also another one for as cheap as possible. 00:47:34.280 |
And there's a couple of categories, ImageNet and Cypher 10. 00:47:38.120 |
So ImageNet is this big 1.3 million image thing 00:47:50.180 |
I remember he told me how he trained ImageNet 00:47:53.240 |
a few years ago, and he basically like had this 00:47:59.760 |
that he turned into his ImageNet training center. 00:48:01.880 |
And he figured, you know, after like a year of work, 00:48:03.760 |
he figured out how to train it in like 10 days or something. 00:48:08.480 |
Well, Cypher 10 at that time, you could train in a few hours. 00:48:22.280 |
Like I'd never really, like things like using more 00:48:25.800 |
than one GPU at a time was something I tried to avoid. 00:48:29.760 |
'Cause to me, it's like very against the whole idea 00:48:32.080 |
of accessibility is you should be able to do things 00:48:42.440 |
- Oh, always, but it's always, for me, it's always, 00:48:47.640 |
that a normal person could afford in their day-to-day life? 00:48:50.360 |
It's not, how could I do it faster by, you know, 00:48:57.200 |
as many people should be able to use something as possible 00:49:06.000 |
we can use eight GPUs just by renting a AWS machine. 00:49:11.840 |
And yeah, basically using the stuff we were already doing, 00:49:20.120 |
you know, within a few days, we had the speed down to, 00:49:22.880 |
I don't know, a very small number of minutes. 00:49:26.000 |
I can't remember exactly how many minutes it was, 00:49:28.760 |
but it might've been like 10 minutes or something. 00:49:33.200 |
of the leaderboard easily for both time and money, 00:49:59.280 |
And, but we didn't put anything up on the leaderboard, 00:50:23.240 |
But we kind of like had a bunch of ideas to try, 00:50:41.360 |
that your train model is supposed to achieve. 00:50:51.120 |
- Yeah, 93%, like they picked a good threshold. 00:50:56.920 |
than what the most commonly used ResNet-50 model 00:51:03.360 |
So yeah, so it's quite a difficult problem to solve. 00:51:30.160 |
and then you say, here's a really clear picture of a dog, 00:51:39.840 |
and we ended up winning parts of that competition. 00:51:44.840 |
We actually ended up doing a distributed version 00:51:49.600 |
over multiple machines a couple of months later 00:51:57.960 |
and people have just kept on blasting through 00:52:19.440 |
Multi GPUs is less clunky than it used to be. 00:52:25.320 |
But to me, anything that slows down your iteration speed 00:52:33.840 |
you know, perfecting of the model on multi GPUs 00:52:41.040 |
I think doing stuff on ImageNet is generally a waste of time. 00:52:59.160 |
So from a research point of view, why waste that time? 00:53:02.080 |
So actually I released a couple of new datasets recently. 00:53:07.720 |
the French ImageNet, which is a small subset of ImageNet, 00:53:21.320 |
- Yeah, and then another one called ImageWolf, 00:53:24.720 |
which is a subset of ImageNet that only contains dog breeds. 00:53:34.920 |
you can train things on a single GPU in 10 minutes 00:53:39.120 |
and the results you get directly transferable 00:53:44.320 |
And so now I'm starting to see some researchers 00:53:51.160 |
because I think you might've written a blog post saying 00:54:00.160 |
is encouraging people to not think creatively. 00:54:14.000 |
And then you start, so like somehow you kill the creativity. 00:54:25.440 |
to people outside of Google to do useful work." 00:54:28.520 |
So like I see a lot of people make an explicit decision 00:54:42.440 |
And I just find that so disappointing and it's so wrong. 00:54:45.360 |
- And I think all of the major breakthroughs in AI 00:54:49.160 |
in the next 20 years will be doable on a single GPU. 00:54:53.240 |
Like I would say my sense is all the big sort of- 00:54:58.240 |
None of the big breakthroughs of the last 20 years 00:55:05.960 |
- To demonstrate that there's something to that. 00:55:08.080 |
- Every one of them, none of them has required multiple GPUs. 00:55:11.960 |
- GANs, the original GANs didn't require multiple GPUs. 00:55:19.640 |
So we've developed GAN level outcomes without needing GANs. 00:55:27.960 |
we can do it in a couple of hours on a single GPU. 00:55:35.680 |
that work super well without the adversarial part. 00:55:38.640 |
And then one of our students, a guy called Jason Antich, 00:55:52.840 |
And one of the things that Jason and I did together 00:55:56.040 |
was we figured out how to add a little bit of GAN 00:56:00.440 |
at the very end, which it turns out for colorization 00:56:19.160 |
that sounds like something a huge studio would have to do, 00:56:29.040 |
It's such a pain in the ass to have these microphones 00:56:34.360 |
And I tried to see if it's possible to plop down 00:56:47.440 |
automatically combining audio from multiple sources 00:57:01.000 |
I felt the same way about computational photography 00:57:09.800 |
plus actually a little bit of intentional movement? 00:57:16.640 |
gives you enough information to get excellent sub-pixel 00:57:19.800 |
resolution, which particularly with deep learning, 00:57:22.440 |
you would know exactly what you're meant to be looking at. 00:57:28.160 |
I think it's madness that it hasn't been done yet. 00:57:30.680 |
- Is there been progress on the photography company? 00:57:33.240 |
- Yeah, photography is basically a standard now. 00:57:53.400 |
to the background blurring done computationally. 00:57:58.560 |
- Yeah, basically everybody now is doing most 00:58:07.080 |
And also increasingly people are putting more than one lens 00:58:14.280 |
- And there's applications in the audio side. 00:58:18.400 |
most people I've seen, especially I worked at Google before, 00:58:25.880 |
you don't think of multiple sources of audio. 00:58:28.760 |
You don't play with that as much as I would hope people would. 00:58:31.840 |
- But I mean, you can still do it even with one. 00:58:33.560 |
Like again, it's not much work's been done in this area. 00:58:36.040 |
So we're actually gonna be releasing an audio library soon, 00:58:38.960 |
which hopefully will encourage development of this 00:58:43.120 |
The basic approach we used for our super resolution 00:58:50.920 |
the exact same approach would work for audio. 00:58:57.080 |
- Okay, also learning rate in terms of DawnBench. 00:59:09.280 |
Leslie's a researcher who like us cares a lot 00:59:13.200 |
about just the practicalities of training neural networks 00:59:20.280 |
which you would think is what everybody should care about, 00:59:23.680 |
And he discovered something very interesting, 00:59:31.160 |
that with certain settings of high parameters 00:59:43.560 |
because it's not an area of kind of active research 00:59:59.800 |
So unlike in physics where you could say like, 01:00:07.200 |
you could publish that without an explanation. 01:00:11.840 |
people can try to work out how to explain it. 01:00:14.080 |
We don't allow this in the deep learning world. 01:00:23.520 |
This thing trained 10 times faster than it should have. 01:00:28.480 |
well, you can't publish that 'cause you don't know why. 01:00:36.120 |
- Every other scientific field I know of works that way. 01:00:39.200 |
I don't know why ours is uniquely disinterested 01:00:43.480 |
in publishing unexplained experimental results, 01:00:52.480 |
I read a lot more unpublished papers than published papers 01:00:56.800 |
'cause that's where you find the interesting insights. 01:01:04.440 |
this is astonishingly mind-blowing and weird and awesome. 01:01:09.440 |
And like, why isn't everybody only talking about this? 01:01:12.320 |
Because like, if you can train these things 10 times faster, 01:01:21.360 |
So I've been kind of studying that ever since. 01:01:42.040 |
and you gradually make them bigger and bigger 01:01:44.000 |
until eventually you're taking much bigger steps 01:01:48.120 |
There's a few other little tricks to make it work, 01:01:51.040 |
but basically we can reliably get super convergence. 01:01:56.560 |
we were using just much higher learning rates 01:02:05.160 |
to be a critical hyperparameter learning rate that you vary. 01:02:20.160 |
like we just have no idea really how optimizers work. 01:02:29.160 |
and then other things like the epsilon we use 01:02:38.520 |
this is another thing we've done a lot of work on 01:02:40.440 |
is research into how different parts of the model 01:02:43.480 |
should be trained at different rates in different ways. 01:02:46.600 |
So we do something we call discriminative learning rates, 01:03:10.800 |
it almost already has disappeared in the latest research. 01:03:15.720 |
we know enough about how to interpret the gradients 01:03:30.800 |
where really, where's the input of a human expert needed? 01:03:34.520 |
- Well, hopefully the input of a human expert 01:03:43.440 |
is to try and use thousands of times more compute 01:03:45.960 |
to run lots and lots of models at the same time 01:03:51.840 |
- Yeah, AutoML kind of stuff, which I think is insane. 01:04:01.640 |
you don't have to try a thousand different models 01:04:14.880 |
it means you don't need deep learning experts 01:04:19.320 |
which means that domain experts can do more of the work, 01:04:22.240 |
which means that now you can focus the human time 01:04:24.960 |
on the kind of interpretation, the data gathering, 01:04:28.280 |
identifying model errors and stuff like that. 01:04:49.360 |
- Yeah, I mean, it's a key part of our course 01:04:51.320 |
is like before we train a model in the course, 01:04:57.920 |
which we fine tune an ImageNet model for five minutes. 01:05:00.520 |
And then the thing we immediately do after that 01:05:02.200 |
is we learn how to analyze the results of the model 01:05:05.800 |
by looking at examples of misclassified images 01:05:15.080 |
to learn about the kinds of things that it's misclassifying. 01:05:29.360 |
And they help you become a domain expert more quickly 01:05:38.680 |
So it lets you deal with things like data leakage, 01:05:41.560 |
"Oh, the main feature I'm looking at is customer ID." 01:05:50.640 |
that manage customer IDs and they'll tell you like, 01:05:53.160 |
"Oh yes, as soon as a customer's application is accepted, 01:05:57.480 |
we add a one on the end of their customer ID or something." 01:06:03.720 |
particularly from the lens of which parts of the data 01:06:06.000 |
the model says is important is super important. 01:06:09.360 |
- Yeah, and using the model to almost debug the data 01:06:39.440 |
Google's TPUs and the best Nvidia GPUs are similar. 01:06:49.920 |
There isn't a clear leader in terms of hardware right now, 01:06:59.560 |
They've got much more written for all of them. 01:07:17.040 |
and AWS that you can access a GPU pretty quickly and easily. 01:07:28.080 |
Like you have to find an AMI and get the instance running 01:07:33.080 |
and then install the software you want and blah, blah, blah. 01:07:37.080 |
GCP is still, is currently the best way to get started 01:08:13.560 |
and it pops up a Jupyter Notebook straight away 01:08:17.200 |
without any kind of installation or anything. 01:08:22.200 |
And all the course notebooks are all pre-installed. 01:08:28.560 |
we spent a lot of time kind of curating and working on. 01:08:35.960 |
the biggest problem was people dropped out of lesson one 01:08:39.600 |
'cause they couldn't get an AWS instance running. 01:08:44.880 |
And like we actually have, if you go to course.fast.ai, 01:08:56.280 |
I have to confess, I've never used the Google GCP. 01:08:58.800 |
- Yeah, GCP gives you $300 of compute for free, 01:09:10.960 |
So from the perspective of deep learning frameworks, 01:09:15.120 |
you work with Fast.ai, if you go to this framework, 01:09:25.800 |
- So in terms of what we've done our research on 01:09:34.360 |
And then we switched to TensorFlow and Keras. 01:09:42.960 |
And that kind of reflects a growth and development 01:10:01.680 |
because they define what's called a computational graph 01:10:07.400 |
here are all the things that I'm going to eventually do 01:10:23.680 |
but PyTorch was certainly the strongest entrant 01:10:34.080 |
And we'll figure out how to make that run on the GPU 01:10:44.640 |
in terms of what we could do with our research 01:10:57.840 |
and practitioner when you have to do everything up front 01:11:08.880 |
because you have to write your own training loop 01:11:19.360 |
dealing with all this boilerplate and overhead 01:11:23.880 |
So we ended up writing this very multi-layered API 01:11:29.040 |
you can train a state-of-the-art neural network 01:11:35.040 |
which talks to an API, which talks to an API, 01:11:47.400 |
That's been critical for us and for our students 01:12:02.920 |
and particularly this problem with things like 01:12:06.400 |
recurrent neural nets say where you just can't change things 01:12:11.400 |
unless you accept it going so slowly that it's impractical. 01:12:18.320 |
and with some of the research we're now starting to do, 01:12:31.040 |
I'm very happy to invest the time to get there. 01:12:34.240 |
But with that, we actually already have a nascent version 01:12:44.720 |
'Cause Python for TensorFlow is not gonna cut it. 01:12:52.960 |
the bits that people were saying they like about PyTorch, 01:13:10.880 |
but it's 10 times slower than PyTorch to actually do a step. 01:13:21.080 |
'cause their code base is so horribly complex. 01:13:26.360 |
- Yeah, well, particularly the way TensorFlow was written, 01:13:28.600 |
it was written by a lot of people very quickly 01:13:33.320 |
So like when you actually look in the code, as I do often, 01:13:35.960 |
I'm always just like, oh God, what were they thinking? 01:13:53.720 |
It can be like, it can basically be a layer on top of MLIR 01:13:57.520 |
that takes advantage of all the great compiler stuff 01:14:11.840 |
I haven't truly felt the pain of TensorFlow 2.0 Python. 01:14:34.720 |
okay, I want to write something from scratch. 01:14:37.320 |
And you're like, I just keep finding it's like, 01:14:38.840 |
oh, it's running 10 times slower than PyTorch. 01:14:50.920 |
Thanks to TensorFlow Eager, that's not too different. 01:14:54.000 |
But because so many things take so long to run, 01:15:00.240 |
Like you just go like, oh, this is taking too long. 01:15:05.760 |
like tf.data, which is the way data processing works 01:15:14.720 |
because of the TPU problems I described earlier. 01:15:22.120 |
I just feel like they've got this huge technical debt, 01:15:37.440 |
- Well, I mean, we obviously recommend Fast.ai and PyTorch 01:15:46.040 |
because it will let you get on top of the concepts 01:15:53.080 |
and you'll also learn the actual state of the art techniques, 01:15:56.120 |
you know, so you actually get world-class results. 01:15:59.160 |
Honestly, it doesn't much matter what library you learn 01:16:08.280 |
to TensorFlow to PyTorch is gonna be a couple of days work 01:16:11.960 |
as long as you understand the foundation as well. 01:16:24.320 |
particularly because like Swift has no data science community, 01:16:29.320 |
libraries, tooling. - So code bases are out there. 01:16:33.360 |
- And the Swift community has a total lack of appreciation 01:16:40.840 |
So like they keep on making stupid decisions, 01:16:43.280 |
you know, for years they've just done dumb things 01:16:53.440 |
because the developer of Swift, Chris Latner, 01:16:58.000 |
is working at Google on Swift for TensorFlow. 01:17:04.160 |
It'll be interesting to see what happens with Apple 01:17:05.800 |
because like Apple hasn't shown any sign of caring 01:17:13.800 |
So I mean, hopefully they'll get off their ass 01:17:18.800 |
'cause currently all of their low level libraries 01:17:27.360 |
stuff like core ML, they're really pretty rubbish. 01:17:33.680 |
but at least one nice thing is that Swift for TensorFlow 01:17:36.080 |
can actually directly use Python code and Python libraries 01:17:40.760 |
in literally the entire lesson one notebook of fast AI 01:18:04.320 |
Somewhere between two months and two years generally. 01:18:15.320 |
- So like somebody who is a very competent coder 01:18:35.680 |
to study fast AI full time and say at the end of the year, 01:18:43.440 |
'Cause generally there's a lot of other things you do. 01:18:45.560 |
Like generally they'll be entering Kaggle competitions. 01:18:51.440 |
They might, you know, they'll be doing a bunch of stuff. 01:19:01.760 |
So part of it's just like doing a lot more writing. 01:19:04.760 |
- What do you find is the bottleneck for people usually, 01:19:14.840 |
the people who are strong coders pick it up the best. 01:19:17.880 |
Although another bottleneck is people who have a lot 01:19:21.640 |
of experience of classic statistics can really struggle 01:19:30.840 |
They're very used to like trying to reduce the number 01:19:36.920 |
at individual coefficients and stuff like that. 01:19:39.400 |
So I find people who have a lot of coding background 01:19:42.920 |
and know nothing about statistics are generally 01:19:47.440 |
- So you taught several courses on deep learning 01:19:52.920 |
"The best way to understand something is to teach it." 01:19:55.600 |
What have you learned about deep learning from teaching it? 01:20:00.600 |
It's a key reason for me to teach the courses. 01:20:04.920 |
to achieve our goal of getting domain experts 01:20:09.320 |
but it was also necessary for me to achieve my goal 01:20:28.800 |
but convinced me something that I liked to believe 01:20:34.880 |
So there's a lot of kind of snobbishness out there 01:20:40.200 |
only certain people are gonna be smart enough to do AI. 01:20:47.240 |
from so many different backgrounds get state-of-the-art 01:20:52.480 |
It's definitely taught me that the key differentiator 01:20:57.120 |
between people that succeed and people that fail 01:21:00.680 |
That seems to be basically the only thing that matters. 01:21:15.000 |
Even if at first I'm just kind of like thinking like, 01:21:17.840 |
wow, they really aren't quite getting it yet, are they? 01:21:20.520 |
But eventually people get it and they succeed. 01:21:26.400 |
I think they're both things I've liked to believe was true, 01:21:28.720 |
but I don't feel like I really had strong evidence 01:21:31.760 |
but now I can say I've seen it again and again. 01:21:47.040 |
So like, so I would, you know, I think, it's not just me. 01:21:53.320 |
but also lots of people independently have said 01:21:55.800 |
It recently won the COGx award for AI courses 01:22:02.960 |
And the thing I keep on hopping on in my lessons 01:22:05.240 |
is train models, print out the inputs to the models, 01:22:11.000 |
like study, you know, change the inputs a bit, 01:22:17.320 |
just run lots of experiments to get a, you know, 01:22:20.360 |
an intuitive understanding of what's going on. 01:22:24.480 |
- To get hooked, do you think, you mentioned training, 01:22:29.080 |
do you think just running the models inference? 01:22:43.240 |
So there's no point running somebody else's model 01:22:47.880 |
Like, so it only takes five minutes to fine tune a model 01:22:53.520 |
we teach you how to create your own dataset from scratch 01:23:02.840 |
So I create one in the course that differentiates 01:23:05.280 |
between a teddy bear, a grizzly bear, and a brown bear. 01:23:08.320 |
And it does it with basically a hundred percent accuracy. 01:23:11.040 |
Took me about four minutes to scrape the images 01:23:15.080 |
There's a little graphical widgets we have in the notebook 01:23:21.400 |
There's other widgets that help you study the results 01:23:29.280 |
in our share your work here thread of students saying, 01:23:39.000 |
at Devan Garey characters and I couldn't believe it. 01:23:43.320 |
than the best academic paper after lesson one. 01:23:46.640 |
And then there's others which are just more kind of fun. 01:23:48.560 |
Like somebody who's doing Trinidad and Tobago hummingbirds. 01:23:53.080 |
She said, that's kind of their national bird. 01:23:54.880 |
And she's got something that can now classify a Trinidad 01:23:58.800 |
So yeah, train models, fine tune models with your dataset 01:24:33.120 |
Specifically train lots of models in your domain area. 01:25:13.240 |
So like become the expert in your passion area. 01:25:22.920 |
than other people, particularly by combining it 01:25:28.400 |
Even if you do wanna innovate on transfer learning 01:25:36.200 |
is you also need to find a domain or a dataset 01:25:42.520 |
If you're not working on a real problem that you understand, 01:25:49.320 |
How do you know if you're getting bad results? 01:25:53.600 |
How do you know you're doing anything useful? 01:25:57.400 |
Yeah, to me, the only really interesting research 01:26:04.720 |
and solve an actual problem and solve it really well. 01:26:09.440 |
on the deep learning side and becoming a domain expert 01:26:13.720 |
in a particular domain are really things within reach 01:26:23.480 |
having never looked at a car or been in a car 01:26:26.560 |
or turned a car on, which is like the way it is 01:26:32.880 |
where they literally have no idea about that. 01:26:37.680 |
with autonomous vehicles, but that is literally, 01:26:40.880 |
you describe a large percentage of robotics folks 01:26:48.680 |
They haven't actually looked at what driving looks like. 01:26:51.440 |
- Right, and it's a problem because you know, 01:26:54.400 |
like these are the things that happened to me 01:26:57.080 |
- There's nothing that beats the real world examples 01:27:04.880 |
What does it take to create a successful startup? 01:27:11.520 |
deep learning practitioner, which is not giving up. 01:27:15.000 |
So you can run out of money or run out of time 01:27:34.000 |
then just sticking with it is one important thing. 01:27:38.080 |
Doing something you understand and care about is important. 01:27:44.040 |
the biggest problem I see with deep learning people 01:27:50.160 |
and then they try and commercialize their PhD, 01:27:55.880 |
You picked your PhD topic 'cause it was an interesting 01:27:59.280 |
kind of engineering or math or research exercise. 01:28:02.520 |
But yeah, if you've actually spent time as a recruiter 01:28:12.880 |
you're just looking for certain kinds of things 01:28:14.720 |
and you can try doing that with a model for a few minutes 01:28:19.720 |
and see whether that's something which the model 01:28:23.760 |
then you're on the right track to creating a startup. 01:28:35.720 |
from venture capital money as long as possible, 01:29:00.680 |
who do this all the time and who have done it for years 01:29:07.200 |
they only care if you don't grow fast enough. 01:29:09.480 |
So that's scary, whereas doing the ones myself, 01:29:17.760 |
it's nice 'cause we just went along at a pace 01:29:21.120 |
that made sense and we were able to build it to something 01:29:23.760 |
which was big enough that we never had to work again, 01:29:41.920 |
but how do you make money during that process? 01:29:47.440 |
- So yeah, so I started Fastmail and Optimal Decisions 01:29:50.640 |
at the same time in 1999 with two different friends. 01:29:54.560 |
And for Fastmail, I guess I spent $70 a month on the server. 01:30:09.400 |
and said, "If you want more than 10 megs of space, 01:30:29.440 |
we were making money and I was profitable from then. 01:30:42.160 |
But what we did was we would sell scoping projects. 01:31:13.280 |
And I guess I was comparing it to the scarediness of VC. 01:31:18.120 |
I felt like with VC stuff, it was more scary, 01:31:27.880 |
I also found it very difficult with VC-backed startups 01:31:30.560 |
to actually do the thing which I thought was important 01:31:42.400 |
But then if you don't do the thing that makes them happy, 01:31:53.080 |
- I mean, it can be, but not at the VC level, 01:31:54.920 |
'cause the VC exit needs to be, you know, a thousand X. 01:32:11.200 |
you're kind of happy to do forever, then fine. 01:32:18.440 |
I mean, they're both perfectly good outcomes. 01:32:26.760 |
- And I read that you use, at least in some cases, 01:32:31.160 |
spaced repetition as a mechanism for learning new things. 01:32:37.240 |
- I actually never talked to anybody about it. 01:33:03.440 |
I don't know, must be a couple of hundred years ago 01:33:08.040 |
He did something which sounds pretty damn tedious. 01:33:10.720 |
He wrote down random sequences of letters on cards 01:33:23.000 |
He discovered that there was this kind of a curve 01:33:26.120 |
where his probability of remembering one of them 01:33:33.520 |
What he discovered is that if he revised those cards 01:33:36.880 |
after a day, the probabilities would decrease 01:33:42.880 |
And then if he revised them again a week later, 01:33:47.040 |
And so he basically figured out a roughly optimal equation 01:33:51.800 |
for when you should revise something you wanna remember. 01:34:02.080 |
revise something after a day and then three days 01:34:04.480 |
and then a week and then three weeks and so forth. 01:34:07.680 |
And so if you use a program like Anki, as you know, 01:34:14.520 |
And if you say no, it will reschedule it back 01:34:21.960 |
It's a kind of a way of being guaranteed to learn something 01:34:27.880 |
because by definition, if you're not learning it, 01:34:30.200 |
it will be rescheduled to be revised more quickly. 01:34:39.480 |
you know like your revisions will just get more and more. 01:34:44.040 |
So you have to find ways to learn things productively 01:34:59.720 |
It's like learning how to learn is something which 01:35:02.560 |
everybody should learn before they actually learn anything, 01:35:07.920 |
- So what have you, so it certainly works well 01:35:16.400 |
but do you, you know, I started using it for, 01:35:19.800 |
I forget who wrote a blog post about this inspired me. 01:35:33.640 |
- Yeah, so Michael started doing this recently 01:35:47.720 |
And he's been basically trying to become like 01:35:55.920 |
He's basically lived his life with space repetition, 01:36:07.440 |
but he started really getting excited about doing it 01:36:20.680 |
is specifically a thing I made a conscious decision 01:36:26.680 |
even if I don't get much of a chance to exercise it, 01:36:30.120 |
'cause like I'm not often in China, so I don't. 01:36:33.040 |
Or else something like programming languages or papers, 01:36:39.640 |
which is I try not to learn anything from them, 01:36:43.040 |
but instead I try to identify the important concepts 01:36:49.000 |
So like really understand that concept deeply 01:37:06.760 |
So I find I then remember the things that I care about 01:37:20.160 |
I've committed to spending at least half of every day 01:37:28.760 |
because it always looks like I'm not working on 01:37:36.920 |
So I kind of give myself a lot of opportunity 01:37:56.160 |
but speaking Chinese, you can't look it up on Google. 01:37:59.720 |
- Do you have advice for people learning new things? 01:38:01.560 |
So if you, what have you learned as a process? 01:38:04.840 |
I mean, it all starts with just making the hours 01:38:13.680 |
So the people I started learning Chinese with, 01:38:15.880 |
none of them were still doing it 12 months later. 01:38:33.760 |
and that story is specifically designed to be memorable. 01:38:41.360 |
or related to people that we know or care about. 01:38:44.240 |
So I try to make sure all the stories that are in my head 01:39:00.640 |
whether it be just part of your day-to-day life 01:39:30.480 |
all right, well, I can either stop and give up everything 01:39:36.560 |
for the next two years until I get back to it. 01:39:39.000 |
The amazing thing has been that even after three years, 01:39:53.120 |
I have the same with guitar, with music and so on. 01:39:56.520 |
It's sad because the work sometimes takes away, 01:40:01.160 |
But really, if you then just get back to it every day, 01:40:06.000 |
What do you think is the next big breakthrough 01:40:09.400 |
What are your hopes in deep learning or beyond 01:40:23.680 |
to solve lots of societally important problems 01:40:33.360 |
I don't think we need a lot of new technological breakthroughs 01:40:38.600 |
- And when do you think we're going to create 01:40:51.760 |
I don't know why people make predictions about this 01:41:00.360 |
there's so many societally important problems 01:41:04.440 |
I just don't find it a really interesting question 01:41:10.280 |
- So in terms of societally important problems, 01:41:26.840 |
and people keep making this frivolous econometric argument 01:41:30.920 |
of being like, oh, there's been other things that aren't AI 01:41:34.960 |
and haven't created massive labor force displacement, 01:41:47.360 |
And you see already that the changing workplace 01:41:52.360 |
has led to a hollowing out of the middle class. 01:41:55.760 |
You're seeing that students coming out of school today 01:41:59.040 |
have a less rosy financial future ahead of them 01:42:04.040 |
which has never happened in the last few hundred years. 01:42:10.960 |
And you see this turning into anxiety and despair 01:42:23.440 |
- You've written quite a bit about ethics too. 01:42:29.640 |
working with deep learning needs to recognize 01:42:35.640 |
that they're using that can influence society 01:42:40.320 |
that that research is gonna be used by people 01:42:44.440 |
And they have a responsibility to consider the consequences 01:42:56.520 |
How do we ensure an appeals process for humans 01:43:01.720 |
How do I ensure that the constraints of my algorithm 01:43:11.880 |
which only data scientists are actually in the right place 01:43:17.960 |
But data scientists tend to think of themselves 01:43:26.680 |
- Well, you're in a perfect position to educate them better, 01:43:30.280 |
to read literature, to read history, to learn from history. 01:43:33.760 |
Well, Jeremy, thank you so much for everything you do,