Most Research in Deep Learning is a Total Waste of Time - Jeremy Howard

00:00:00.000 | (gentle music)

00:00:02.580 | - So much of fast AI students and researchers

00:00:10.960 | and the things you teach are pragmatically minded,

00:00:15.440 | practically minded, figuring out ways

00:00:18.160 | how to solve real problems and fast.

00:00:21.080 | So from your experience, what's the difference

00:00:23.440 | between theory and practice of deep learning?

00:00:28.960 | - Well, most of the research in the deep learning world

00:00:32.800 | is a total waste of time.

00:00:35.080 | - Right, that's what I was getting at.

00:00:36.320 | - Yeah.

00:00:37.160 | (laughing)

00:00:37.980 | It's a problem in science in general.

00:00:41.520 | Scientists need to be published,

00:00:44.880 | which means they need to work on things

00:00:46.760 | that their peers are extremely familiar with

00:00:49.340 | and can recognize and advance in that area.

00:00:51.480 | So that means that they all need to work on the same thing.

00:00:54.280 | And so it really, and the thing they work on,

00:00:58.320 | there's nothing to encourage them to work on things

00:01:00.960 | that are practically useful.

00:01:04.160 | So you get just a whole lot of research,

00:01:06.440 | which is minor advances and stuff

00:01:08.540 | that's been very highly studied

00:01:09.960 | and has no significant practical impact.

00:01:14.640 | Whereas the things that really make a difference,

00:01:16.200 | like I mentioned transfer learning,

00:01:18.080 | like if we can do better at transfer learning,

00:01:20.920 | then it's this like world-changing thing

00:01:23.520 | where suddenly like lots more people

00:01:25.080 | can do world-class work with less resources and less data.

00:01:30.080 | But almost nobody works on that.

00:01:33.800 | Or another example, active learning,

00:01:36.080 | which is the study of like,

00:01:37.160 | how do we get more out of the human beings in the loop?

00:01:41.160 | - That's my favorite topic.

00:01:42.440 | - Yeah, so active learning is great,

00:01:43.840 | but it's almost nobody working on it

00:01:46.480 | because it's just not a trendy thing right now.

00:01:49.080 | - You know what, somebody started to interrupt.

00:01:52.320 | He was saying that nobody is publishing on active learning,

00:01:56.800 | but there's people inside companies,

00:01:58.720 | anybody who actually has to solve a problem,

00:02:02.080 | they're going to innovate on active learning.

00:02:04.920 | - Yeah, everybody kind of reinvents active learning

00:02:07.360 | when they actually have to work in practice

00:02:09.040 | because they start labeling things and they think,

00:02:11.640 | gosh, this is taking a long time and it's very expensive.

00:02:14.560 | And then they start thinking,

00:02:16.520 | well, why am I labeling everything?

00:02:17.920 | I'm only, the machine's only making mistakes

00:02:20.120 | on those two classes, they're the hard ones.

00:02:22.160 | Maybe I'll just start labeling those two classes

00:02:24.120 | and then you start thinking,

00:02:25.640 | well, why did I do that manually?

00:02:26.840 | Why can't I just get the system to tell me

00:02:28.280 | which things are going to be hardest?

00:02:30.040 | It's an obvious thing to do, but yeah,

00:02:33.600 | it's just like transfer learning,

00:02:36.680 | it's understudied and the academic world

00:02:39.400 | just has no reason to care about practical results.

00:02:42.720 | The funny thing is,

00:02:43.560 | like I've only really ever written one paper.

00:02:45.240 | I hate writing papers and I didn't even write it.

00:02:48.040 | It was my colleague, Sebastian Ruder, who actually wrote it.

00:02:50.760 | I just did the research for it,

00:02:53.320 | but it was basically introducing transfer learning,

00:02:55.840 | successful transfer learning to NLP for the first time.

00:02:59.160 | And the algorithm is called ULMfit.

00:03:01.280 | And I actually wrote it for the course,

00:03:07.200 | for the fast AI course.

00:03:08.920 | I wanted to teach people NLP

00:03:10.560 | and I thought I only want to teach people practical stuff.

00:03:12.720 | And I think the only practical stuff is transfer learning.

00:03:15.760 | And I couldn't find any examples of transfer learning in NLP.

00:03:18.560 | So I just did it.

00:03:19.760 | And I was shocked to find that as soon as I did it,

00:03:22.520 | which the basic prototype took a couple of days,

00:03:26.280 | smashed the state of the art

00:03:27.720 | on one of the most important data sets

00:03:29.480 | in a field that I knew nothing about.

00:03:31.920 | And I just thought, well, this is ridiculous.

00:03:35.600 | And so I spoke to Sebastian about it

00:03:39.000 | and he kindly offered to write it up, the results.

00:03:42.880 | And so it ended up being published in ACL,

00:03:46.560 | which is the top computational linguistics conference.

00:03:50.760 | So like people do actually care once you do it,

00:03:54.080 | but I guess it's difficult for maybe like junior researchers

00:03:58.000 | or like, I don't care whether I get citations

00:04:01.800 | or papers or whatever.

00:04:02.960 | There's nothing in my life that makes that important,

00:04:04.840 | which is why I've never actually bothered

00:04:06.720 | to write a paper myself.

00:04:08.240 | But for people who do,

00:04:09.200 | I guess they have to pick the kind of safe option,

00:04:14.800 | which is like, yeah, make a slight improvement

00:04:17.520 | on something that everybody's already working on.

00:04:19.920 | (upbeat music)

00:04:22.520 | (upbeat music)

00:04:25.120 | (upbeat music)

00:04:27.720 | (upbeat music)

00:04:30.320 | (upbeat music)

00:04:32.920 | (upbeat music)

00:04:35.520 | [BLANK_AUDIO]

Most Research in Deep Learning is a Total Waste of Time - Jeremy Howard | AI Podcast Clips