back to indexUSF Deep Learning Info 2020
Chapters
0:0
0:18 Course logistics
9:26 Why top-down learning?
20:50 Research examples
25:21 The fastai library
25:30 PyTorch is taking over the research world
28:33 Helping rescue misguided tensorflow
32:49 fastai's layered API
36:36 Image classification
37:34 Custom labelling
37:46 Segmentation
38:14 Object detection
00:00:00.000 |
I'm Jeremy. This is Silvan who helped me develop this course and I'm a trainer. 00:00:10.000 |
So hopefully you're hearing a lot about learning that. 00:00:18.000 |
Don't ask me much about logistics because I don't think that would be a careless job. 00:00:30.000 |
I wanted to say a couple of things about prerequisites. 00:00:40.000 |
So if you're not a strong coder, you can do it. You just have to work hard. 00:00:52.000 |
We're not even going to tell you how to code in Python. 00:01:00.000 |
If you've never coded in Python but you've done two or three other languages, you'll pick it up super fast. 00:01:08.000 |
If you've done a little bit of MATLAB scripting and that's about it, you're going to have to put in more Alice. 00:01:18.000 |
Have a look at the recordings of some of the previous classes to get a sense of the amount of coding 00:01:22.000 |
chops that are expected. But really it's about, you'll get out of it what you put into it. 00:01:28.000 |
The location is here, PG&E, so here's 101 Howard where we are. 00:01:35.000 |
So it's just a couple of blocks away, but don't turn up here for the class because it's not here. 00:01:40.000 |
It's a much bigger auditorium because it's quite a popular course. 00:01:46.000 |
However, there is a study group here every day throughout the course. 00:01:53.000 |
It's not office hours. It's not tutoring. It's a study group, so you can come and hang out at USF. 00:02:05.000 |
We kind of particularly set it up because a lot of people fly in from overseas and they're just here the whole time, 00:02:11.000 |
so it's kind of a nice place for them to work from. But I'll show you some examples of student projects 00:02:17.000 |
that have come out of the course, and most of them happened in the study group. 00:02:21.000 |
So that's a particularly good thing to do if a lot of people put their jobs on hiatus during the course 00:02:28.000 |
so they can focus on it full time, and if that's you, you probably want to come along to the study group and get involved. 00:02:34.000 |
So it's really like doing projects and hanging out with other like-minded students. 00:02:44.000 |
Okay, so why learn deep learning? Well, deep learning is quite good at quite a lot of things. 00:02:53.000 |
These are all things that deep learning is the best in the world at right now, 00:02:58.000 |
and for many of these things, superhuman as well. I won't go through all of them, 00:03:04.000 |
but basically for kind of complex problems, particularly those involving some amount of kind of pattern recognition and analogy making, 00:03:16.000 |
deep learning tends to work very well and it's used very widely in industry and scientific research and so forth. 00:03:28.000 |
So I kind of saw this coming a while ago and got pretty excited about it and also kind of nervous about it 00:03:38.000 |
because to me when the big technology comes along, which kind of changes what's possible, 00:03:43.000 |
what changes how much people can do things, it gives big opportunities, 00:03:47.000 |
but it can also be a bit of a threat if it all ends up in the hands of a small homogenous group of people. 00:03:54.000 |
So kind of our mission is to get this tool into the hands of as many people as possible. 00:04:00.000 |
One of the things that stops people from getting into deep learning is a view that they can't do it 00:04:07.000 |
or that deep learning is not right for them. This is a list of reasons people tell me that they're not doing deep learning. 00:04:18.000 |
None of them are true. As you'll see in the course, you can get great results with 30 or 40 images. 00:04:28.000 |
We do nearly all of our work on a single laptop computer. You can use deep learning for a really wide range of applications. 00:04:39.000 |
We're not going to be building a brain. We're not claiming this is artificial general intelligence 00:04:45.000 |
and it's not something we're going to talk about at all. We're just talking about deep learning as a tool to get stuff done. 00:04:55.000 |
So there's a strong connection between the University of San Francisco and FAST.ai. 00:05:03.000 |
The University of San Francisco is the oldest university in San Francisco. 00:05:09.000 |
The main campus is on the other side of town. This downtown campus is where the Data Institute lives, 00:05:18.000 |
which is where all the data science stuff happens. How many of you are familiar with FAST.ai? 00:05:29.000 |
The courses you see on FAST.ai all got recorded here at USF or at PG&E. 00:05:37.000 |
They're all USF courses that get turned into MOOCs. 00:05:43.000 |
One obvious question then would be why do the in-person course when it'll be online in July? 00:05:51.000 |
The obvious answer, well, A, you're going to be the first to see the material by quite a bit. 00:05:57.000 |
You're going to be doing it with a bunch of like-minded people. 00:06:01.000 |
The difference between being in a in-person group of people who are all studying the same thing at the same time, 00:06:08.000 |
it's pretty different and it's interesting to see how many of the best students 00:06:14.000 |
who go on to do the most high-impact work were in the in-person course, 00:06:19.000 |
which when you think there's like two or three hundred people that do the in-person course 00:06:24.000 |
versus two or three hundred thousand that do the online course, 00:06:40.000 |
But all of the things that FAST.ai does is about making deep learning accessible, as I described. 00:06:45.000 |
So as well as the education, there's a community, online community that we build and help develop. 00:06:57.000 |
All of these things are very connected to the course that I want to talk about. 00:07:00.000 |
So I'm going to bring all these together because the stuff we do at USF is all deeply connected to the mission of FAST.ai. 00:07:06.000 |
The community all happens here on the forums, forums go FAST.ai. 00:07:14.000 |
And one of the really interesting things about this is that during the period that the live course is going on, 00:07:22.000 |
there's a whole other level of activity on the forums. 00:07:26.000 |
One of the reasons for that is that we actually invite this year the top 800 participants from the forums 00:07:35.000 |
to participate live in the course with you through a live stream. 00:07:40.000 |
So that's an invite-only thing where the best people from the community get to participate. 00:07:47.000 |
And so most of those folks are expert practitioners. 00:07:53.000 |
Many of them have published papers or PhDs or whatever. 00:07:58.000 |
And the quid pro quo is that they help during the course answering questions 00:08:06.000 |
and expanding on things that people are interested in and so forth. 00:08:10.000 |
So there's this kind of huge uptick in activity on the forums that goes on during the in-person course. 00:08:18.000 |
And the actual category where that's happening is a private category just for the people that are in the course, 00:08:25.000 |
the invited live stream or the in-person folks until the online version comes out in July. 00:08:31.000 |
So it's kind of like your private study group of a thousand people around the world. 00:08:40.000 |
So as I mentioned, the courses that get recorded here at USF get remixed into an online course. 00:08:50.000 |
The online courses that we've developed have been super popular, 00:08:54.000 |
so nearly a million hours of time spent around the world on watching this material over three million views. 00:09:09.000 |
One of the reasons that the course has been so popular is it's kind of upside down compared to most technical teaching, 00:09:18.000 |
which is that you learn how to do things before you learn why those things work. 00:09:25.000 |
So we describe that as the difference between a bottom-up teaching approach and a top-down teaching approach. 00:09:30.000 |
So bottom-up is what most university technical material looks like. 00:09:35.000 |
It's kind of started like addition and then subtraction and then gradually build up until in your PhD program you learn to do something useful. 00:09:44.000 |
And it's a lot of people who like study math, for example, kind of say like, 00:09:48.000 |
"I didn't actually get to appreciate the beauty of this subject until I got to my PhD." 00:09:55.000 |
There's another way to learn, just top-down, which is how we learn music or baseball, 00:10:02.000 |
which is to put an instrument in somebody's hands and get them playing music 00:10:08.000 |
and then gradually over the next ten years you learn about harmonies and theory and history and whatever. 00:10:16.000 |
So we teach people learning more like how people teach music or sports. 00:10:22.000 |
So you get started using stuff right away and this means that you avoid these problems at the bottom-up approach. 00:10:30.000 |
You have motivation from the start because you're building useful stuff. 00:10:34.000 |
You have the context, human brains love context, so you know why things are being done. 00:10:39.000 |
And you understand which bits are important because we don't teach the lower-level pieces until we need them. 00:10:48.000 |
So one misunderstanding of the top-down approach is some people think it's kind of dumbed down 00:10:55.000 |
or has less theory and foundations and that couldn't be further from the truth 00:10:59.000 |
because what happens is with the top-down approach as we peel the layers away, 00:11:04.000 |
we do eventually get to that core and so we actually end up seeing the math and the theory and so forth. 00:11:13.000 |
Having said that, the math is not taught with Greek letters and math notation. 00:11:22.000 |
The math is taught with code because our view is, well there's a couple of things. 00:11:29.000 |
The first is all the math gets ended up being turned into code anyway to actually get the computer to do something. 00:11:38.000 |
So you may as well see it in the form it's going to be used. 00:11:41.000 |
When it's shown in the form of code you can experiment with it. 00:11:44.000 |
You can put inputs into it and see the outputs come out of it. 00:11:49.000 |
And also, why learn two whole separate languages and notations? 00:11:58.000 |
So we teach the math that's necessary to actually understand the foundations. 00:12:09.000 |
We introduce a bit of notation in order to, because sometimes you just have to read a paper to see how something works, 00:12:16.000 |
so we try to show his how to make sense of papers. 00:12:20.000 |
But the vast majority of the explanation is as code. 00:12:27.000 |
Now let's look at the code of the math and see how it maps. 00:12:32.000 |
One thing that you might find interesting is to look at some student work. 00:12:39.000 |
So something that I did a while ago was to put up a post on the forum saying like, 00:12:47.000 |
"Oh, after lesson one, if you've made something interesting, let us know." 00:12:55.000 |
I did this when there was a thousand replies. 00:12:59.000 |
Lots of people have posted and said like, "Oh, here's something I built." 00:13:04.000 |
And it's been really cool because there are people all around the world that do the course, 00:13:09.000 |
even the live course, because of the 800 I mentioned, who are also live streaming it. 00:13:14.000 |
You get all these interesting projects going on. 00:13:17.000 |
So one person talked about a recognition program to see different types of Trinidad and Tobago masqueraders versus regular islanders. 00:13:29.000 |
And one of the interesting things here is it's like 47 cucumbers and 39 zucchinis. 00:13:37.000 |
So we don't need lots of data, even for things that are pretty subtle, like cucumbers versus zucchinis. 00:13:44.000 |
This was an interesting one, which is predicting what city a satellite photo is of, 00:13:52.000 |
which is kind of something that I doubt any of us humans could do. 00:13:56.000 |
But a computer was doing it to 85% accuracy for 110 cities, which is amazing. 00:14:04.000 |
Looking at Panamanian buses, Batik cloth patterns, Tanzanian building conditions. 00:14:14.000 |
And it turns out that it's not at all uncommon, like every course, 00:14:19.000 |
there's lots of examples of people who discover that they have a new state of the art. 00:14:24.000 |
Because deep learning still has been not applied to more things than it has been applied to. 00:14:29.000 |
So whatever you come in here with an interest in as a hobby or as your vocation or whatever, 00:14:35.000 |
hopefully you can try these techniques on that thing. 00:14:38.000 |
So it turned out that Sivash got a state of the art result on Dev and Gary character recognition, 00:14:46.000 |
literally using the lesson one course material. 00:14:50.000 |
Ethan got a state of the art result on environmental sound classification. 00:14:55.000 |
One of the interesting things here is lesson one is all about image classification, 00:15:02.000 |
but you can turn a lot of things into images. 00:15:05.000 |
So in this case he converted sounds into images representing those sounds 00:15:14.000 |
And then he compared to his paper to see what the state of the art was and got the best result. 00:15:26.000 |
Elena is an example of somebody who took it to a whole other level. 00:15:30.000 |
She actually, during the course, did three hosts all in the area of genomics. 00:15:35.000 |
She's one of the top scientists at Human Longevity International. 00:15:39.000 |
And in every case she showed a significant advance of the state of the art in different genomics areas. 00:15:45.000 |
So we actually, like you'll see there's a lot of writing here. 00:15:48.000 |
You don't have to write, but a lot of students do. 00:15:52.000 |
We encourage it because it's a great way to develop your understanding of the material, 00:16:01.000 |
And so we do talk a bit about writing, and a lot of students try out their head 00:16:07.000 |
at writing something about, particularly about, like in Elena's case, 00:16:11.000 |
a combination of deep learning and something you know about. 00:16:22.000 |
A good example is Jason Antich, who, during the course, created this project he called de-oldify. 00:16:28.000 |
So in lesson seven last year we showed a new approach to generative image models, 00:16:33.000 |
where we said, like, oh, you can take a picture as an import and create a new picture as an output. 00:16:39.000 |
And Jason thought, oh, I wonder what would happen if I did that to create black and white input images 00:16:50.000 |
And as of last week he just announced that he's now quit his job. 00:16:57.000 |
He's just sold it to the world's largest ancestry online site. 00:17:03.000 |
And in the first week they had over, I think they said, 00:17:06.000 |
a million people use his system to colorize photos of their relatives. 00:17:15.000 |
It's even kind of created new communities of practitioners. 00:17:23.000 |
Folks like Alex, who went on to get together with a bunch of other folks and create a well-regarded paper. 00:17:31.000 |
He also won a Kaggle competition on pneumonia detection, a very significant Kaggle competition. 00:17:37.000 |
He's now widely regarded as one of the experts in the field of big learning and medical imaging, 00:17:45.000 |
even although I think he's still a resident or recently just finished being a resident. 00:17:53.000 |
It's been cool to see there's lots of radiologists now who, particularly younger folks like Alex, 00:17:59.000 |
who are expert practitioners in deep learning and also deeply understand their field of radiology 00:18:05.000 |
and are bringing the two together to do some really powerful stuff. 00:18:10.000 |
Melissa Thapros actually did some super exciting work. 00:18:15.000 |
She kind of really pioneered the study of facial recognition algorithms on people of color. 00:18:23.000 |
But it turned out that they didn't work real well. 00:18:26.000 |
And this is important because she was helping Kiva that does microlending, 00:18:31.000 |
mainly in markets where there's not that many white folks. 00:18:36.000 |
So they tried to kind of use the algorithms that are out there. They didn't work very well. 00:18:40.000 |
And she won this million-dollar AI for Everyone Challenge round. 00:18:45.000 |
And a lot of these people don't have traditional machine learning backgrounds. 00:18:51.000 |
So at the time, I think she was doing a PhD in English literature, as I mentioned, Alex, radiology. 00:18:59.000 |
Another alumni saying he was a lawyer, and he built this super impressive system for GitHub 00:19:09.000 |
showing how you can search for code in English and get back code snippets. 00:19:20.000 |
Christine Payne wrote a neural net that created this. 00:19:35.000 |
So after doing the Fast.AI course, and actually she was in the in-person USF course, 00:19:45.000 |
she went to OpenAI and became a resident there, and she wrote this, 00:19:50.000 |
which went on to be produced by the BBC Philharmonic. 00:19:56.000 |
And so her background is... that's her as a pianist. 00:20:02.000 |
So here's a great example of people bringing their domain expertise together with their deep learning skills. 00:20:08.000 |
It's important to realize, folks like Christine, it's not like she just does a single course 00:20:16.000 |
She worked very hard over a significant period of time and is also a genius, which helps. 00:20:25.000 |
But the point here is that you absolutely can bring together domain expertise 00:20:32.000 |
and a deep understanding of deep learning that you can get through this USF course 00:20:39.000 |
and the other Fast.AI online courses and so forth and pull it together into something super cool. 00:20:49.000 |
So a lot of our research ends up in the course, 00:20:54.000 |
and actually a lot of that research happens during the course, particularly in the study group. 00:21:00.000 |
So, for example, MIT Technology Review wrote about how a small team of student coders beat Google. 00:21:11.000 |
They didn't just beat Google but also Intel to create the fastest ever training of ImageNet and CIFAR-10, 00:21:19.000 |
two of the most important computer vision benchmarks in the world. 00:21:23.000 |
This was a competition called John Bench. So, yeah, this was exciting in the study group. 00:21:30.000 |
A few of us decided, hey, let's see if we can take a crack at this competition. 00:21:36.000 |
We spent a couple of weeks giving it our best shot. 00:21:39.000 |
And really the trick was we didn't have the resources, like Intel entered, 00:21:44.000 |
and they entered by combining a thousand servers into a mega cluster and running it on this mega cluster. 00:21:51.000 |
We had one computer. So we had to think, how do we beat a thousand computers with one computer? 00:22:02.000 |
And so we've had lots of examples of this, both from projects that we've done in our research and our students, 00:22:08.000 |
again and again, get state-of-the-art results with far fewer resources than the big guys. 00:22:15.000 |
Another interesting example of research that actually came out of a course was a couple of years ago, 00:22:20.000 |
maybe three years ago now. I thought it would be great to show the students how transferable ideas are. 00:22:28.000 |
And so the first two lectures were mainly about computer vision images. 00:22:35.000 |
And so I wanted to show what would happen if you took those same ideas and applied them to text. 00:22:39.000 |
And it turned out that nobody had really done that before. 00:22:43.000 |
And I didn't know much about text, natural language processing, but I thought I'd give it a go. 00:22:48.000 |
And within a few hours of trying it, it turned out I had a new state-of-the-art result for one of the most widely studied text classification datasets. 00:22:57.000 |
So somebody who saw that course, who was then doing his PhD in this area, got in touch and offered to write it up into a paper. 00:23:07.000 |
And we ended up publishing a paper together that got published in the top journal for computational linguistics. 00:23:13.000 |
And actually kind of went on to help kick-start a new era in NLP, even got written up in the New York Times. 00:23:22.000 |
And today this idea of using these computer vision-based transfer learning techniques in NLP is probably the most important NLP development going on at the moment. 00:23:36.000 |
Another area that we've really focused on in trying to -- this is all trying to make deep learning more accessible, right? 00:23:44.000 |
So less computers, less data, less specialist knowledge. 00:23:50.000 |
So one of the things that's been holding back people from using deep learning is there's a lot of parameters to tweak, settings to get just right, like learning rates and regularization and optimization parameters and whatever. 00:24:05.000 |
So one of the things we do is we just look out there to find the research that's already been done but has been overlooked to solve these problems. 00:24:14.000 |
So, for example, the most important parameter to set is something called a learning rate. 00:24:20.000 |
And we discovered there was already a paper showing how to set it really well in like a minute, whereas previously people were using vast compute clusters to try out lots of different values. 00:24:34.000 |
So we kind of popularize that and put it into the software and make it easy to use. 00:24:41.000 |
The other thing we do is we tried lots of different -- these settings that we tweaked, they're called hyperparameters. 00:24:47.000 |
We tried lots of different hyperparameters across lots of different data sets and found a set of hyperparameters that just work nearly all the time. 00:24:57.000 |
So one of the things we show in the course is how to like not waste your time and money doing stuff that, you know, you just don't have to fiddle with. 00:25:08.000 |
There's already well-known good settings or there's easy ways to figure them out pretty quickly. 00:25:21.000 |
So one of the really interesting things about it is the use of something called the Fast AI Library. 00:25:30.000 |
So the Fast AI Library is a library that sits on top of another library called PyTorch. 00:25:36.000 |
In deep learning, there are two libraries that pretty much everyone uses, PyTorch and TensorFlow. TensorFlow came out of Google, the advanced resources a few years ago. 00:25:58.000 |
But we got to a point where it wasn't flexible enough to handle what we wanted to show students how to do with it and flexible enough for the research we wanted to do. 00:26:08.000 |
And luckily, at that time, a couple of years ago, a new library came out called PyTorch, which was really just a couple of people wrote it. 00:26:19.000 |
But interestingly, it kind of had a bit of a fast AI feel to it in that because they didn't have the resources, they had to be super careful to curate the best approach to each thing, to make sure they did each thing just once and to be careful. 00:26:33.000 |
And we thought it was amazingly great and we switched everything over to PyTorch. 00:26:38.000 |
And what's happened since then is PyTorch is kind of taking over the world. 00:26:43.000 |
The first place you take over the world is in research because when the researchers flock to your software, then all of the new developments come out with your software and anybody who wants to use that have to do it with your software. 00:26:58.000 |
So if we look at the last few academic conferences, in each case, the percentage using PyTorch papers is over 50%. 00:27:08.000 |
And you can see that's happened basically in one year. 00:27:12.000 |
So you can clearly see that PyTorch is going to take over research in the industry in the next year or two. 00:27:23.000 |
So there's no point in learning what people used to use. 00:27:27.000 |
And there's a reason that this is happening, it's just much better. 00:27:31.000 |
So we focus on that. The only issue is that PyTorch is kind of a lower level plumbing library. 00:27:43.000 |
So that's no good really for like particularly a top-down approach to getting stuff done. 00:27:49.000 |
So we wrote our own library on top of that called FastAI, which just makes a lot of things much easier. 00:28:00.000 |
And so FastAI is now super popular in its own right. 00:28:05.000 |
It's available on every major cloud platform. 00:28:09.000 |
Lots and lots of research is coming out of it. 00:28:17.000 |
We often get messages like this from people saying, "Oh, I just started using FastAI. 00:28:22.000 |
I used to use TensorFlow and the first thing I tried, everything's so much better. 00:28:28.000 |
How could it be this much better? I thought deep learning was deep learning." 00:28:32.000 |
And then somebody else replied, founder of this company, saying, "Yep, that's what we found. 00:28:36.000 |
We used to use TensorFlow. We spent months tweaking it. 00:28:39.000 |
FastAI used it and immediately got better results." 00:28:43.000 |
So I mean, the main thing we teach is the concepts, the understanding of what we're doing. 00:28:52.000 |
So in a sense, having great software doesn't matter as much. 00:28:58.000 |
But it sure is nice that when you do something correctly, 00:29:03.000 |
it would be nice to get a world-class result rather than have to spend months fiddling around with things. 00:29:11.000 |
So for example, with the previous version of FastAI, when we compared it to Keras, 00:29:17.000 |
which is the main equivalent kind of API on top of TensorFlow, 00:29:23.000 |
comparing it to the code that Keras makes available for a particular problem, 00:29:28.000 |
we found our error was about half of the Keras error. 00:29:35.000 |
The amount of lines of code was about one-sixth of the Keras lines of code. 00:29:40.000 |
Lines of code is important. It's really important because these extra 26 lines of code 00:29:46.000 |
is like 26 lines of things you have to tweak and change and choose, 00:29:53.000 |
which is cognitive overhead that you have to deal with. 00:29:56.000 |
And if it doesn't work, which one of those extra lines was the problem? 00:30:02.000 |
So when we used to use Keras, we used to teach it in the course, this would happen all the time. 00:30:07.000 |
I would keep finding that things didn't work as well as I hoped they would, 00:30:13.000 |
and there would be one of those lines of boilerplate where something was true rather than false. 00:30:18.000 |
So our view is that you shouldn't have to type more lines of code than necessary, 00:30:27.000 |
but at the same token, everything should be infinitely customizable. 00:30:34.000 |
So to get to that point, this course will be the first-ever course to show how to use FastAI version 2, 00:30:44.000 |
which is officially coming out in July, so we'll be using the pre-release version of it. 00:30:49.000 |
And FastAI version 2 is a rewrite from scratch that is described fully in this peer-reviewed paper 00:30:55.000 |
in the journal Information and will be in this O'Reilly book, 00:31:01.000 |
of which you'll all be able to access for free during the course. 00:31:07.000 |
And it's a huge advance over anything that's come before. 00:31:13.000 |
and it's very much designed to be what we described in the paper as a layered API. 00:31:20.000 |
It's all about what happens when you take a coder mentality to a deep learning library 00:31:28.000 |
and you think hard about things like refactoring and separation of concerns 00:31:33.000 |
and stuff that software engineers really care about. 00:31:38.000 |
So with FastAI version 2, there's a lot of interesting stuff 00:31:42.000 |
that you'll be the first ones to learn about and experiment with. 00:31:48.000 |
So for example, there are diagnostics that print out after training that create a picture. 00:31:55.000 |
This is a unique, new kind of picture showing you what's going on inside a network as it trains. 00:32:01.000 |
I won't describe it fully here, but basically what this is showing is the first one, 00:32:05.000 |
two, three, four layers of a deep neural network, 00:32:08.000 |
and it's showing that the numbers, they're called activations that neural network, 00:32:12.000 |
are growing exponentially and crashing, and growing exponentially and crashing. 00:32:18.000 |
And so there are these pictures that you can get out of the training process 00:32:23.000 |
to actually look inside and see what's going on. 00:32:25.000 |
And one of the nice things about this picture is it was actually developed 00:32:32.000 |
So one of the international visitors from Italy, Stefano, 00:32:35.000 |
actually helped draw out all the different ways that we could build this visualization, 00:32:54.000 |
and the course will focus initially on the top layer, which we call the applications. 00:33:00.000 |
And these are the four things which are kind of pretty well established 00:33:06.000 |
as being things deep learning is very good at, and we know how to do them, 00:33:09.000 |
and we kind of really know how to do them properly, 00:33:15.000 |
So we start there, and then we gradually delve into the mid-layer API, 00:33:20.000 |
which is the components, the flexible components that the applications are built on. 00:33:25.000 |
And then eventually we get to the foundation, 00:33:28.000 |
which is the lowest level plumbing that everything is built on top of. 00:33:40.000 |
is all the code you need with fast AI to create a world-class classifier 00:33:46.000 |
for recognizing pet breeds, and it takes, as you see, 35 seconds to run. 00:33:51.000 |
This is on a single computer, and it's one, two, three, four lines of code. 00:34:01.000 |
The lines of code don't matter so much other than to point out 00:34:04.000 |
that if you wanted to switch from an image classifier 00:34:07.000 |
to something that can handle segmentation, this is segmentation. 00:34:13.000 |
and you want to color code every pixel to say what is it a pixel of. 00:34:25.000 |
So you can see again, it's one, two, three, four lines of code, 00:34:30.000 |
nearly the same four lines of code to do segmentation. 00:34:36.000 |
And this show batch, which visualizes the contents, 00:34:43.000 |
>> For the color select for those, is it random by the machine? 00:34:46.000 |
>> The color selection is coming from something called a color map. 00:34:50.000 |
So the plotting library we use is something called Mapplotlib, 00:34:57.000 |
and we just have a default color map that tries to select colors 00:35:08.000 |
So this is how to get world class results on sentiment analysis, 00:35:18.000 |
and again, show batch will now tell us here's the text and here's the labels. 00:35:27.000 |
Tabular data, stuff that's in spreadsheets or database tables, 00:35:31.000 |
is something that a lot of people don't realize actually works great with deep learning. 00:35:36.000 |
And again, it's basically the same lines of code, 00:35:39.000 |
so here's one to predict who's going to be a high-income earner 00:35:43.000 |
versus a low-income earner based on socioeconomic data. 00:35:47.000 |
There's one extra step, which is you have to say which ones are categorical 00:35:50.000 |
and which ones are continuous, which we'll learn all about, 00:35:57.000 |
And then very related is collaborative filtering. 00:36:01.000 |
Collaborative filtering is a really important technique for recommendation systems, 00:36:05.000 |
so figuring out who's going to be interested in buying this product 00:36:08.000 |
or looking at this movie based on the past behavior of similar customers, 00:36:13.000 |
and again, it's the same basic lines of code. 00:36:22.000 |
so we'll study all of those applications, we'll learn how to use them in practice 00:36:28.000 |
But they're built on this mid-tier API that you can kind of mix and match. 00:36:32.000 |
So I want to show you the whole thing, obviously, now, 00:36:34.000 |
but an example of this is something called the DataBlock API, 00:36:38.000 |
where, for example, if you want to do a digit recognition, 00:36:41.000 |
so you've probably heard, like for machine learning, for DIC learning, 00:36:46.000 |
it's getting the data into a form, you can model it. 00:36:48.000 |
We've realized there's basically four things you have to do to make that happen, 00:36:52.000 |
and so we created this DataBlock API where you list the four things separately. 00:36:59.000 |
we say the input type is a black and white image, as you can see. 00:37:04.000 |
The output type is a category. Which digit is it? 00:37:08.000 |
They are image files. How do you split into training and validation? 00:37:15.000 |
So you just say each of the things, and these are all plain Python functions. 00:37:19.000 |
So you can write your own Python code to replace any of these things. 00:37:23.000 |
And so once you've done that, you now have something called a DataLoader, 00:37:27.000 |
which is a PyTorch concept, which is basically a thing you can train a model from. 00:37:32.000 |
So you can use a very similar-looking DataBlock to do custom labeling for a pet's example, 00:37:40.000 |
to label with multiple labels, for example, for satellite classification, segmentation, 00:37:47.000 |
it looks almost the same. Instead of now having an image input and a category output, 00:38:01.000 |
If you're doing key points, so in this case we're looking for the center of people's faces, 00:38:06.000 |
it's almost the same thing, but now we have an image input and a point output. 00:38:12.000 |
So this is an example of how it's really software engineering kind of basic principles 00:38:20.000 |
of building these APIs with a nice decoupled separation of concerns, 00:38:24.000 |
and the user ends up in a situation where they can build what they need, 00:38:29.000 |
in a kind of fast and customized and easy way. 00:38:37.000 |
So I'll show you one more example of this mid-tier API, which is something called optimizers. 00:38:45.000 |
Optimizers are the things that actually train your model, and you'll be learning all about them. 00:38:49.000 |
It turns out that optimizers are a current kind of big area of research interest, 00:38:56.000 |
and in the last 12 months people have built some much better optimizers. 00:39:02.000 |
They work much better than the basic approach called SGD. 00:39:06.000 |
And so when they do that, they release papers and they release code. 00:39:10.000 |
So one of the important recent optimizers is something called AdamW. 00:39:15.000 |
It's actually not that recent, it was a couple of years ago, 00:39:18.000 |
but it took about two years for AdamW to get implemented in PyTorch. 00:39:23.000 |
And in PyTorch it takes all this code because the software engineering work of refactoring had never happened. 00:39:33.000 |
So we realized when we looked at lots and lots of papers that you could refactor all of them 00:39:39.000 |
into a small basic framework using callbacks. 00:39:45.000 |
All this code gets turned into these three lines and this little gray bit. 00:39:53.000 |
So for us, we had AdamW implemented the day after the paper came out. 00:39:57.000 |
So one of the cool things about working with fastAI is you often get to be the first 00:40:01.000 |
to try out new research techniques because they're so easy to implement. 00:40:05.000 |
Either us or somebody in the community will implement them. 00:40:08.000 |
So a really cool example of this is Google actually implemented a new optimizer 00:40:17.000 |
to reduce the amount of time it took to train a very important NLP, 00:40:22.000 |
language model, from three days to 76 minutes. 00:40:25.000 |
And they created this thing called the Lamb Optimizer. 00:40:40.000 |
And one of the nice things, if you're not a math person like me, I am not a math person, 00:40:44.000 |
being able to see how the code matched to the math helps me get more comfortable with the math. 00:40:51.000 |
So actually I presented this to Google a few weeks ago, fastAI2. 00:40:57.000 |
And it turned out one of the people in the room was one of the people on the paper. 00:41:03.000 |
And he was just sort of happy to see his ideas so nicely expressed in code. 00:41:09.000 |
So a lot of people are kind of funny about the idea of having a small amount of code 00:41:17.000 |
But a small amount of code means you're expressing the actual thing you want to express. 00:41:22.000 |
So when it says, you know, compute this thing, there should be a line of code, 00:41:27.000 |
If it's 50 lines of code computing that thing, my brain can't cope. 00:41:32.000 |
So, you know, fastAIv2 is kind of designed for people with brains like mine 00:41:57.000 |
So I've watched several of the courses of Battler, yeah, before. 00:42:05.000 |
But now one of the parts that I've struggled is basically the math. 00:42:12.000 |
The math is sort of like trying to understand that sometimes it's really hard. 00:42:16.000 |
But is there anything that you would recommend for the math? 00:42:21.000 |
So for understanding the math, the best is actually part 2, 00:42:28.000 |
which I assume will be in October, but we already have a part 2. 00:42:32.000 |
So in the part 2 we already have online on fastAI that was recorded as the USF part 2. 00:42:39.000 |
We implement, I don't know how many silver, like a dozen of all papers from scratch. 00:42:46.000 |
So if you do that, like for me, as I say, I'm not a math person. 00:42:52.000 |
I studied philosophy, and even then not in any break depth. 00:42:56.000 |
So my understanding of the math is basically coming from this process of implementing papers. 00:43:01.000 |
And so I find when I read somebody else's code and I read their paper and I compare them 00:43:06.000 |
and then I implement it myself, you know, I kind of get there. 00:43:11.000 |
And there is a kind of a language to it that takes a while to pick up, just like programming. 00:43:23.000 |
And the other thing is it's kind of less, it's less well-defined. 00:43:28.000 |
People change the way it's described, so sometimes you just have to stare at it for a while 00:43:32.000 |
and go and ask people, but it's showing the same ideas that's in the code. 00:43:37.000 |
So it takes a certain amount of tenacity as well. 00:43:43.000 |
So, how is this year's course different from last year's course? 00:43:47.000 |
And if you do the online course, would you get a head start if so? 00:43:53.000 |
So each year the course is 100% new material. 00:43:57.000 |
But each year it's trying to teach you the same stuff, which is how to be a good learning practitioner 00:44:03.000 |
and also prepare you to do the Part 2 course, which is to become a world-class researcher, 00:44:11.000 |
But each year the world changes enough that we think we can do a much better job 00:44:16.000 |
with the things that have happened in the last 12 months. 00:44:20.000 |
So doing the previous year's course helps a lot. 00:44:24.000 |
So like of the 800 people that are doing it live, probably most of them have done all of our previous courses. 00:44:33.000 |
partly because you get a different angle on the material. 00:44:38.000 |
So yeah, it's super helpful, particularly if you haven't got lots of Python experience 00:44:44.000 |
or if you haven't played around much with like NumPy or the kind of scientific programming libraries. 00:44:52.000 |
The thing I'll say is this course is kind of unusual in the variety of people that do it. 00:45:02.000 |
There are plenty of people that turn up two hours a week and don't do any of the assignments and that's it. 00:45:12.000 |
They probably can't train a model much themselves and make it work, 00:45:16.000 |
or if it breaks they wouldn't know how to fix it because they haven't practiced. 00:45:19.000 |
But they have a good sense like if they're a product manager or a CTO or whatever, 00:45:23.000 |
like what are the capabilities, what does the kind of approach look like, 00:45:27.000 |
where do people get stuck, you know, where are the constraints. 00:45:31.000 |
But a lot of people study fast AI parts one and two full time for a year. 00:45:37.000 |
Like a lot of people take a year off to just do that and you certainly can get into that depth as well. 00:45:48.000 |
And if you are interested in going deep then studying previous courses is certainly useful. 00:45:56.000 |
One of the things we're doing differently this year is we're also incorporating the key material 00:46:01.000 |
from the introduction to machine learning course. 00:46:04.000 |
So we'll be learning about in more detail about things like training versus validation sets. 00:46:09.000 |
We'll be learning about random forests, we'll be learning about feature importance, stuff like that. 00:46:16.000 |
So in the past there was two separate courses. 00:46:23.000 |
Do you have any recommended readings that we might want to look at prior to the course? 00:46:28.000 |
The best recommended reading would be the previous videos. 00:46:35.000 |
And there isn't much, I'm just trying to think, do you think so? 00:46:43.000 |
I mean there's actually, people have taken great, really great notes about previous courses. 00:46:50.000 |
So like the other thing would be, and they're linked to the courses. 00:46:54.000 |
So there's people who have gone to a lot of effort to turn lessons into prose. 00:47:02.000 |
But yeah, I mean there's not a lot of great material out there and kind of top down practical deep learning elsewhere that we found. 00:47:19.000 |
Part two, there will be, we do, so we have a medical research initiative here, which I chair, called WAMRI. 00:47:31.000 |
It turns out that the vast majority of the time the best thing to do is what we call 2.5D, 00:47:36.000 |
which is where you basically treat the images largely separately and then at the very end you combine them. 00:47:41.000 |
The basic techniques to do that, we'll learn them all in the part one course. 00:47:47.000 |
But to actually put them together that would be more of a project you could do, I guess. 00:47:54.000 |
I actually have two questions. The first one, I really love what you did with the abstraction in FastAI. 00:48:02.000 |
I think that's brilliant. However, generally when you do something like that, you have to give up something. 00:48:09.000 |
And what I'm thinking is you give up fine-grained control, maybe. Is that fair to say? 00:48:15.000 |
No, it's not. Just like we have layers of abstractions in all the software we use, 00:48:26.000 |
just because you create another layer of abstraction doesn't mean the other ones go away. 00:48:31.000 |
So in the previous version of FastAI, we didn't have this mid-layer. 00:48:40.000 |
And that was a big problem because the applications were written in these low-level foundations. 00:48:46.000 |
So if you wanted to create a new application for audio or 3D medical imaging or whatever, 00:48:56.000 |
So by adding this extra tier, we've kind of had this pre-release version available for a few months now. 00:49:06.000 |
It's been amazing to see how the community is building a lot more stuff already. 00:49:12.000 |
The other thing is that when you provide the applications tier that makes it more concise but also more expressive, 00:49:23.000 |
you can get involved more quickly. You can understand what's going on by saying, 00:49:27.000 |
"Oh, these are the things you actually have to change from application to application, data set to data set." 00:49:33.000 |
So then you can customize it a little bit at a time. 00:49:36.000 |
So it makes it much more gradually learnable, gradually extensible. 00:49:42.000 |
You mentioned the code versus mathematics and how there's almost one-to-one correspondence between the two. 00:49:52.000 |
But I'm thinking you can't prove stuff using code. You can with math. 00:49:59.000 |
We don't do any proofs, and I don't find that's a problem. 00:50:07.000 |
I've published very highly cited research papers in top journals, and I don't have proofs. 00:50:16.000 |
I say, "Hey, here's this thing in computer vision that works, and here's an understanding of why it works, 00:50:25.000 |
and here's the similar ideas in natural language processing, and so we would expect the same thing to happen, 00:50:32.000 |
So proofs are a controversial topic in deep learning, and for many years they were absolutely demanded 00:50:42.000 |
for pretty much every conference paper that got accepted to the top conferences. 00:50:46.000 |
Jeffrey Hinton, who's one of the fathers of deep learning, complained regularly about the fact that we built something 00:50:56.000 |
that creates these billion parameter networks with all these layers. 00:51:02.000 |
Any proof requires simplifying the math down to a point that it's not accurate anymore. 00:51:09.000 |
And there's still a bit of that that happens, honestly. There's a lot of papers that are still published today 00:51:16.000 |
where they end up in top journals because they have proofs in them, but they require--it's kind of like economics. 00:51:22.000 |
It's like, "Oh, let's set up some premises that have nothing to do with the reality of training a real neural network," 00:51:28.000 |
but now we've simplified it so much, the main thing people try to do is prove convergence bounds. 00:51:34.000 |
It basically says, "Oh, regardless of how you initialize this thing, you'll always end up with the error getting better and better." 00:51:44.000 |
When I started out in this area 20 plus years ago, everything was always about proving convergence, 00:51:52.000 |
but a lot of people in operations research focused on only looked at algorithms where you could prove that it would be optimal. 00:52:01.000 |
And it really set the field back because it meant that all the techniques that worked in practice that we could improve it for got ignored. 00:52:11.000 |
So I'm kind of very cautious about proofs in general, and it's certainly not something that we look at in the course. 00:52:20.000 |
I'm wondering, is there anything that we should be doing in terms of expectations that you may have for projects or whatever? 00:52:27.000 |
Is there anything we should be preparing between now and when it begins? 00:52:35.000 |
Your greatest impact will be if you can combine what you learn in this course with stuff that you're deeply passionate about right now. 00:52:47.000 |
So that might be stuff you do at work, or it might be stuff you do outside of work. 00:52:51.000 |
So if you can come with ideas about problems you want to solve, data sets you want to explore, that's super helpful. 00:53:00.000 |
If you can start curating the data set that you might be interested in learning more about, that would be super helpful. 00:53:09.000 |
So I think there's always kind of a delicate balance between following the material that's in each lesson versus exploring your project. 00:53:19.000 |
And there's mistakes to be made on either side. 00:53:21.000 |
If you only look at your project, you're going to miss out on actually understanding the stuff that's being presented each week. 00:53:29.000 |
But if you just do nothing but read and listen to what's being presented and not experimenting with your own stuff, you don't get to find out where the hard edges are. 00:53:40.000 |
So then you can jump on the forums and say like, "Hey, Jeremy thought this would work if we did this thing, and I tried it on this data set, and it's totally not working at all." 00:53:49.000 |
And have that conversation and try to figure it out. 00:53:54.000 |
More generally for code, write as much Python as possible. 00:54:00.000 |
And then the Python we use for deep learning is a particular kind of Python. 00:54:08.000 |
It's where we almost never use loops, but instead we do things on whole arrays, or we call it intensors at once. 00:54:17.000 |
So learning to use the NumPy library, N-U-M-P-Y, would be super helpful because PyTorch is nearly identical. 00:54:26.000 |
And trying to get some experience of working with matrices and vectors and adding them together and multiplying them and stuff like that is useful. 00:54:36.000 |
You don't need to study linear programming. Sorry, linear algebra. 00:54:40.000 |
We have a linear algebra course, but you don't need to know. 00:54:44.000 |
A lot of people come in and spend too much time on this math stuff. 00:54:49.000 |
But yeah, I would come in as good at coding as you can make yourself, because that's the language you'll be talking to the computer in. 00:55:00.000 |
Do we need to create our own compute environment like an AWS or a cloud before this class starts? 00:55:13.000 |
If you go to course.fast.ai and then click on Server Setup, you can see there's all these different options. 00:55:29.000 |
And if you scroll down here, you'll find a description of each one. 00:55:34.000 |
So if you pick, let's say, Gradient, then you'll see it basically tells you create an account, click on the fastai button, decide which machine you want, click Create Notebook, and you start it. 00:55:50.000 |
So all the major cloud companies have fastai built into their environments. 00:55:58.000 |
There's always a group of people that are tempted to set up a new computer and put in a GPU and all this stuff. 00:56:04.000 |
Those are the people that spend the entire course installing Linux drivers, so don't be that person. 00:56:13.000 |
Google Cloud comes with $300 of free credits, which is more than enough to get you through the course many times over. 00:56:20.000 |
There's also something called Colab, which is free, and Campbell Kernels, which are free. 00:56:27.000 |
If you're interested in exploring a little bit of basic Linux setup stuff, Google Cloud is your best option. 00:56:36.000 |
If you don't want to touch that at all, PaperSpace Gradient has free one-click Jupyter notebooks that you can get started straight away, and all the data sets and notebooks are ready to go.