back to indexLesson 1: Deep Learning 2019 - Image classification
00:00:06.200 |
Practical deep learning for coders lesson one 00:00:11.560 |
There's a lesson zero and lesson zero is is why do you need a GPU and how do you get it set up? 00:00:20.080 |
then go back and do that make sure that you can access a 00:00:28.040 |
And then you're ready to start the real lesson one. So if you're ready 00:00:32.220 |
you will be able to see something like this and 00:00:37.080 |
In particular, hopefully you have gone to notebook tutorial. It's at the top 00:00:42.120 |
That's right, but a 0/0 here as this grows. You'll see more and more files, but we'll keep notebook tutorial at the top and 00:00:50.920 |
Jupiter notebook to add one and one together get in the expected result 00:01:01.800 |
And hopefully you've learned these four keyboard shortcuts 00:01:11.840 |
Has pros in it. It can have pictures in it. It can have 00:01:23.120 |
And most importantly it can have code in it. Okay, so the code is in Python 00:01:29.520 |
How many people have used Python before so nearly all of you that's great 00:01:35.920 |
Um, if you haven't used Python, that's totally okay. All right 00:01:40.480 |
It's a pretty easy language to pick up. But if you haven't used Python 00:01:44.680 |
This will feel a little bit more intimidating because the code that you're seeing will be unfamiliar to you. Yes, Rachel 00:02:00.000 |
Try to keep the most sacred. Yeah. Yeah. Okay. We're not the way here. I'll edit this bit out 00:02:04.240 |
So as I say, there are things like this where people in the room in person 00:02:10.240 |
This is one of those bits just like this is really for the MOOC audience 00:02:13.520 |
Not for you. That's I think this will be the only time like this in the in the lesson where we've assumed 00:02:24.880 |
All right, so yeah for those of you in the room or on for or in faster you're live 00:02:29.740 |
You can go back after this and make sure that you can get this running using the information in course the three dot faster 00:02:53.440 |
Device for a data scientist because it kind of lets you 00:02:57.000 |
run interactive experiments and it lets ask if you not just a 00:03:03.180 |
Static piece of information, but it let it let's ask if you something that you can actually 00:03:17.840 |
Think works well to use these notebooks and to use this material and this is based on the kind of last three years of experience 00:03:24.140 |
We've had with the students who have gone through this course 00:03:27.020 |
First of all, it works pretty well just to watch a lesson end-to-end 00:03:33.960 |
Don't try and follow along because it's not really designed to go at a speed where you can follow along 00:03:40.840 |
It's designed to be something where you just take in the information you get a general sense of all of the pieces how it all 00:03:55.280 |
And trying things out making sure that you can do the things that I'm doing 00:03:59.660 |
And that you can try and extend them to do it things in your own way. Okay, so don't worry if 00:04:10.280 |
Faster than you can do them. That's normal. Also don't try and stop and understand everything the first time if you do understand 00:04:20.240 |
But most people don't particularly as the lessons go on they get faster and they get more difficult. Okay 00:04:27.260 |
So at this point we've got our notebooks going we're ready to start doing deep learning 00:04:35.680 |
And so the main thing that hopefully you're going to agree at the end of this is that you? 00:04:40.740 |
Can do deep learning regardless of who you are and we don't just mean do we mean do at a very? 00:04:47.000 |
High level I mean world-class practitioner level deep learning 00:04:54.840 |
Your main place to be looking for things is course the three dot fast dot AI 00:05:04.280 |
Other information and you can also access our forums 00:05:09.280 |
You can also access our forums and on our forums you'll find things like how do you build a 00:05:18.400 |
Deep learning box yourself and that's something that you can do up, you know later on once you've kind of got going 00:05:29.760 |
So why should you listen to me? Well, maybe you shouldn't but I'll try and justify why you should listen to me 00:05:36.320 |
I've been doing stuff with machine learning for over 25 years. I 00:05:42.160 |
Started out in management consulting where actually initially I was I think McKinsey and companies first analytical specialist and went into a general consulting 00:05:54.320 |
Eventually became the president of Kaggle, but actually the thing I'm probably most proud of in my life 00:06:00.160 |
Is that I got to be the number one ranked contestant in Kaggle competitions globally? 00:06:08.640 |
Practical like can you actually train a predictive model that predicts things pretty important aspect of data science? 00:06:16.040 |
I then founded a company called in Linux, which was the first kind of medical deep learning company 00:06:24.240 |
nowadays, I'm on the faculty at University of San Francisco and also co-founder with Rachel of fast AI 00:06:33.480 |
Machine learning throughout that time and I guess I'm not really although I am at USF for the university 00:06:40.280 |
I'm not really an academic type. I'm much more interested in in using this tool 00:06:48.360 |
Specifically through fast AI we are trying to help people use deep learning to do useful things through 00:06:56.800 |
To make deep learning easier to use at a very high level through education such as the thing you're watching now 00:07:03.360 |
Through research which is where we spend a very large amount of our time, which is researching to figure out 00:07:08.980 |
How can you make deep learning easier to use at a very high level? 00:07:12.720 |
Which ends up in as you'll see in the software and the education and by helping to build a community? 00:07:18.200 |
Which is mainly through the forums so that practitioners can find each other and work together 00:07:26.080 |
So this lesson practical deep learning for coders is kind of the starting point in this journey 00:07:31.140 |
It contains seven lessons each one's about two hours long 00:07:35.120 |
We're then expecting you to do about eight to ten hours of homework during the week 00:07:39.480 |
So it'll end up being something around 70 or 80 hours of work 00:07:43.840 |
I will say there is a lot as to how much people put into this 00:07:48.000 |
I know a lot of people who who work full-time on fast AI 00:07:52.120 |
Some folks whose do the two parts can spend a whole year doing it really intensively. I know some folks 00:07:59.700 |
Watch the videos on double speed and never do any homework and come at the end of it with you know 00:08:04.440 |
A general sense of what's going on. So there's lots of different ways you can do this 00:08:10.400 |
Ten hours a week or so approach for the seven weeks by the end you will be able to build an image classification 00:08:16.080 |
Model on pictures that you choose that will work at a world-class level 00:08:21.640 |
You'll be able to classify text again using whatever data sets you're interested in 00:08:28.080 |
You'll be able to make predictions of kind of commercial applications like sales 00:08:33.340 |
You'll be able to build recommendation systems such as the one used by Netflix 00:08:38.640 |
Not toy examples of any of these but actually things that can 00:08:42.360 |
Come top ten and capital competitions that can be everything that's in the academic community 00:08:47.480 |
Very very high-level versions of these things. So that might surprise you. That's like, you know, the prerequisite here is 00:08:54.340 |
Literally one year of coding and high school math 00:09:01.320 |
Thousands of students now who have done this and shown it to be true 00:09:08.960 |
Less now than a couple of years ago than we started but a lot of naysayers telling you that you can't do it 00:09:14.400 |
Or that you shouldn't be doing it or the deep learnings got all these problems 00:09:18.720 |
It's not perfect. But these are all things that people claim about 00:09:22.840 |
Deep learning which are either pointless or untrue 00:09:27.640 |
It's not a black box as you'll see it's really great for interpret interpreting what's going on 00:09:34.200 |
It does not need much data for most practical applications. You certainly don't need a PhD 00:09:39.960 |
Rachel has one so it doesn't actually stop you from doing deep learning if you have a PhD 00:09:44.440 |
I certainly don't I have a philosophy degree and nothing else 00:09:47.800 |
It could be used very widely for lots of different applications not just for vision which is where it's most well known 00:09:58.100 |
That's a thirty six cents an hour server is more than enough to get world-class results for most problems 00:10:03.920 |
It's true that maybe this is not going to help you to build a sentient brain 00:10:12.480 |
For all the people who say deep learning is not interesting because it's not really AI 00:10:16.560 |
Not really a conversation that I'm interested in we're focused on solving 00:10:23.960 |
What are you going to be able to do by the end of lesson one? 00:10:26.840 |
Well, this was an example from Nicole who's actually in the audience now because he was in last year's course as well 00:10:33.880 |
this is an example of something he did which is he downloaded 30 images of 00:10:37.840 |
people playing cricket and people playing baseball and ran the code you'll see today and built a 00:10:47.600 |
So this kind of it's kind of stuff that you can build with some fun hobby examples like this 00:10:52.760 |
Or you can try stuff as we'll see in the workplace that it could be of direct commercial value 00:10:57.920 |
So this is the idea of where we're going to get to by the end of lesson one 00:11:09.200 |
Many of the academic courses so for those of you who haven't kind of an engineering or math or computer science background 00:11:15.800 |
This is very different to the approach where you start with lots and lots of theory and eventually you get to a postgraduate degree 00:11:22.280 |
And you finally are at the point where you can build something useful. We're going to learn to build the useful thing today 00:11:30.040 |
You won't know all the theory. Okay, there will be lots of aspects of what we do that 00:11:34.920 |
You don't know why or how it works. That's okay. You will learn why and how it works over the next seven weeks 00:11:42.480 |
But for now we found that what works really well is to actually get your hands dirty coding 00:11:55.880 |
Artisanship in deep learning. Unfortunately, it's still a situation where people who are good practitioners 00:12:01.560 |
Have a really good feel for how to work with the code and how to work with the data and you can only get that through experience 00:12:09.840 |
And so the best way to get that that that feel of how to get good models is to create lots of models 00:12:18.640 |
Study them carefully and it's Jupiter notebook provides a really great way to study them. So 00:12:27.240 |
Let's try getting started. So to get started you will open your 00:12:36.400 |
Lesson one pets and it will pop open looking something like this. And so here it is so you can 00:12:43.680 |
Run a cell in a Jupiter notebook by clicking on it and pressing run 00:12:50.880 |
but if you do so everybody will know that you're not a real deep learning practitioner because real deep learning practitioners know the keyboard shortcuts and 00:12:57.440 |
The keyboard shortcut is shift enter given how often you have to run a cell 00:13:04.960 |
Going all the way up here finding it clicking it just shift enter. Okay, so type type type shift enter type 00:13:09.400 |
Type shift enter up and down to move around to pick something to run shift enter to run it 00:13:15.840 |
So we're going to go through this quickly and then later on we're going to go back over it more carefully 00:13:22.600 |
So here's the quick version to get a sense of what's going on 00:13:25.040 |
So here we are in lesson one and these three lines is what we start every notebook with 00:13:32.660 |
These things starting with percent are special directives to Jupiter notebook itself. They're not Python code. They're called magics 00:13:40.360 |
Just kind of a cool name and these three directives the details aren't very important 00:13:45.000 |
But basically it says hey if somebody changes the underlying library code while I'm running this 00:13:49.960 |
Please reload it automatically if somebody asks to plot something then please plot it here in this Jupiter notebook 00:13:56.640 |
So just put those three lines at the top of everything 00:13:59.440 |
The next two lines load up the fast AI library 00:14:08.000 |
So it's a little bit confusing fast AI with no dot is the name of our software and then fast dot AI 00:14:22.240 |
Library, okay, we'll learn more about it in a moment 00:14:25.600 |
But for now just realize everything we are going to do is going to be using basically either fast AI 00:14:31.760 |
Or the thing that fast AI sits on top of which is pytorch 00:14:43.480 |
It's a bit newer than TensorFlow. So in a lot of ways, it's more modern than TensorFlow 00:14:53.640 |
Extremely fast-growing extremely popular and we use it because we used to use TensorFlow a couple of years ago 00:14:59.480 |
And we found we can just do a lot more a lot more quickly with pytorch 00:15:04.080 |
And then we have this software that sits on top of pytorch unless you do 00:15:09.880 |
Far far far more things far far far more easily than you can with pytorch alone 00:15:14.200 |
So it's a good combination. We'll be talking a lot about it. But for now, just know that you can use fast AI by doing two things 00:15:21.040 |
importing star from fast AI and then importing star from fast AI dot 00:15:27.400 |
Something where something is the application you want and currently fast AI supports four applications computer vision 00:15:37.600 |
Tabular data and collaborative filtering and we're going to see lots of examples of all of those during the seven weeks 00:15:43.680 |
So we're going to be doing some computer vision 00:15:45.680 |
At this point if you are a Python software engineer, you are probably 00:15:50.600 |
Feeling sick because you've seen me go import star, which is something that you've all been told to never ever do 00:15:58.320 |
Okay, and there's very good reasons to not use import star in standard production code with most libraries 00:16:06.320 |
But you might have also seen for those of you that have used something like Matlab 00:16:09.680 |
It's kind of the opposite everything's there for you all the time. You don't even have to import things a lot of the time 00:16:14.900 |
It's kind of funny. We've got these two extremes of like how do I code you've got a scientific 00:16:20.760 |
Programming community that has one way and then you've got the software engineering community that has the other 00:16:26.280 |
Both have really good reasons for doing things and with the fast AI library. We actually support both approaches 00:16:33.440 |
Indeed you put a notebook where you want to be able to quickly interactively try stuff out 00:16:38.160 |
You don't want to be constantly going back up to the top and importing more stuff and trying to figure out where things are 00:16:43.200 |
You want to be able to use lots of tab complete be you know, very experimental. So import star is great 00:16:49.240 |
Then when you're building stuff in production 00:16:51.880 |
You can do the normal Pepe style, you know proper software engineering practices. So 00:17:01.200 |
When you see me doing stuff which at your workplace is found upon, okay, it's it's this is a different style of coding 00:17:09.240 |
There are no rules in data science programming 00:17:11.840 |
It's that the rules are different right when you're training models 00:17:15.120 |
The most important thing is to be able to interactively experiment quickly. Okay, so you'll see we use a lot of very different 00:17:22.880 |
Processes styles and stuff to what you're used to but they're there for a reason 00:17:28.280 |
And you'll learn about them over time. You can choose to use a similar approach or not. It's entirely up to you 00:17:34.120 |
The other thing to mention is that the fast AI libraries 00:17:38.080 |
In a real designed in a very interesting modular way and you'll find over time that when you do use import star 00:17:45.080 |
There's far less clobbering of things than you might expect 00:17:48.280 |
It's all explicitly designed to allow you to pull in things and use them quickly without having problems 00:17:55.320 |
Okay, so we're going to look at some data and 00:17:59.480 |
There's two main places that will be tending to get data from for the course one is from academic data sets 00:18:06.880 |
Academic data sets are really important. They're really interesting 00:18:10.960 |
They're things where academics spend a lot of time 00:18:13.440 |
Curating and gathering a data set so that they can show how well different kinds of approaches work with that data 00:18:19.240 |
The idea is they try to design data sets that are 00:18:23.000 |
Challenging in some way and require some kind of breakthrough to do them. Well 00:18:26.640 |
So we're going to be starting with an academic data set called the pet data set 00:18:30.840 |
The other kind of data set will be using during the course is data sets from the Kaggle competitions platform 00:18:37.040 |
Both academic data sets and Kaggle data sets are interesting for us 00:18:41.680 |
Particularly because they provide strong baselines that is to say you want to know if you're doing a good job 00:18:48.120 |
so with Kaggle data sets that have come from a competition you can actually submit your results to Kaggle and see how well would 00:18:55.480 |
You have gone in that competition and if you can get in about the top 10% then I'd say you're doing 00:19:04.800 |
Academics write down in papers what the state-of-the-art is so how well did they go with using models on that data set? 00:19:11.160 |
So this is this is what we're going to do. We're going to try and 00:19:15.880 |
Models that get right up towards the top of Kaggle competitions preferably actually in the top 10 not just the top 10% 00:19:22.180 |
Or that meet or exceed academic state-of-the-art published results 00:19:36.840 |
There's a link to the paper that it's from you definitely don't need to read that paper right now 00:19:41.040 |
But if you're interested in learning more about it, and why it was created, and how it was created all the details are there 00:19:47.240 |
So in this case this is a pretty difficult challenge the pet data sets going to ask us to distinguish between 00:19:54.000 |
37 different categories of dog breed and cat breed so that's really hard in fact 00:20:03.680 |
We've used a different data set which is one where you just have to decide is something a dog or is it a cat? 00:20:09.680 |
So you've got a 50/50 chance right away, right and dogs and cats look really different 00:20:14.200 |
There are lots of dog breeds and cat breeds look pretty much the same so why have we changed that data set? 00:20:19.700 |
We've got to the point now where deep learning is so fast and so easy that the dogs versus cats problem 00:20:25.780 |
Which a few years ago was considered extremely difficult 00:20:29.360 |
80% accuracy was the state-of-the-art. It's now too easy 00:20:32.800 |
Our models were basically getting everything right all the time without any tuning 00:20:38.840 |
And so they weren't you know really a lot of the opportunities for me to show you how to do more sophisticated stuff 00:20:46.560 |
So this is the first class where we're going to be learning how to do this difficult problem and this kind of thing where you 00:20:52.780 |
Have to distinguish between similar categories is called in the academic context. It's called fine-grained classification 00:20:59.800 |
so we're going to do the fine-grained classification tasks of 00:21:02.400 |
Figuring out a particular kind of pet and so the first thing we have to do is download and extract 00:21:08.620 |
The data that we want we're going to be using this function called 00:21:12.780 |
Antar data which will download it automatically and will enter it automatically 00:21:18.380 |
AWS has been kind enough to give us lots of space and bandwidth for these data sets so they'll download super quickly for you 00:21:25.960 |
And so the first question then would be how do I know what Antar data? 00:21:33.540 |
So you can just type help and you will find out 00:21:36.860 |
What module did it come from because since we imported start we don't necessarily know that 00:21:44.300 |
something you might not have seen before even if you're an experienced programmer is 00:21:47.600 |
What exactly do you pass to it? You're probably used to seeing the names URL file name 00:21:54.980 |
Destination that you might not be used to seeing 00:21:58.060 |
These bits these bits of types and if you've used a type programming language, you'll be used to seeing them 00:22:04.860 |
But Python programmers are less used to it, but if you think about it 00:22:08.540 |
You don't actually know how to use a function unless you know what type each thing is that you're providing it 00:22:15.140 |
So we make sure that we give you that type information 00:22:17.580 |
Directly here in the help so in this case the URL is a string and the file name is either union means either 00:22:25.780 |
either a path or a string and it defaults to nothing and 00:22:31.580 |
The destination is either a path or a string that defaults to nothing 00:22:35.580 |
So we'll learn more shortly about how to get more documentation about the details of this 00:22:40.740 |
But for now we can see we don't have to pass in a file name or a destination 00:22:44.780 |
It'll figure them out for us from the URL so and for all the data sets 00:22:49.660 |
We'll be using in the course. We already have constants to find 00:22:52.720 |
For all of them right so in this URLs module class actually 00:22:58.740 |
You can see that's where it's going to grab it from okay, so it's going to download that to some 00:23:04.580 |
Convenient path and untie it for us and will then return 00:23:12.020 |
Okay, and then in Jupyter notebook. It's kind of handy 00:23:16.060 |
You can just write a variable on its own right and semicolon is just an end a statement marker in Python 00:23:23.180 |
So it's the same as doing this you can write it on its own and it prints it you can also say print 00:23:28.260 |
Right, but again, we're trying to do everything fast and interactively this right it and here is the path 00:23:39.780 |
Since you've already downloaded it it won't download it again since you've already untied it it won't untie it again 00:23:44.960 |
So everything's kind of designed to be pretty automatic pretty easy 00:23:52.940 |
Python that are less convenient for interactive use than they should be for example when you do have a path object 00:23:58.540 |
Seeing what's in it actually is takes a lot more typing than I would like so sometimes we add 00:24:03.960 |
Functionality into existing Python stuff one of the things we do is we add an LS method to paths so if you go path to LS 00:24:14.420 |
This path so that's what we just downloaded so when you try this yourself 00:24:19.220 |
You wait a couple minutes for it to download unzip and then you can see what's in there 00:24:28.620 |
You may not be familiar with this approach of using a slash like this now. This is a really convenient function 00:24:34.560 |
That's part of Python 3 its functionality from something called path lib these are path objects path objects are much better to use than strings 00:24:42.280 |
That lets you basically create sub paths like this doesn't matter if you're on Windows Linux Mac 00:24:49.020 |
It's always going to work exactly the same way 00:24:51.020 |
So here's a path to the images in that data set 00:24:55.460 |
Alright, so if you're starting with a brand new data set try to do some deep mining on it 00:25:02.860 |
What do you do? Well the first thing you would want to do is probably see what's in there, so we found that these are the 00:25:09.300 |
Directories that in there, so what's in this images? 00:25:15.140 |
There's a lot of functions in fast.io for you. There's one called get image files that will just grab a 00:25:21.100 |
Array of all of the image files based on extension in a path 00:25:35.300 |
Image computer vision data sets to get passed around is such as just one folder with a whole bunch of files in 00:25:44.580 |
How do we get the labels so in machine learning the labels refer to the thing? 00:25:50.340 |
We're trying to predict and if we just eyeball this we can immediately see that the labels 00:25:56.340 |
Actually part of the file name you see that right. It's kind of like path 00:26:05.140 |
extension so we need to somehow get a list of 00:26:09.740 |
These bits of each file name and that will give us our labels 00:26:14.260 |
Because that's all you need to build a deep learning model. You need some pictures so files containing the images and you need some labels 00:26:21.140 |
so in fast AI this is made really easy there's a 00:26:26.700 |
object called image data bunch and an image data bunch represents all of the data you need to build a model and 00:26:33.780 |
There's basically some factory methods which try to make it really easy for you to create that data bunch 00:26:41.500 |
We'll talk more about this shortly, but a training set and the validation set with images and labels for you 00:26:47.260 |
Now in this case we can see we need to extract the labels 00:26:53.500 |
Okay, so we're going to use from name re so for those of you that use Python 00:26:58.220 |
You know re is the module in Python that does regular expressions things. That's really useful for extracting 00:27:07.460 |
The regular expression that would extract the label from this text, okay, so those of you who? 00:27:14.440 |
Not familiar with regular expressions super useful tool 00:27:19.060 |
It'd be very useful to spend some time figuring out how and why that particular regular expression is going to extract 00:27:27.660 |
From this text. Okay, so with this factory method we can basically say okay. I've got this path containing images 00:27:34.900 |
This is a list of file names. Remember I got them back here 00:27:37.500 |
This is the regular expression pattern that is going to be used to extract the label from the file name 00:27:47.620 |
And then you also need to say what size images do you want to work with? 00:27:52.180 |
So that might seem weird. Why do I need to say what size images? 00:27:56.260 |
I want to work with because the images have a size we can see what size the images are and I guess honestly this is a 00:28:03.860 |
Shortcoming of current deep learning technology, which is that a GPU 00:28:08.680 |
Has to apply the exact same instruction to a whole bunch of things at the same time in order to be fast and 00:28:16.660 |
So if the images are different shapes and sizes, it can't do that 00:28:21.320 |
All right, so we actually have to make all of the images the same shape and size 00:28:27.020 |
In part one of the course, we're always going to be making 00:28:32.140 |
Images square shapes and part two will learn how to use rectangles as well 00:28:39.180 |
But pretty much everybody in pretty much all computer vision modeling nearly all of it uses this approach of square 00:28:46.620 |
And 224 by 224 for reasons we'll learn about is an extremely common size that most models tend to use 00:28:57.820 |
You're probably going to get pretty good results most of the time and this is kind of 00:29:01.980 |
The little bits of art is in the ship that I want to teach you folks, which is like what generally just works 00:29:08.680 |
Okay, so if you just use size equals 224 that'll generally just work for most things most of the time 00:29:18.500 |
Data bunch object and in fast AI everything you model with is going to be a data bunch object 00:29:24.420 |
We're going to learn all about them and what's in them and how do we look at them and so forth? 00:29:35.180 |
We'll learn about this shortly. It'll contain your validation data and optionally it contains your test data and for each of those it contains your 00:29:43.700 |
Your images and your labels or your texts on your labels or your tabular data and your labels or so forth 00:29:55.020 |
Something we'll learn more about a little bit is 00:29:58.140 |
Normalization but generally in all nearly all machine learning tasks you have to make all of your data 00:30:04.260 |
About the same size specifically about the same mean and about the same standard deviation 00:30:09.500 |
So there's a normalized function that we can use to normalize our data bunch in that way 00:30:22.380 |
What is the function do if the image size is not to 24? 00:30:32.540 |
This is what we're going to learn about shortly 00:30:34.540 |
Basically this thing called transforms is is used to do a number of things and one of the things it does is to make something 00:30:43.100 |
Let's take a look at a few pictures. Here are a few pictures of 00:30:46.580 |
Things from my data from my data bunch so you can see data.show batch 00:30:50.700 |
Can be used to show me the contents of some of the contents of my data bunch 00:30:58.940 |
And you can see roughly what's happened is that they all seem to have been kind of 00:31:04.100 |
Zoomed and cropped in a reasonably nice way. So basically what it'll do is something called by default 00:31:12.540 |
Which means it'll kind of grab the middle bit and it will also 00:31:16.500 |
Resize it so we'll talk more about the detail of this because it turns out to actually be quite important 00:31:21.180 |
But basically a combination of cropping and resizing is used 00:31:25.780 |
Something else we'll learn about is we also use this to do something called data augmentation 00:31:31.060 |
So there's actually some randomization in how much and where it crops and stuff like that 00:31:35.820 |
Okay, but that's the basic idea is some cropping and some resizing 00:31:40.900 |
But often we also also do some some padding so there's all kinds of different ways 00:31:46.180 |
And it depends on data augmentation, which we're going to learn about shortly 00:31:49.460 |
And what does it mean to normalize the images? 00:31:54.100 |
So I'm normalizing the images we're going to be learning more about later in the course 00:31:59.980 |
But in short it means that the the pixel values and we're going to be learning more about pixel values the pixel values start out 00:32:06.820 |
from 0 to 255 and some pixel values might tend to be 00:32:16.060 |
Should say some channels because there's red green and blue so some channels might tend to be 00:32:20.460 |
Really bright and some might tend to be really not bright at all and some might vary a lot and some might not very much 00:32:26.900 |
It really helps train a deep learning model if each one of those red green and blue channels has a mean of zero 00:32:33.740 |
And a standard deviation of one. Okay, we'll learn more about that if you 00:32:37.860 |
Haven't studied or don't remember means and standard deviations. We'll get back to some of that later, but that's the basic idea 00:32:44.340 |
That's what normalization does if your data and again, we'll learn much more about the details 00:32:49.380 |
But if your data is not normalized it can be quite difficult for your model to train well 00:32:54.900 |
So if you do have trouble training a model one thing to check is that you've normalized it 00:33:00.260 |
As GPU man will be empowered to doesn't size 256 on more practical considering GPU little utilization 00:33:08.600 |
So what we're going to be getting into that shortly, but the brief answer is that the 00:33:14.820 |
Models are designed so that the final layer is of size 7 by 7 00:33:19.580 |
So we actually want something where if you go 7 times to a bunch of times then you end up with something. It's a good size 00:33:27.220 |
Yeah, all of these details we are gonna we are going to get to but the key thing is I wanted to get you 00:33:34.320 |
But you know one of the most important things to be a really good practitioner is to be able to look at your data 00:33:40.380 |
Okay, so it's really important to remember to go data show batch and take a look 00:33:45.420 |
It's surprising how often when you actually look at the data set you've been given that you realize it's got 00:33:49.960 |
Weird black borders on it or some of the things have text covering up some of it or some of its rotated in odd ways 00:33:57.820 |
And then the other thing we're going to look at do is not just look at the pictures 00:34:03.660 |
But also look at the labels and so all of the possible 00:34:10.220 |
Accord your classes. That's where the data bunch you can print out your data dot classes 00:34:15.580 |
And so here they are that's all of the possible labels that we found by using that regular expression on the file names 00:34:22.540 |
And we learned earlier on in that pros are right at the top that there are 37 00:34:27.200 |
Possible categories and so just checking length data dot classes. It is indeed 37 00:34:32.660 |
A data bunch will always have a property called C 00:34:36.480 |
And that property called C the technical details will kind of get to later 00:34:41.920 |
But for now you can kind of think of it as being the number of classes 00:34:45.100 |
For things like regression problems and multi-label classification and stuff. That's not exactly accurate 00:34:53.980 |
It's it's important to know that data dot C is a really 00:34:57.940 |
Important piece of information that is something like or at least for classification problems. It is the number of classes 00:35:04.860 |
Right believe it or not. We're now ready to train a model and 00:35:11.260 |
So a model is trained in fast AI using something called a learner 00:35:19.380 |
And just like a data bunch is a general fast AI concept for your data 00:35:24.520 |
And from there there are subclasses for particular applications 00:35:32.140 |
Alona is a general concept for things that can learn 00:35:36.140 |
To fit the model and from that there are various subclasses to make things easier and in particular 00:35:41.720 |
There's one called conf loner, which is something that will create a convolutional neural network for you 00:35:47.380 |
We'll be learning a lot about that over the next few lessons 00:35:50.460 |
But for now just know that to create a learner for a convolutional neural network. You just have to tell it two things the first is 00:35:59.440 |
What's your data and not surprisingly it takes a data bunch and the second thing you need to tell it is? 00:36:10.820 |
So as we learned there are lots of different ways of constructing a convolutional neural network 00:36:17.020 |
But for now the most important thing for you to know is that there's a particular kind of model called a resnet 00:36:25.780 |
Well nearly all the time and so for a while at least you really only need to be doing 00:36:31.940 |
Choosing between two things which is what size resnet do you want? 00:36:36.580 |
That's just basically how big is it and we'll learn all about the details of what that means 00:36:41.380 |
But there's one quarter resnet 34 and there's one quarter resnet 50 and so when we're getting started with something 00:36:47.660 |
I'll pick a smaller one because it'll train faster 00:36:51.700 |
That's kind of it. That's as much as you need to know to be a pretty good practitioner about architectures for now, which is that there's two 00:36:58.420 |
Architectures or two variants of one architecture that work pretty well 00:37:02.180 |
Resnet 34 and resnet 50 start with a smaller one and see if it's good enough 00:37:07.180 |
So that is all the information we need to create a convolutional neural network learner 00:37:12.100 |
There's one other thing I'm going to give it though, which is a list of metrics 00:37:16.660 |
Metrics are literally just things that get printed out as it's training 00:37:20.280 |
So I'm saying I would like you to print out the error rate, please 00:37:24.780 |
Now you can see the first time I ran this on a newly installed box 00:37:39.980 |
Now what this means is that this particular model has actually already been trained 00:37:45.020 |
For a particular task and that particular task is that it was trained on looking at about one and a half million 00:37:51.260 |
Pictures of all kinds of different things a thousand different categories of things 00:37:55.380 |
using an image a data set called image net and 00:37:59.580 |
So we can download those pre-trained weights so that we don't start with a model that knows nothing about anything 00:38:05.840 |
But we actually start with a model that knows how to recognize the a thousand categories of things in image net 00:38:12.380 |
Now I don't think I'm not sure but I don't think all of these 37 categories of pet or in image net 00:38:19.580 |
But there were certainly some kinds of dog. There's certainly some kinds of cat 00:38:23.860 |
So this pre-trained model already knows quite a little bit about what pets look like and it certainly knows quite a lot about 00:38:31.060 |
What animals look like and what photos look like? So the idea is that we don't start 00:38:38.700 |
But we start by downloading a model that knows 00:38:44.420 |
So it downloads for us automatically the first time we use it a pre-trained model and then from now on it won't need to download it 00:38:54.100 |
This is really important. We're going to learn a lot about this 00:38:57.260 |
It's kind of the focus of the whole course, which is how to do this is called transfer learning 00:39:02.260 |
how to take a model that already knows how to do something pretty well and 00:39:06.860 |
Make it so that it can do your thing really well 00:39:10.080 |
I take a pre-trained model and then we fit it so that instead of predicting the a thousand categories of image net with the image 00:39:18.320 |
Net data it predicts the 37 categories of pets using your pet data 00:39:23.500 |
And it turns out that by doing this you can train models in 00:39:34.020 |
1/100 or less of the data of regular model training in fact potentially many thousands of times less 00:39:40.820 |
Remember I showed you the slide of nickels lesson one project from last year. He used 30 images 00:39:47.220 |
And there's not cricket and baseball images in image net 00:39:51.100 |
Right, but it just turns out that image nets already so good at recognizing things in the world 00:39:55.920 |
They're just 30 examples of people playing baseball and cricket was enough to build a nearly perfect classifier 00:40:11.380 |
How do you know that it was going to actually that it can actually recognize pictures of people playing cricket versus baseball in general? 00:40:22.780 |
Maybe it's just cheating right and it's called overfitting. We'll be talking a lot about that during this course, right? 00:40:29.340 |
But overfitting is where you don't learn to recognize pictures of say cricket versus baseball 00:40:34.620 |
But just these particular cricketers and these particular photos and these particular baseball players and these particular photos 00:40:40.940 |
We have to make sure that we don't know the theater 00:40:43.660 |
And so the way we do that is using something called a validation set a validation set is a set of images 00:40:50.360 |
That your model does not get to look at and so these metrics 00:40:54.760 |
Like in this case error rate get printed out automatically using the validation set a set of images that our model never got to see 00:41:05.920 |
It automatically created a validation set for us 00:41:09.840 |
Okay, and we'll learn lots of ways of creating and using validation sets 00:41:15.240 |
But because we try to bake in all of the best practices we actually make it nearly impossible 00:41:20.840 |
For you not to use a validation set because if you're not using a validation set, you don't know if you're overfitting 00:41:26.240 |
Okay, so we always print out the metrics on a validation set. We always hold it out 00:41:30.640 |
We always make sure that the model doesn't touch it. That's all done for you 00:41:34.560 |
Okay, and that's all built into this data bunch object 00:41:47.540 |
But in practice, you should nearly always use a method called fit one cycle 00:41:52.220 |
We'll learn more about this during the course. But in short one cycle learning is a paper that was 00:42:00.280 |
I'm trying to think a few months ago. Listen a year ago 00:42:06.360 |
And it turned out to be dramatically better both more accurate and faster than any previous approach 00:42:12.080 |
So again, I don't want to teach you how to do 00:42:19.660 |
The best way to fit models is to use something called one cycle. We'll learn all about it 00:42:24.020 |
But for now just know you should probably type my own dot fit one cycle, right? 00:42:28.720 |
If you forget how to type it you can start typing a few letters and hit tab 00:42:34.620 |
Okay, and you'll get a list of potential options 00:42:38.420 |
All right, and then if you forget what to pass it you can press shift tab 00:42:44.280 |
And it'll show you exactly what passes so you don't actually have to type help and again 00:42:49.060 |
This is kind of nice that we have all the types here because we can see cycle length 00:42:52.720 |
I will learn more about what that is shortly is an integer and then max learning rate 00:42:56.800 |
It could either be a float or a collection or whatever and so forth and you can see that the mentors will default to this couple 00:43:07.600 |
For now just know that this number four basically decides how many times do we go through the entire data set? 00:43:15.840 |
How many times do we show the data set to the model so that it can learn from it each time? 00:43:20.360 |
It sees a picture. It's going to get a little bit better 00:43:25.280 |
It means it could over fit it sees the same picture too many times 00:43:29.400 |
It'll just learn to recognize that picture not pets in general 00:43:37.160 |
tune this number during the next couple of lessons 00:43:43.200 |
Starting out with four is a pretty good start just to see how it goes and you can actually see after 00:43:57.840 |
So a natural question is how long did that took that took a minute and 56 seconds? 00:44:11.780 |
I mean we actually pay for the whole time that it's on and running if we use two minutes of compute time 00:44:19.200 |
94 percent of the time we correctly picked the exact right one 00:44:23.700 |
Of those 94 dog and cat breeds which feels pretty good to me 00:44:29.060 |
But to get a sense of how good it is maybe we should go back and look at the paper 00:44:34.100 |
Just remember I said the nice thing about using academic papers or Kaggle data sets is we can compare 00:44:40.120 |
our solution to whatever the best people in Kaggle did or whatever the 00:44:45.380 |
academics did so this particular data set of pet breeds is from 2012 and 00:44:51.060 |
If I scroll through the paper you'll generally find in any academic paper 00:44:56.300 |
There'll be a section called experiments about two-thirds of the way through and if you find the section on experiments 00:45:02.060 |
Then you can find the section on accuracy, and they've got lots of different models 00:45:07.500 |
And their models as you'll read about in the paper are extremely kind of pet specific 00:45:13.380 |
They learn something about how pet heads look and how pet bodies look and pet images in general look 00:45:18.900 |
They combine them all together and once they use all of this 00:45:21.860 |
complex code and math they got an accuracy of 00:45:31.420 |
This highly pet specific analysis got an accuracy of 59% at least with the top researchers from Oxford University 00:45:44.620 |
Basically, if you go back and look at actually how much code we just wrote it's about three lines of code 00:45:49.700 |
The other stuff is just printing out things to see what we're doing we got 00:45:53.580 |
94% so 6% error so like that gives you a sense of 00:46:00.020 |
You know how far we've come with deep learning and particularly with pytorch and fast AI how easy things are 00:46:09.460 |
Before we take a break. I just want to check to see if we've got any 00:46:12.860 |
And just remember if you're in the audience and you see a question that you want asked 00:46:18.580 |
Please click the love heart next to it so that Rachel knows that you want to hear about it also 00:46:23.820 |
If there is something with six likes and Rachel didn't notice it, which is quite possible just just quote it in a reply and say 00:46:31.900 |
Hey at Rachel this one's got six legs. Okay, so what we're going to do is we're going to take a 00:46:38.260 |
Eight minute break, so we'll come back at five past eight 00:46:42.900 |
So where we got to was we just we just trained a model 00:46:48.620 |
We don't exactly know what that involved or how it happened 00:46:51.340 |
but we do know that with three or four lines of code we built something which 00:46:56.540 |
smashed the accuracy of the state-of-the-art of 00:46:59.780 |
2012 6% error certainly sounds like pretty impressive for something that can recognize different dog breeds and cat breeds 00:47:06.500 |
But we don't really know why it works, but we won't that's okay, right and 00:47:15.300 |
In terms of getting the most out of this course 00:47:21.460 |
Very very regularly here after the course is finished the same basic feedback 00:47:26.820 |
Which this is literally copy and paste it for them forum 00:47:30.500 |
I fell into the habit of watching the lectures too much and googling too much about concepts without running the code 00:47:37.660 |
Now first I thought I should just read it and then research the theory 00:47:41.540 |
And we keep hearing people saying my number one regret is I just spent 00:47:47.980 |
70 hours doing that and at the very end I started running the code and oh it turned out 00:47:59.900 |
Should have spent the majority of my time on the actual code in the notebooks running it seeing what goes in and 00:48:06.220 |
Seeing what comes out so your most important skills to practice our learning and we're going to show you how to do this 00:48:13.700 |
Not a lot more detail, but understanding what goes in 00:48:19.700 |
So we've already seen an example of looking at what goes in 00:48:23.260 |
Which is data show batch and that's going to show you examples of labels and 00:48:31.540 |
So next we're going to be seeing how to look at what came out 00:48:35.060 |
All right, so that's the most important thing to study 00:48:40.740 |
The reason we've been able to do this so quickly is heavily because of the fast AI library now fast AI library is pretty new 00:48:47.620 |
But it's already getting an extraordinary amount of traction as you've seen all of the major cloud 00:48:53.260 |
Providers either support it or are about to support it a lot of researchers are starting to use it. It's it's 00:49:00.260 |
Doing making a lot of things a lot easier, but it's also making new things possible and so 00:49:09.140 |
Really understanding the fast AI software is something which is going to take you a long way 00:49:13.340 |
And the best way to really understand the fast AR software. Well is by using the fast AI 00:49:18.340 |
Documentation and we'll be learning more about the fast AI documentation shortly 00:49:23.380 |
So how does it compare I mean there's really only one major other piece of software like fast AI 00:49:31.140 |
That is something that tries to make deep learning 00:49:34.380 |
Easy to use and that's Keras Keras is a really terrific piece of software 00:49:39.020 |
We actually used it for the previous courses until we switched to fast AI 00:49:46.820 |
It was kind of the gold standard for making big learning easy to use before but life is much easier with fast AI 00:49:53.100 |
So if you look for example at the last year's course 00:50:01.700 |
Fast AI lets you get more much more accurate less than half the error on a validation set of course 00:50:13.660 |
Lines of code is about a six of the lines of code and the lines of code 00:50:19.060 |
More important than you might realize because those 31 lines of Keras code involve you making a lot of decisions 00:50:27.580 |
Setting lots of parameters doing lots of configuration. So that's all stuff where you have to know 00:50:32.700 |
How to set those things to get kind of best practice results or else these five lines of code 00:50:38.160 |
Any time we know what to do for you. We do it for you any time. We can pick a good default we pick it for you 00:50:48.260 |
Library not just for learning deep learning but for taking it a very long way. How far can you take it? 00:50:54.980 |
Well, as you'll see all of the research that we do at fast AI 00:50:58.460 |
Uses the library and an example of the research we did which was recently featured in wired 00:51:07.380 |
Breakthrough in a natural language process processing which people are calling the image net moment 00:51:12.980 |
Which is basically we broken use date-of-the-art result in text classification 00:51:18.420 |
Which open AI then built on top of our paper to do with more compute and more data and some different tasks to take it 00:51:27.380 |
Like this is an example of something that we've done in the last six months in conjunction actually with my colleagues Sebastian Ruder 00:51:34.180 |
Example of something that's being built in the fast AI library and you're going to learn how to use this brand-new model in 00:51:43.420 |
Three lessons time and you're actually going to get this exact result from this exact paper yourself 00:51:56.620 |
Who you'll come across on the forum plenty because he's a great guy very active built a new system for natural language semantic code search 00:52:06.180 |
Where you can actually type in English sentences and find snippets of code that do the thing you ask for and again? 00:52:13.300 |
It's being built with the fast AI library using the techniques. You'll be learning in the next seven weeks in production 00:52:18.780 |
Yeah, well, it's I think at this stage. It's a part of their experiments platform. So it's kind of pre-production I guess 00:52:26.220 |
And so the best place to learn about these things and get involved in these things is on the forums 00:52:34.260 |
Where as well as categories for each part of the course, and there's also a general category for deep learning where people talk about 00:52:41.700 |
Research papers applications, so on and so forth 00:52:49.620 |
We're kind of going to focus on a small number of lines of code to do a particular thing, which is image classification 00:52:55.940 |
And we're not learning much math or theory or whatever over these seven weeks and then part two another seven weeks 00:53:03.420 |
We're going to go deeper and deeper and deeper. And so where can that take you? I want to give you some examples 00:53:08.700 |
That there is Sarah hooker. She did our first course a couple of years ago 00:53:16.780 |
Economics didn't have a background in coding math computer science. I think she started learning to code two years before she took our course 00:53:24.540 |
She helped develop something at she started a nonprofit called 00:53:32.820 |
They helped build this amazing system where they attached old mobile phones to trees in the Kenyan rainforests and 00:53:43.700 |
And then they use deep learning to figure out when there was a chainsaw being used and then they had a system set up to 00:53:49.380 |
Alert ranges to go out and stop a legal deforestation in the rainforests 00:53:54.620 |
So that was something that she was doing while she was in the course as part of her kind of class projects 00:54:05.380 |
Researcher which I guess is one of the top if not the top place to do deep learning 00:54:14.260 |
Now she is going to Africa to set up Google brains first deep learning AI research center in Africa now 00:54:22.340 |
Say like she worked her ass off, you know, she really really invested in this course 00:54:28.700 |
Not just doing all of the assignments but also going out and reading Ian Goodfellow's book and doing lots of other things 00:54:36.900 |
Where somebody who has no computer science or math background at all? 00:54:41.620 |
Can be now one of the world's top deep learning researchers and doing very valuable work 00:54:47.900 |
Another example from our most recent course Christine Payne 00:54:58.780 |
And you can find her post and actually listen to her music samples of she actually built something to do 00:55:07.360 |
Automatically create chamber music compositions you can play and you can listen to online 00:55:24.620 |
Now I will say she's not your average classical pianist 00:55:27.520 |
She's a classical pianist who also has a master's in medical research from Stanford and studied neuroscience and was a high-performance computing 00:55:34.380 |
Expert at the E. Shore and was valedictorian at Princeton. Anyway, she, you know, very annoying person good at everything she does 00:55:41.420 |
But you know, I think it's really cool to see how a kind of a domain expert in this case the domain of playing piano 00:55:52.660 |
Come out the other end at I guess open AI would be 00:55:55.440 |
You know of the three top research institutes Google Blaine or open AI would be two of them probably along with deepland 00:56:01.500 |
And interestingly actually one of our other students or should say alumni of the course recently interviewed 00:56:09.340 |
Her for a blog post series he's doing on top AI researchers 00:56:13.940 |
And she said one of the most important pieces of advice she got was from me and she said the piece of advice was pick one 00:56:27.700 |
She found the most useful and we're going to be talking a lot about you doing projects and making them fantastic during this course 00:56:35.560 |
Having said that I don't really want you to go to AI or Google brain 00:56:40.500 |
What I really want you to do is go back to your workplace or your passion project and apply these skills 00:56:53.180 |
Released a deep learning course and they highlighted in their announcement for this deep learning course this medical imaging example 00:57:09.140 |
You guys just showed a model overfitting I can tell because I'm a radiologist and this is not 00:57:21.380 |
This is what it should look like and this as a deep-learning practitioner. This is how I know 00:57:26.180 |
This is what happened in your model. So Alex is combining his knowledge of radiology and his knowledge of deep learning 00:57:35.900 |
MIT's model from just two images very accurately 00:57:40.300 |
And so this is actually what I want most of you to be doing is to take your domain 00:57:44.820 |
expertise and combine it with the deep learning 00:57:47.500 |
Practical aspects that you'll learn in this course and bring them together 00:57:51.140 |
like Alex is doing here and so a lot of radiologists have actually gone through this course now and 00:57:56.620 |
Have built journal clubs and American Council of Radiology practice groups 00:58:02.660 |
There's a data science institute at the ACR now and so forth and Alex is one of the people who's providing kind of a lot 00:58:10.060 |
I would love for you to do the same kind of thing that Alex is doing which is to really bring 00:58:14.660 |
Deep learning later leadership into your industry and just your social impact project, whatever it is that you're trying to do 00:58:21.940 |
So another great example was this was Melissa Fabros who was a English literature PhD 00:58:27.900 |
He's just studied like gendered language in English literature or something 00:58:35.220 |
Rachel did a previous job taught her to code I think and then she came into the fast AI course and she helped 00:58:42.780 |
Kiva a micro lending social impact organization to build a system that can recognize 00:58:48.500 |
Faces why is that necessary? Well, we're going to be talking a lot about this but because 00:59:01.420 |
Can only recognize white male faces effectively in fact? 00:59:06.220 |
I think it was IBM system is like ninety nine point eight percent accurate on common 00:59:16.420 |
60% accurate 65% accurate on dark face dark-skinned women 00:59:22.060 |
So it's like what is that like 30 or 40 times worse? 00:59:26.140 |
For black women versus white men and this is really important because for Kiva 00:59:31.260 |
Black women are you know, perhaps the most common user base for their micro lending platform 00:59:38.940 |
So Melissa after taking our course and again working her ass off and being super intense in her study and her work 00:59:46.340 |
Won this one million dollar AI challenge for her work for Kiva 00:59:51.180 |
Karthik did our course and realized the thing he wanted to do wasn't at his company 00:59:58.940 |
It was something else which is to help blind people to understand the world around them. So he started a new startup 01:00:03.900 |
You can find it now. It's called envision you can download the app you can point your phone at things and it will tell you 01:00:11.420 |
And I actually talked to a blind lady about these kinds of apps the other day and she confirmed to me 01:00:24.900 |
It's not it's the level that you can get to with with 01:00:30.380 |
The content that you're going to get over these seven weeks and with this software 01:00:33.780 |
Can get you right to the cutting edge in areas. You might find surprising 01:00:38.220 |
For example, I helped a team of some of our students and some collaborators 01:00:45.060 |
On actually breaking the world record for training remember 01:00:50.300 |
I mentioned the image net data set lots of people want to train on the image net data set 01:00:54.020 |
We smashed the world record for how quickly you can train it 01:00:59.660 |
Cloud infrastructure cost of $40 of compute to train this model 01:01:04.580 |
Using again fast AI library the techniques that we learn in this course 01:01:08.620 |
So it can really take you a long way. So don't be kind of put off by this 01:01:13.700 |
What might seem pretty simple at first? We're going to get deeper and deeper 01:01:17.380 |
You can also use it for other kinds of passion project 01:01:21.260 |
So Helena Sarin actually you should definitely check out her Twitter account like a lister 01:01:27.180 |
This art is a basically a new style of art that she's developed 01:01:32.180 |
Which combines her painting and drawing with generative adversarial models to create these extraordinary? 01:01:40.340 |
Results and so I think this is super cool. She's not a professional artist. She is a professional software developer 01:01:47.780 |
But she just keeps on producing these beautiful results and 01:01:56.620 |
Her art had not really been shown anywhere or discussed anywhere now 01:02:01.380 |
There's recently been some quite high-profile articles describing how she is creating a new form of art again. This has come out of the 01:02:11.500 |
Course that she developed these skills or equally important Brad Kentsler who figured out how to make a picture of Kanye out of pictures 01:02:19.260 |
Of Patrick Stewart's head also something you will learn to do if you wish to 01:02:25.100 |
This particular style this particular type of what's called style transfer was a really interesting tweak that allowed him to do some things 01:02:33.780 |
And this particular picture helped him to get a job as a deep learning specialist at AWS. So there you go 01:02:41.500 |
Another interesting example another alumni actually worked at Splunk as a software engineer 01:02:50.940 |
He'd signed an algorithm after like lesson 3 which basically turned out at Splunk to be fantastically good at identifying fraud 01:03:01.660 |
If you've seen Silicon Valley the HBO series the the hot dog not hot dog app 01:03:06.180 |
That's actually a real app you can download and it was actually built by Tim on blade as a fast AI student project 01:03:12.680 |
So there's a lot of cool stuff that you can do 01:03:17.420 |
I'm like, yes, it wasn't any nominated. So I think we only have one any nominated deep fast AI alumni at this stage 01:03:31.340 |
The other thing, you know is is the forum threads can kind of turn into these really cool things 01:03:36.940 |
So Francisco who's actually here in the audience. He's a really 01:03:39.940 |
Boring McKinsey consultant like me that's a Francisco and I both have this shameful past that we were McKinsey consultants 01:03:49.180 |
He started this thread saying like oh this stuff. We've just been learning about 01:03:53.900 |
Building NLP and different languages. Let's try and do lots of different languages 01:03:59.100 |
We started this thing called the language model zoo and out of that. There's now been an academic 01:04:05.620 |
Competition was one in Polish that led to an academic paper 01:04:12.940 |
German state-of-the-art basically students have been coming up with new state-of-the-art results across lots of different languages 01:04:19.500 |
And this all is entirely being done by students working together through the forum. So please 01:04:29.340 |
Don't be intimidated because remember a lot of the you know 01:04:32.780 |
Everybody you see on the forum the vast majority posting post all the damn time, right? 01:04:38.460 |
They've been doing this a lot and they do it a lot of the time and so at first it can feel intimidating 01:04:43.580 |
Because it can feel like you're the only new person there 01:04:46.140 |
But you're not that all of you people in the audience everybody who's watching everybody who's listening you're all new people 01:04:52.300 |
that and so when you just get out there and say like 01:04:55.860 |
Okay, all you people getting new state-of-the-art results in German language modeling. I 01:05:00.860 |
Can't start my server. I try to click the notebook and I get an error 01:05:08.780 |
Okay, just make sure you provide all the information. This is the you know, I'm using paper space 01:05:13.740 |
This was a particular instance. I tried to use here's a screenshot of my error 01:05:18.020 |
People will help you. Okay, or if you've got something to add so if people were talking about 01:05:23.860 |
Crop yield analysis and you're a farmer and you think you know, oh I've got something to add 01:05:29.540 |
Please mention it even even if you're not sure it's exactly relevant. It's fine, you know, just get involved 01:05:36.840 |
And because remember everybody else from the forum started out 01:05:39.840 |
Also intimidated. All right, we all start out 01:05:43.560 |
Not knowing things and so just get out there and try it 01:05:58.560 |
There's a question from earlier about why you're using resnet as opposed to inception 01:06:09.360 |
So there are lots of architectures to choose from 01:06:26.160 |
Or image net classification you'll see in first place in second place in third place and fourth place is fast AI 01:06:34.480 |
Jeremy had in fast AI Jeremy had first AI euros clouds from the Department of Defense 01:06:40.160 |
Google resnet resnet resnet resnet resnet resnet. It's good enough 01:06:47.800 |
There are other architectures the main reason you might want a different architecture is if you want to do edge computing 01:06:57.880 |
So if you want to create a model that's going to sit on somebody's mobile phone 01:07:04.360 |
I reckon the best way to get a model onto somebody's mobile phone is to run it on your server and 01:07:11.040 |
It really makes life a lot easier and you get a lot more flexibility 01:07:14.300 |
But if you really do need to run something on a low-powered device, then there are some special architectures for that 01:07:19.560 |
So the particular question was about inception 01:07:24.200 |
That's a particular another architecture which tends to be pretty memory intensive and 01:07:33.160 |
So inception tends to be pretty memory intensive, but it's okay. It's also like 01:07:37.180 |
It's not terribly resilient. One of the things we try to show you is like stuff which just tends to always work 01:07:43.580 |
Even if you don't quite true and everything perfectly 01:07:46.600 |
So resnet tends to work pretty well across a wide range of different 01:07:51.080 |
Kind of details around choices that you might make so I think it's pretty good 01:07:58.720 |
So we've got this train model and so what's actually happened as we'll learn is it's basically 01:08:03.120 |
Creating a set of weights if you've ever done anything like linear regression 01:08:07.840 |
Or logistic regression you'll be familiar with coefficients. We basically found some coefficients and parameters that work pretty well 01:08:16.640 |
So if we want to start doing some more playing around and come back later 01:08:20.240 |
We probably should save those weights so we can save that minute and 56 seconds 01:08:24.200 |
So you can just go learn dot save and give it a name. It's going to put it 01:08:28.760 |
In a model sub directory in the same place the data came from so if you save different models or different data bunches from different 01:08:36.800 |
Datasets, they'll all be kept separate. So don't worry about it 01:08:40.060 |
All right, so we talked about how the most important things are how to learn what goes into your model what comes out 01:08:47.440 |
We've seen one way of seeing what goes in now. Let's see what comes out 01:08:51.920 |
As this is the other thing you need to get really good at 01:08:54.200 |
so to see what comes out we can use this class for 01:09:01.520 |
We're going to use this factory method from learner. So we pass in a learn object. So remember a learn object knows two things 01:09:11.280 |
What is your model? It's now not just an architecture, but it's actually a trained model inside there 01:09:16.440 |
And that's all the information we need to interpret that model. So it's this pass in the learner 01:09:21.560 |
and we now have a classification interpretation object and 01:09:25.160 |
So one of the things we can do and perhaps the most useful things to do is called plot top losses 01:09:32.560 |
So we're going to be learning a lot about this idea of lost functions shortly 01:09:38.720 |
But in short a lost function is something that tells you how good was your prediction and so specifically that means if you predicted 01:09:51.480 |
With great confidence. You said I am very very sure that this is a 01:09:59.680 |
That actually you were wrong then then that's going to have a high loss because you were very confident about the wrong answer 01:10:06.720 |
Okay, so that's what it basically means to have a high loss. So by parting the top losses, we're going to find out 01:10:12.420 |
What were the things that we were the most wrong on or the most confident about what we got wrong? 01:10:35.880 |
So if you we've already seen help right and help just prints out a quick little summary 01:10:41.800 |
but if you want to really see how to do something use doc and 01:10:47.120 |
Tells you the same information has helped, but it has this very important thing, which is 01:10:55.720 |
It pops up the documentation for that method or class or function or whatever 01:11:02.400 |
It starts out by showing us the same information about what is what are the parameters it takes? 01:11:07.400 |
Along with the doc string, but then tells you more information 01:11:12.400 |
So in this case it's another thing that tells me the title of each shows 01:11:24.480 |
So for example, and you can see there's actually some code you can run 01:11:28.480 |
so the documentation always has working code and so in this case it was trying things with handwritten digits and 01:11:34.560 |
So the first one it was predicted to be a 7. It was actually a 3 01:11:42.000 |
5.44 and the probability of the actual class was 01:11:48.600 |
You know, we did not have a high probability associate with the actual class 01:11:52.840 |
I can see why I thought this was a 7 nonetheless. It was wrong. So this is the documentation 01:11:58.320 |
okay, and so this is your friend when you're trying to figure out how to use these things the other thing I'll mention is if 01:12:06.080 |
Somewhat experienced Python programmer you'll find the source code of fast AI really easy to read 01:12:11.160 |
We try to write everything in just a small number of you know 01:12:14.780 |
Much less than half a screen of code generally four or five lines of code if you click source 01:12:19.240 |
You can jump straight to the source code. Alright, so here is 01:12:23.640 |
The plot top losses and this is also a great way to find out 01:12:27.720 |
How to use the faster AI library because every line of code here nearly every line of code is calling stuff in the fast AI library 01:12:36.040 |
Okay, so don't be afraid to look at the source code 01:12:42.240 |
I've got another really cool trick about the documentation that you're going to see a little bit later 01:12:48.400 |
So that's how we can look at these top losses and these are perhaps the most important image classification 01:12:54.900 |
Interpretation tool that we have because it lets us see 01:12:58.960 |
What are we getting wrong and quite often you like in this case? 01:13:03.500 |
If you're a dog and cat expert, you'll realize that the things it's getting wrong 01:13:09.240 |
Breeds that are actually very difficult to tell apart and you'd be able to look at these and say oh I can see why 01:13:21.040 |
Another useful tool kind of is to use something called a confusion matrix, which basically shows you for every actual 01:13:31.080 |
How many times was it predicted to be that dog or cat but unfortunately in this case because it's so accurate 01:13:36.960 |
This diagonal basically says oh, it's pretty much right all the time and you can see there's some slightly darker ones like a five here 01:13:43.420 |
But it's really hard to read exactly what that combination is 01:13:46.280 |
So what I suggest you use is instead of if you've got lots of classes don't use a classification or confusion matrix 01:13:52.520 |
But this is my favorite named function in fast AI. I have very proud of this you can call most confused 01:13:59.080 |
And most confused will simply grab out of the confusion matrix the particular 01:14:05.600 |
Combinations have predicted an actual that got wrong the most often 01:14:09.440 |
So in this case the Staffordshire ball terrier was what it should have predicted and instead it predicted an American pitball terrier 01:14:17.680 |
And so forth it should have predicted a Siamese and actually predicted women that happened four times 01:14:21.840 |
This particular combination happened six times 01:14:24.040 |
So this is again a very useful thing because you can look and you can say like with my domain expertise 01:14:29.280 |
Does it make sense that that would be something that was confused about? 01:14:33.280 |
So these are some of the kinds of tools you can use to look at the output 01:14:40.280 |
So how do we make the bottle better we can make it better using fine-tuning? 01:14:45.340 |
So far we fitted for epochs and it ran pretty quickly 01:14:50.760 |
And the reason it ran pretty quickly is that there was a little trick we use these deep learning models these convolutional networks 01:14:57.320 |
They have many layers will learn a lot about exactly what layers are but for now just know it goes through compute 01:15:05.280 |
What we did was we added a few extra layers to the end 01:15:10.060 |
And we only trained those we basically left most of the model exactly as it was so that's really fast 01:15:16.200 |
And if we try to build a model of something that's similar to the original 01:15:21.080 |
Pre-trained model so in this case similar to the image net data that works pretty well 01:15:26.800 |
But what we really want to do is actually go back and train the whole model 01:15:31.280 |
So this is why we pretty much always use this two-stage process so by default 01:15:36.520 |
When we call fit will fit one cycle on a confliner 01:15:42.480 |
It'll just fine-tune these few extra layers added to the end and it will run very fast. It'll basically never over fit 01:15:53.040 |
unfreeze and unfreeze is the thing that says please train the whole model and 01:16:12.160 |
We're actually going to have to learn more about exactly what's going on behind the scenes 01:16:18.280 |
So let's start out by trying to get an intuitive understanding of what's going on behind the scenes and again 01:16:26.640 |
We're going to start with this picture these pictures come from a fantastic paper by Matt Zyla who nowadays is CEO of clarify 01:16:35.160 |
which is a very successful computer vision start-up and 01:16:42.160 |
And they created a paper showing how you can visualize the layers of a convolutional neural network 01:16:48.600 |
So a convolutional neural network will learn mathematically about what the layers are shortly 01:16:53.120 |
But the basic idea is that your red green and blue pixel values that are numbers from 0 to 255 go into a simple computation 01:17:00.240 |
The first layer and something comes out of that and then the result of that goes into a second layer 01:17:08.480 |
There can be up to a thousand layers of a neural network 01:17:15.760 |
ResNet 34 has 34 layers ResNet 50 has 50 layers 01:17:19.880 |
But let's look at layer one. There's this very simple computation. It's it's a convolution if you know what they are 01:17:29.000 |
What comes out of this first layer? Well, we can actually visualize these specific coefficients the specific parameters by drawing them as a picture 01:17:37.320 |
There's actually a few dozen of them in the first layer, so we won't draw all of them 01:17:42.480 |
But let's just look at mine at random. So here are nine examples of the actual 01:17:47.280 |
coefficients from the first layer and so these operate on 01:17:51.240 |
groups of pixels that are next to each other and 01:17:54.160 |
So this first one basically finds groups of pixels that have a little horror than a little diagonal line in this direction 01:17:59.720 |
This one finds diagonal lines in the other direction this fine gradients that go from yellow to blue in this direction 01:18:06.360 |
This one finds gradients that go from pink to green in this direction and so forth 01:18:15.480 |
That's layer one of a image net pre-trained convolutional neural net 01:18:23.000 |
takes the results of those filters and does a second layer of computation and it allows it to create so here are nine examples of 01:18:31.160 |
Kind of a way of visualizing this one of the second layer features and you can see it's basically learned to create something that looks for 01:18:42.840 |
This one is learned to find things that find right hand curves 01:18:46.800 |
This one is going to find things that find little circles 01:18:49.840 |
So you can see how layer two like this is the easiest way to see it in layer one 01:18:54.840 |
We have things that can find just one line in layer two 01:18:57.760 |
We can find things that have two lines joined up or one line repeated 01:19:04.040 |
These nine show you nine examples of actual bits of actual photos that activated this filter a lot 01:19:12.760 |
Function math function here was good at finding these kind of window corners and stuff like that 01:19:19.200 |
This little certainly one was very good at finding bits of photos that had circles 01:19:24.200 |
Okay, so this is the kind of stuff you've got to get a really good intuitive understanding for us lately 01:19:29.120 |
The start of my neural net is going to find simple very simple gradients lines 01:19:34.000 |
The second layer can find very simple shapes the third layer can find combinations of those 01:19:42.120 |
Repeating patterns of two-dimensional objects or we can find kind of things that lines that join together 01:19:47.760 |
Or we can find well, what are these things? Well, let's find out. What is this? 01:19:53.560 |
Let's go and have a look at some bits of picture that activated this one highly. Oh 01:19:59.040 |
Mainly they're bits of text. Although sometimes for windows, so it seems to be able to find kind of like repeated 01:20:05.280 |
horizontal patterns and this one here says you have to find kind of 01:20:13.200 |
This one here is kind of finding geometric patterns 01:20:16.360 |
So layer 3 was able to take all the stuff from layer 2 and combine them together 01:20:21.080 |
Layer 4 can take all the stuff from layer 3 and combine them together by layer 4 01:20:27.320 |
we've got something that can find dog faces and 01:20:33.680 |
Yeah various kinds of oh here we are bird legs 01:20:38.400 |
So you kind of get the idea and so by layer 5 we've got something that can find the eyeballs of birds and wizards or 01:20:45.880 |
Faces of particular breeds of dogs and so forth. So you can see how by the time you get to layer 34 01:20:56.800 |
Specific dog breeds and cat breeds right? This is kind of how it works. So 01:21:03.280 |
Trained when we first fine-tuned that pre-trained model 01:21:06.640 |
We kept all of these layers that you've seen so far and we just trained a few more layers on top of all of those 01:21:13.160 |
Sophisticated features that are already being created. Alright, and so now we're fine-tuning 01:21:17.320 |
We're going back and saying let's change all of these. We'll keep that. We'll start with them where they are 01:21:22.960 |
Right, but let's see if we can make them better 01:21:25.560 |
Now it seems very unlikely that we can make these layer one features 01:21:32.080 |
Better like it's very unlikely that the kind of the definition of a diagonal line 01:21:36.560 |
It's going to be different when we look at dog and cat breeds versus the image net data that this is originally trained on 01:21:42.720 |
So we don't really want to change layer one very much if at all 01:21:46.720 |
Where else the last layers, you know this thing of like types of dog face 01:21:52.880 |
Seems very likely that we do want to change that, right? 01:21:55.920 |
So you kind of want this intuition this understanding that the different layers of a neural network represents different levels of kind of 01:22:08.840 |
our attempt to fine-tune this model didn't work is because we actually 01:22:13.560 |
By default it trains all the layers at the same speed 01:22:18.600 |
Right, which is to say it'll update those like things representing diagonal lines of gradients 01:22:23.000 |
Just as much as it tries to update the things that represent the exact specifics of what an eyeball looks like 01:22:32.480 |
To change it we first of all need to go back to where we were before. Okay, we just broke this model, right? 01:22:40.200 |
So if we just go load this brings back the model that we saved earlier. Remember we saved it 01:22:52.440 |
Load that back up. So that's now our models back to where it was before we killed it and 01:23:00.120 |
Learning rate finder. We'll learn about what that is next week 01:23:03.000 |
But for now just know this is the thing that figures out what is the fastest I can train this neural network at? 01:23:11.640 |
Making it zip off the rails and get blown apart. Okay, so we can call learn dot LR find and 01:23:17.560 |
Then we can go learn dot recorder dot plot and that will plot the result of our LR finder 01:23:23.120 |
and what this basically shows you is this is this key parameter that we're going to learn all about of the learning rate and the 01:23:29.240 |
Learning rate basically says how quickly am I updating the parameters in my model? 01:23:34.280 |
and you can see that what happens is as I think this this bottom one here shows me what happens as I increase the learning rate and 01:23:43.200 |
This one here shows what have you know, what's the result? What's the loss? 01:23:46.640 |
And so you can see once the learning rate gets past ten to the negative four my loss gets 01:23:54.680 |
It actually so happens. In fact, I can check this if I press shift tab here. My learning rate defaults to 01:24:02.120 |
Point oh, oh three. So my default learning rate is about here 01:24:06.760 |
So you can see why our loss got worse, right because we're trying to fine-tune things now 01:24:13.440 |
So based on the learning rate finder, I tried to pick something, you know 01:24:21.720 |
So I decided to pick one in x6. So I decided I'm going to train at that rate 01:24:28.040 |
But there's no point trading all the layers at that rate because we know that the later layers worked just fine 01:24:35.640 |
Before when we were training much more quickly again of the default which was to remind us 01:24:44.480 |
So what we can actually do is we can pass a range of learning rates to learn dot fit and we do it like this 01:24:51.400 |
You pass and use this keyword in fact in Python. You may have come across before it's called slice and that can take a 01:24:58.640 |
Start value and a stock value and basically what this says is train the very first players 01:25:07.920 |
The very last layers at a rate of one in egg four and then kind of distribute all the other layers 01:25:14.480 |
Across that, you know between those two values equally 01:25:19.200 |
So we're going to see that in a lot more detail. Basically for now 01:25:23.440 |
This is kind of a good rule of thumb is to say when you after you unfreeze 01:25:30.640 |
So this is the thing that's going to train the whole thing 01:25:33.960 |
Pass a max learning rate parameter pass it a slice 01:25:38.120 |
Make the second part of that slice about ten times smaller than your first stage 01:25:44.600 |
So our first stage defaulted to about one in egg three 01:25:47.300 |
So let's use about one in egg four and then this one should be a value from your learning rate finder 01:25:53.520 |
Which is well before things started getting worse and you can see things are starting to get worse 01:25:58.000 |
Maybe about here. So I picked something that's at least ten times more than that 01:26:13.600 |
Yeah a bit better, right? So we've gone down from a six point one percent to a five point seven percent 01:26:19.360 |
So that's about a ten percentage point relative improvement with another 58 seconds of training. So I 01:26:26.200 |
Would perhaps say for most people most of the time these two stages are enough to get 01:26:36.000 |
You won't win a Kaggle competition particularly because now a lot of fast AI alumni are competing on Kaggle and this is the first thing 01:26:45.360 |
But you know in practice you'll get something that's you know about as good in practice as the vast majority of practitioners can do 01:26:53.280 |
We can improve it by using more layers and we'll do this next week by basically doing a resnet 50 instead of a resnet 34 01:27:04.040 |
And you can try running this during the week if you want to you'll see it's exactly the same as before 01:27:11.880 |
What you'll find is it's very likely if you try to do this, you will get an error 01:27:17.600 |
And the error will be your GPU is run out of memory 01:27:21.240 |
and the reason for that is that resnet 50 is bigger than resnet 34 and 01:27:26.000 |
Therefore it has more parameters and therefore it uses more of your graphics cards memory 01:27:30.800 |
Just totally separate to your normal computer RAM. This is GPU RAM 01:27:34.640 |
if you're using the kind of default salamander AWS 01:27:40.360 |
And so forth suggestion then you'll be having a 16 gig of 01:27:46.120 |
GPU memory the part I use most the time has 11 gig of GPU memory 01:27:55.440 |
That's kind of the main range you tend to get if yours has less than 8 gig of GPU memory 01:28:04.960 |
And it's very likely that we try to run this you'll get an out-of-memory memory error 01:28:09.160 |
And that's because it's just trying to do too much too many parameter updates for the amount of RAM you have 01:28:22.560 |
Batch size BS for batch size and this basically says how many images do you train at one time? 01:28:29.440 |
If you run out of memory, just make it smaller. Okay, so this worked for me on an 11 gig card 01:28:34.960 |
It probably won't work for you if you've got an 8 gig card if you do just make that 32 01:28:39.640 |
It's fine to use a smaller batch size it just it might take a little bit longer 01:28:45.680 |
That's all okay. If you've got a bigger like a 16 gig you might be able to get away with 64 01:28:51.360 |
Okay, so that's just one number you'll need to try during the week 01:28:58.600 |
We get down to a four point four percent area 01:29:01.800 |
So this is pretty extraordinary. You know I was pretty surprised because I mean 01:29:07.400 |
When we first did in the first course just cats versus dogs. We were kind of getting 01:29:13.960 |
Somewhere around a three percent error for something where you've got a 50% chance of being right and the two things look totally different 01:29:21.000 |
so the fact that we can get a four point four percent error for something for such a 01:29:29.080 |
In this case I am frozen fitted a little bit more went from four point four to four point three five 01:29:37.280 |
Basically resonant 50 is already a pretty good model 01:29:39.480 |
It's interesting because again you can call most confused here, and you can see the kinds of things that it's 01:29:49.200 |
Getting wrong, and I actually depending on when you run it. You're going to get slightly different numbers, but you'll get roughly the same kinds of things 01:29:58.040 |
So quite often I find that ragdoll and burman are things that it gets confused 01:30:02.360 |
And I actually have never heard of either of those things so I actually looked them up on the internet 01:30:10.560 |
Found a page on the cat site called is this a burman or ragdoll and there is a long thread of cat experts like 01:30:19.680 |
Arguing intensely about which it is so I feel fine that my computer had problems 01:30:29.400 |
Found something similar. I think what's this pit wall versus Staffordshire ball terrier apparently the main difference is like the particular kennel club 01:30:36.640 |
Guidelines as to how they are assessed, but some people think that one of them might have a slightly red in those 01:30:42.040 |
So this is the kind of stuff where actually even if you're not a domain expert 01:30:47.200 |
It helps you become one right because I now know 01:30:50.260 |
More about which kinds of pet breeds are hard to identify than I used to 01:30:56.120 |
So model interpretation works both ways. So what I want you to do this week is to run 01:31:02.080 |
This notebook, you know, make sure you can get through it 01:31:06.280 |
but then what I really want you to do is to get your own image data set and actually 01:31:13.060 |
Francisco who I mentioned earlier he started the language to model thread and he's you know 01:31:18.600 |
Now helping to TA the course. He's actually putting together a guy that will show you how to download data 01:31:25.760 |
From Google images so you can create your own data set to play with but before I do I want to 01:31:37.880 |
Because how to create labels in lots of different ways because your data set wherever you get it from won't necessarily 01:31:44.680 |
Be that kind of regex based approach. It could be in lots of different formats 01:31:50.600 |
So just showing you how to do this. I'm going to use the MNIST sample. MNIST is pictures of hand-drawn numbers 01:31:57.220 |
Just because I want to show you different ways of 01:32:10.160 |
Basically looks like this so I can go path.ls 01:32:16.280 |
And you can see it's got a training set and a validation set already 01:32:20.000 |
So basically the people that put together this data set have already decided what they want you to use as a validation set 01:32:31.000 |
You'll see there's a folder called three and a folder called seven 01:32:35.560 |
That's just really really common way to just to give people labels. It's basically to say oh everything. That's a three 01:32:41.840 |
I'll put in a folder called three everything. That's a seven. I'll put in a folder called seven 01:32:45.880 |
That this is a often called an image net style data set because this is how image net is distributed 01:32:52.240 |
So if you have something in this format where the labels are just whatever the folders called you can say from folder 01:33:00.200 |
Okay, and that will create an image data bunch for you. And as you can see three seven 01:33:06.360 |
It's created the labels just by using the folder names 01:33:12.040 |
Another possibility and as you can see we can train that get ninety nine point five five percent accuracy blah blah blah 01:33:16.960 |
Another possibility and for this endless sample. I've got both it might come with a CSV file 01:33:22.520 |
That would look something like this for each file name. What's its label now this case the labels are three or seven 01:33:29.580 |
There's zero or one which is basically is it a seven or not? 01:33:33.120 |
All right, so that's another possibility. So if this is how your labels are you can use from CSV 01:33:39.400 |
And if it's called labels dot CSV, you don't even have to pass in a file name if it's called anything else 01:33:47.480 |
Okay, so that's how you can use a CSV. Okay. There it is. This is now is it a seven or not? 01:33:53.640 |
Another possibility and then you can call data dot passes to see what it found another possibility is as we've seen is you've got 01:34:04.680 |
And so in this case, this is the same thing. These are the folders 01:34:10.400 |
The label by using a regular expression and so here's the regular expression 01:34:15.840 |
So we've already seen that approach and again, you can see data dot classes has found it 01:34:20.280 |
So what if you it's something that's in the file name of a path, but it's not just a regular expression. It's more complex 01:34:27.440 |
You can create an arbitrary function that extracts a label from the file name or path 01:34:33.680 |
And in that case you would say from name and function 01:34:42.600 |
Is that even you need something even more flexible than that? 01:34:46.440 |
And so you're going to write some code to create an array of labels and so in that case you can just pass in 01:34:52.360 |
From lists. So here is I've created an array of labels here. My labels is from lists 01:34:58.240 |
Okay, and then I just pass in that break so you can see there's lots of different ways of creating labels. So so during the week 01:35:07.880 |
How would you know to do all these things? Like where am I going to find? 01:35:11.160 |
This kind of information, right? How would I how do you possibly know to do all this stuff? 01:35:16.920 |
So I'll show you something incredibly cool. Let's grab this function and 01:35:21.840 |
Do you remember to get documentation we type doc? 01:35:25.960 |
And here is the documentation for the function and I can click show in dots and 01:35:39.560 |
Every single line of code I just showed you I took it this morning and I copied and pasted it from the documentation 01:35:51.000 |
Code that I just used so the documentation for fast AI doesn't just tell you 01:35:56.600 |
what to do, but step to step how to do it and 01:36:01.320 |
Here is perhaps the coolest bit if you go to fast AI 01:36:07.760 |
Fast AI underscore Doc's and click on Doc's sauce 01:36:15.720 |
It turns out that all of our documentation is actually just do better notebooks. So in this case I was looking at vision dot data 01:36:25.120 |
So here is the vision dot data notebook you can download this repo you can get clone on and 01:36:37.040 |
Every single line of the documentation yourself 01:36:40.160 |
Okay, so so all of our docs is also code and so like this is the kind of the ultimate example to me of 01:36:54.480 |
You can now experiment and you'll see in in github 01:36:59.600 |
It doesn't quite render properly because github doesn't quite know how to render notebooks properly 01:37:04.120 |
But if you get clone this and open it up in Jupiter 01:37:09.240 |
Anything that you read about in the documentation 01:37:11.560 |
Really everything in the documentation has actual working examples in it with actual data sets that are already sitting in there in the repo 01:37:17.920 |
For you and so you can actually try every single function in your browser 01:37:23.200 |
Try seeing what goes in and try seeing what comes out 01:37:30.240 |
Will the library use multi GPU and parallel by default? 01:37:33.920 |
The library will use multiple CPUs by default, but just one GPU by default 01:37:40.160 |
We've probably won't be looking at multi GPU until part two. It's easy to do and you'll find it on the forum, but 01:37:49.760 |
And the second question is whether the library can use 01:38:00.880 |
And there is actually a forum thread about that already 01:38:03.960 |
Although that's not as developed as 2d yet, but maybe by the time the MOOC is out it will be 01:38:08.960 |
So before I wrap up I'll just show you an example of the kind of interesting stuff that you can do by 01:38:20.160 |
Remember earlier I mentioned that one of our alums who works at Splunk. It's just a 01:38:31.800 |
This is actually how he created it as part of a fast AI part one class project 01:38:38.120 |
He took the telemetry of the new of users who had Splunk analytics installed and watched their mouse movements 01:38:45.840 |
And he created pictures of the mouse movements. He converted speed into 01:38:50.120 |
Color and right and left clicks into splodges 01:38:55.000 |
he then took the exact code that we saw with an earlier version of the software and 01:39:00.000 |
Trained a CNN in exactly the way we saw and used that to train his fraud model 01:39:06.240 |
So he basically took something which is not obviously a picture and he turned it into a picture 01:39:11.480 |
And got these fantastically good results for a piece of fraud analysis software. So it pays to think 01:39:20.160 |
Creatively, so if you're wanting to study sounds a lot of people that study sounds do it by actually creating a spectrogram image and 01:39:27.280 |
Then sticking that into a confident. So there's a lot of cool stuff you can do with this 01:39:33.120 |
Get your get your GPU going try and use your first notebook 01:39:36.520 |
Make sure that you can use lesson one and work through it and then see if you can repeat the process on your own data 01:39:43.640 |
Set get on the forum and tell us any little success you had. It's like, oh, I spent three days trying to get my GPU running 01:39:55.640 |
You know try it for an hour or two, but if you get stuck, please ask 01:39:59.320 |
And if you're able to successfully build a model with a new data set