Back to Index

Lesson 1: Deep Learning 2019 - Image classification


Transcript

Okay so Welcome Practical deep learning for coders lesson one it's kind of lesson two because There's a lesson zero and lesson zero is is why do you need a GPU and how do you get it set up? So if you haven't got a GPU running yet then go back and do that make sure that you can access a Jupiter notebook and And then you're ready to start the real lesson one.

So if you're ready you will be able to see something like this and In particular, hopefully you have gone to notebook tutorial. It's at the top That's right, but a 0/0 here as this grows. You'll see more and more files, but we'll keep notebook tutorial at the top and You will have used your Jupiter notebook to add one and one together get in the expected result Let's make that a bit bigger And hopefully you've learned these four keyboard shortcuts so The basic idea is that your Jupiter notebook Has pros in it.

It can have pictures in it. It can have Charts in it And most importantly it can have code in it. Okay, so the code is in Python How many people have used Python before so nearly all of you that's great Um, if you haven't used Python, that's totally okay.

All right It's a pretty easy language to pick up. But if you haven't used Python This will feel a little bit more intimidating because the code that you're seeing will be unfamiliar to you. Yes, Rachel Oh No, cuz I Try to keep the most sacred. Yeah. Yeah. Okay. We're not the way here.

I'll edit this bit out So as I say, there are things like this where people in the room in person This is one of those bits just like this is really for the MOOC audience Not for you. That's I think this will be the only time like this in the in the lesson where we've assumed You've got this set up Thanks to the mother.

Okay All right, so yeah for those of you in the room or on for or in faster you're live You can go back after this and make sure that you can get this running using the information in course the three dot faster They I Okay, okay Okay, so a Jupyter notebook is a Really interesting Device for a data scientist because it kind of lets you run interactive experiments and it lets ask if you not just a Static piece of information, but it let it let's ask if you something that you can actually interactively experiment with so let me explain how we Think works well to use these notebooks and to use this material and this is based on the kind of last three years of experience We've had with the students who have gone through this course First of all, it works pretty well just to watch a lesson end-to-end Don't try and follow along because it's not really designed to go at a speed where you can follow along It's designed to be something where you just take in the information you get a general sense of all of the pieces how it all Fits together, right?

And then you can go back and go through it most slowly pausing on in the video And trying things out making sure that you can do the things that I'm doing And that you can try and extend them to do it things in your own way. Okay, so don't worry if Things are zipping along Faster than you can do them.

That's normal. Also don't try and stop and understand everything the first time if you do understand Everything the first time good for you But most people don't particularly as the lessons go on they get faster and they get more difficult. Okay So at this point we've got our notebooks going we're ready to start doing deep learning And so the main thing that hopefully you're going to agree at the end of this is that you?

Can do deep learning regardless of who you are and we don't just mean do we mean do at a very? High level I mean world-class practitioner level deep learning So Your main place to be looking for things is course the three dot fast dot AI Where you can find out how to get a GPU Other information and you can also access our forums You can also access our forums and on our forums you'll find things like how do you build a Deep learning box yourself and that's something that you can do up, you know later on once you've kind of got going Who am I?

So why should you listen to me? Well, maybe you shouldn't but I'll try and justify why you should listen to me I've been doing stuff with machine learning for over 25 years. I Started out in management consulting where actually initially I was I think McKinsey and companies first analytical specialist and went into a general consulting Ran a number of startups for a long time Eventually became the president of Kaggle, but actually the thing I'm probably most proud of in my life Is that I got to be the number one ranked contestant in Kaggle competitions globally?

So I think that's a good Practical like can you actually train a predictive model that predicts things pretty important aspect of data science? I then founded a company called in Linux, which was the first kind of medical deep learning company nowadays, I'm on the faculty at University of San Francisco and also co-founder with Rachel of fast AI So I've used Machine learning throughout that time and I guess I'm not really although I am at USF for the university I'm not really an academic type.

I'm much more interested in in using this tool to do useful things Specifically through fast AI we are trying to help people use deep learning to do useful things through creating software To make deep learning easier to use at a very high level through education such as the thing you're watching now Through research which is where we spend a very large amount of our time, which is researching to figure out How can you make deep learning easier to use at a very high level?

Which ends up in as you'll see in the software and the education and by helping to build a community? Which is mainly through the forums so that practitioners can find each other and work together So that's what we're doing So this lesson practical deep learning for coders is kind of the starting point in this journey It contains seven lessons each one's about two hours long We're then expecting you to do about eight to ten hours of homework during the week So it'll end up being something around 70 or 80 hours of work I will say there is a lot as to how much people put into this I know a lot of people who who work full-time on fast AI Some folks whose do the two parts can spend a whole year doing it really intensively.

I know some folks Watch the videos on double speed and never do any homework and come at the end of it with you know A general sense of what's going on. So there's lots of different ways you can do this but if you follow along with this kind of Ten hours a week or so approach for the seven weeks by the end you will be able to build an image classification Model on pictures that you choose that will work at a world-class level You'll be able to classify text again using whatever data sets you're interested in You'll be able to make predictions of kind of commercial applications like sales You'll be able to build recommendation systems such as the one used by Netflix Not toy examples of any of these but actually things that can Come top ten and capital competitions that can be everything that's in the academic community Very very high-level versions of these things.

So that might surprise you. That's like, you know, the prerequisite here is Literally one year of coding and high school math But we have Thousands of students now who have done this and shown it to be true You will probably hear a lot of naysayers Less now than a couple of years ago than we started but a lot of naysayers telling you that you can't do it Or that you shouldn't be doing it or the deep learnings got all these problems It's not perfect.

But these are all things that people claim about Deep learning which are either pointless or untrue It's not a black box as you'll see it's really great for interpret interpreting what's going on It does not need much data for most practical applications. You certainly don't need a PhD Rachel has one so it doesn't actually stop you from doing deep learning if you have a PhD I certainly don't I have a philosophy degree and nothing else It could be used very widely for lots of different applications not just for vision which is where it's most well known You don't need lots of hardware, you know That's a thirty six cents an hour server is more than enough to get world-class results for most problems It's true that maybe this is not going to help you to build a sentient brain But that's not our focus.

Okay, so For all the people who say deep learning is not interesting because it's not really AI Not really a conversation that I'm interested in we're focused on solving interesting real-world problems What are you going to be able to do by the end of lesson one? Well, this was an example from Nicole who's actually in the audience now because he was in last year's course as well this is an example of something he did which is he downloaded 30 images of people playing cricket and people playing baseball and ran the code you'll see today and built a Nearly perfect classifier of which is which So this kind of it's kind of stuff that you can build with some fun hobby examples like this Or you can try stuff as we'll see in the workplace that it could be of direct commercial value So this is the idea of where we're going to get to by the end of lesson one We're going to start by looking at code Which is very different to?

Many of the academic courses so for those of you who haven't kind of an engineering or math or computer science background This is very different to the approach where you start with lots and lots of theory and eventually you get to a postgraduate degree And you finally are at the point where you can build something useful.

We're going to learn to build the useful thing today Now that means that at the end of today You won't know all the theory. Okay, there will be lots of aspects of what we do that You don't know why or how it works. That's okay. You will learn why and how it works over the next seven weeks But for now we found that what works really well is to actually get your hands dirty coding not focusing on theory because There's still a lot of Artisanship in deep learning.

Unfortunately, it's still a situation where people who are good practitioners Have a really good feel for how to work with the code and how to work with the data and you can only get that through experience And so the best way to get that that that feel of how to get good models is to create lots of models through lots of coding and Study them carefully and it's Jupiter notebook provides a really great way to study them.

So Let's try that Let's try getting started. So to get started you will open your Jupiter notebook and You'll click on lesson one Lesson one pets and it will pop open looking something like this. And so here it is so you can Run a cell in a Jupiter notebook by clicking on it and pressing run but if you do so everybody will know that you're not a real deep learning practitioner because real deep learning practitioners know the keyboard shortcuts and The keyboard shortcut is shift enter given how often you have to run a cell Don't be Going all the way up here finding it clicking it just shift enter.

Okay, so type type type shift enter type Type shift enter up and down to move around to pick something to run shift enter to run it So we're going to go through this quickly and then later on we're going to go back over it more carefully So here's the quick version to get a sense of what's going on So here we are in lesson one and these three lines is what we start every notebook with These things starting with percent are special directives to Jupiter notebook itself.

They're not Python code. They're called magics Just kind of a cool name and these three directives the details aren't very important But basically it says hey if somebody changes the underlying library code while I'm running this Please reload it automatically if somebody asks to plot something then please plot it here in this Jupiter notebook So just put those three lines at the top of everything The next two lines load up the fast AI library What is the fast AI library?

So it's a little bit confusing fast AI with no dot is the name of our software and then fast dot AI With the dot is the name of our organization So if you go to docs dot fast dot AI This is the fast AI Library, okay, we'll learn more about it in a moment But for now just realize everything we are going to do is going to be using basically either fast AI Or the thing that fast AI sits on top of which is pytorch Pytorch is one of the most popular libraries for deep learning in the world It's a bit newer than TensorFlow.

So in a lot of ways, it's more modern than TensorFlow It's Extremely fast-growing extremely popular and we use it because we used to use TensorFlow a couple of years ago And we found we can just do a lot more a lot more quickly with pytorch And then we have this software that sits on top of pytorch unless you do Far far far more things far far far more easily than you can with pytorch alone So it's a good combination.

We'll be talking a lot about it. But for now, just know that you can use fast AI by doing two things importing star from fast AI and then importing star from fast AI dot Something where something is the application you want and currently fast AI supports four applications computer vision natural language text Tabular data and collaborative filtering and we're going to see lots of examples of all of those during the seven weeks So we're going to be doing some computer vision At this point if you are a Python software engineer, you are probably Feeling sick because you've seen me go import star, which is something that you've all been told to never ever do Okay, and there's very good reasons to not use import star in standard production code with most libraries But you might have also seen for those of you that have used something like Matlab It's kind of the opposite everything's there for you all the time.

You don't even have to import things a lot of the time It's kind of funny. We've got these two extremes of like how do I code you've got a scientific Programming community that has one way and then you've got the software engineering community that has the other Both have really good reasons for doing things and with the fast AI library.

We actually support both approaches Indeed you put a notebook where you want to be able to quickly interactively try stuff out You don't want to be constantly going back up to the top and importing more stuff and trying to figure out where things are You want to be able to use lots of tab complete be you know, very experimental.

So import star is great Then when you're building stuff in production You can do the normal Pepe style, you know proper software engineering practices. So So don't worry When you see me doing stuff which at your workplace is found upon, okay, it's it's this is a different style of coding It's not that There are no rules in data science programming It's that the rules are different right when you're training models The most important thing is to be able to interactively experiment quickly.

Okay, so you'll see we use a lot of very different Processes styles and stuff to what you're used to but they're there for a reason And you'll learn about them over time. You can choose to use a similar approach or not. It's entirely up to you The other thing to mention is that the fast AI libraries In a real designed in a very interesting modular way and you'll find over time that when you do use import star There's far less clobbering of things than you might expect It's all explicitly designed to allow you to pull in things and use them quickly without having problems Okay, so we're going to look at some data and There's two main places that will be tending to get data from for the course one is from academic data sets Academic data sets are really important.

They're really interesting They're things where academics spend a lot of time Curating and gathering a data set so that they can show how well different kinds of approaches work with that data The idea is they try to design data sets that are Challenging in some way and require some kind of breakthrough to do them.

Well So we're going to be starting with an academic data set called the pet data set The other kind of data set will be using during the course is data sets from the Kaggle competitions platform Both academic data sets and Kaggle data sets are interesting for us Particularly because they provide strong baselines that is to say you want to know if you're doing a good job so with Kaggle data sets that have come from a competition you can actually submit your results to Kaggle and see how well would You have gone in that competition and if you can get in about the top 10% then I'd say you're doing pretty well for academic data sets Academics write down in papers what the state-of-the-art is so how well did they go with using models on that data set?

So this is this is what we're going to do. We're going to try and create Models that get right up towards the top of Kaggle competitions preferably actually in the top 10 not just the top 10% Or that meet or exceed academic state-of-the-art published results so the When you use an academic data set It's important to site it so you'll see here There's a link to the paper that it's from you definitely don't need to read that paper right now But if you're interested in learning more about it, and why it was created, and how it was created all the details are there So in this case this is a pretty difficult challenge the pet data sets going to ask us to distinguish between 37 different categories of dog breed and cat breed so that's really hard in fact Every course until this one We've used a different data set which is one where you just have to decide is something a dog or is it a cat?

So you've got a 50/50 chance right away, right and dogs and cats look really different There are lots of dog breeds and cat breeds look pretty much the same so why have we changed that data set? We've got to the point now where deep learning is so fast and so easy that the dogs versus cats problem Which a few years ago was considered extremely difficult 80% accuracy was the state-of-the-art.

It's now too easy Our models were basically getting everything right all the time without any tuning And so they weren't you know really a lot of the opportunities for me to show you how to do more sophisticated stuff So we've picked a harder problem this year So this is the first class where we're going to be learning how to do this difficult problem and this kind of thing where you Have to distinguish between similar categories is called in the academic context.

It's called fine-grained classification so we're going to do the fine-grained classification tasks of Figuring out a particular kind of pet and so the first thing we have to do is download and extract The data that we want we're going to be using this function called Antar data which will download it automatically and will enter it automatically AWS has been kind enough to give us lots of space and bandwidth for these data sets so they'll download super quickly for you And so the first question then would be how do I know what Antar data?

does So you can just type help and you will find out What module did it come from because since we imported start we don't necessarily know that What does it do and? something you might not have seen before even if you're an experienced programmer is What exactly do you pass to it?

You're probably used to seeing the names URL file name Destination that you might not be used to seeing These bits these bits of types and if you've used a type programming language, you'll be used to seeing them But Python programmers are less used to it, but if you think about it You don't actually know how to use a function unless you know what type each thing is that you're providing it So we make sure that we give you that type information Directly here in the help so in this case the URL is a string and the file name is either union means either either a path or a string and it defaults to nothing and The destination is either a path or a string that defaults to nothing So we'll learn more shortly about how to get more documentation about the details of this But for now we can see we don't have to pass in a file name or a destination It'll figure them out for us from the URL so and for all the data sets We'll be using in the course.

We already have constants to find For all of them right so in this URLs module class actually You can see that's where it's going to grab it from okay, so it's going to download that to some Convenient path and untie it for us and will then return the value of path Okay, and then in Jupyter notebook.

It's kind of handy You can just write a variable on its own right and semicolon is just an end a statement marker in Python So it's the same as doing this you can write it on its own and it prints it you can also say print Right, but again, we're trying to do everything fast and interactively this right it and here is the path Where it's given us our data Next time you run this Since you've already downloaded it it won't download it again since you've already untied it it won't untie it again So everything's kind of designed to be pretty automatic pretty easy There are some things in Python that are less convenient for interactive use than they should be for example when you do have a path object Seeing what's in it actually is takes a lot more typing than I would like so sometimes we add Functionality into existing Python stuff one of the things we do is we add an LS method to paths so if you go path to LS Here is what's inside?

This path so that's what we just downloaded so when you try this yourself You wait a couple minutes for it to download unzip and then you can see what's in there If you're an experienced Python programmer You may not be familiar with this approach of using a slash like this now.

This is a really convenient function That's part of Python 3 its functionality from something called path lib these are path objects path objects are much better to use than strings That lets you basically create sub paths like this doesn't matter if you're on Windows Linux Mac It's always going to work exactly the same way So here's a path to the images in that data set Alright, so if you're starting with a brand new data set try to do some deep mining on it What do you do?

Well the first thing you would want to do is probably see what's in there, so we found that these are the Directories that in there, so what's in this images? There's a lot of functions in fast.io for you. There's one called get image files that will just grab a Array of all of the image files based on extension in a path and so here you can see We've got lots of different files Okay, so this is a pretty common way to Image computer vision data sets to get passed around is such as just one folder with a whole bunch of files in so the interesting bit then is How do we get the labels so in machine learning the labels refer to the thing?

We're trying to predict and if we just eyeball this we can immediately see that the labels Actually part of the file name you see that right. It's kind of like path label underscore number extension so we need to somehow get a list of These bits of each file name and that will give us our labels Because that's all you need to build a deep learning model.

You need some pictures so files containing the images and you need some labels so in fast AI this is made really easy there's a object called image data bunch and an image data bunch represents all of the data you need to build a model and There's basically some factory methods which try to make it really easy for you to create that data bunch We'll talk more about this shortly, but a training set and the validation set with images and labels for you Now in this case we can see we need to extract the labels from the names Okay, so we're going to use from name re so for those of you that use Python You know re is the module in Python that does regular expressions things.

That's really useful for extracting Text I just went ahead and created The regular expression that would extract the label from this text, okay, so those of you who? Not familiar with regular expressions super useful tool It'd be very useful to spend some time figuring out how and why that particular regular expression is going to extract the label From this text.

Okay, so with this factory method we can basically say okay. I've got this path containing images This is a list of file names. Remember I got them back here This is the regular expression pattern that is going to be used to extract the label from the file name We'll talk about transforms later And then you also need to say what size images do you want to work with?

So that might seem weird. Why do I need to say what size images? I want to work with because the images have a size we can see what size the images are and I guess honestly this is a Shortcoming of current deep learning technology, which is that a GPU Has to apply the exact same instruction to a whole bunch of things at the same time in order to be fast and So if the images are different shapes and sizes, it can't do that All right, so we actually have to make all of the images the same shape and size In part one of the course, we're always going to be making Images square shapes and part two will learn how to use rectangles as well It turns out to be surprisingly nuanced But pretty much everybody in pretty much all computer vision modeling nearly all of it uses this approach of square And 224 by 224 for reasons we'll learn about is an extremely common size that most models tend to use So if you just use size equals 224 You're probably going to get pretty good results most of the time and this is kind of The little bits of art is in the ship that I want to teach you folks, which is like what generally just works Okay, so if you just use size equals 224 that'll generally just work for most things most of the time So this is going to return a Data bunch object and in fast AI everything you model with is going to be a data bunch object We're going to learn all about them and what's in them and how do we look at them and so forth?

So basically a data bunch object contains two or three Data sets it contains your training data We'll learn about this shortly. It'll contain your validation data and optionally it contains your test data and for each of those it contains your Your images and your labels or your texts on your labels or your tabular data and your labels or so forth And that all sits there in this one place Something we'll learn more about a little bit is Normalization but generally in all nearly all machine learning tasks you have to make all of your data About the same size specifically about the same mean and about the same standard deviation So there's a normalized function that we can use to normalize our data bunch in that way Okay, Rachel come and ask the question What is the function do if the image size is not to 24?

great, so This is what we're going to learn about shortly Basically this thing called transforms is is used to do a number of things and one of the things it does is to make something size 224 Let's take a look at a few pictures. Here are a few pictures of Things from my data from my data bunch so you can see data.show batch Can be used to show me the contents of some of the contents of my data bunch So this is going to be three by three And you can see roughly what's happened is that they all seem to have been kind of Zoomed and cropped in a reasonably nice way.

So basically what it'll do is something called by default center cropping Which means it'll kind of grab the middle bit and it will also Resize it so we'll talk more about the detail of this because it turns out to actually be quite important But basically a combination of cropping and resizing is used Something else we'll learn about is we also use this to do something called data augmentation So there's actually some randomization in how much and where it crops and stuff like that Okay, but that's the basic idea is some cropping and some resizing But often we also also do some some padding so there's all kinds of different ways And it depends on data augmentation, which we're going to learn about shortly And what does it mean to normalize the images?

So I'm normalizing the images we're going to be learning more about later in the course But in short it means that the the pixel values and we're going to be learning more about pixel values the pixel values start out from 0 to 255 and some pixel values might tend to be really I Should say some channels because there's red green and blue so some channels might tend to be Really bright and some might tend to be really not bright at all and some might vary a lot and some might not very much at all It really helps train a deep learning model if each one of those red green and blue channels has a mean of zero And a standard deviation of one.

Okay, we'll learn more about that if you Haven't studied or don't remember means and standard deviations. We'll get back to some of that later, but that's the basic idea That's what normalization does if your data and again, we'll learn much more about the details But if your data is not normalized it can be quite difficult for your model to train well So if you do have trouble training a model one thing to check is that you've normalized it As GPU man will be empowered to doesn't size 256 on more practical considering GPU little utilization So what we're going to be getting into that shortly, but the brief answer is that the Models are designed so that the final layer is of size 7 by 7 So we actually want something where if you go 7 times to a bunch of times then you end up with something.

It's a good size Yeah, all of these details we are gonna we are going to get to but the key thing is I wanted to get you Training a model as quickly as possible But you know one of the most important things to be a really good practitioner is to be able to look at your data Okay, so it's really important to remember to go data show batch and take a look It's surprising how often when you actually look at the data set you've been given that you realize it's got Weird black borders on it or some of the things have text covering up some of it or some of its rotated in odd ways So make sure you take a look And then the other thing we're going to look at do is not just look at the pictures But also look at the labels and so all of the possible label names Accord your classes.

That's where the data bunch you can print out your data dot classes And so here they are that's all of the possible labels that we found by using that regular expression on the file names And we learned earlier on in that pros are right at the top that there are 37 Possible categories and so just checking length data dot classes.

It is indeed 37 A data bunch will always have a property called C And that property called C the technical details will kind of get to later But for now you can kind of think of it as being the number of classes For things like regression problems and multi-label classification and stuff.

That's not exactly accurate But it'll do for now It's it's important to know that data dot C is a really Important piece of information that is something like or at least for classification problems. It is the number of classes Right believe it or not. We're now ready to train a model and So a model is trained in fast AI using something called a learner And just like a data bunch is a general fast AI concept for your data And from there there are subclasses for particular applications like image data bunch Alona is a general concept for things that can learn To fit the model and from that there are various subclasses to make things easier and in particular There's one called conf loner, which is something that will create a convolutional neural network for you We'll be learning a lot about that over the next few lessons But for now just know that to create a learner for a convolutional neural network.

You just have to tell it two things the first is What's your data and not surprisingly it takes a data bunch and the second thing you need to tell it is? What's your model? Or what's your architecture? So as we learned there are lots of different ways of constructing a convolutional neural network But for now the most important thing for you to know is that there's a particular kind of model called a resnet Which works extremely?

Well nearly all the time and so for a while at least you really only need to be doing Choosing between two things which is what size resnet do you want? That's just basically how big is it and we'll learn all about the details of what that means But there's one quarter resnet 34 and there's one quarter resnet 50 and so when we're getting started with something I'll pick a smaller one because it'll train faster so That's kind of it.

That's as much as you need to know to be a pretty good practitioner about architectures for now, which is that there's two Architectures or two variants of one architecture that work pretty well Resnet 34 and resnet 50 start with a smaller one and see if it's good enough So that is all the information we need to create a convolutional neural network learner There's one other thing I'm going to give it though, which is a list of metrics Metrics are literally just things that get printed out as it's training So I'm saying I would like you to print out the error rate, please Now you can see the first time I ran this on a newly installed box It downloaded something What's it downloading?

It's downloading the resnet 34 pre-trained weights Now what this means is that this particular model has actually already been trained For a particular task and that particular task is that it was trained on looking at about one and a half million Pictures of all kinds of different things a thousand different categories of things using an image a data set called image net and So we can download those pre-trained weights so that we don't start with a model that knows nothing about anything But we actually start with a model that knows how to recognize the a thousand categories of things in image net Now I don't think I'm not sure but I don't think all of these 37 categories of pet or in image net But there were certainly some kinds of dog.

There's certainly some kinds of cat So this pre-trained model already knows quite a little bit about what pets look like and it certainly knows quite a lot about What animals look like and what photos look like? So the idea is that we don't start With a model that knows nothing at all But we start by downloading a model that knows something about recognizing images already So it downloads for us automatically the first time we use it a pre-trained model and then from now on it won't need to download it Again, it'll just use the one we've got This is really important.

We're going to learn a lot about this It's kind of the focus of the whole course, which is how to do this is called transfer learning how to take a model that already knows how to do something pretty well and Make it so that it can do your thing really well I take a pre-trained model and then we fit it so that instead of predicting the a thousand categories of image net with the image Net data it predicts the 37 categories of pets using your pet data And it turns out that by doing this you can train models in 1/100 or less of the time of regular model training with 1/100 or less of the data of regular model training in fact potentially many thousands of times less Remember I showed you the slide of nickels lesson one project from last year.

He used 30 images And there's not cricket and baseball images in image net Right, but it just turns out that image nets already so good at recognizing things in the world They're just 30 examples of people playing baseball and cricket was enough to build a nearly perfect classifier Okay now You would naturally be Potentially saying well wait a minute How do you know that it was going to actually that it can actually recognize pictures of people playing cricket versus baseball in general?

Maybe it just learned to recognize those 30 Maybe it's just cheating right and it's called overfitting. We'll be talking a lot about that during this course, right? But overfitting is where you don't learn to recognize pictures of say cricket versus baseball But just these particular cricketers and these particular photos and these particular baseball players and these particular photos We have to make sure that we don't know the theater And so the way we do that is using something called a validation set a validation set is a set of images That your model does not get to look at and so these metrics Like in this case error rate get printed out automatically using the validation set a set of images that our model never got to see When we created our data bunch It automatically created a validation set for us Okay, and we'll learn lots of ways of creating and using validation sets But because we try to bake in all of the best practices we actually make it nearly impossible For you not to use a validation set because if you're not using a validation set, you don't know if you're overfitting Okay, so we always print out the metrics on a validation set.

We always hold it out We always make sure that the model doesn't touch it. That's all done for you Okay, and that's all built into this data bunch object So now that we have a conf loader We can fit it You can just use a method called fit But in practice, you should nearly always use a method called fit one cycle We'll learn more about this during the course.

But in short one cycle learning is a paper that was released I'm trying to think a few months ago. Listen a year ago Yeah, so a few months ago And it turned out to be dramatically better both more accurate and faster than any previous approach So again, I don't want to teach you how to do 2017 deep learning right in 2018 The best way to fit models is to use something called one cycle.

We'll learn all about it But for now just know you should probably type my own dot fit one cycle, right? If you forget how to type it you can start typing a few letters and hit tab Okay, and you'll get a list of potential options All right, and then if you forget what to pass it you can press shift tab And it'll show you exactly what passes so you don't actually have to type help and again This is kind of nice that we have all the types here because we can see cycle length I will learn more about what that is shortly is an integer and then max learning rate It could either be a float or a collection or whatever and so forth and you can see that the mentors will default to this couple Okay, so For now just know that this number four basically decides how many times do we go through the entire data set?

How many times do we show the data set to the model so that it can learn from it each time? It sees a picture. It's going to get a little bit better But it's going to take time and It means it could over fit it sees the same picture too many times It'll just learn to recognize that picture not pets in general So we'll learn all about how to tune this number during the next couple of lessons but Starting out with four is a pretty good start just to see how it goes and you can actually see after four epochs or four cycles We've got an error rate of six percent So a natural question is how long did that took that took a minute and 56 seconds?

Yeah, so we're paying you know 60 cents an hour We just paid for two minutes of time I mean we actually pay for the whole time that it's on and running if we use two minutes of compute time and we got an error rate of six percent, so 94 percent of the time we correctly picked the exact right one Of those 94 dog and cat breeds which feels pretty good to me But to get a sense of how good it is maybe we should go back and look at the paper Just remember I said the nice thing about using academic papers or Kaggle data sets is we can compare our solution to whatever the best people in Kaggle did or whatever the academics did so this particular data set of pet breeds is from 2012 and If I scroll through the paper you'll generally find in any academic paper There'll be a section called experiments about two-thirds of the way through and if you find the section on experiments Then you can find the section on accuracy, and they've got lots of different models And their models as you'll read about in the paper are extremely kind of pet specific They learn something about how pet heads look and how pet bodies look and pet images in general look They combine them all together and once they use all of this complex code and math they got an accuracy of 59% Okay, so in 2012 This highly pet specific analysis got an accuracy of 59% at least with the top researchers from Oxford University today in 2018 with Basically, if you go back and look at actually how much code we just wrote it's about three lines of code The other stuff is just printing out things to see what we're doing we got 94% so 6% error so like that gives you a sense of You know how far we've come with deep learning and particularly with pytorch and fast AI how easy things are Yes, so Before we take a break.

I just want to check to see if we've got any And just remember if you're in the audience and you see a question that you want asked Please click the love heart next to it so that Rachel knows that you want to hear about it also If there is something with six likes and Rachel didn't notice it, which is quite possible just just quote it in a reply and say Hey at Rachel this one's got six legs.

Okay, so what we're going to do is we're going to take a Eight minute break, so we'll come back at five past eight So where we got to was we just we just trained a model We don't exactly know what that involved or how it happened but we do know that with three or four lines of code we built something which smashed the accuracy of the state-of-the-art of 2012 6% error certainly sounds like pretty impressive for something that can recognize different dog breeds and cat breeds But we don't really know why it works, but we won't that's okay, right and In terms of getting the most out of this course we Very very regularly here after the course is finished the same basic feedback Which this is literally copy and paste it for them forum I fell into the habit of watching the lectures too much and googling too much about concepts without running the code Now first I thought I should just read it and then research the theory And we keep hearing people saying my number one regret is I just spent 70 hours doing that and at the very end I started running the code and oh it turned out I learned a lot more so please run the code really run the code I Should have spent the majority of my time on the actual code in the notebooks running it seeing what goes in and Seeing what comes out so your most important skills to practice our learning and we're going to show you how to do this Not a lot more detail, but understanding what goes in And what goes out?

So we've already seen an example of looking at what goes in Which is data show batch and that's going to show you examples of labels and images and So next we're going to be seeing how to look at what came out All right, so that's the most important thing to study as I said The reason we've been able to do this so quickly is heavily because of the fast AI library now fast AI library is pretty new But it's already getting an extraordinary amount of traction as you've seen all of the major cloud Providers either support it or are about to support it a lot of researchers are starting to use it.

It's it's Doing making a lot of things a lot easier, but it's also making new things possible and so Really understanding the fast AI software is something which is going to take you a long way And the best way to really understand the fast AR software. Well is by using the fast AI Documentation and we'll be learning more about the fast AI documentation shortly So how does it compare I mean there's really only one major other piece of software like fast AI That is something that tries to make deep learning Easy to use and that's Keras Keras is a really terrific piece of software We actually used it for the previous courses until we switched to fast AI It runs on top of TensorFlow It was kind of the gold standard for making big learning easy to use before but life is much easier with fast AI So if you look for example at the last year's course Exercise which is getting dogs versus cats Fast AI lets you get more much more accurate less than half the error on a validation set of course training time is less than half the time Lines of code is about a six of the lines of code and the lines of code More important than you might realize because those 31 lines of Keras code involve you making a lot of decisions Setting lots of parameters doing lots of configuration.

So that's all stuff where you have to know How to set those things to get kind of best practice results or else these five lines of code Any time we know what to do for you. We do it for you any time. We can pick a good default we pick it for you okay, so Hopefully you'll find this a really useful Library not just for learning deep learning but for taking it a very long way.

How far can you take it? Well, as you'll see all of the research that we do at fast AI Uses the library and an example of the research we did which was recently featured in wired describes a new Breakthrough in a natural language process processing which people are calling the image net moment Which is basically we broken use date-of-the-art result in text classification Which open AI then built on top of our paper to do with more compute and more data and some different tasks to take it even further Like this is an example of something that we've done in the last six months in conjunction actually with my colleagues Sebastian Ruder Example of something that's being built in the fast AI library and you're going to learn how to use this brand-new model in Three lessons time and you're actually going to get this exact result from this exact paper yourself Another example one of our alums Hamel Hussein Who you'll come across on the forum plenty because he's a great guy very active built a new system for natural language semantic code search You can find it on github Where you can actually type in English sentences and find snippets of code that do the thing you ask for and again?

It's being built with the fast AI library using the techniques. You'll be learning in the next seven weeks in production Yeah, well, it's I think at this stage. It's a part of their experiments platform. So it's kind of pre-production I guess And so the best place to learn about these things and get involved in these things is on the forums Where as well as categories for each part of the course, and there's also a general category for deep learning where people talk about Research papers applications, so on and so forth so Even though today We're kind of going to focus on a small number of lines of code to do a particular thing, which is image classification And we're not learning much math or theory or whatever over these seven weeks and then part two another seven weeks We're going to go deeper and deeper and deeper.

And so where can that take you? I want to give you some examples That there is Sarah hooker. She did our first course a couple of years ago her background was Economics didn't have a background in coding math computer science. I think she started learning to code two years before she took our course She helped develop something at she started a nonprofit called Delta Analytics They helped build this amazing system where they attached old mobile phones to trees in the Kenyan rainforests and Used it to listen for chainsaw noises And then they use deep learning to figure out when there was a chainsaw being used and then they had a system set up to Alert ranges to go out and stop a legal deforestation in the rainforests So that was something that she was doing while she was in the course as part of her kind of class projects What's she doing now?

She is now a Google brain Researcher which I guess is one of the top if not the top place to do deep learning She's just been publishing some papers Now she is going to Africa to set up Google brains first deep learning AI research center in Africa now Say like she worked her ass off, you know, she really really invested in this course Not just doing all of the assignments but also going out and reading Ian Goodfellow's book and doing lots of other things But it really shows Where somebody who has no computer science or math background at all?

Can be now one of the world's top deep learning researchers and doing very valuable work Another example from our most recent course Christine Payne she Is now at OpenAI? And you can find her post and actually listen to her music samples of she actually built something to do Automatically create chamber music compositions you can play and you can listen to online And so again, it's her background math and computer science Actually, that's her there classical pianist Now I will say she's not your average classical pianist She's a classical pianist who also has a master's in medical research from Stanford and studied neuroscience and was a high-performance computing Expert at the E.

Shore and was valedictorian at Princeton. Anyway, she, you know, very annoying person good at everything she does But you know, I think it's really cool to see how a kind of a domain expert in this case the domain of playing piano can go through the fast AI course and Come out the other end at I guess open AI would be You know of the three top research institutes Google Blaine or open AI would be two of them probably along with deepland And interestingly actually one of our other students or should say alumni of the course recently interviewed Her for a blog post series he's doing on top AI researchers And she said one of the most important pieces of advice she got was from me and she said the piece of advice was pick one Project do it really well make it fantastic Okay, so that was the piece of advice She found the most useful and we're going to be talking a lot about you doing projects and making them fantastic during this course Having said that I don't really want you to go to AI or Google brain What I really want you to do is go back to your workplace or your passion project and apply these skills There, right?

Let me give you an example MIT Released a deep learning course and they highlighted in their announcement for this deep learning course this medical imaging example and One of our students Alex who is a radiologist? said You guys just showed a model overfitting I can tell because I'm a radiologist and this is not What this would look like?

on a chest film This is what it should look like and this as a deep-learning practitioner. This is how I know This is what happened in your model. So Alex is combining his knowledge of radiology and his knowledge of deep learning to assess MIT's model from just two images very accurately All right And so this is actually what I want most of you to be doing is to take your domain expertise and combine it with the deep learning Practical aspects that you'll learn in this course and bring them together like Alex is doing here and so a lot of radiologists have actually gone through this course now and Have built journal clubs and American Council of Radiology practice groups There's a data science institute at the ACR now and so forth and Alex is one of the people who's providing kind of a lot Of leadership in this area I would love for you to do the same kind of thing that Alex is doing which is to really bring Deep learning later leadership into your industry and just your social impact project, whatever it is that you're trying to do So another great example was this was Melissa Fabros who was a English literature PhD He's just studied like gendered language in English literature or something and actually Rachel did a previous job taught her to code I think and then she came into the fast AI course and she helped Kiva a micro lending social impact organization to build a system that can recognize Faces why is that necessary?

Well, we're going to be talking a lot about this but because Most AI researchers are white men most computer vision software Can only recognize white male faces effectively in fact? I think it was IBM system is like ninety nine point eight percent accurate on common white face men versus 60% accurate 65% accurate on dark face dark-skinned women So it's like what is that like 30 or 40 times worse?

For black women versus white men and this is really important because for Kiva Black women are you know, perhaps the most common user base for their micro lending platform So Melissa after taking our course and again working her ass off and being super intense in her study and her work Won this one million dollar AI challenge for her work for Kiva Karthik did our course and realized the thing he wanted to do wasn't at his company It was something else which is to help blind people to understand the world around them.

So he started a new startup You can find it now. It's called envision you can download the app you can point your phone at things and it will tell you What it sees? And I actually talked to a blind lady about these kinds of apps the other day and she confirmed to me This is a super useful thing for visually disabled users And It's not it's the level that you can get to with with The content that you're going to get over these seven weeks and with this software Can get you right to the cutting edge in areas.

You might find surprising For example, I helped a team of some of our students and some collaborators On actually breaking the world record for training remember I mentioned the image net data set lots of people want to train on the image net data set We smashed the world record for how quickly you can train it We use standard AWS Cloud infrastructure cost of $40 of compute to train this model Using again fast AI library the techniques that we learn in this course So it can really take you a long way.

So don't be kind of put off by this What might seem pretty simple at first? We're going to get deeper and deeper You can also use it for other kinds of passion project So Helena Sarin actually you should definitely check out her Twitter account like a lister This art is a basically a new style of art that she's developed Which combines her painting and drawing with generative adversarial models to create these extraordinary?

Results and so I think this is super cool. She's not a professional artist. She is a professional software developer But she just keeps on producing these beautiful results and when she started you know her Her art had not really been shown anywhere or discussed anywhere now There's recently been some quite high-profile articles describing how she is creating a new form of art again.

This has come out of the fast AI Course that she developed these skills or equally important Brad Kentsler who figured out how to make a picture of Kanye out of pictures Of Patrick Stewart's head also something you will learn to do if you wish to This particular style this particular type of what's called style transfer was a really interesting tweak that allowed him to do some things It hadn't quite been done before And this particular picture helped him to get a job as a deep learning specialist at AWS.

So there you go Another interesting example another alumni actually worked at Splunk as a software engineer and He'd signed an algorithm after like lesson 3 which basically turned out at Splunk to be fantastically good at identifying fraud We'll talk more about it shortly If you've seen Silicon Valley the HBO series the the hot dog not hot dog app That's actually a real app you can download and it was actually built by Tim on blade as a fast AI student project So there's a lot of cool stuff that you can do I'm like, yes, it wasn't any nominated.

So I think we only have one any nominated deep fast AI alumni at this stage So please help change that All right The other thing, you know is is the forum threads can kind of turn into these really cool things So Francisco who's actually here in the audience. He's a really Boring McKinsey consultant like me that's a Francisco and I both have this shameful past that we were McKinsey consultants but we left and we're okay now and He started this thread saying like oh this stuff.

We've just been learning about Building NLP and different languages. Let's try and do lots of different languages We started this thing called the language model zoo and out of that. There's now been an academic Competition was one in Polish that led to an academic paper tie state-of-the-art German state-of-the-art basically students have been coming up with new state-of-the-art results across lots of different languages And this all is entirely being done by students working together through the forum.

So please Get on the forum but Don't be intimidated because remember a lot of the you know Everybody you see on the forum the vast majority posting post all the damn time, right? They've been doing this a lot and they do it a lot of the time and so at first it can feel intimidating Because it can feel like you're the only new person there But you're not that all of you people in the audience everybody who's watching everybody who's listening you're all new people that and so when you just get out there and say like Okay, all you people getting new state-of-the-art results in German language modeling.

I Can't start my server. I try to click the notebook and I get an error What do I do people will help you? Okay, just make sure you provide all the information. This is the you know, I'm using paper space This was a particular instance. I tried to use here's a screenshot of my error People will help you.

Okay, or if you've got something to add so if people were talking about Crop yield analysis and you're a farmer and you think you know, oh I've got something to add Please mention it even even if you're not sure it's exactly relevant. It's fine, you know, just get involved And because remember everybody else from the forum started out Also intimidated.

All right, we all start out Not knowing things and so just get out there and try it Okay, so Let's get back and do some more coding Yes, Richard we have some questions There's a question from earlier about why you're using resnet as opposed to inception So The question is about this architecture So there are lots of architectures to choose from And it would be fair to say there isn't One best one but if you look at things like the Stanford Dawn Bench benchmark Or image net classification you'll see in first place in second place in third place and fourth place is fast AI Jeremy had in fast AI Jeremy had first AI euros clouds from the Department of Defense innovation team Google resnet resnet resnet resnet resnet resnet.

It's good enough Okay, so it's fun There are other architectures the main reason you might want a different architecture is if you want to do edge computing So if you want to create a model that's going to sit on somebody's mobile phone Having said that even there most of the time I reckon the best way to get a model onto somebody's mobile phone is to run it on your server and Then have your mobile phone app talk to it It really makes life a lot easier and you get a lot more flexibility But if you really do need to run something on a low-powered device, then there are some special architectures for that So the particular question was about inception That's a particular another architecture which tends to be pretty memory intensive and Yeah, resnet So inception tends to be pretty memory intensive, but it's okay.

It's also like It's not terribly resilient. One of the things we try to show you is like stuff which just tends to always work Even if you don't quite true and everything perfectly So resnet tends to work pretty well across a wide range of different Kind of details around choices that you might make so I think it's pretty good So we've got this train model and so what's actually happened as we'll learn is it's basically Creating a set of weights if you've ever done anything like linear regression Or logistic regression you'll be familiar with coefficients.

We basically found some coefficients and parameters that work pretty well And it took us a minute and 56 seconds So if we want to start doing some more playing around and come back later We probably should save those weights so we can save that minute and 56 seconds So you can just go learn dot save and give it a name.

It's going to put it In a model sub directory in the same place the data came from so if you save different models or different data bunches from different Datasets, they'll all be kept separate. So don't worry about it All right, so we talked about how the most important things are how to learn what goes into your model what comes out We've seen one way of seeing what goes in now.

Let's see what comes out As this is the other thing you need to get really good at so to see what comes out we can use this class for classification interpretation and We're going to use this factory method from learner. So we pass in a learn object. So remember a learn object knows two things What's your data and?

What is your model? It's now not just an architecture, but it's actually a trained model inside there And that's all the information we need to interpret that model. So it's this pass in the learner and we now have a classification interpretation object and So one of the things we can do and perhaps the most useful things to do is called plot top losses So we're going to be learning a lot about this idea of lost functions shortly But in short a lost function is something that tells you how good was your prediction and so specifically that means if you predicted one class of cat With great confidence.

You said I am very very sure that this is a Burman That actually you were wrong then then that's going to have a high loss because you were very confident about the wrong answer Okay, so that's what it basically means to have a high loss. So by parting the top losses, we're going to find out What were the things that we were the most wrong on or the most confident about what we got wrong?

So you can see here It prints out three things German short-haired before things big all 7.04 0.92 Well, what do they mean? Perhaps we should look at the documentation So if you we've already seen help right and help just prints out a quick little summary but if you want to really see how to do something use doc and doc Tells you the same information has helped, but it has this very important thing, which is show in Docs So when you click on show in Docs It pops up the documentation for that method or class or function or whatever It starts out by showing us the same information about what is what are the parameters it takes?

Along with the doc string, but then tells you more information So in this case it's another thing that tells me the title of each shows the prediction the actual the loss and the probability that was predicted So for example, and you can see there's actually some code you can run so the documentation always has working code and so in this case it was trying things with handwritten digits and So the first one it was predicted to be a 7.

It was actually a 3 the loss is 5.44 and the probability of the actual class was 0.07. Okay, so I You know, we did not have a high probability associate with the actual class I can see why I thought this was a 7 nonetheless. It was wrong. So this is the documentation okay, and so this is your friend when you're trying to figure out how to use these things the other thing I'll mention is if you're a Somewhat experienced Python programmer you'll find the source code of fast AI really easy to read We try to write everything in just a small number of you know Much less than half a screen of code generally four or five lines of code if you click source You can jump straight to the source code.

Alright, so here is The plot top losses and this is also a great way to find out How to use the faster AI library because every line of code here nearly every line of code is calling stuff in the fast AI library Okay, so don't be afraid to look at the source code I I've got another really cool trick about the documentation that you're going to see a little bit later Okay So that's how we can look at these top losses and these are perhaps the most important image classification Interpretation tool that we have because it lets us see What are we getting wrong and quite often you like in this case?

If you're a dog and cat expert, you'll realize that the things it's getting wrong Breeds that are actually very difficult to tell apart and you'd be able to look at these and say oh I can see why They've got this one wrong So this is a really useful tool Another useful tool kind of is to use something called a confusion matrix, which basically shows you for every actual type of dog or cat How many times was it predicted to be that dog or cat but unfortunately in this case because it's so accurate This diagonal basically says oh, it's pretty much right all the time and you can see there's some slightly darker ones like a five here But it's really hard to read exactly what that combination is So what I suggest you use is instead of if you've got lots of classes don't use a classification or confusion matrix But this is my favorite named function in fast AI.

I have very proud of this you can call most confused And most confused will simply grab out of the confusion matrix the particular Combinations have predicted an actual that got wrong the most often So in this case the Staffordshire ball terrier was what it should have predicted and instead it predicted an American pitball terrier And so forth it should have predicted a Siamese and actually predicted women that happened four times This particular combination happened six times So this is again a very useful thing because you can look and you can say like with my domain expertise Does it make sense that that would be something that was confused about?

So these are some of the kinds of tools you can use to look at the output Let's make our model better So how do we make the bottle better we can make it better using fine-tuning? So far we fitted for epochs and it ran pretty quickly And the reason it ran pretty quickly is that there was a little trick we use these deep learning models these convolutional networks They have many layers will learn a lot about exactly what layers are but for now just know it goes through compute computational computational computational What we did was we added a few extra layers to the end And we only trained those we basically left most of the model exactly as it was so that's really fast And if we try to build a model of something that's similar to the original Pre-trained model so in this case similar to the image net data that works pretty well But what we really want to do is actually go back and train the whole model So this is why we pretty much always use this two-stage process so by default When we call fit will fit one cycle on a confliner It'll just fine-tune these few extra layers added to the end and it will run very fast.

It'll basically never over fit But to really get it good you have to call unfreeze and unfreeze is the thing that says please train the whole model and Then I can call fit one cycle again, and oh The error got much worse Okay, why in order to understand why?

We're actually going to have to learn more about exactly what's going on behind the scenes So let's start out by trying to get an intuitive understanding of what's going on behind the scenes and again We're going to do it by looking at pictures We're going to start with this picture these pictures come from a fantastic paper by Matt Zyla who nowadays is CEO of clarify which is a very successful computer vision start-up and His supervisor of his PhD Rob Fergus And they created a paper showing how you can visualize the layers of a convolutional neural network So a convolutional neural network will learn mathematically about what the layers are shortly But the basic idea is that your red green and blue pixel values that are numbers from 0 to 255 go into a simple computation The first layer and something comes out of that and then the result of that goes into a second layer that goes for a third layer and so forth and There can be up to a thousand layers of a neural network ResNet 34 has 34 layers ResNet 50 has 50 layers But let's look at layer one.

There's this very simple computation. It's it's a convolution if you know what they are We'll learn more about them shortly What comes out of this first layer? Well, we can actually visualize these specific coefficients the specific parameters by drawing them as a picture There's actually a few dozen of them in the first layer, so we won't draw all of them But let's just look at mine at random.

So here are nine examples of the actual coefficients from the first layer and so these operate on groups of pixels that are next to each other and So this first one basically finds groups of pixels that have a little horror than a little diagonal line in this direction This one finds diagonal lines in the other direction this fine gradients that go from yellow to blue in this direction This one finds gradients that go from pink to green in this direction and so forth That's a very very simple little filters That's layer one of a image net pre-trained convolutional neural net Layer two takes the results of those filters and does a second layer of computation and it allows it to create so here are nine examples of Kind of a way of visualizing this one of the second layer features and you can see it's basically learned to create something that looks for Corners top left corners and This one is learned to find things that find right hand curves This one is going to find things that find little circles So you can see how layer two like this is the easiest way to see it in layer one We have things that can find just one line in layer two We can find things that have two lines joined up or one line repeated If you then look over here These nine show you nine examples of actual bits of actual photos that activated this filter a lot that's what other words this little bit of Function math function here was good at finding these kind of window corners and stuff like that This little certainly one was very good at finding bits of photos that had circles Okay, so this is the kind of stuff you've got to get a really good intuitive understanding for us lately The start of my neural net is going to find simple very simple gradients lines The second layer can find very simple shapes the third layer can find combinations of those So now we can find Repeating patterns of two-dimensional objects or we can find kind of things that lines that join together Or we can find well, what are these things?

Well, let's find out. What is this? Let's go and have a look at some bits of picture that activated this one highly. Oh Mainly they're bits of text. Although sometimes for windows, so it seems to be able to find kind of like repeated horizontal patterns and this one here says you have to find kind of edges of fluffy or flowery things This one here is kind of finding geometric patterns So layer 3 was able to take all the stuff from layer 2 and combine them together Layer 4 can take all the stuff from layer 3 and combine them together by layer 4 we've got something that can find dog faces and Let's see what else we've got here Yeah various kinds of oh here we are bird legs So you kind of get the idea and so by layer 5 we've got something that can find the eyeballs of birds and wizards or Faces of particular breeds of dogs and so forth.

So you can see how by the time you get to layer 34 You can find Specific dog breeds and cat breeds right? This is kind of how it works. So when we first Trained when we first fine-tuned that pre-trained model We kept all of these layers that you've seen so far and we just trained a few more layers on top of all of those Sophisticated features that are already being created.

Alright, and so now we're fine-tuning We're going back and saying let's change all of these. We'll keep that. We'll start with them where they are Right, but let's see if we can make them better Now it seems very unlikely that we can make these layer one features Better like it's very unlikely that the kind of the definition of a diagonal line It's going to be different when we look at dog and cat breeds versus the image net data that this is originally trained on So we don't really want to change layer one very much if at all Where else the last layers, you know this thing of like types of dog face Seems very likely that we do want to change that, right?

So you kind of want this intuition this understanding that the different layers of a neural network represents different levels of kind of semantic complexity So this is why our attempt to fine-tune this model didn't work is because we actually By default it trains all the layers at the same speed Right, which is to say it'll update those like things representing diagonal lines of gradients Just as much as it tries to update the things that represent the exact specifics of what an eyeball looks like So we have to change that and so To change it we first of all need to go back to where we were before.

Okay, we just broke this model, right? It's much worse than started out So if we just go load this brings back the model that we saved earlier. Remember we saved it as Stage one, okay, so let's go ahead and Load that back up. So that's now our models back to where it was before we killed it and Let's run Learning rate finder.

We'll learn about what that is next week But for now just know this is the thing that figures out what is the fastest I can train this neural network at? without Making it zip off the rails and get blown apart. Okay, so we can call learn dot LR find and Then we can go learn dot recorder dot plot and that will plot the result of our LR finder and what this basically shows you is this is this key parameter that we're going to learn all about of the learning rate and the Learning rate basically says how quickly am I updating the parameters in my model?

and you can see that what happens is as I think this this bottom one here shows me what happens as I increase the learning rate and This one here shows what have you know, what's the result? What's the loss? And so you can see once the learning rate gets past ten to the negative four my loss gets worse, okay, so It actually so happens.

In fact, I can check this if I press shift tab here. My learning rate defaults to Point oh, oh three. So my default learning rate is about here So you can see why our loss got worse, right because we're trying to fine-tune things now We can't use such a high learning rate So based on the learning rate finder, I tried to pick something, you know Well before it started getting worse So I decided to pick one in x6.

So I decided I'm going to train at that rate But there's no point trading all the layers at that rate because we know that the later layers worked just fine Before when we were training much more quickly again of the default which was to remind us Point oh oh three So what we can actually do is we can pass a range of learning rates to learn dot fit and we do it like this You pass and use this keyword in fact in Python.

You may have come across before it's called slice and that can take a Start value and a stock value and basically what this says is train the very first players at a learning rate of one in x6 and The very last layers at a rate of one in egg four and then kind of distribute all the other layers Across that, you know between those two values equally So we're going to see that in a lot more detail.

Basically for now This is kind of a good rule of thumb is to say when you after you unfreeze So this is the thing that's going to train the whole thing Pass a max learning rate parameter pass it a slice Make the second part of that slice about ten times smaller than your first stage So our first stage defaulted to about one in egg three So let's use about one in egg four and then this one should be a value from your learning rate finder Which is well before things started getting worse and you can see things are starting to get worse Maybe about here.

So I picked something that's at least ten times more than that So if I do that, then I get point. Oh five seven eight eight so Don't quite remember what we got before Yeah a bit better, right? So we've gone down from a six point one percent to a five point seven percent So that's about a ten percentage point relative improvement with another 58 seconds of training.

So I Would perhaps say for most people most of the time these two stages are enough to get Pretty much a world-class model You won't win a Kaggle competition particularly because now a lot of fast AI alumni are competing on Kaggle and this is the first thing that they do But you know in practice you'll get something that's you know about as good in practice as the vast majority of practitioners can do We can improve it by using more layers and we'll do this next week by basically doing a resnet 50 instead of a resnet 34 And you can try running this during the week if you want to you'll see it's exactly the same as before But I'm using resnet 50 instead of resnet 34 What you'll find is it's very likely if you try to do this, you will get an error And the error will be your GPU is run out of memory and the reason for that is that resnet 50 is bigger than resnet 34 and Therefore it has more parameters and therefore it uses more of your graphics cards memory Just totally separate to your normal computer RAM.

This is GPU RAM if you're using the kind of default salamander AWS And so forth suggestion then you'll be having a 16 gig of GPU memory the part I use most the time has 11 gig of GPU memory The cheaper ones have 8 gig of GPU memory That's kind of the main range you tend to get if yours has less than 8 gig of GPU memory It's going to be frustrating for you Anyway, so you'll be somewhere around there And it's very likely that we try to run this you'll get an out-of-memory memory error And that's because it's just trying to do too much too many parameter updates for the amount of RAM you have And that's easily fixed this image data bunch constructor Has a parameter at the end Batch size BS for batch size and this basically says how many images do you train at one time?

If you run out of memory, just make it smaller. Okay, so this worked for me on an 11 gig card It probably won't work for you if you've got an 8 gig card if you do just make that 32 It's fine to use a smaller batch size it just it might take a little bit longer That's all okay.

If you've got a bigger like a 16 gig you might be able to get away with 64 Okay, so that's just one number you'll need to try during the week and again, we fit it for a while and We get down to a four point four percent area So this is pretty extraordinary.

You know I was pretty surprised because I mean When we first did in the first course just cats versus dogs. We were kind of getting Somewhere around a three percent error for something where you've got a 50% chance of being right and the two things look totally different so the fact that we can get a four point four percent error for something for such a Fine grain thing it's quite extraordinary In this case I am frozen fitted a little bit more went from four point four to four point three five tiny improvement Basically resonant 50 is already a pretty good model It's interesting because again you can call most confused here, and you can see the kinds of things that it's Getting wrong, and I actually depending on when you run it.

You're going to get slightly different numbers, but you'll get roughly the same kinds of things So quite often I find that ragdoll and burman are things that it gets confused And I actually have never heard of either of those things so I actually looked them up on the internet and I Found a page on the cat site called is this a burman or ragdoll and there is a long thread of cat experts like Arguing intensely about which it is so I feel fine that my computer had problems I Found something similar.

I think what's this pit wall versus Staffordshire ball terrier apparently the main difference is like the particular kennel club Guidelines as to how they are assessed, but some people think that one of them might have a slightly red in those So this is the kind of stuff where actually even if you're not a domain expert It helps you become one right because I now know More about which kinds of pet breeds are hard to identify than I used to So model interpretation works both ways.

So what I want you to do this week is to run This notebook, you know, make sure you can get through it but then what I really want you to do is to get your own image data set and actually Francisco who I mentioned earlier he started the language to model thread and he's you know Now helping to TA the course.

He's actually putting together a guy that will show you how to download data From Google images so you can create your own data set to play with but before I do I want to Okay, I'll come back to that moment Before I do I want to show you Because how to create labels in lots of different ways because your data set wherever you get it from won't necessarily Be that kind of regex based approach.

It could be in lots of different formats So just showing you how to do this. I'm going to use the MNIST sample. MNIST is pictures of hand-drawn numbers Just because I want to show you different ways of Creating these data sets The The MNIST sample Basically looks like this so I can go path.ls And you can see it's got a training set and a validation set already So basically the people that put together this data set have already decided what they want you to use as a validation set Okay, so if we go path slash train You'll see there's a folder called three and a folder called seven That's just really really common way to just to give people labels.

It's basically to say oh everything. That's a three I'll put in a folder called three everything. That's a seven. I'll put in a folder called seven That this is a often called an image net style data set because this is how image net is distributed So if you have something in this format where the labels are just whatever the folders called you can say from folder Okay, and that will create an image data bunch for you.

And as you can see three seven It's created the labels just by using the folder names Another possibility and as you can see we can train that get ninety nine point five five percent accuracy blah blah blah Another possibility and for this endless sample. I've got both it might come with a CSV file That would look something like this for each file name.

What's its label now this case the labels are three or seven There's zero or one which is basically is it a seven or not? All right, so that's another possibility. So if this is how your labels are you can use from CSV And if it's called labels dot CSV, you don't even have to pass in a file name if it's called anything else Then you can call pass in the CSV labels Okay, so that's how you can use a CSV.

Okay. There it is. This is now is it a seven or not? Another possibility and then you can call data dot passes to see what it found another possibility is as we've seen is you've got paths that look like this And so in this case, this is the same thing.

These are the folders right, I could actually grab the The label by using a regular expression and so here's the regular expression So we've already seen that approach and again, you can see data dot classes has found it So what if you it's something that's in the file name of a path, but it's not just a regular expression.

It's more complex You can create an arbitrary function that extracts a label from the file name or path And in that case you would say from name and function and Another possibility Is that even you need something even more flexible than that? And so you're going to write some code to create an array of labels and so in that case you can just pass in From lists.

So here is I've created an array of labels here. My labels is from lists Okay, and then I just pass in that break so you can see there's lots of different ways of creating labels. So so during the week Try this out now. You might be wondering How would you know to do all these things?

Like where am I going to find? This kind of information, right? How would I how do you possibly know to do all this stuff? So I'll show you something incredibly cool. Let's grab this function and Do you remember to get documentation we type doc? And here is the documentation for the function and I can click show in dots and It pops up the documentation So here's the thing Every single line of code I just showed you I took it this morning and I copied and pasted it from the documentation So you can see here the exact Code that I just used so the documentation for fast AI doesn't just tell you what to do, but step to step how to do it and Here is perhaps the coolest bit if you go to fast AI I Fast AI underscore Doc's and click on Doc's sauce It turns out that all of our documentation is actually just do better notebooks.

So in this case I was looking at vision dot data So here is the vision dot data notebook you can download this repo you can get clone on and If you run it, you can actually run Every single line of the documentation yourself Okay, so so all of our docs is also code and so like this is the kind of the ultimate example to me of Of experimenting right is that You can now experiment and you'll see in in github It doesn't quite render properly because github doesn't quite know how to render notebooks properly But if you get clone this and open it up in Jupiter You can see it and so now Anything that you read about in the documentation Really everything in the documentation has actual working examples in it with actual data sets that are already sitting in there in the repo For you and so you can actually try every single function in your browser Try seeing what goes in and try seeing what comes out There's a question Will the library use multi GPU and parallel by default?

The library will use multiple CPUs by default, but just one GPU by default We've probably won't be looking at multi GPU until part two. It's easy to do and you'll find it on the forum, but Most people won't be needing to use that now And the second question is whether the library can use 3d data such as MRI or Yes, it can And there is actually a forum thread about that already Although that's not as developed as 2d yet, but maybe by the time the MOOC is out it will be So before I wrap up I'll just show you an example of the kind of interesting stuff that you can do by Doing this kind of exercise Remember earlier I mentioned that one of our alums who works at Splunk.

It's just a Nasdaq listed big successful company Created this new anti fraud software This is actually how he created it as part of a fast AI part one class project He took the telemetry of the new of users who had Splunk analytics installed and watched their mouse movements And he created pictures of the mouse movements.

He converted speed into Color and right and left clicks into splodges he then took the exact code that we saw with an earlier version of the software and Trained a CNN in exactly the way we saw and used that to train his fraud model So he basically took something which is not obviously a picture and he turned it into a picture And got these fantastically good results for a piece of fraud analysis software.

So it pays to think Creatively, so if you're wanting to study sounds a lot of people that study sounds do it by actually creating a spectrogram image and Then sticking that into a confident. So there's a lot of cool stuff you can do with this So during the week, yeah Get your get your GPU going try and use your first notebook Make sure that you can use lesson one and work through it and then see if you can repeat the process on your own data Set get on the forum and tell us any little success you had.

It's like, oh, I spent three days trying to get my GPU running And I finally did any constraints that you hit You know try it for an hour or two, but if you get stuck, please ask And if you're able to successfully build a model with a new data set Let us know and I will see you next week *Laughter*