Lesson 1: Deep Learning 2019 - Image classification

00:00:00.000 | Okay

00:00:02.000 | so

00:00:04.000 | Welcome

00:00:06.200 | Practical deep learning for coders lesson one

00:00:08.920 | it's kind of lesson two because

00:00:11.560 | There's a lesson zero and lesson zero is is why do you need a GPU and how do you get it set up?

00:00:17.080 | So if you haven't got a GPU running yet

00:00:20.080 | then go back and do that make sure that you can access a

00:00:25.180 | Jupiter notebook and

00:00:28.040 | And then you're ready to start the real lesson one. So if you're ready

00:00:32.220 | you will be able to see something like this and

00:00:37.080 | In particular, hopefully you have gone to notebook tutorial. It's at the top

00:00:42.120 | That's right, but a 0/0 here as this grows. You'll see more and more files, but we'll keep notebook tutorial at the top and

00:00:47.960 | You will have used your

00:00:50.920 | Jupiter notebook to add one and one together get in the expected result

00:00:55.800 | Let's make that a bit bigger

00:01:01.800 | And hopefully you've learned these four keyboard shortcuts

00:01:06.520 | so

00:01:08.600 | The basic idea is that your Jupiter notebook

00:01:11.840 | Has pros in it. It can have pictures in it. It can have

00:01:19.240 | Charts in it

00:01:23.120 | And most importantly it can have code in it. Okay, so the code is in Python

00:01:29.520 | How many people have used Python before so nearly all of you that's great

00:01:35.920 | Um, if you haven't used Python, that's totally okay. All right

00:01:40.480 | It's a pretty easy language to pick up. But if you haven't used Python

00:01:44.680 | This will feel a little bit more intimidating because the code that you're seeing will be unfamiliar to you. Yes, Rachel

00:01:53.360 | Oh

00:01:55.360 | No, cuz I

00:02:00.000 | Try to keep the most sacred. Yeah. Yeah. Okay. We're not the way here. I'll edit this bit out

00:02:04.240 | So as I say, there are things like this where people in the room in person

00:02:10.240 | This is one of those bits just like this is really for the MOOC audience

00:02:13.520 | Not for you. That's I think this will be the only time like this in the in the lesson where we've assumed

00:02:21.000 | You've got this set up

00:02:22.880 | Thanks to the mother. Okay

00:02:24.880 | All right, so yeah for those of you in the room or on for or in faster you're live

00:02:29.740 | You can go back after this and make sure that you can get this running using the information in course the three dot faster

00:02:37.420 | They I

00:02:39.420 | Okay, okay

00:02:43.020 | Okay, so a Jupyter notebook is

00:02:49.600 | a

00:02:51.440 | Really interesting

00:02:53.440 | Device for a data scientist because it kind of lets you

00:02:57.000 | run interactive experiments and it lets ask if you not just a

00:03:03.180 | Static piece of information, but it let it let's ask if you something that you can actually

00:03:09.080 | interactively experiment with

00:03:12.640 | so

00:03:14.880 | let me explain how we

00:03:17.840 | Think works well to use these notebooks and to use this material and this is based on the kind of last three years of experience

00:03:24.140 | We've had with the students who have gone through this course

00:03:27.020 | First of all, it works pretty well just to watch a lesson end-to-end

00:03:33.960 | Don't try and follow along because it's not really designed to go at a speed where you can follow along

00:03:40.840 | It's designed to be something where you just take in the information you get a general sense of all of the pieces how it all

00:03:46.560 | Fits together, right?

00:03:48.320 | And then you can go back and go through it

00:03:50.940 | most slowly

00:03:53.280 | pausing on in the video

00:03:55.280 | And trying things out making sure that you can do the things that I'm doing

00:03:59.660 | And that you can try and extend them to do it things in your own way. Okay, so don't worry if

00:04:07.560 | Things are zipping along

00:04:10.280 | Faster than you can do them. That's normal. Also don't try and stop and understand everything the first time if you do understand

00:04:18.240 | Everything the first time good for you

00:04:20.240 | But most people don't particularly as the lessons go on they get faster and they get more difficult. Okay

00:04:27.260 | So at this point we've got our notebooks going we're ready to start doing deep learning

00:04:35.680 | And so the main thing that hopefully you're going to agree at the end of this is that you?

00:04:40.740 | Can do deep learning regardless of who you are and we don't just mean do we mean do at a very?

00:04:47.000 | High level I mean world-class practitioner level deep learning

00:04:51.600 | So

00:04:54.840 | Your main place to be looking for things is course the three dot fast dot AI

00:04:59.740 | Where you can find out how to get a GPU

00:05:04.280 | Other information and you can also access our forums

00:05:09.280 | You can also access our forums and on our forums you'll find things like how do you build a

00:05:18.400 | Deep learning box yourself and that's something that you can do up, you know later on once you've kind of got going

00:05:26.160 | Who am I?

00:05:29.760 | So why should you listen to me? Well, maybe you shouldn't but I'll try and justify why you should listen to me

00:05:36.320 | I've been doing stuff with machine learning for over 25 years. I

00:05:42.160 | Started out in management consulting where actually initially I was I think McKinsey and companies first analytical specialist and went into a general consulting

00:05:50.920 | Ran a number of startups for a long time

00:05:54.320 | Eventually became the president of Kaggle, but actually the thing I'm probably most proud of in my life

00:06:00.160 | Is that I got to be the number one ranked contestant in Kaggle competitions globally?

00:06:05.260 | So I think that's a good

00:06:08.640 | Practical like can you actually train a predictive model that predicts things pretty important aspect of data science?

00:06:16.040 | I then founded a company called in Linux, which was the first kind of medical deep learning company

00:06:24.240 | nowadays, I'm on the faculty at University of San Francisco and also co-founder with Rachel of fast AI

00:06:30.440 | So I've used

00:06:33.480 | Machine learning throughout that time and I guess I'm not really although I am at USF for the university

00:06:40.280 | I'm not really an academic type. I'm much more interested in in using this tool

00:06:44.840 | to do useful things

00:06:48.360 | Specifically through fast AI we are trying to help people use deep learning to do useful things through

00:06:54.800 | creating software

00:06:56.800 | To make deep learning easier to use at a very high level through education such as the thing you're watching now

00:07:03.360 | Through research which is where we spend a very large amount of our time, which is researching to figure out

00:07:08.980 | How can you make deep learning easier to use at a very high level?

00:07:12.720 | Which ends up in as you'll see in the software and the education and by helping to build a community?

00:07:18.200 | Which is mainly through the forums so that practitioners can find each other and work together

00:07:23.280 | So that's what we're doing

00:07:26.080 | So this lesson practical deep learning for coders is kind of the starting point in this journey

00:07:31.140 | It contains seven lessons each one's about two hours long

00:07:35.120 | We're then expecting you to do about eight to ten hours of homework during the week

00:07:39.480 | So it'll end up being something around 70 or 80 hours of work

00:07:43.840 | I will say there is a lot as to how much people put into this

00:07:48.000 | I know a lot of people who who work full-time on fast AI

00:07:52.120 | Some folks whose do the two parts can spend a whole year doing it really intensively. I know some folks

00:07:59.700 | Watch the videos on double speed and never do any homework and come at the end of it with you know

00:08:04.440 | A general sense of what's going on. So there's lots of different ways you can do this

00:08:07.560 | but if you follow along with this kind of

00:08:10.400 | Ten hours a week or so approach for the seven weeks by the end you will be able to build an image classification

00:08:16.080 | Model on pictures that you choose that will work at a world-class level

00:08:21.640 | You'll be able to classify text again using whatever data sets you're interested in

00:08:28.080 | You'll be able to make predictions of kind of commercial applications like sales

00:08:33.340 | You'll be able to build recommendation systems such as the one used by Netflix

00:08:38.640 | Not toy examples of any of these but actually things that can

00:08:42.360 | Come top ten and capital competitions that can be everything that's in the academic community

00:08:47.480 | Very very high-level versions of these things. So that might surprise you. That's like, you know, the prerequisite here is

00:08:54.340 | Literally one year of coding and high school math

00:08:59.120 | But we have

00:09:01.320 | Thousands of students now who have done this and shown it to be true

00:09:05.920 | You will probably hear a lot of naysayers

00:09:08.960 | Less now than a couple of years ago than we started but a lot of naysayers telling you that you can't do it

00:09:14.400 | Or that you shouldn't be doing it or the deep learnings got all these problems

00:09:18.720 | It's not perfect. But these are all things that people claim about

00:09:22.840 | Deep learning which are either pointless or untrue

00:09:27.640 | It's not a black box as you'll see it's really great for interpret interpreting what's going on

00:09:34.200 | It does not need much data for most practical applications. You certainly don't need a PhD

00:09:39.960 | Rachel has one so it doesn't actually stop you from doing deep learning if you have a PhD

00:09:44.440 | I certainly don't I have a philosophy degree and nothing else

00:09:47.800 | It could be used very widely for lots of different applications not just for vision which is where it's most well known

00:09:55.320 | You don't need lots of hardware, you know

00:09:58.100 | That's a thirty six cents an hour server is more than enough to get world-class results for most problems

00:10:03.920 | It's true that maybe this is not going to help you to build a sentient brain

00:10:09.040 | But that's not our focus. Okay, so

00:10:12.480 | For all the people who say deep learning is not interesting because it's not really AI

00:10:16.560 | Not really a conversation that I'm interested in we're focused on solving

00:10:20.880 | interesting real-world problems

00:10:23.960 | What are you going to be able to do by the end of lesson one?

00:10:26.840 | Well, this was an example from Nicole who's actually in the audience now because he was in last year's course as well

00:10:33.880 | this is an example of something he did which is he downloaded 30 images of

00:10:37.840 | people playing cricket and people playing baseball and ran the code you'll see today and built a

00:10:44.200 | Nearly perfect classifier of which is which

00:10:47.600 | So this kind of it's kind of stuff that you can build with some fun hobby examples like this

00:10:52.760 | Or you can try stuff as we'll see in the workplace that it could be of direct commercial value

00:10:57.920 | So this is the idea of where we're going to get to by the end of lesson one

00:11:02.880 | We're going to start

00:11:04.600 | by looking at code

00:11:06.600 | Which is very different to?

00:11:09.200 | Many of the academic courses so for those of you who haven't kind of an engineering or math or computer science background

00:11:15.800 | This is very different to the approach where you start with lots and lots of theory and eventually you get to a postgraduate degree

00:11:22.280 | And you finally are at the point where you can build something useful. We're going to learn to build the useful thing today

00:11:26.840 | Now that means that at the end of today

00:11:30.040 | You won't know all the theory. Okay, there will be lots of aspects of what we do that

00:11:34.920 | You don't know why or how it works. That's okay. You will learn why and how it works over the next seven weeks

00:11:42.480 | But for now we found that what works really well is to actually get your hands dirty coding

00:11:49.560 | not focusing on theory because

00:11:53.000 | There's still a lot of

00:11:55.880 | Artisanship in deep learning. Unfortunately, it's still a situation where people who are good practitioners

00:12:01.560 | Have a really good feel for how to work with the code and how to work with the data and you can only get that through experience

00:12:09.840 | And so the best way to get that that that feel of how to get good models is to create lots of models

00:12:16.400 | through lots of coding and

00:12:18.640 | Study them carefully and it's Jupiter notebook provides a really great way to study them. So

00:12:25.640 | Let's try that

00:12:27.240 | Let's try getting started. So to get started you will open your

00:12:31.240 | Jupiter notebook and

00:12:33.880 | You'll click on lesson one

00:12:36.400 | Lesson one pets and it will pop open looking something like this. And so here it is so you can

00:12:43.680 | Run a cell in a Jupiter notebook by clicking on it and pressing run

00:12:50.880 | but if you do so everybody will know that you're not a real deep learning practitioner because real deep learning practitioners know the keyboard shortcuts and

00:12:57.440 | The keyboard shortcut is shift enter given how often you have to run a cell

00:13:02.680 | Don't be

00:13:04.960 | Going all the way up here finding it clicking it just shift enter. Okay, so type type type shift enter type

00:13:09.400 | Type shift enter up and down to move around to pick something to run shift enter to run it

00:13:15.840 | So we're going to go through this quickly and then later on we're going to go back over it more carefully

00:13:22.600 | So here's the quick version to get a sense of what's going on

00:13:25.040 | So here we are in lesson one and these three lines is what we start every notebook with

00:13:32.660 | These things starting with percent are special directives to Jupiter notebook itself. They're not Python code. They're called magics

00:13:40.360 | Just kind of a cool name and these three directives the details aren't very important

00:13:45.000 | But basically it says hey if somebody changes the underlying library code while I'm running this

00:13:49.960 | Please reload it automatically if somebody asks to plot something then please plot it here in this Jupiter notebook

00:13:56.640 | So just put those three lines at the top of everything

00:13:59.440 | The next two lines load up the fast AI library

00:14:05.140 | What is the fast AI library?

00:14:08.000 | So it's a little bit confusing fast AI with no dot is the name of our software and then fast dot AI

00:14:14.160 | With the dot is the name of our organization

00:14:16.160 | So if you go to docs dot fast dot AI

00:14:19.480 | This is the fast AI

00:14:22.240 | Library, okay, we'll learn more about it in a moment

00:14:25.600 | But for now just realize everything we are going to do is going to be using basically either fast AI

00:14:31.760 | Or the thing that fast AI sits on top of which is pytorch

00:14:36.920 | Pytorch is one of the most popular

00:14:39.960 | libraries for deep learning in the world

00:14:43.480 | It's a bit newer than TensorFlow. So in a lot of ways, it's more modern than TensorFlow

00:14:49.760 | It's

00:14:53.640 | Extremely fast-growing extremely popular and we use it because we used to use TensorFlow a couple of years ago

00:14:59.480 | And we found we can just do a lot more a lot more quickly with pytorch

00:15:04.080 | And then we have this software that sits on top of pytorch unless you do

00:15:09.880 | Far far far more things far far far more easily than you can with pytorch alone

00:15:14.200 | So it's a good combination. We'll be talking a lot about it. But for now, just know that you can use fast AI by doing two things

00:15:21.040 | importing star from fast AI and then importing star from fast AI dot

00:15:27.400 | Something where something is the application you want and currently fast AI supports four applications computer vision

00:15:34.360 | natural language text

00:15:37.600 | Tabular data and collaborative filtering and we're going to see lots of examples of all of those during the seven weeks

00:15:43.680 | So we're going to be doing some computer vision

00:15:45.680 | At this point if you are a Python software engineer, you are probably

00:15:50.600 | Feeling sick because you've seen me go import star, which is something that you've all been told to never ever do

00:15:58.320 | Okay, and there's very good reasons to not use import star in standard production code with most libraries

00:16:06.320 | But you might have also seen for those of you that have used something like Matlab

00:16:09.680 | It's kind of the opposite everything's there for you all the time. You don't even have to import things a lot of the time

00:16:14.900 | It's kind of funny. We've got these two extremes of like how do I code you've got a scientific

00:16:20.760 | Programming community that has one way and then you've got the software engineering community that has the other

00:16:26.280 | Both have really good reasons for doing things and with the fast AI library. We actually support both approaches

00:16:33.440 | Indeed you put a notebook where you want to be able to quickly interactively try stuff out

00:16:38.160 | You don't want to be constantly going back up to the top and importing more stuff and trying to figure out where things are

00:16:43.200 | You want to be able to use lots of tab complete be you know, very experimental. So import star is great

00:16:49.240 | Then when you're building stuff in production

00:16:51.880 | You can do the normal Pepe style, you know proper software engineering practices. So

00:16:58.560 | So don't worry

00:17:01.200 | When you see me doing stuff which at your workplace is found upon, okay, it's it's this is a different style of coding

00:17:08.080 | It's not that

00:17:09.240 | There are no rules in data science programming

00:17:11.840 | It's that the rules are different right when you're training models

00:17:15.120 | The most important thing is to be able to interactively experiment quickly. Okay, so you'll see we use a lot of very different

00:17:22.880 | Processes styles and stuff to what you're used to but they're there for a reason

00:17:28.280 | And you'll learn about them over time. You can choose to use a similar approach or not. It's entirely up to you

00:17:34.120 | The other thing to mention is that the fast AI libraries

00:17:38.080 | In a real designed in a very interesting modular way and you'll find over time that when you do use import star

00:17:45.080 | There's far less clobbering of things than you might expect

00:17:48.280 | It's all explicitly designed to allow you to pull in things and use them quickly without having problems

00:17:55.320 | Okay, so we're going to look at some data and

00:17:59.480 | There's two main places that will be tending to get data from for the course one is from academic data sets

00:18:06.880 | Academic data sets are really important. They're really interesting

00:18:10.960 | They're things where academics spend a lot of time

00:18:13.440 | Curating and gathering a data set so that they can show how well different kinds of approaches work with that data

00:18:19.240 | The idea is they try to design data sets that are

00:18:23.000 | Challenging in some way and require some kind of breakthrough to do them. Well

00:18:26.640 | So we're going to be starting with an academic data set called the pet data set

00:18:30.840 | The other kind of data set will be using during the course is data sets from the Kaggle competitions platform

00:18:37.040 | Both academic data sets and Kaggle data sets are interesting for us

00:18:41.680 | Particularly because they provide strong baselines that is to say you want to know if you're doing a good job

00:18:48.120 | so with Kaggle data sets that have come from a competition you can actually submit your results to Kaggle and see how well would

00:18:55.480 | You have gone in that competition and if you can get in about the top 10% then I'd say you're doing

00:19:00.580 | pretty well

00:19:02.800 | for academic data sets

00:19:04.800 | Academics write down in papers what the state-of-the-art is so how well did they go with using models on that data set?

00:19:11.160 | So this is this is what we're going to do. We're going to try and

00:19:14.000 | create

00:19:15.880 | Models that get right up towards the top of Kaggle competitions preferably actually in the top 10 not just the top 10%

00:19:22.180 | Or that meet or exceed academic state-of-the-art published results

00:19:27.480 | so

00:19:29.720 | the

00:19:31.720 | When you use an academic data set

00:19:34.320 | It's important to site it so you'll see here

00:19:36.840 | There's a link to the paper that it's from you definitely don't need to read that paper right now

00:19:41.040 | But if you're interested in learning more about it, and why it was created, and how it was created all the details are there

00:19:47.240 | So in this case this is a pretty difficult challenge the pet data sets going to ask us to distinguish between

00:19:54.000 | 37 different categories of dog breed and cat breed so that's really hard in fact

00:20:00.460 | Every course until this one

00:20:03.680 | We've used a different data set which is one where you just have to decide is something a dog or is it a cat?

00:20:09.680 | So you've got a 50/50 chance right away, right and dogs and cats look really different

00:20:14.200 | There are lots of dog breeds and cat breeds look pretty much the same so why have we changed that data set?

00:20:19.700 | We've got to the point now where deep learning is so fast and so easy that the dogs versus cats problem

00:20:25.780 | Which a few years ago was considered extremely difficult

00:20:29.360 | 80% accuracy was the state-of-the-art. It's now too easy

00:20:32.800 | Our models were basically getting everything right all the time without any tuning

00:20:38.840 | And so they weren't you know really a lot of the opportunities for me to show you how to do more sophisticated stuff

00:20:43.880 | So we've picked a harder problem this year

00:20:46.560 | So this is the first class where we're going to be learning how to do this difficult problem and this kind of thing where you

00:20:52.780 | Have to distinguish between similar categories is called in the academic context. It's called fine-grained classification

00:20:59.800 | so we're going to do the fine-grained classification tasks of

00:21:02.400 | Figuring out a particular kind of pet and so the first thing we have to do is download and extract

00:21:08.620 | The data that we want we're going to be using this function called

00:21:12.780 | Antar data which will download it automatically and will enter it automatically

00:21:18.380 | AWS has been kind enough to give us lots of space and bandwidth for these data sets so they'll download super quickly for you

00:21:25.960 | And so the first question then would be how do I know what Antar data?

00:21:31.560 | does

00:21:33.540 | So you can just type help and you will find out

00:21:36.860 | What module did it come from because since we imported start we don't necessarily know that

00:21:41.220 | What does it do and?

00:21:44.300 | something you might not have seen before even if you're an experienced programmer is

00:21:47.600 | What exactly do you pass to it? You're probably used to seeing the names URL file name

00:21:54.980 | Destination that you might not be used to seeing

00:21:58.060 | These bits these bits of types and if you've used a type programming language, you'll be used to seeing them

00:22:04.860 | But Python programmers are less used to it, but if you think about it

00:22:08.540 | You don't actually know how to use a function unless you know what type each thing is that you're providing it

00:22:15.140 | So we make sure that we give you that type information

00:22:17.580 | Directly here in the help so in this case the URL is a string and the file name is either union means either

00:22:25.780 | either a path or a string and it defaults to nothing and

00:22:31.580 | The destination is either a path or a string that defaults to nothing

00:22:35.580 | So we'll learn more shortly about how to get more documentation about the details of this

00:22:40.740 | But for now we can see we don't have to pass in a file name or a destination

00:22:44.780 | It'll figure them out for us from the URL so and for all the data sets

00:22:49.660 | We'll be using in the course. We already have constants to find

00:22:52.720 | For all of them right so in this URLs module class actually

00:22:58.740 | You can see that's where it's going to grab it from okay, so it's going to download that to some

00:23:04.580 | Convenient path and untie it for us and will then return

00:23:09.260 | the value of path

00:23:12.020 | Okay, and then in Jupyter notebook. It's kind of handy

00:23:16.060 | You can just write a variable on its own right and semicolon is just an end a statement marker in Python

00:23:23.180 | So it's the same as doing this you can write it on its own and it prints it you can also say print

00:23:28.260 | Right, but again, we're trying to do everything fast and interactively this right it and here is the path

00:23:34.580 | Where it's given us our data

00:23:37.180 | Next time you run this

00:23:39.780 | Since you've already downloaded it it won't download it again since you've already untied it it won't untie it again

00:23:44.960 | So everything's kind of designed to be pretty automatic pretty easy

00:23:48.260 | There are some things in

00:23:52.940 | Python that are less convenient for interactive use than they should be for example when you do have a path object

00:23:58.540 | Seeing what's in it actually is takes a lot more typing than I would like so sometimes we add

00:24:03.960 | Functionality into existing Python stuff one of the things we do is we add an LS method to paths so if you go path to LS

00:24:11.660 | Here is what's inside?

00:24:14.420 | This path so that's what we just downloaded so when you try this yourself

00:24:19.220 | You wait a couple minutes for it to download unzip and then you can see what's in there

00:24:24.020 | If you're an experienced Python programmer

00:24:28.620 | You may not be familiar with this approach of using a slash like this now. This is a really convenient function

00:24:34.560 | That's part of Python 3 its functionality from something called path lib these are path objects path objects are much better to use than strings

00:24:42.280 | That lets you basically create sub paths like this doesn't matter if you're on Windows Linux Mac

00:24:49.020 | It's always going to work exactly the same way

00:24:51.020 | So here's a path to the images in that data set

00:24:55.460 | Alright, so if you're starting with a brand new data set try to do some deep mining on it

00:25:02.860 | What do you do? Well the first thing you would want to do is probably see what's in there, so we found that these are the

00:25:09.300 | Directories that in there, so what's in this images?

00:25:15.140 | There's a lot of functions in fast.io for you. There's one called get image files that will just grab a

00:25:21.100 | Array of all of the image files based on extension in a path

00:25:25.480 | and so here you can see

00:25:28.420 | We've got lots of different files

00:25:30.660 | Okay, so this is a pretty

00:25:33.100 | common way to

00:25:35.300 | Image computer vision data sets to get passed around is such as just one folder with a whole bunch of files in

00:25:40.900 | so the interesting bit then is

00:25:44.580 | How do we get the labels so in machine learning the labels refer to the thing?

00:25:50.340 | We're trying to predict and if we just eyeball this we can immediately see that the labels

00:25:56.340 | Actually part of the file name you see that right. It's kind of like path

00:26:01.660 | label underscore number

00:26:05.140 | extension so we need to somehow get a list of

00:26:09.740 | These bits of each file name and that will give us our labels

00:26:14.260 | Because that's all you need to build a deep learning model. You need some pictures so files containing the images and you need some labels

00:26:21.140 | so in fast AI this is made really easy there's a

00:26:26.700 | object called image data bunch and an image data bunch represents all of the data you need to build a model and

00:26:33.780 | There's basically some factory methods which try to make it really easy for you to create that data bunch

00:26:41.500 | We'll talk more about this shortly, but a training set and the validation set with images and labels for you

00:26:47.260 | Now in this case we can see we need to extract the labels

00:26:50.820 | from the names

00:26:53.500 | Okay, so we're going to use from name re so for those of you that use Python

00:26:58.220 | You know re is the module in Python that does regular expressions things. That's really useful for extracting

00:27:03.460 | Text I just went ahead and created

00:27:07.460 | The regular expression that would extract the label from this text, okay, so those of you who?

00:27:14.440 | Not familiar with regular expressions super useful tool

00:27:19.060 | It'd be very useful to spend some time figuring out how and why that particular regular expression is going to extract

00:27:25.580 | the label

00:27:27.660 | From this text. Okay, so with this factory method we can basically say okay. I've got this path containing images

00:27:34.900 | This is a list of file names. Remember I got them back here

00:27:37.500 | This is the regular expression pattern that is going to be used to extract the label from the file name

00:27:44.620 | We'll talk about transforms later

00:27:47.620 | And then you also need to say what size images do you want to work with?

00:27:52.180 | So that might seem weird. Why do I need to say what size images?

00:27:56.260 | I want to work with because the images have a size we can see what size the images are and I guess honestly this is a

00:28:03.860 | Shortcoming of current deep learning technology, which is that a GPU

00:28:08.680 | Has to apply the exact same instruction to a whole bunch of things at the same time in order to be fast and

00:28:16.660 | So if the images are different shapes and sizes, it can't do that

00:28:21.320 | All right, so we actually have to make all of the images the same shape and size

00:28:27.020 | In part one of the course, we're always going to be making

00:28:32.140 | Images square shapes and part two will learn how to use rectangles as well

00:28:36.860 | It turns out to be surprisingly nuanced

00:28:39.180 | But pretty much everybody in pretty much all computer vision modeling nearly all of it uses this approach of square

00:28:46.620 | And 224 by 224 for reasons we'll learn about is an extremely common size that most models tend to use

00:28:54.880 | So if you just use size equals 224

00:28:57.820 | You're probably going to get pretty good results most of the time and this is kind of

00:29:01.980 | The little bits of art is in the ship that I want to teach you folks, which is like what generally just works

00:29:08.680 | Okay, so if you just use size equals 224 that'll generally just work for most things most of the time

00:29:14.580 | So this is going to return a

00:29:18.500 | Data bunch object and in fast AI everything you model with is going to be a data bunch object

00:29:24.420 | We're going to learn all about them and what's in them and how do we look at them and so forth?

00:29:27.540 | So basically a data bunch object contains

00:29:29.940 | two or three

00:29:32.580 | Data sets it contains your training data

00:29:35.180 | We'll learn about this shortly. It'll contain your validation data and optionally it contains your test data and for each of those it contains your

00:29:43.700 | Your images and your labels or your texts on your labels or your tabular data and your labels or so forth

00:29:51.460 | And that all sits there in this one place

00:29:55.020 | Something we'll learn more about a little bit is

00:29:58.140 | Normalization but generally in all nearly all machine learning tasks you have to make all of your data

00:30:04.260 | About the same size specifically about the same mean and about the same standard deviation

00:30:09.500 | So there's a normalized function that we can use to normalize our data bunch in that way

00:30:19.700 | Okay, Rachel come and ask the question

00:30:22.380 | What is the function do if the image size is not to 24?

00:30:29.300 | great, so

00:30:32.540 | This is what we're going to learn about shortly

00:30:34.540 | Basically this thing called transforms is is used to do a number of things and one of the things it does is to make something

00:30:40.580 | size 224

00:30:43.100 | Let's take a look at a few pictures. Here are a few pictures of

00:30:46.580 | Things from my data from my data bunch so you can see data.show batch

00:30:50.700 | Can be used to show me the contents of some of the contents of my data bunch

00:30:56.180 | So this is going to be three by three

00:30:58.940 | And you can see roughly what's happened is that they all seem to have been kind of

00:31:04.100 | Zoomed and cropped in a reasonably nice way. So basically what it'll do is something called by default

00:31:10.220 | center cropping

00:31:12.540 | Which means it'll kind of grab the middle bit and it will also

00:31:16.500 | Resize it so we'll talk more about the detail of this because it turns out to actually be quite important

00:31:21.180 | But basically a combination of cropping and resizing is used

00:31:25.780 | Something else we'll learn about is we also use this to do something called data augmentation

00:31:31.060 | So there's actually some randomization in how much and where it crops and stuff like that

00:31:35.820 | Okay, but that's the basic idea is some cropping and some resizing

00:31:40.900 | But often we also also do some some padding so there's all kinds of different ways

00:31:46.180 | And it depends on data augmentation, which we're going to learn about shortly

00:31:49.460 | And what does it mean to normalize the images?

00:31:54.100 | So I'm normalizing the images we're going to be learning more about later in the course

00:31:59.980 | But in short it means that the the pixel values and we're going to be learning more about pixel values the pixel values start out

00:32:06.820 | from 0 to 255 and some pixel values might tend to be

00:32:12.540 | really I

00:32:16.060 | Should say some channels because there's red green and blue so some channels might tend to be

00:32:20.460 | Really bright and some might tend to be really not bright at all and some might vary a lot and some might not very much

00:32:26.180 | at all

00:32:26.900 | It really helps train a deep learning model if each one of those red green and blue channels has a mean of zero

00:32:33.740 | And a standard deviation of one. Okay, we'll learn more about that if you

00:32:37.860 | Haven't studied or don't remember means and standard deviations. We'll get back to some of that later, but that's the basic idea

00:32:44.340 | That's what normalization does if your data and again, we'll learn much more about the details

00:32:49.380 | But if your data is not normalized it can be quite difficult for your model to train well

00:32:54.900 | So if you do have trouble training a model one thing to check is that you've normalized it

00:33:00.260 | As GPU man will be empowered to doesn't size 256 on more practical considering GPU little utilization

00:33:08.600 | So what we're going to be getting into that shortly, but the brief answer is that the

00:33:14.820 | Models are designed so that the final layer is of size 7 by 7

00:33:19.580 | So we actually want something where if you go 7 times to a bunch of times then you end up with something. It's a good size

00:33:27.220 | Yeah, all of these details we are gonna we are going to get to but the key thing is I wanted to get you

00:33:32.320 | Training a model as quickly as possible

00:33:34.320 | But you know one of the most important things to be a really good practitioner is to be able to look at your data

00:33:40.380 | Okay, so it's really important to remember to go data show batch and take a look

00:33:45.420 | It's surprising how often when you actually look at the data set you've been given that you realize it's got

00:33:49.960 | Weird black borders on it or some of the things have text covering up some of it or some of its rotated in odd ways

00:33:55.820 | So make sure you take a look

00:33:57.820 | And then the other thing we're going to look at do is not just look at the pictures

00:34:03.660 | But also look at the labels and so all of the possible

00:34:08.020 | label names

00:34:10.220 | Accord your classes. That's where the data bunch you can print out your data dot classes

00:34:15.580 | And so here they are that's all of the possible labels that we found by using that regular expression on the file names

00:34:22.540 | And we learned earlier on in that pros are right at the top that there are 37

00:34:27.200 | Possible categories and so just checking length data dot classes. It is indeed 37

00:34:32.660 | A data bunch will always have a property called C

00:34:36.480 | And that property called C the technical details will kind of get to later

00:34:41.920 | But for now you can kind of think of it as being the number of classes

00:34:45.100 | For things like regression problems and multi-label classification and stuff. That's not exactly accurate

00:34:52.220 | But it'll do for now

00:34:53.980 | It's it's important to know that data dot C is a really

00:34:57.940 | Important piece of information that is something like or at least for classification problems. It is the number of classes

00:35:04.860 | Right believe it or not. We're now ready to train a model and

00:35:11.260 | So a model is trained in fast AI using something called a learner

00:35:19.380 | And just like a data bunch is a general fast AI concept for your data

00:35:24.520 | And from there there are subclasses for particular applications

00:35:29.560 | like image data bunch

00:35:32.140 | Alona is a general concept for things that can learn

00:35:36.140 | To fit the model and from that there are various subclasses to make things easier and in particular

00:35:41.720 | There's one called conf loner, which is something that will create a convolutional neural network for you

00:35:47.380 | We'll be learning a lot about that over the next few lessons

00:35:50.460 | But for now just know that to create a learner for a convolutional neural network. You just have to tell it two things the first is

00:35:59.440 | What's your data and not surprisingly it takes a data bunch and the second thing you need to tell it is?

00:36:06.560 | What's your model?

00:36:08.820 | Or what's your architecture?

00:36:10.820 | So as we learned there are lots of different ways of constructing a convolutional neural network

00:36:17.020 | But for now the most important thing for you to know is that there's a particular kind of model called a resnet

00:36:23.400 | Which works extremely?

00:36:25.780 | Well nearly all the time and so for a while at least you really only need to be doing

00:36:31.940 | Choosing between two things which is what size resnet do you want?

00:36:36.580 | That's just basically how big is it and we'll learn all about the details of what that means

00:36:41.380 | But there's one quarter resnet 34 and there's one quarter resnet 50 and so when we're getting started with something

00:36:47.660 | I'll pick a smaller one because it'll train faster

00:36:49.820 | so

00:36:51.700 | That's kind of it. That's as much as you need to know to be a pretty good practitioner about architectures for now, which is that there's two

00:36:58.420 | Architectures or two variants of one architecture that work pretty well

00:37:02.180 | Resnet 34 and resnet 50 start with a smaller one and see if it's good enough

00:37:07.180 | So that is all the information we need to create a convolutional neural network learner

00:37:12.100 | There's one other thing I'm going to give it though, which is a list of metrics

00:37:16.660 | Metrics are literally just things that get printed out as it's training

00:37:20.280 | So I'm saying I would like you to print out the error rate, please

00:37:24.780 | Now you can see the first time I ran this on a newly installed box

00:37:29.220 | It downloaded something

00:37:32.060 | What's it downloading? It's downloading

00:37:35.620 | the resnet 34

00:37:37.620 | pre-trained weights

00:37:39.980 | Now what this means is that this particular model has actually already been trained

00:37:45.020 | For a particular task and that particular task is that it was trained on looking at about one and a half million

00:37:51.260 | Pictures of all kinds of different things a thousand different categories of things

00:37:55.380 | using an image a data set called image net and

00:37:59.580 | So we can download those pre-trained weights so that we don't start with a model that knows nothing about anything

00:38:05.840 | But we actually start with a model that knows how to recognize the a thousand categories of things in image net

00:38:12.380 | Now I don't think I'm not sure but I don't think all of these 37 categories of pet or in image net

00:38:19.580 | But there were certainly some kinds of dog. There's certainly some kinds of cat

00:38:23.860 | So this pre-trained model already knows quite a little bit about what pets look like and it certainly knows quite a lot about

00:38:31.060 | What animals look like and what photos look like? So the idea is that we don't start

00:38:36.020 | With a model that knows nothing at all

00:38:38.700 | But we start by downloading a model that knows

00:38:41.520 | something about recognizing images already

00:38:44.420 | So it downloads for us automatically the first time we use it a pre-trained model and then from now on it won't need to download it

00:38:51.700 | Again, it'll just use the one we've got

00:38:54.100 | This is really important. We're going to learn a lot about this

00:38:57.260 | It's kind of the focus of the whole course, which is how to do this is called transfer learning

00:39:02.260 | how to take a model that already knows how to do something pretty well and

00:39:06.860 | Make it so that it can do your thing really well

00:39:10.080 | I take a pre-trained model and then we fit it so that instead of predicting the a thousand categories of image net with the image

00:39:18.320 | Net data it predicts the 37 categories of pets using your pet data

00:39:23.500 | And it turns out that by doing this you can train models in

00:39:27.140 | 1/100 or less of the time of

00:39:31.220 | regular model training with

00:39:34.020 | 1/100 or less of the data of regular model training in fact potentially many thousands of times less

00:39:40.820 | Remember I showed you the slide of nickels lesson one project from last year. He used 30 images

00:39:47.220 | And there's not cricket and baseball images in image net

00:39:51.100 | Right, but it just turns out that image nets already so good at recognizing things in the world

00:39:55.920 | They're just 30 examples of people playing baseball and cricket was enough to build a nearly perfect classifier

00:40:01.780 | Okay

00:40:04.300 | now

00:40:06.060 | You would naturally be

00:40:08.060 | Potentially saying well wait a minute

00:40:11.380 | How do you know that it was going to actually that it can actually recognize pictures of people playing cricket versus baseball in general?

00:40:20.420 | Maybe it just learned to recognize those 30

00:40:22.780 | Maybe it's just cheating right and it's called overfitting. We'll be talking a lot about that during this course, right?

00:40:29.340 | But overfitting is where you don't learn to recognize pictures of say cricket versus baseball

00:40:34.620 | But just these particular cricketers and these particular photos and these particular baseball players and these particular photos

00:40:40.940 | We have to make sure that we don't know the theater

00:40:43.660 | And so the way we do that is using something called a validation set a validation set is a set of images

00:40:50.360 | That your model does not get to look at and so these metrics

00:40:54.760 | Like in this case error rate get printed out automatically using the validation set a set of images that our model never got to see

00:41:02.840 | When we created our data bunch

00:41:05.920 | It automatically created a validation set for us

00:41:09.840 | Okay, and we'll learn lots of ways of creating and using validation sets

00:41:15.240 | But because we try to bake in all of the best practices we actually make it nearly impossible

00:41:20.840 | For you not to use a validation set because if you're not using a validation set, you don't know if you're overfitting

00:41:26.240 | Okay, so we always print out the metrics on a validation set. We always hold it out

00:41:30.640 | We always make sure that the model doesn't touch it. That's all done for you

00:41:34.560 | Okay, and that's all built into this data bunch object

00:41:39.080 | So now that we have a conf loader

00:41:42.320 | We can fit it

00:41:45.000 | You can just use a method called fit

00:41:47.540 | But in practice, you should nearly always use a method called fit one cycle

00:41:52.220 | We'll learn more about this during the course. But in short one cycle learning is a paper that was

00:41:58.280 | released

00:42:00.280 | I'm trying to think a few months ago. Listen a year ago

00:42:03.440 | Yeah, so a few months ago

00:42:06.360 | And it turned out to be dramatically better both more accurate and faster than any previous approach

00:42:12.080 | So again, I don't want to teach you how to do

00:42:15.700 | 2017 deep learning right in 2018

00:42:19.660 | The best way to fit models is to use something called one cycle. We'll learn all about it

00:42:24.020 | But for now just know you should probably type my own dot fit one cycle, right?

00:42:28.720 | If you forget how to type it you can start typing a few letters and hit tab

00:42:34.620 | Okay, and you'll get a list of potential options

00:42:38.420 | All right, and then if you forget what to pass it you can press shift tab

00:42:44.280 | And it'll show you exactly what passes so you don't actually have to type help and again

00:42:49.060 | This is kind of nice that we have all the types here because we can see cycle length

00:42:52.720 | I will learn more about what that is shortly is an integer and then max learning rate

00:42:56.800 | It could either be a float or a collection or whatever and so forth and you can see that the mentors will default to this couple

00:43:03.240 | Okay, so

00:43:07.600 | For now just know that this number four basically decides how many times do we go through the entire data set?

00:43:15.840 | How many times do we show the data set to the model so that it can learn from it each time?

00:43:20.360 | It sees a picture. It's going to get a little bit better

00:43:22.480 | But it's going to take time and

00:43:25.280 | It means it could over fit it sees the same picture too many times

00:43:29.400 | It'll just learn to recognize that picture not pets in general

00:43:34.440 | So we'll learn all about how to

00:43:37.160 | tune this number during the next couple of lessons

00:43:41.220 | but

00:43:43.200 | Starting out with four is a pretty good start just to see how it goes and you can actually see after

00:43:49.860 | four epochs or four cycles

00:43:52.700 | We've got an error rate of

00:43:55.200 | six percent

00:43:57.840 | So a natural question is how long did that took that took a minute and 56 seconds?

00:44:04.580 | Yeah, so we're paying you know

00:44:06.940 | 60 cents an hour

00:44:09.460 | We just paid for two minutes of time

00:44:11.780 | I mean we actually pay for the whole time that it's on and running if we use two minutes of compute time

00:44:16.180 | and we got an error rate of six percent, so

00:44:19.200 | 94 percent of the time we correctly picked the exact right one

00:44:23.700 | Of those 94 dog and cat breeds which feels pretty good to me

00:44:29.060 | But to get a sense of how good it is maybe we should go back and look at the paper

00:44:34.100 | Just remember I said the nice thing about using academic papers or Kaggle data sets is we can compare

00:44:40.120 | our solution to whatever the best people in Kaggle did or whatever the

00:44:45.380 | academics did so this particular data set of pet breeds is from 2012 and

00:44:51.060 | If I scroll through the paper you'll generally find in any academic paper

00:44:56.300 | There'll be a section called experiments about two-thirds of the way through and if you find the section on experiments

00:45:02.060 | Then you can find the section on accuracy, and they've got lots of different models

00:45:07.500 | And their models as you'll read about in the paper are extremely kind of pet specific

00:45:13.380 | They learn something about how pet heads look and how pet bodies look and pet images in general look

00:45:18.900 | They combine them all together and once they use all of this

00:45:21.860 | complex code and math they got an accuracy of

00:45:26.140 | 59%

00:45:29.140 | Okay, so in 2012

00:45:31.420 | This highly pet specific analysis got an accuracy of 59% at least with the top researchers from Oxford University

00:45:38.940 | today in

00:45:41.460 | 2018

00:45:43.140 | with

00:45:44.620 | Basically, if you go back and look at actually how much code we just wrote it's about three lines of code

00:45:49.700 | The other stuff is just printing out things to see what we're doing we got

00:45:53.580 | 94% so 6% error so like that gives you a sense of

00:46:00.020 | You know how far we've come with deep learning and particularly with pytorch and fast AI how easy things are

00:46:06.340 | Yes, so

00:46:09.460 | Before we take a break. I just want to check to see if we've got any

00:46:12.860 | And just remember if you're in the audience and you see a question that you want asked

00:46:18.580 | Please click the love heart next to it so that Rachel knows that you want to hear about it also

00:46:23.820 | If there is something with six likes and Rachel didn't notice it, which is quite possible just just quote it in a reply and say

00:46:31.900 | Hey at Rachel this one's got six legs. Okay, so what we're going to do is we're going to take a

00:46:38.260 | Eight minute break, so we'll come back at five past eight

00:46:42.900 | So where we got to was we just we just trained a model

00:46:48.620 | We don't exactly know what that involved or how it happened

00:46:51.340 | but we do know that with three or four lines of code we built something which

00:46:56.540 | smashed the accuracy of the state-of-the-art of

00:46:59.780 | 2012 6% error certainly sounds like pretty impressive for something that can recognize different dog breeds and cat breeds

00:47:06.500 | But we don't really know why it works, but we won't that's okay, right and

00:47:15.300 | In terms of getting the most out of this course

00:47:19.460 | we

00:47:21.460 | Very very regularly here after the course is finished the same basic feedback

00:47:26.820 | Which this is literally copy and paste it for them forum

00:47:30.500 | I fell into the habit of watching the lectures too much and googling too much about concepts without running the code

00:47:37.660 | Now first I thought I should just read it and then research the theory

00:47:41.540 | And we keep hearing people saying my number one regret is I just spent

00:47:47.980 | 70 hours doing that and at the very end I started running the code and oh it turned out

00:47:53.540 | I learned a lot more so please run the code

00:47:57.000 | really run the code I

00:47:59.900 | Should have spent the majority of my time on the actual code in the notebooks running it seeing what goes in and

00:48:06.220 | Seeing what comes out so your most important skills to practice our learning and we're going to show you how to do this

00:48:13.700 | Not a lot more detail, but understanding what goes in

00:48:17.700 | And what goes out?

00:48:19.700 | So we've already seen an example of looking at what goes in

00:48:23.260 | Which is data show batch and that's going to show you examples of labels and

00:48:29.080 | images and

00:48:31.540 | So next we're going to be seeing how to look at what came out

00:48:35.060 | All right, so that's the most important thing to study

00:48:37.980 | as I said

00:48:40.740 | The reason we've been able to do this so quickly is heavily because of the fast AI library now fast AI library is pretty new

00:48:47.620 | But it's already getting an extraordinary amount of traction as you've seen all of the major cloud

00:48:53.260 | Providers either support it or are about to support it a lot of researchers are starting to use it. It's it's

00:49:00.260 | Doing making a lot of things a lot easier, but it's also making new things possible and so

00:49:09.140 | Really understanding the fast AI software is something which is going to take you a long way

00:49:13.340 | And the best way to really understand the fast AR software. Well is by using the fast AI

00:49:18.340 | Documentation and we'll be learning more about the fast AI documentation shortly

00:49:23.380 | So how does it compare I mean there's really only one major other piece of software like fast AI

00:49:31.140 | That is something that tries to make deep learning

00:49:34.380 | Easy to use and that's Keras Keras is a really terrific piece of software

00:49:39.020 | We actually used it for the previous courses until we switched to fast AI

00:49:43.500 | It runs on top of TensorFlow

00:49:46.820 | It was kind of the gold standard for making big learning easy to use before but life is much easier with fast AI

00:49:53.100 | So if you look for example at the last year's course

00:49:56.340 | Exercise which is getting dogs versus cats

00:50:01.700 | Fast AI lets you get more much more accurate less than half the error on a validation set of course

00:50:08.620 | training time is

00:50:11.660 | less than half the time

00:50:13.660 | Lines of code is about a six of the lines of code and the lines of code

00:50:19.060 | More important than you might realize because those 31 lines of Keras code involve you making a lot of decisions

00:50:27.580 | Setting lots of parameters doing lots of configuration. So that's all stuff where you have to know

00:50:32.700 | How to set those things to get kind of best practice results or else these five lines of code

00:50:38.160 | Any time we know what to do for you. We do it for you any time. We can pick a good default we pick it for you

00:50:43.900 | okay, so

00:50:45.860 | Hopefully you'll find this a really useful

00:50:48.260 | Library not just for learning deep learning but for taking it a very long way. How far can you take it?

00:50:54.980 | Well, as you'll see all of the research that we do at fast AI

00:50:58.460 | Uses the library and an example of the research we did which was recently featured in wired

00:51:04.740 | describes a new

00:51:07.380 | Breakthrough in a natural language process processing which people are calling the image net moment

00:51:12.980 | Which is basically we broken use date-of-the-art result in text classification

00:51:18.420 | Which open AI then built on top of our paper to do with more compute and more data and some different tasks to take it

00:51:25.380 | even further

00:51:27.380 | Like this is an example of something that we've done in the last six months in conjunction actually with my colleagues Sebastian Ruder

00:51:34.180 | Example of something that's being built in the fast AI library and you're going to learn how to use this brand-new model in

00:51:43.420 | Three lessons time and you're actually going to get this exact result from this exact paper yourself

00:51:49.700 | Another example one of our alums

00:51:54.140 | Hamel Hussein

00:51:56.620 | Who you'll come across on the forum plenty because he's a great guy very active built a new system for natural language semantic code search

00:52:04.180 | You can find it on github

00:52:06.180 | Where you can actually type in English sentences and find snippets of code that do the thing you ask for and again?

00:52:13.300 | It's being built with the fast AI library using the techniques. You'll be learning in the next seven weeks in production

00:52:18.780 | Yeah, well, it's I think at this stage. It's a part of their experiments platform. So it's kind of pre-production I guess

00:52:26.220 | And so the best place to learn about these things and get involved in these things is on the forums

00:52:34.260 | Where as well as categories for each part of the course, and there's also a general category for deep learning where people talk about

00:52:41.700 | Research papers applications, so on and so forth

00:52:44.600 | so

00:52:47.620 | Even though today

00:52:49.620 | We're kind of going to focus on a small number of lines of code to do a particular thing, which is image classification

00:52:55.940 | And we're not learning much math or theory or whatever over these seven weeks and then part two another seven weeks

00:53:03.420 | We're going to go deeper and deeper and deeper. And so where can that take you? I want to give you some examples

00:53:08.700 | That there is Sarah hooker. She did our first course a couple of years ago

00:53:13.700 | her background was

00:53:16.780 | Economics didn't have a background in coding math computer science. I think she started learning to code two years before she took our course

00:53:24.540 | She helped develop something at she started a nonprofit called

00:53:29.580 | Delta Analytics

00:53:32.820 | They helped build this amazing system where they attached old mobile phones to trees in the Kenyan rainforests and

00:53:40.340 | Used it to listen for chainsaw noises

00:53:43.700 | And then they use deep learning to figure out when there was a chainsaw being used and then they had a system set up to

00:53:49.380 | Alert ranges to go out and stop a legal deforestation in the rainforests

00:53:54.620 | So that was something that she was doing while she was in the course as part of her kind of class projects

00:54:01.260 | What's she doing now?

00:54:03.260 | She is now a Google brain

00:54:05.380 | Researcher which I guess is one of the top if not the top place to do deep learning

00:54:11.020 | She's just been publishing some papers

00:54:14.260 | Now she is going to Africa to set up Google brains first deep learning AI research center in Africa now

00:54:22.340 | Say like she worked her ass off, you know, she really really invested in this course

00:54:28.700 | Not just doing all of the assignments but also going out and reading Ian Goodfellow's book and doing lots of other things

00:54:34.660 | But it really shows

00:54:36.900 | Where somebody who has no computer science or math background at all?

00:54:41.620 | Can be now one of the world's top deep learning researchers and doing very valuable work

00:54:47.900 | Another example from our most recent course Christine Payne

00:54:53.620 | she

00:54:56.780 | Is now at OpenAI?

00:54:58.780 | And you can find her post and actually listen to her music samples of she actually built something to do

00:55:07.360 | Automatically create chamber music compositions you can play and you can listen to online

00:55:13.100 | And so again, it's her background

00:55:15.860 | math and computer science

00:55:18.500 | Actually, that's her there classical pianist

00:55:24.620 | Now I will say she's not your average classical pianist

00:55:27.520 | She's a classical pianist who also has a master's in medical research from Stanford and studied neuroscience and was a high-performance computing

00:55:34.380 | Expert at the E. Shore and was valedictorian at Princeton. Anyway, she, you know, very annoying person good at everything she does

00:55:41.420 | But you know, I think it's really cool to see how a kind of a domain expert in this case the domain of playing piano

00:55:48.700 | can go through the fast AI course and

00:55:52.660 | Come out the other end at I guess open AI would be

00:55:55.440 | You know of the three top research institutes Google Blaine or open AI would be two of them probably along with deepland

00:56:01.500 | And interestingly actually one of our other students or should say alumni of the course recently interviewed

00:56:09.340 | Her for a blog post series he's doing on top AI researchers

00:56:13.940 | And she said one of the most important pieces of advice she got was from me and she said the piece of advice was pick one

00:56:20.700 | Project do it really well make it fantastic

00:56:24.580 | Okay, so that was the piece of advice

00:56:27.700 | She found the most useful and we're going to be talking a lot about you doing projects and making them fantastic during this course

00:56:35.560 | Having said that I don't really want you to go to AI or Google brain

00:56:40.500 | What I really want you to do is go back to your workplace or your passion project and apply these skills

00:56:48.260 | There, right? Let me give you an example

00:56:50.980 | MIT

00:56:53.180 | Released a deep learning course and they highlighted in their announcement for this deep learning course this medical imaging example

00:56:59.960 | and

00:57:02.620 | One of our students

00:57:05.060 | Alex who is a radiologist?

00:57:07.160 | said

00:57:09.140 | You guys just showed a model overfitting I can tell because I'm a radiologist and this is not

00:57:17.140 | What this would look like?

00:57:19.140 | on a chest film

00:57:21.380 | This is what it should look like and this as a deep-learning practitioner. This is how I know

00:57:26.180 | This is what happened in your model. So Alex is combining his knowledge of radiology and his knowledge of deep learning

00:57:33.500 | to assess

00:57:35.900 | MIT's model from just two images very accurately

00:57:39.740 | All right

00:57:40.300 | And so this is actually what I want most of you to be doing is to take your domain

00:57:44.820 | expertise and combine it with the deep learning

00:57:47.500 | Practical aspects that you'll learn in this course and bring them together

00:57:51.140 | like Alex is doing here and so a lot of radiologists have actually gone through this course now and

00:57:56.620 | Have built journal clubs and American Council of Radiology practice groups

00:58:02.660 | There's a data science institute at the ACR now and so forth and Alex is one of the people who's providing kind of a lot

00:58:08.900 | Of leadership in this area

00:58:10.060 | I would love for you to do the same kind of thing that Alex is doing which is to really bring

00:58:14.660 | Deep learning later leadership into your industry and just your social impact project, whatever it is that you're trying to do

00:58:21.940 | So another great example was this was Melissa Fabros who was a English literature PhD

00:58:27.900 | He's just studied like gendered language in English literature or something

00:58:32.660 | and actually

00:58:35.220 | Rachel did a previous job taught her to code I think and then she came into the fast AI course and she helped

00:58:42.780 | Kiva a micro lending social impact organization to build a system that can recognize

00:58:48.500 | Faces why is that necessary? Well, we're going to be talking a lot about this but because

00:58:54.300 | Most AI researchers are white men

00:58:58.020 | most computer vision software

00:59:01.420 | Can only recognize white male faces effectively in fact?

00:59:06.220 | I think it was IBM system is like ninety nine point eight percent accurate on common

00:59:11.420 | white face men

00:59:13.420 | versus

00:59:16.420 | 60% accurate 65% accurate on dark face dark-skinned women

00:59:22.060 | So it's like what is that like 30 or 40 times worse?

00:59:26.140 | For black women versus white men and this is really important because for Kiva

00:59:31.260 | Black women are you know, perhaps the most common user base for their micro lending platform

00:59:38.940 | So Melissa after taking our course and again working her ass off and being super intense in her study and her work

00:59:46.340 | Won this one million dollar AI challenge for her work for Kiva

00:59:51.180 | Karthik did our course and realized the thing he wanted to do wasn't at his company

00:59:58.940 | It was something else which is to help blind people to understand the world around them. So he started a new startup

01:00:03.900 | You can find it now. It's called envision you can download the app you can point your phone at things and it will tell you

01:00:10.300 | What it sees?

01:00:11.420 | And I actually talked to a blind lady about these kinds of apps the other day and she confirmed to me

01:00:16.780 | This is a super useful thing

01:00:18.780 | for visually disabled

01:00:20.980 | users

01:00:23.180 | And

01:00:24.900 | It's not it's the level that you can get to with with

01:00:30.380 | The content that you're going to get over these seven weeks and with this software

01:00:33.780 | Can get you right to the cutting edge in areas. You might find surprising

01:00:38.220 | For example, I helped a team of some of our students and some collaborators

01:00:45.060 | On actually breaking the world record for training remember

01:00:50.300 | I mentioned the image net data set lots of people want to train on the image net data set

01:00:54.020 | We smashed the world record for how quickly you can train it

01:00:56.620 | We use standard AWS

01:00:59.660 | Cloud infrastructure cost of $40 of compute to train this model

01:01:04.580 | Using again fast AI library the techniques that we learn in this course

01:01:08.620 | So it can really take you a long way. So don't be kind of put off by this

01:01:13.700 | What might seem pretty simple at first? We're going to get deeper and deeper

01:01:17.380 | You can also use it for other kinds of passion project

01:01:21.260 | So Helena Sarin actually you should definitely check out her Twitter account like a lister

01:01:27.180 | This art is a basically a new style of art that she's developed

01:01:32.180 | Which combines her painting and drawing with generative adversarial models to create these extraordinary?

01:01:40.340 | Results and so I think this is super cool. She's not a professional artist. She is a professional software developer

01:01:47.780 | But she just keeps on producing these beautiful results and

01:01:51.500 | when she started

01:01:54.220 | you know her

01:01:56.620 | Her art had not really been shown anywhere or discussed anywhere now

01:02:01.380 | There's recently been some quite high-profile articles describing how she is creating a new form of art again. This has come out of the

01:02:09.060 | fast AI

01:02:11.500 | Course that she developed these skills or equally important Brad Kentsler who figured out how to make a picture of Kanye out of pictures

01:02:19.260 | Of Patrick Stewart's head also something you will learn to do if you wish to

01:02:25.100 | This particular style this particular type of what's called style transfer was a really interesting tweak that allowed him to do some things

01:02:31.780 | It hadn't quite been done before

01:02:33.780 | And this particular picture helped him to get a job as a deep learning specialist at AWS. So there you go

01:02:41.500 | Another interesting example another alumni actually worked at Splunk as a software engineer

01:02:48.540 | and

01:02:50.940 | He'd signed an algorithm after like lesson 3 which basically turned out at Splunk to be fantastically good at identifying fraud

01:02:59.100 | We'll talk more about it shortly

01:03:01.660 | If you've seen Silicon Valley the HBO series the the hot dog not hot dog app

01:03:06.180 | That's actually a real app you can download and it was actually built by Tim on blade as a fast AI student project

01:03:12.680 | So there's a lot of cool stuff that you can do

01:03:17.420 | I'm like, yes, it wasn't any nominated. So I think we only have one any nominated deep fast AI alumni at this stage

01:03:24.080 | So please help change that

01:03:26.080 | All right

01:03:31.340 | The other thing, you know is is the forum threads can kind of turn into these really cool things

01:03:36.940 | So Francisco who's actually here in the audience. He's a really

01:03:39.940 | Boring McKinsey consultant like me that's a Francisco and I both have this shameful past that we were McKinsey consultants

01:03:46.660 | but we left and we're okay now and

01:03:49.180 | He started this thread saying like oh this stuff. We've just been learning about

01:03:53.900 | Building NLP and different languages. Let's try and do lots of different languages

01:03:59.100 | We started this thing called the language model zoo and out of that. There's now been an academic

01:04:05.620 | Competition was one in Polish that led to an academic paper

01:04:10.740 | tie state-of-the-art

01:04:12.940 | German state-of-the-art basically students have been coming up with new state-of-the-art results across lots of different languages

01:04:19.500 | And this all is entirely being done by students working together through the forum. So please

01:04:25.880 | Get on the forum

01:04:28.020 | but

01:04:29.340 | Don't be intimidated because remember a lot of the you know

01:04:32.780 | Everybody you see on the forum the vast majority posting post all the damn time, right?

01:04:38.460 | They've been doing this a lot and they do it a lot of the time and so at first it can feel intimidating

01:04:43.580 | Because it can feel like you're the only new person there

01:04:46.140 | But you're not that all of you people in the audience everybody who's watching everybody who's listening you're all new people

01:04:52.300 | that and so when you just get out there and say like

01:04:55.860 | Okay, all you people getting new state-of-the-art results in German language modeling. I

01:05:00.860 | Can't start my server. I try to click the notebook and I get an error

01:05:06.540 | What do I do people will help you?

01:05:08.780 | Okay, just make sure you provide all the information. This is the you know, I'm using paper space

01:05:13.740 | This was a particular instance. I tried to use here's a screenshot of my error

01:05:18.020 | People will help you. Okay, or if you've got something to add so if people were talking about

01:05:23.860 | Crop yield analysis and you're a farmer and you think you know, oh I've got something to add

01:05:29.540 | Please mention it even even if you're not sure it's exactly relevant. It's fine, you know, just get involved

01:05:36.840 | And because remember everybody else from the forum started out

01:05:39.840 | Also intimidated. All right, we all start out

01:05:43.560 | Not knowing things and so just get out there and try it

01:05:47.040 | Okay, so

01:05:51.920 | Let's get back and do some more coding

01:05:54.540 | Yes, Richard we have some questions

01:05:58.560 | There's a question from earlier about why you're using resnet as opposed to inception

01:06:04.880 | So

01:06:06.880 | The question is about this architecture

01:06:09.360 | So there are lots of architectures to choose from

01:06:12.840 | And it would be fair to say there isn't

01:06:15.840 | One best one

01:06:18.960 | but if you look at

01:06:21.920 | things like the

01:06:24.160 | Stanford Dawn Bench benchmark

01:06:26.160 | Or image net classification you'll see in first place in second place in third place and fourth place is fast AI

01:06:34.480 | Jeremy had in fast AI Jeremy had first AI euros clouds from the Department of Defense

01:06:38.020 | innovation team

01:06:40.160 | Google resnet resnet resnet resnet resnet resnet. It's good enough

01:06:45.040 | Okay, so it's fun

01:06:47.800 | There are other architectures the main reason you might want a different architecture is if you want to do edge computing

01:06:57.880 | So if you want to create a model that's going to sit on somebody's mobile phone

01:07:01.760 | Having said that even there most of the time

01:07:04.360 | I reckon the best way to get a model onto somebody's mobile phone is to run it on your server and

01:07:08.400 | Then have your mobile phone app talk to it

01:07:11.040 | It really makes life a lot easier and you get a lot more flexibility

01:07:14.300 | But if you really do need to run something on a low-powered device, then there are some special architectures for that

01:07:19.560 | So the particular question was about inception

01:07:24.200 | That's a particular another architecture which tends to be pretty memory intensive and

01:07:31.160 | Yeah, resnet

01:07:33.160 | So inception tends to be pretty memory intensive, but it's okay. It's also like

01:07:37.180 | It's not terribly resilient. One of the things we try to show you is like stuff which just tends to always work

01:07:43.580 | Even if you don't quite true and everything perfectly

01:07:46.600 | So resnet tends to work pretty well across a wide range of different

01:07:51.080 | Kind of details around choices that you might make so I think it's pretty good

01:07:58.720 | So we've got this train model and so what's actually happened as we'll learn is it's basically

01:08:03.120 | Creating a set of weights if you've ever done anything like linear regression

01:08:07.840 | Or logistic regression you'll be familiar with coefficients. We basically found some coefficients and parameters that work pretty well

01:08:13.600 | And it took us a minute and 56 seconds

01:08:16.640 | So if we want to start doing some more playing around and come back later

01:08:20.240 | We probably should save those weights so we can save that minute and 56 seconds

01:08:24.200 | So you can just go learn dot save and give it a name. It's going to put it

01:08:28.760 | In a model sub directory in the same place the data came from so if you save different models or different data bunches from different

01:08:36.800 | Datasets, they'll all be kept separate. So don't worry about it

01:08:40.060 | All right, so we talked about how the most important things are how to learn what goes into your model what comes out

01:08:47.440 | We've seen one way of seeing what goes in now. Let's see what comes out

01:08:51.920 | As this is the other thing you need to get really good at

01:08:54.200 | so to see what comes out we can use this class for

01:08:58.720 | classification interpretation and

01:09:01.520 | We're going to use this factory method from learner. So we pass in a learn object. So remember a learn object knows two things

01:09:08.720 | What's your data and?

01:09:11.280 | What is your model? It's now not just an architecture, but it's actually a trained model inside there

01:09:16.440 | And that's all the information we need to interpret that model. So it's this pass in the learner

01:09:21.560 | and we now have a classification interpretation object and

01:09:25.160 | So one of the things we can do and perhaps the most useful things to do is called plot top losses

01:09:32.560 | So we're going to be learning a lot about this idea of lost functions shortly

01:09:38.720 | But in short a lost function is something that tells you how good was your prediction and so specifically that means if you predicted

01:09:47.880 | one class of cat

01:09:51.480 | With great confidence. You said I am very very sure that this is a

01:09:56.200 | Burman

01:09:59.680 | That actually you were wrong then then that's going to have a high loss because you were very confident about the wrong answer

01:10:06.720 | Okay, so that's what it basically means to have a high loss. So by parting the top losses, we're going to find out

01:10:12.420 | What were the things that we were the most wrong on or the most confident about what we got wrong?

01:10:18.800 | So you can see here

01:10:20.800 | It prints out three things

01:10:22.840 | German short-haired before things big all

01:10:26.080 | 7.04 0.92

01:10:29.000 | Well, what do they mean?

01:10:32.080 | Perhaps we should look at the documentation

01:10:35.880 | So if you we've already seen help right and help just prints out a quick little summary

01:10:41.800 | but if you want to really see how to do something use doc and

01:10:44.880 | doc

01:10:47.120 | Tells you the same information has helped, but it has this very important thing, which is

01:10:50.920 | show in Docs

01:10:53.600 | So when you click on show in Docs

01:10:55.720 | It pops up the documentation for that method or class or function or whatever

01:11:02.400 | It starts out by showing us the same information about what is what are the parameters it takes?

01:11:07.400 | Along with the doc string, but then tells you more information

01:11:12.400 | So in this case it's another thing that tells me the title of each shows

01:11:16.880 | the prediction the actual

01:11:19.320 | the loss and the

01:11:22.000 | probability that was predicted

01:11:24.480 | So for example, and you can see there's actually some code you can run

01:11:28.480 | so the documentation always has working code and so in this case it was trying things with handwritten digits and

01:11:34.560 | So the first one it was predicted to be a 7. It was actually a 3

01:11:39.440 | the loss is

01:11:42.000 | 5.44 and the probability of the actual class was

01:11:46.600 | 0.07. Okay, so I

01:11:48.600 | You know, we did not have a high probability associate with the actual class

01:11:52.840 | I can see why I thought this was a 7 nonetheless. It was wrong. So this is the documentation

01:11:58.320 | okay, and so this is your friend when you're trying to figure out how to use these things the other thing I'll mention is if

01:12:04.400 | you're a

01:12:06.080 | Somewhat experienced Python programmer you'll find the source code of fast AI really easy to read

01:12:11.160 | We try to write everything in just a small number of you know

01:12:14.780 | Much less than half a screen of code generally four or five lines of code if you click source

01:12:19.240 | You can jump straight to the source code. Alright, so here is

01:12:23.640 | The plot top losses and this is also a great way to find out

01:12:27.720 | How to use the faster AI library because every line of code here nearly every line of code is calling stuff in the fast AI library

01:12:36.040 | Okay, so don't be afraid to look at the source code

01:12:40.240 | I

01:12:42.240 | I've got another really cool trick about the documentation that you're going to see a little bit later

01:12:46.040 | Okay

01:12:48.400 | So that's how we can look at these top losses and these are perhaps the most important image classification

01:12:54.900 | Interpretation tool that we have because it lets us see

01:12:58.960 | What are we getting wrong and quite often you like in this case?

01:13:03.500 | If you're a dog and cat expert, you'll realize that the things it's getting wrong

01:13:09.240 | Breeds that are actually very difficult to tell apart and you'd be able to look at these and say oh I can see why

01:13:14.760 | They've got this one wrong

01:13:17.640 | So this is a really useful tool

01:13:21.040 | Another useful tool kind of is to use something called a confusion matrix, which basically shows you for every actual

01:13:28.160 | type of dog or cat

01:13:31.080 | How many times was it predicted to be that dog or cat but unfortunately in this case because it's so accurate

01:13:36.960 | This diagonal basically says oh, it's pretty much right all the time and you can see there's some slightly darker ones like a five here

01:13:43.420 | But it's really hard to read exactly what that combination is

01:13:46.280 | So what I suggest you use is instead of if you've got lots of classes don't use a classification or confusion matrix

01:13:52.520 | But this is my favorite named function in fast AI. I have very proud of this you can call most confused

01:13:59.080 | And most confused will simply grab out of the confusion matrix the particular

01:14:05.600 | Combinations have predicted an actual that got wrong the most often

01:14:09.440 | So in this case the Staffordshire ball terrier was what it should have predicted and instead it predicted an American pitball terrier

01:14:17.680 | And so forth it should have predicted a Siamese and actually predicted women that happened four times

01:14:21.840 | This particular combination happened six times

01:14:24.040 | So this is again a very useful thing because you can look and you can say like with my domain expertise

01:14:29.280 | Does it make sense that that would be something that was confused about?

01:14:33.280 | So these are some of the kinds of tools you can use to look at the output

01:14:36.780 | Let's make our model better

01:14:40.280 | So how do we make the bottle better we can make it better using fine-tuning?

01:14:45.340 | So far we fitted for epochs and it ran pretty quickly

01:14:50.760 | And the reason it ran pretty quickly is that there was a little trick we use these deep learning models these convolutional networks

01:14:57.320 | They have many layers will learn a lot about exactly what layers are but for now just know it goes through compute

01:15:02.760 | computational computational computational

01:15:05.280 | What we did was we added a few extra layers to the end

01:15:10.060 | And we only trained those we basically left most of the model exactly as it was so that's really fast

01:15:16.200 | And if we try to build a model of something that's similar to the original

01:15:21.080 | Pre-trained model so in this case similar to the image net data that works pretty well

01:15:26.800 | But what we really want to do is actually go back and train the whole model

01:15:31.280 | So this is why we pretty much always use this two-stage process so by default

01:15:36.520 | When we call fit will fit one cycle on a confliner

01:15:42.480 | It'll just fine-tune these few extra layers added to the end and it will run very fast. It'll basically never over fit

01:15:49.200 | But to really get it good you have to call

01:15:53.040 | unfreeze and unfreeze is the thing that says please train the whole model and

01:15:59.000 | Then I can call fit one cycle again, and oh

01:16:02.960 | The error got much worse

01:16:07.200 | Okay, why in order to understand why?

01:16:12.160 | We're actually going to have to learn more about exactly what's going on behind the scenes

01:16:18.280 | So let's start out by trying to get an intuitive understanding of what's going on behind the scenes and again

01:16:24.640 | We're going to do it by looking at pictures

01:16:26.640 | We're going to start with this picture these pictures come from a fantastic paper by Matt Zyla who nowadays is CEO of clarify

01:16:35.160 | which is a very successful computer vision start-up and

01:16:39.000 | His supervisor of his PhD Rob Fergus

01:16:42.160 | And they created a paper showing how you can visualize the layers of a convolutional neural network

01:16:48.600 | So a convolutional neural network will learn mathematically about what the layers are shortly

01:16:53.120 | But the basic idea is that your red green and blue pixel values that are numbers from 0 to 255 go into a simple computation

01:17:00.240 | The first layer and something comes out of that and then the result of that goes into a second layer

01:17:05.920 | that goes for a third layer and so forth and

01:17:08.480 | There can be up to a thousand layers of a neural network

01:17:15.760 | ResNet 34 has 34 layers ResNet 50 has 50 layers

01:17:19.880 | But let's look at layer one. There's this very simple computation. It's it's a convolution if you know what they are

01:17:27.000 | We'll learn more about them shortly

01:17:29.000 | What comes out of this first layer? Well, we can actually visualize these specific coefficients the specific parameters by drawing them as a picture

01:17:37.320 | There's actually a few dozen of them in the first layer, so we won't draw all of them

01:17:42.480 | But let's just look at mine at random. So here are nine examples of the actual

01:17:47.280 | coefficients from the first layer and so these operate on

01:17:51.240 | groups of pixels that are next to each other and

01:17:54.160 | So this first one basically finds groups of pixels that have a little horror than a little diagonal line in this direction

01:17:59.720 | This one finds diagonal lines in the other direction this fine gradients that go from yellow to blue in this direction

01:18:06.360 | This one finds gradients that go from pink to green in this direction and so forth

01:18:10.440 | That's a very very simple

01:18:12.440 | little

01:18:14.120 | filters

01:18:15.480 | That's layer one of a image net pre-trained convolutional neural net

01:18:19.520 | Layer two

01:18:23.000 | takes the results of those filters and does a second layer of computation and it allows it to create so here are nine examples of

01:18:31.160 | Kind of a way of visualizing this one of the second layer features and you can see it's basically learned to create something that looks for

01:18:39.440 | Corners top left corners and

01:18:42.840 | This one is learned to find things that find right hand curves

01:18:46.800 | This one is going to find things that find little circles

01:18:49.840 | So you can see how layer two like this is the easiest way to see it in layer one

01:18:54.840 | We have things that can find just one line in layer two

01:18:57.760 | We can find things that have two lines joined up or one line repeated

01:19:01.080 | If you then look over here

01:19:04.040 | These nine show you nine examples of actual bits of actual photos that activated this filter a lot

01:19:10.520 | that's what other words this little bit of

01:19:12.760 | Function math function here was good at finding these kind of window corners and stuff like that

01:19:19.200 | This little certainly one was very good at finding bits of photos that had circles

01:19:24.200 | Okay, so this is the kind of stuff you've got to get a really good intuitive understanding for us lately

01:19:29.120 | The start of my neural net is going to find simple very simple gradients lines

01:19:34.000 | The second layer can find very simple shapes the third layer can find combinations of those

01:19:38.800 | So now we can find

01:19:42.120 | Repeating patterns of two-dimensional objects or we can find kind of things that lines that join together

01:19:47.760 | Or we can find well, what are these things? Well, let's find out. What is this?

01:19:53.560 | Let's go and have a look at some bits of picture that activated this one highly. Oh

01:19:59.040 | Mainly they're bits of text. Although sometimes for windows, so it seems to be able to find kind of like repeated

01:20:05.280 | horizontal patterns and this one here says you have to find kind of

01:20:09.660 | edges of fluffy or flowery things

01:20:13.200 | This one here is kind of finding geometric patterns

01:20:16.360 | So layer 3 was able to take all the stuff from layer 2 and combine them together

01:20:21.080 | Layer 4 can take all the stuff from layer 3 and combine them together by layer 4

01:20:27.320 | we've got something that can find dog faces and

01:20:29.800 | Let's see what else we've got here

01:20:33.680 | Yeah various kinds of oh here we are bird legs

01:20:38.400 | So you kind of get the idea and so by layer 5 we've got something that can find the eyeballs of birds and wizards or

01:20:45.880 | Faces of particular breeds of dogs and so forth. So you can see how by the time you get to layer 34

01:20:53.040 | You can find

01:20:56.800 | Specific dog breeds and cat breeds right? This is kind of how it works. So

01:21:00.240 | when we first

01:21:03.280 | Trained when we first fine-tuned that pre-trained model

01:21:06.640 | We kept all of these layers that you've seen so far and we just trained a few more layers on top of all of those

01:21:13.160 | Sophisticated features that are already being created. Alright, and so now we're fine-tuning

01:21:17.320 | We're going back and saying let's change all of these. We'll keep that. We'll start with them where they are

01:21:22.960 | Right, but let's see if we can make them better

01:21:25.560 | Now it seems very unlikely that we can make these layer one features

01:21:32.080 | Better like it's very unlikely that the kind of the definition of a diagonal line

01:21:36.560 | It's going to be different when we look at dog and cat breeds versus the image net data that this is originally trained on

01:21:42.720 | So we don't really want to change layer one very much if at all

01:21:46.720 | Where else the last layers, you know this thing of like types of dog face

01:21:52.880 | Seems very likely that we do want to change that, right?

01:21:55.920 | So you kind of want this intuition this understanding that the different layers of a neural network represents different levels of kind of

01:22:03.420 | semantic complexity

01:22:05.920 | So this is why

01:22:08.840 | our attempt to fine-tune this model didn't work is because we actually

01:22:13.560 | By default it trains all the layers at the same speed

01:22:18.600 | Right, which is to say it'll update those like things representing diagonal lines of gradients

01:22:23.000 | Just as much as it tries to update the things that represent the exact specifics of what an eyeball looks like

01:22:28.520 | So we have to change that

01:22:30.440 | and so

01:22:32.480 | To change it we first of all need to go back to where we were before. Okay, we just broke this model, right?

01:22:38.200 | It's much worse than started out

01:22:40.200 | So if we just go load this brings back the model that we saved earlier. Remember we saved it

01:22:45.560 | as

01:22:48.840 | Stage one, okay, so let's go ahead and

01:22:52.440 | Load that back up. So that's now our models back to where it was before we killed it and

01:22:57.120 | Let's run

01:23:00.120 | Learning rate finder. We'll learn about what that is next week

01:23:03.000 | But for now just know this is the thing that figures out what is the fastest I can train this neural network at?

01:23:09.720 | without

01:23:11.640 | Making it zip off the rails and get blown apart. Okay, so we can call learn dot LR find and

01:23:17.560 | Then we can go learn dot recorder dot plot and that will plot the result of our LR finder

01:23:23.120 | and what this basically shows you is this is this key parameter that we're going to learn all about of the learning rate and the

01:23:29.240 | Learning rate basically says how quickly am I updating the parameters in my model?

01:23:34.280 | and you can see that what happens is as I think this this bottom one here shows me what happens as I increase the learning rate and

01:23:43.200 | This one here shows what have you know, what's the result? What's the loss?

01:23:46.640 | And so you can see once the learning rate gets past ten to the negative four my loss gets

01:23:52.040 | worse, okay, so

01:23:54.680 | It actually so happens. In fact, I can check this if I press shift tab here. My learning rate defaults to

01:24:02.120 | Point oh, oh three. So my default learning rate is about here

01:24:06.760 | So you can see why our loss got worse, right because we're trying to fine-tune things now

01:24:11.440 | We can't use such a high learning rate

01:24:13.440 | So based on the learning rate finder, I tried to pick something, you know

01:24:18.760 | Well before it started getting worse

01:24:21.720 | So I decided to pick one in x6. So I decided I'm going to train at that rate

01:24:28.040 | But there's no point trading all the layers at that rate because we know that the later layers worked just fine

01:24:35.640 | Before when we were training much more quickly again of the default which was to remind us

01:24:41.440 | Point oh oh three

01:24:44.480 | So what we can actually do is we can pass a range of learning rates to learn dot fit and we do it like this

01:24:51.400 | You pass and use this keyword in fact in Python. You may have come across before it's called slice and that can take a

01:24:58.640 | Start value and a stock value and basically what this says is train the very first players

01:25:05.200 | at a learning rate of one in x6 and

01:25:07.920 | The very last layers at a rate of one in egg four and then kind of distribute all the other layers

01:25:14.480 | Across that, you know between those two values equally

01:25:19.200 | So we're going to see that in a lot more detail. Basically for now

01:25:23.440 | This is kind of a good rule of thumb is to say when you after you unfreeze

01:25:30.640 | So this is the thing that's going to train the whole thing

01:25:33.960 | Pass a max learning rate parameter pass it a slice

01:25:38.120 | Make the second part of that slice about ten times smaller than your first stage

01:25:44.600 | So our first stage defaulted to about one in egg three

01:25:47.300 | So let's use about one in egg four and then this one should be a value from your learning rate finder

01:25:53.520 | Which is well before things started getting worse and you can see things are starting to get worse

01:25:58.000 | Maybe about here. So I picked something that's at least ten times more than that

01:26:03.720 | So if I do that, then I get point. Oh

01:26:07.040 | five seven eight eight

01:26:09.600 | so

01:26:11.600 | Don't quite remember what we got before

01:26:13.600 | Yeah a bit better, right? So we've gone down from a six point one percent to a five point seven percent

01:26:19.360 | So that's about a ten percentage point relative improvement with another 58 seconds of training. So I

01:26:26.200 | Would perhaps say for most people most of the time these two stages are enough to get

01:26:34.000 | Pretty much a world-class model

01:26:36.000 | You won't win a Kaggle competition particularly because now a lot of fast AI alumni are competing on Kaggle and this is the first thing

01:26:43.480 | that they do

01:26:45.360 | But you know in practice you'll get something that's you know about as good in practice as the vast majority of practitioners can do

01:26:53.280 | We can improve it by using more layers and we'll do this next week by basically doing a resnet 50 instead of a resnet 34

01:27:04.040 | And you can try running this during the week if you want to you'll see it's exactly the same as before

01:27:09.160 | But I'm using resnet 50 instead of resnet 34

01:27:11.880 | What you'll find is it's very likely if you try to do this, you will get an error

01:27:17.600 | And the error will be your GPU is run out of memory

01:27:21.240 | and the reason for that is that resnet 50 is bigger than resnet 34 and

01:27:26.000 | Therefore it has more parameters and therefore it uses more of your graphics cards memory

01:27:30.800 | Just totally separate to your normal computer RAM. This is GPU RAM

01:27:34.640 | if you're using the kind of default salamander AWS

01:27:40.360 | And so forth suggestion then you'll be having a 16 gig of

01:27:46.120 | GPU memory the part I use most the time has 11 gig of GPU memory

01:27:52.720 | The cheaper ones have 8 gig of GPU memory

01:27:55.440 | That's kind of the main range you tend to get if yours has less than 8 gig of GPU memory

01:28:00.520 | It's going to be frustrating for you

01:28:02.520 | Anyway, so you'll be somewhere around there

01:28:04.960 | And it's very likely that we try to run this you'll get an out-of-memory memory error

01:28:09.160 | And that's because it's just trying to do too much too many parameter updates for the amount of RAM you have

01:28:15.200 | And that's easily fixed

01:28:17.800 | this image data bunch constructor

01:28:19.960 | Has a parameter at the end

01:28:22.560 | Batch size BS for batch size and this basically says how many images do you train at one time?

01:28:29.440 | If you run out of memory, just make it smaller. Okay, so this worked for me on an 11 gig card

01:28:34.960 | It probably won't work for you if you've got an 8 gig card if you do just make that 32

01:28:39.640 | It's fine to use a smaller batch size it just it might take a little bit longer

01:28:45.680 | That's all okay. If you've got a bigger like a 16 gig you might be able to get away with 64

01:28:51.360 | Okay, so that's just one number you'll need to try during the week

01:28:54.000 | and again, we fit it for a while and

01:28:58.600 | We get down to a four point four percent area

01:29:01.800 | So this is pretty extraordinary. You know I was pretty surprised because I mean

01:29:07.400 | When we first did in the first course just cats versus dogs. We were kind of getting

01:29:13.960 | Somewhere around a three percent error for something where you've got a 50% chance of being right and the two things look totally different

01:29:21.000 | so the fact that we can get a four point four percent error for something for such a

01:29:25.200 | Fine grain thing it's quite extraordinary

01:29:29.080 | In this case I am frozen fitted a little bit more went from four point four to four point three five

01:29:34.840 | tiny improvement

01:29:37.280 | Basically resonant 50 is already a pretty good model

01:29:39.480 | It's interesting because again you can call most confused here, and you can see the kinds of things that it's

01:29:49.200 | Getting wrong, and I actually depending on when you run it. You're going to get slightly different numbers, but you'll get roughly the same kinds of things

01:29:58.040 | So quite often I find that ragdoll and burman are things that it gets confused

01:30:02.360 | And I actually have never heard of either of those things so I actually looked them up on the internet

01:30:06.560 | and

01:30:09.640 | I

01:30:10.560 | Found a page on the cat site called is this a burman or ragdoll and there is a long thread of cat experts like

01:30:19.680 | Arguing intensely about which it is so I feel fine that my computer had problems

01:30:27.400 | I

01:30:29.400 | Found something similar. I think what's this pit wall versus Staffordshire ball terrier apparently the main difference is like the particular kennel club

01:30:36.640 | Guidelines as to how they are assessed, but some people think that one of them might have a slightly red in those

01:30:42.040 | So this is the kind of stuff where actually even if you're not a domain expert

01:30:47.200 | It helps you become one right because I now know

01:30:50.260 | More about which kinds of pet breeds are hard to identify than I used to

01:30:56.120 | So model interpretation works both ways. So what I want you to do this week is to run

01:31:02.080 | This notebook, you know, make sure you can get through it

01:31:06.280 | but then what I really want you to do is to get your own image data set and actually

01:31:13.060 | Francisco who I mentioned earlier he started the language to model thread and he's you know

01:31:18.600 | Now helping to TA the course. He's actually putting together a guy that will show you how to download data

01:31:25.760 | From Google images so you can create your own data set to play with but before I do I want to

01:31:31.720 | Okay, I'll come back to that moment

01:31:35.360 | Before I do I want to show you

01:31:37.880 | Because how to create labels in lots of different ways because your data set wherever you get it from won't necessarily

01:31:44.680 | Be that kind of regex based approach. It could be in lots of different formats

01:31:50.600 | So just showing you how to do this. I'm going to use the MNIST sample. MNIST is pictures of hand-drawn numbers

01:31:57.220 | Just because I want to show you different ways of

01:31:59.920 | Creating these data sets

01:32:04.320 | The

01:32:08.120 | The MNIST sample

01:32:10.160 | Basically looks like this so I can go path.ls

01:32:16.280 | And you can see it's got a training set and a validation set already

01:32:20.000 | So basically the people that put together this data set have already decided what they want you to use as a validation set

01:32:26.040 | Okay, so if we go path

01:32:28.400 | slash train

01:32:31.000 | You'll see there's a folder called three and a folder called seven

01:32:35.560 | That's just really really common way to just to give people labels. It's basically to say oh everything. That's a three

01:32:41.840 | I'll put in a folder called three everything. That's a seven. I'll put in a folder called seven

01:32:45.880 | That this is a often called an image net style data set because this is how image net is distributed

01:32:52.240 | So if you have something in this format where the labels are just whatever the folders called you can say from folder

01:33:00.200 | Okay, and that will create an image data bunch for you. And as you can see three seven

01:33:06.360 | It's created the labels just by using the folder names

01:33:12.040 | Another possibility and as you can see we can train that get ninety nine point five five percent accuracy blah blah blah

01:33:16.960 | Another possibility and for this endless sample. I've got both it might come with a CSV file

01:33:22.520 | That would look something like this for each file name. What's its label now this case the labels are three or seven

01:33:29.580 | There's zero or one which is basically is it a seven or not?

01:33:33.120 | All right, so that's another possibility. So if this is how your labels are you can use from CSV

01:33:39.400 | And if it's called labels dot CSV, you don't even have to pass in a file name if it's called anything else

01:33:44.400 | Then you can call pass in the CSV labels

01:33:47.480 | Okay, so that's how you can use a CSV. Okay. There it is. This is now is it a seven or not?

01:33:53.640 | Another possibility and then you can call data dot passes to see what it found another possibility is as we've seen is you've got

01:34:02.360 | paths that look like this

01:34:04.680 | And so in this case, this is the same thing. These are the folders

01:34:08.400 | right, I could actually grab the

01:34:10.400 | The label by using a regular expression and so here's the regular expression

01:34:15.840 | So we've already seen that approach and again, you can see data dot classes has found it

01:34:20.280 | So what if you it's something that's in the file name of a path, but it's not just a regular expression. It's more complex

01:34:27.440 | You can create an arbitrary function that extracts a label from the file name or path

01:34:33.680 | And in that case you would say from name and function

01:34:38.040 | and

01:34:40.040 | Another possibility

01:34:42.600 | Is that even you need something even more flexible than that?

01:34:46.440 | And so you're going to write some code to create an array of labels and so in that case you can just pass in

01:34:52.360 | From lists. So here is I've created an array of labels here. My labels is from lists

01:34:58.240 | Okay, and then I just pass in that break so you can see there's lots of different ways of creating labels. So so during the week

01:35:04.200 | Try this out now. You might be wondering

01:35:07.880 | How would you know to do all these things? Like where am I going to find?

01:35:11.160 | This kind of information, right? How would I how do you possibly know to do all this stuff?

01:35:16.920 | So I'll show you something incredibly cool. Let's grab this function and

01:35:21.840 | Do you remember to get documentation we type doc?

01:35:25.960 | And here is the documentation for the function and I can click show in dots and

01:35:33.600 | It pops up the documentation

01:35:37.560 | So here's the thing

01:35:39.560 | Every single line of code I just showed you I took it this morning and I copied and pasted it from the documentation

01:35:46.520 | So you can see here the exact

01:35:51.000 | Code that I just used so the documentation for fast AI doesn't just tell you

01:35:56.600 | what to do, but step to step how to do it and

01:36:01.320 | Here is perhaps the coolest bit if you go to fast AI

01:36:05.760 | I

01:36:07.760 | Fast AI underscore Doc's and click on Doc's sauce

01:36:15.720 | It turns out that all of our documentation is actually just do better notebooks. So in this case I was looking at vision dot data

01:36:25.120 | So here is the vision dot data notebook you can download this repo you can get clone on and

01:36:34.560 | If you run it, you can actually run

01:36:37.040 | Every single line of the documentation yourself

01:36:40.160 | Okay, so so all of our docs is also code and so like this is the kind of the ultimate example to me of

01:36:49.760 | Of experimenting right is that

01:36:54.480 | You can now experiment and you'll see in in github

01:36:59.600 | It doesn't quite render properly because github doesn't quite know how to render notebooks properly

01:37:04.120 | But if you get clone this and open it up in Jupiter

01:37:06.760 | You can see it and so now

01:37:09.240 | Anything that you read about in the documentation

01:37:11.560 | Really everything in the documentation has actual working examples in it with actual data sets that are already sitting in there in the repo

01:37:17.920 | For you and so you can actually try every single function in your browser

01:37:23.200 | Try seeing what goes in and try seeing what comes out

01:37:27.000 | There's a question

01:37:30.240 | Will the library use multi GPU and parallel by default?

01:37:33.920 | The library will use multiple CPUs by default, but just one GPU by default

01:37:40.160 | We've probably won't be looking at multi GPU until part two. It's easy to do and you'll find it on the forum, but

01:37:46.720 | Most people won't be needing to use that now

01:37:49.760 | And the second question is whether the library can use

01:37:53.840 | 3d data such as MRI or

01:37:59.400 | Yes, it can

01:38:00.880 | And there is actually a forum thread about that already

01:38:03.960 | Although that's not as developed as 2d yet, but maybe by the time the MOOC is out it will be

01:38:08.960 | So before I wrap up I'll just show you an example of the kind of interesting stuff that you can do by

01:38:17.280 | Doing this kind of exercise

01:38:20.160 | Remember earlier I mentioned that one of our alums who works at Splunk. It's just a

01:38:26.600 | Nasdaq listed big successful company

01:38:28.840 | Created this new anti fraud software

01:38:31.800 | This is actually how he created it as part of a fast AI part one class project

01:38:38.120 | He took the telemetry of the new of users who had Splunk analytics installed and watched their mouse movements

01:38:45.840 | And he created pictures of the mouse movements. He converted speed into

01:38:50.120 | Color and right and left clicks into splodges

01:38:55.000 | he then took the exact code that we saw with an earlier version of the software and

01:39:00.000 | Trained a CNN in exactly the way we saw and used that to train his fraud model

01:39:06.240 | So he basically took something which is not obviously a picture and he turned it into a picture

01:39:11.480 | And got these fantastically good results for a piece of fraud analysis software. So it pays to think

01:39:20.160 | Creatively, so if you're wanting to study sounds a lot of people that study sounds do it by actually creating a spectrogram image and

01:39:27.280 | Then sticking that into a confident. So there's a lot of cool stuff you can do with this

01:39:31.200 | So during the week, yeah

01:39:33.120 | Get your get your GPU going try and use your first notebook

01:39:36.520 | Make sure that you can use lesson one and work through it and then see if you can repeat the process on your own data

01:39:43.640 | Set get on the forum and tell us any little success you had. It's like, oh, I spent three days trying to get my GPU running

01:39:50.760 | And I finally did

01:39:52.360 | any

01:39:53.920 | constraints that you hit

01:39:55.640 | You know try it for an hour or two, but if you get stuck, please ask

01:39:59.320 | And if you're able to successfully build a model with a new data set

01:40:03.320 | Let us know and I will see you next week

01:40:06.560 | *Laughter*

01:40:08.560 | [HORSE WHINNYING]