back to indexLesson 3: Deep Learning 2019 - Data blocks; Multi-label classification; Segmentation
Chapters
0:0
0:54 Machine Learning Course
2:27 Deployment
3:36 Examples of Web Apps
4:37 Guitar Classifier
5:49 American Sign Language
9:54 Satellite Images
11:1 Download the Data
13:53 Condor Installation
14:18 Unzip a 7-Zip File
17:9 The Data Block Api
18:32 Data Set
21:35 Data Loader
24:10 Examples of Using the Data Block Api
27:51 Object Detection Data Set
28:54 Datablock Api Examples
31:25 Warp Perspective Warping
32:21 Data Augmentation
33:46 Metrics
40:11 Partial Function Application
43:1 Fine-Tuning
47:38 What Resources Do You Recommend for Getting Started with Video
51:44 Transfer Learning
57:59 Build a Segmentation Model
60:43 Reading the Learning Rate Finder Graph
65:53 Create a Databank
66:10 Split into Training and Validation
67:33 Transformations
72:34 What Does Accuracy Mean for Pix Pixel Wise Segmentation
76:25 Segmentation
77:9 Convolutional Neural Network
79:3 Plot Losses
82:36 Decreasing the Learning Rate during Training
91:10 Mixed Precision Training
95:35 Image Points
96:59 Regression Model
97:12 Create an Image Regression Model
99:22 Cross-Entropy Loss
99:37 Mean Squared Error
102:48 Tokenization
104:5 Vocab
105:27 Language Model
110:35 Activation Function
112:32 Michael Nielsen
113:27 The Universal Approximation Theorem
117:27 Engrams
123:0 The Universal Approximation Theorem
00:00:03.160 |
So we're going to start with a quick correction, which is to let you know that 00:00:09.160 |
when we referred to this chart as coming from Quora last week, we were correct. 00:00:13.080 |
It did come from Quora, but actually, we realized originally it came from Andrew 00:00:16.920 |
Un's excellent machine learning course on Coursera. 00:00:22.960 |
But in exchange, let's talk about Andrew Un's excellent machine learning 00:00:28.440 |
It's really great, as you can see, people gave it 4.9 out of 5 stars. 00:00:33.400 |
In some ways, it's a little dated, but a lot of the content really is 00:00:39.040 |
as appropriate as ever, and taught in a more bottom-up style. 00:00:43.880 |
So it can be quite nice to combine Andrew's bottom-up style and 00:00:47.520 |
our top-down style and meet somewhere in the middle. 00:00:49.840 |
Also, if you're interested in more machine learning foundations, 00:00:54.240 |
you should check out our machine learning course as well. 00:00:56.800 |
If you go to course.fast.ai and click on the machine learning button, 00:01:00.160 |
that will take you to our course, which is about twice as long as this deep 00:01:05.080 |
learning course, and kind of takes you much more gradually through some of 00:01:08.320 |
the foundational stuff around validation sets and model interpretation and 00:01:13.000 |
how PyTorch tensors work and stuff like that. 00:01:16.280 |
So I think all these courses together, if you want to really dig deeply into 00:01:24.120 |
I know a lot of people here have, and end up saying, I got more out of each one 00:01:28.080 |
by doing a whole lot, or you can skip backwards and forwards, 00:01:31.520 |
So we started talking about deploying your web app last week. 00:01:41.040 |
One thing that's gonna make life a lot easier for 00:01:44.160 |
you is that on the course V3 website, there's a production section, 00:01:49.720 |
where right now we have one platform, but more will be added by the time this 00:01:54.560 |
video comes out, showing you how to deploy your web app really, really easily. 00:02:00.320 |
And when I say easily, for example, here's the how to deploy on 00:02:05.480 |
Zite guide created by San Francisco study group member, Navjot. 00:02:16.160 |
It's not gonna serve 10,000 simultaneous requests, but 00:02:22.080 |
it'll certainly get you started, and I found it works really well. 00:02:25.440 |
It's fast, and so deploying a model doesn't have to be slow or 00:02:32.960 |
And the nice thing is, you can kind of use this for an MVP. 00:02:35.760 |
And if you do find you're starting to get 1,000 simultaneous requests, 00:02:39.340 |
then you know that things are working out, and you can start to upgrade your instance 00:02:43.680 |
types or add to a more traditional big engineering approach. 00:02:53.200 |
it will actually create my teddy bear finder for you. 00:02:57.040 |
And this is an example of my teddy bear finder. 00:02:59.360 |
So the idea is it's as simple as possible, this template. 00:03:03.320 |
So you can fill in your own style sheets, your own custom logic, and so forth. 00:03:08.200 |
This is kind of designed to be a minimal thing, so 00:03:12.680 |
The back end is a simple kind of rest style interface that sends back JSON. 00:03:20.600 |
And the front end is a super simple little JavaScript thing. 00:03:23.760 |
So yeah, it should be a good way to get a sense of how to build 00:03:33.360 |
So examples of web apps people have built during the week. 00:03:45.000 |
Or more specifically, the what Australian car is that. 00:03:48.040 |
I thought it was kind of interesting that Edward said on the forum that the building 00:03:52.760 |
of the app was actually a great experience in terms of understanding 00:03:59.640 |
And it's interesting that he's describing trying it out on his phone. 00:04:09.040 |
A lot of people think like, if I want something on my phone, I have to create 00:04:11.960 |
some kind of mobile TensorFlow, ONNX, whatever, tricky mobile app. 00:04:18.120 |
You can run it all in the cloud and make it just a web app or 00:04:21.760 |
use some kind of simple little gooey front end that talks to a rest back end. 00:04:28.040 |
It's not that often that you'll need to actually run stuff on the phone, so 00:04:39.720 |
You can decide whether your food is healthy or not. 00:04:46.240 |
I would have thought a hamburger is more what we're looking for, but there you go. 00:04:49.560 |
Apparently, Trinidad and Tobago is the home of the hummingbird. 00:04:54.560 |
So if you're visiting, you can find out what kind of hummingbird you're looking at. 00:04:57.560 |
You can decide whether or not to eat a mushroom. 00:05:02.960 |
If you happen to be one of the cousins of Charlie Harrington, 00:05:08.320 |
I believe this was actually designed for his fiance. 00:05:11.160 |
Even will tell you about the interests of this particular cousin. 00:05:14.960 |
So, a fairly niche application, but apparently, 00:05:18.680 |
there are 36 people who will appreciate this at least. 00:05:25.560 |
This is an example of an app which actually takes a video feed and 00:05:44.880 |
Here's a similar one for American Sign Language. 00:05:52.320 |
And so it's not a big step from taking a single image model to taking a video model. 00:06:01.600 |
You can just grab the occasional frame, put it through your model, and 00:06:05.720 |
update the UI as the kind of model results come in. 00:06:11.860 |
So it's really cool that you can do this kind of stuff either in client or 00:06:28.520 |
which he describes as creepy, how accurate it is. 00:06:32.840 |
So here is where I live, which it figured out was in the United States. 00:06:35.960 |
It's interesting, he describes here how he actually had to be very thoughtful about 00:06:40.800 |
the validation set he built, make sure that the satellite tiles were not overlapping or 00:06:47.320 |
In doing so, he realized he had to download more data. 00:06:49.920 |
But once he did, he got this amazingly effective model that can look at satellite 00:06:54.640 |
imagery and figure out what country it's from. 00:06:58.160 |
I thought this one was pretty interesting, which was doing univariate time series 00:07:02.880 |
analysis by converting it into a picture using something I've never heard of, 00:07:11.600 |
But he says he's getting close to state of the art results for 00:07:14.440 |
univariate time series modeling by turning it into a picture. 00:07:17.840 |
And so I like this idea of turning stuff that's not a picture into a picture. 00:07:25.880 |
So something really interesting about this project, which was looking at 00:07:30.000 |
emotion classification from faces, was that he was specifically asking the question, 00:07:35.160 |
how well does it go without changing anything, just using the default settings? 00:07:39.000 |
Which I think is a really interesting experiment because we're all told it's 00:07:42.120 |
really hard to train models and it takes a lot of specific knowledge. 00:07:47.440 |
And actually we're finding that that's often not the case. 00:07:50.120 |
And he looked at this facial expression recognition data set. 00:07:54.200 |
There was a 2017 paper that he compared his results to, and he got equal or 00:08:00.120 |
slightly better results than the state of the art paper on face recognition, 00:08:05.080 |
emotion recognition without doing any custom hyperparameter tuning at all. 00:08:10.520 |
And then Elena Harley, who I featured one of her works last week, 00:08:17.080 |
has done another really cool work in the genomics space, which is looking at 00:08:27.480 |
looking at false positives in these kinds of pictures. 00:08:33.040 |
And she found she was able to decrease the number of false positives coming out of 00:08:37.600 |
the kind of industry standard software she was using by 500% 00:08:46.440 |
I think this is a nice example of something where if you're going through 00:08:50.480 |
spending hours every day looking at something, in this case, 00:08:54.320 |
looking at kind of get rid of the false positives, 00:08:57.360 |
maybe you can make that a lot faster by using deep learning to do a lot of the work 00:09:02.680 |
And again, this is an example of a computer vision based approach on something 00:09:12.400 |
So really nice to see what people have been building in terms of both web apps, 00:09:22.440 |
What we're gonna do today is look at a whole lot more different types of model 00:09:27.560 |
And we're gonna kind of zip through them pretty quickly. 00:09:29.800 |
And then we're gonna go back and say, like, how did all these things work? 00:09:33.960 |
But all of these things, you can create web apps from these as well. 00:09:40.360 |
But you'll have to think about how to slightly change that template to make it 00:09:47.280 |
I think that'll be a really good exercise in making sure you understand the material. 00:09:51.680 |
So the first one we're gonna look at is a dataset of satellite images. 00:09:57.080 |
And satellite imaging is a really fertile area for deep learning. 00:10:04.880 |
It's certainly a lot of people already using deep learning and 00:10:07.560 |
satellite imaging, but only scratching the surface. 00:10:10.720 |
And the dataset that we're gonna look at looks like this. 00:10:15.240 |
It has satellite tiles, and for each one, as you can see, 00:10:20.160 |
there's a number of different labels for each tile. 00:10:23.160 |
One of the labels always represents the weather that's shown. 00:10:31.160 |
And then all of the other labels tell you any interesting features that are seen there. 00:10:40.800 |
Agriculture means there's some farming, road, road, and so forth. 00:10:45.200 |
And so, as I'm sure you can tell, this is a little different to all the classifiers 00:10:50.080 |
we've seen so far, cuz there's not just one label, there's potentially multiple labels. 00:10:55.320 |
So multi-label classification can be done in a very similar way. 00:10:59.840 |
But the first thing we're gonna need to do is to download the data. 00:11:05.640 |
Kaggle is mainly known for being a competition's website. 00:11:09.240 |
And it's really great to download data from Kaggle when you're learning, 00:11:12.880 |
because you can see, how would I have gone in that competition? 00:11:16.000 |
And it's a good way to see whether you kind of know what you're doing. 00:11:18.720 |
I tend to think the goal is to try and get in the top 10%. 00:11:23.640 |
And in my experience, all the people in the top 10% of a competition 00:11:29.560 |
So if you can get in the top 10%, then that's a really good sign. 00:11:32.720 |
Pretty much every Kaggle data set is not available for 00:11:37.440 |
download outside of Kaggle, at least the competition data sets. 00:11:43.200 |
And the good news is that Kaggle provides a Python-based downloader tool, 00:11:49.800 |
So we've got a quick description here of how to download stuff from Kaggle. 00:11:53.400 |
So to install stuff, to download stuff from Kaggle, 00:11:59.200 |
you first have to install the Kaggle download tool. 00:12:04.720 |
And so you can see what we tend to do when there's one off things to do, 00:12:08.400 |
is we show you the commented out version in the notebook. 00:12:13.920 |
If you select a few lines and then hit Ctrl + slash, it uncomments them all. 00:12:19.800 |
And then when you're done, select them again, Ctrl + slash again, and 00:12:25.000 |
So if you run this line, it'll install Kaggle for you. 00:12:28.800 |
Depending on your platform, you may need sudo, 00:12:33.880 |
you may need slash something else, slash pip, you may need source activate. 00:12:40.040 |
So have a look on the setup instructions, actually the returning to work 00:12:44.960 |
instructions on the course website to see when we do condor install, 00:12:50.600 |
you have to do the same basic steps for your pip install. 00:12:53.640 |
So once you've got that module installed, you can then go ahead and 00:13:02.960 |
And basically it's as simple as saying Kaggle competitions download, 00:13:07.720 |
the competition name, and then the files that you want. 00:13:12.120 |
The only other steps before you do that is that you have to authenticate yourself. 00:13:17.740 |
And you'll see there's a little bit of information here on exactly how you can 00:13:21.160 |
go about downloading from Kaggle the file containing your 00:13:28.280 |
So I won't bother going through it here, but just follow these steps. 00:13:33.600 |
Sometimes stuff on Kaggle is not just zipped or tarred, but 00:13:38.520 |
it's compressed with a program called 7-zip, which will have a 7Z extension. 00:13:45.760 |
If that's the case, you'll need to either apt-install P7-zip, or 00:13:52.640 |
Some kind of person has actually created a condor installation of 7-zip that works 00:13:57.720 |
So you can always just run this condor install, 00:14:00.040 |
doesn't even require sudo or anything like that. 00:14:02.960 |
And this is actually a good example of where condor is super handy, 00:14:06.360 |
is that you can actually install binaries, and libraries, and 00:14:09.760 |
stuff like that, and it's nicely cross platform. 00:14:12.280 |
So if you don't have 7-zip installed, that's a good way to get it. 00:14:22.600 |
In this case, it's tarred and 7-zipped, so you can do this all in one step. 00:14:29.920 |
So 7-za is the name of the 7-zip archiver program that you would run. 00:14:34.160 |
Okay, so that's all basic stuff, which if you're not so familiar with the command 00:14:38.960 |
line and stuff, it might take you a little bit of experimenting to get it working. 00:14:44.400 |
Make sure you search the forum first to get started, okay. 00:14:50.720 |
So once you've got the data downloaded and unzipped, you can take a look at it. 00:14:56.160 |
So in this case, because we have multiple labels for 00:15:04.960 |
each tile, we clearly can't have a different folder for 00:15:14.680 |
And so the way that Kaggle did it was they provided a CSV file that had 00:15:19.400 |
each file name along with a list of all of the labels. 00:15:25.640 |
In order to just take a look at that CSV file, 00:15:32.960 |
it's kind of the standard way of dealing with tabular data in Python. 00:15:40.280 |
It pretty much always appears in the PD namespace. 00:15:43.440 |
In this case, we're not really doing anything with it, 00:15:45.600 |
other than just showing you the contents of this file. 00:15:48.680 |
So we can read it, we can take a look at the first few lines, and there it is. 00:15:53.320 |
So we want to turn this into something we can use for modeling. 00:15:59.360 |
So the kind of object that we use for modeling is an object of the data bunch 00:16:05.440 |
plus, so we have to somehow create a data bunch out of this. 00:16:08.920 |
Once we have a data bunch, we'll be able to go .show batch to take a look at it. 00:16:15.000 |
And then we'll be able to go create CNN with it, and 00:16:19.600 |
So really, the trickiest step previously in deep learning has often been 00:16:26.480 |
getting your data into a form that you can get it into a model. 00:16:29.000 |
So far, we've been showing you how to do that using various factory methods. 00:16:36.000 |
So methods where you basically say, I want to create this kind of data from 00:16:39.800 |
this kind of source with these kinds of options. 00:16:42.640 |
The problem is, I mean, that works fine sometimes, and 00:16:45.560 |
we showed you a few ways of doing it over the last couple of weeks. 00:16:53.200 |
Because there's so many choices that you have to make about where do the files live, 00:16:58.160 |
and what's the structure they're in, and how do the labels appear, and 00:17:01.080 |
how do you split out the validation set, and how do you transform it, and so forth. 00:17:05.120 |
So we've got this unique API that I'm really proud of called the DataBlock API. 00:17:11.680 |
And the DataBlock API makes each one of those decisions a separate decision 00:17:16.160 |
that you make, there's separate methods and with their own parameters for 00:17:19.240 |
every choice that you make around how do I create, set up my data. 00:17:28.720 |
we would say we've got a list of image files that are in a folder, and 00:17:33.760 |
they're labeled based on a CSV with this name. 00:17:39.040 |
Remember, I showed you back here that there's a space between them. 00:17:42.000 |
So by passing in separator, it's going to create multiple labels. 00:17:44.920 |
The images are in this folder, they have this suffix. 00:17:48.080 |
We're going to randomly split out a validation set with 20% of the data. 00:17:55.080 |
which we're then going to transform with these transformations. 00:17:58.840 |
And then we're going to create a data bunch out of that, 00:18:00.880 |
which we'll then normalize using these statistics. 00:18:06.080 |
So to give you a sense of what that looks like, 00:18:11.280 |
the first thing I'm going to do is go back and explain what are all of 00:18:15.640 |
the PyTorch and FastAI classes that you need to know about that are going to 00:18:20.600 |
appear in this process, because you're going to see them all the time in 00:18:27.440 |
So the first one you need to know about is a class called a dataset. 00:18:33.840 |
And the dataset class is part of PyTorch, and 00:18:38.560 |
this is the source code for the dataset class. 00:18:42.640 |
As you can see, it actually does nothing at all. 00:18:46.080 |
So the dataset class in PyTorch defines two things, getItem and when. 00:18:59.040 |
In Python, these special things that are underscore, underscore something, 00:19:02.680 |
underscore, underscore, Pythonistas call them dunder something. 00:19:09.440 |
And they're basically special magical methods that do some special behavior. 00:19:16.000 |
This particular method, you can look them up in the Python docs. 00:19:19.320 |
This particular method means that your object, if you had an object called o, 00:19:23.600 |
can be indexed with square brackets, something like that, right? 00:19:28.200 |
So that would call getItem with three as the index. 00:19:32.760 |
And then this one called len means that you can go len o and 00:19:40.240 |
And you can see in this case, they're both not implemented. 00:19:43.120 |
So that is to say, although PyTorch says to tell PyTorch about your data, 00:19:52.400 |
It doesn't really do anything to help you create the dataset. 00:19:55.520 |
It just defines what the dataset needs to do. 00:19:57.800 |
So in other words, your data, the starting point for your data, 00:20:01.600 |
is something where you can say, what is the third item of data in my dataset? 00:20:07.080 |
So that's what getItem does, and how big is my dataset? 00:20:11.040 |
So FastAI has lots of dataset subclasses that do that for 00:20:20.600 |
And so, so far, you've been seeing image classification datasets. 00:20:25.520 |
And so they're datasets where getItem will return an image and 00:20:37.280 |
Now, a dataset is not enough to train a model. 00:20:40.760 |
The first thing we know we have to do, if you think back to the gradient descent 00:20:45.800 |
tutorial last week, is we have to have a few images or 00:20:50.720 |
a few items at a time so that our GPU can work in parallel. 00:20:55.120 |
Remember, we do this thing called a mini-batch. 00:20:56.960 |
A mini-batch is a few items that we present to the model at a time that 00:21:02.200 |
So to create a mini-batch, we use another PyTorch class called a data loader. 00:21:12.760 |
And so a data loader takes a dataset in its constructor. 00:21:19.280 |
So it's now saying, this is something I can get the third item and 00:21:27.240 |
create a batch of whatever size you ask for and pop it on the GPU and 00:21:35.840 |
So a data loader is something that grabs individual items, 00:21:39.800 |
combines them into a mini-batch, pops them on the GPU for modeling. 00:21:43.160 |
So that's called a data loader and it comes from a dataset. 00:21:46.720 |
So you can see already there's kind of choices you have to make. 00:21:53.120 |
What is the data for it where it's gonna come from? 00:21:55.600 |
And then when I create my data loader, what batch size do I wanna use, right? 00:21:59.680 |
This still isn't enough to train a model, not really, 00:22:03.520 |
because we've got no way to validate the model. 00:22:05.920 |
If all we have is a training set, then we have no way to know how we're doing, 00:22:10.040 |
because we need a separate set of held out data. 00:22:13.000 |
A validation set to see how we're getting along. 00:22:15.800 |
So for that, we use a fast AI class called a data bunch. 00:22:21.200 |
And a data bunch is something which, as it says here, 00:22:23.800 |
binds together a training data loader, and a valid data loader. 00:22:36.360 |
they're always referring to some symbol you can look up elsewhere. 00:22:38.920 |
So in this case, you can see train DL is here. 00:22:42.920 |
And there's no point knowing that there's an argument with a certain name, 00:22:49.800 |
So you should always look after the colon to find out that is a data loader. 00:22:56.280 |
you're basically giving it a training set data loader and 00:23:01.880 |
And that's now an object that you can send off to a learner and 00:23:16.200 |
This stuff plus this line is all the stuff which is creating the data set. 00:23:25.720 |
So it's saying where did the images come from? 00:23:27.720 |
Cuz the data set, the index returns two things. 00:23:30.760 |
It returns the image and the labels, assuming it's an image data set. 00:23:38.480 |
And then I'm gonna create two separate data sets, the training and 00:23:42.640 |
This is the thing that actually turns them into PyTorch data sets. 00:23:45.840 |
This is the thing that transforms them, okay? 00:23:49.280 |
And then this is actually gonna create the data loader and 00:23:55.840 |
So let's look at some examples of this data block API. 00:24:00.280 |
Because once you understand the data block API, you'll never be lost for 00:24:04.520 |
how to convert your data set into something you can start modeling with. 00:24:08.000 |
So here's some examples of using the data block API. 00:24:13.520 |
So for example, if you're looking at MNIST, which remember is the pictures and 00:24:19.080 |
classes of handwritten numerals, you can do something like this. 00:24:26.920 |
This, what kind of data set is this gonna be? 00:24:30.120 |
It's gonna come from a list of image files, which are in some folder. 00:24:36.120 |
And they're labeled according to the folder name that they're in. 00:24:42.160 |
And then we're gonna split it into train and validation, 00:24:46.680 |
according to the folder that they're in, train and validation. 00:24:53.440 |
We're gonna be talking more about test sets later in the course. 00:24:56.120 |
Okay, we'll convert those into PyTorch data sets, now that that's all set up. 00:25:01.280 |
We'll then transform them using this set of transforms. 00:25:07.200 |
And we're gonna transform into something of this size. 00:25:12.360 |
And then we're gonna convert them into a data bunch. 00:25:14.240 |
So each of those stages inside these parentheses are various parameters you 00:25:19.640 |
can pass to customize how that all works, right? 00:25:22.560 |
But in the case of something like this MNIST data set, 00:25:25.560 |
all the defaults pretty much work, so this is all fine. 00:25:33.120 |
So data.trainDS is the data set, not the data loader, the data set. 00:25:37.440 |
So I can actually index into it with a particular number. 00:25:40.160 |
So here is the zero indexed item in the training data set. 00:25:47.320 |
We can show batch to see an example of the pictures of it, and 00:25:51.600 |
Here are the classes that are in that data set. 00:25:55.600 |
And this little cut down sample of MNIST just has threes and sevens. 00:26:05.040 |
This is actually, again, a little subset of Planet we use to make it easy to 00:26:10.640 |
So in this case, again, it's an image file list. 00:26:16.040 |
This time we're labeling it based on a CSV file. 00:26:25.760 |
We're gonna use a smaller size and then create a data bunch. 00:26:32.600 |
And so data bunches know how to draw themselves, amongst other things. 00:26:38.680 |
So here's some more examples we're gonna be seeing later today. 00:26:42.800 |
What if we look at this data set called Canvid? 00:26:48.720 |
It contains pictures, and every pixel in the picture is color coded, right? 00:26:54.600 |
So in this case, we have a list of files in a folder, and 00:26:59.400 |
we're gonna label them, in this case, using a function. 00:27:03.480 |
And so this function is basically the thing, we're gonna see it later, 00:27:06.800 |
which tells it whereabouts of the color coding for each pixel. 00:27:10.760 |
Randomly split it in some way, create some data sets in some way. 00:27:17.960 |
We can tell it for our particular list of classes. 00:27:21.960 |
How do we know what pixel value one versus pixel value two is? 00:27:26.160 |
And that was something that we can basically read in, like so. 00:27:34.440 |
You can optionally pass in things like what batch size do you want. 00:27:38.120 |
And again, it knows how to draw itself, and you can start learning with that. 00:27:41.960 |
For one more example, what if we wanted to create something like this? 00:27:46.480 |
It has like bars, and chair, and remote control, and book. 00:27:53.680 |
So again, we've got a little minimal CoCo data set. 00:27:57.120 |
CoCo is kind of the most famous academic data set for object detection. 00:28:03.720 |
Grab a list of files from a folder, label them according to this little function. 00:28:09.120 |
Randomly split them, create an object detection data set, create a data bunch. 00:28:14.400 |
In this case, as you'll learn when we get to object detection, 00:28:16.920 |
you have to use generally smaller batch sizes, or you'll run out of memory. 00:28:20.520 |
And as you'll also learn, you have to use something called a collation function. 00:28:24.760 |
And once that's all done, we can again show it, and 00:28:31.880 |
So here's a really convenient notebook, where will you find this? 00:28:38.920 |
Remember how I told you that all of the documentation comes from notebooks? 00:28:42.640 |
You'll find them in your fast AI repo in docs_source. 00:28:47.160 |
So this, which you can play with and experiment with inputs and outputs, and 00:28:51.400 |
try all the different parameters, you will find the data block API examples of use. 00:28:56.360 |
If you go to the documentation, here it is, the data block API examples of use. 00:29:01.440 |
All right, so remember, everything that you wanna use in fast AI, 00:29:22.840 |
And so once you find some documentation that you actually wanna try playing with 00:29:29.920 |
And then you can open up a notebook with the same name in the fast AI repo, 00:29:36.360 |
So that's a quick overview of this really nice data block API. 00:29:42.520 |
And there's lots of documentation for all of the different ways you can 00:29:46.760 |
label inputs, and split data, and create data sets, and so forth. 00:29:50.000 |
And so that's what we're using for Planet, okay? 00:29:57.560 |
You'll see in the documentation these two steps we had all joined up together. 00:30:04.680 |
We can certainly do that here too, but you'll learn in a moment why it is that 00:30:10.240 |
we're actually splitting these up into two separate steps, which is also fine as well. 00:30:14.360 |
So a few interesting points about this, transforms. 00:30:26.640 |
Remember, you can hit Shift + Tab to get all the information, right? 00:30:29.720 |
Transforms by default will flip randomly each image, right? 00:30:37.000 |
But they'll actually randomly only flip them horizontally, which makes sense, right? 00:30:42.120 |
If you're trying to tell if something's a cat or a dog, 00:30:44.200 |
doesn't matter whether it's pointing left or right, but 00:30:48.400 |
On the other hand, satellite imagery, whether something's cloudy or hazy, or 00:30:52.480 |
whether there's a road there or not, could absolutely be flipped upside down. 00:30:55.920 |
There's no such thing as a right way up in space. 00:30:58.440 |
So flipvert, which defaults to false, we're going to flip over to true. 00:31:03.320 |
To say like, yeah, randomly, you should actually do that. 00:31:07.480 |
It actually tries also each possible 90 degree rotation. 00:31:10.680 |
So there are eight possible kind of symmetries that it tries out. 00:31:17.880 |
I've found that these particular settings work pretty well for Planet. 00:31:27.560 |
Perspective warping is something which very few libraries provide, and 00:31:31.280 |
those that do provide it, it tends to be really slow. 00:31:33.560 |
I think fast AI is the first one to provide really fast perspective warping. 00:31:38.600 |
And basically the reason this is interesting is if I kind of look at you 00:31:41.880 |
from below versus look at you from above, your shape changes, right? 00:31:49.280 |
And so when you're taking a photo of a cat or a dog, sometimes you'll be higher, 00:31:54.080 |
sometimes you'll be lower, then that kind of change of shape is certainly something 00:31:58.920 |
that you would want to include as you're creating your training batches. 00:32:03.680 |
You want to modify it a little bit each time. 00:32:09.360 |
A satellite always points straight down at the planet. 00:32:16.160 |
you would be making changes that aren't going to be there in real life. 00:32:21.120 |
So this is all something called data augmentation. 00:32:23.720 |
We'll be talking a lot more about it later in the course. 00:32:28.800 |
the kinds of things that you can do to augment your data. 00:32:33.040 |
And in general, maybe the most important one is if you're looking at astronomical 00:32:37.440 |
data or kind of pathology, digital slide data or satellite data. 00:32:43.920 |
Data where there isn't really an up or a down, turning on flipvert equals true 00:32:48.840 |
is generally going to make your models generalize better. 00:32:52.800 |
Okay, so here's the steps necessary to create our data bunch. 00:33:03.840 |
classifier, multi-label classifier, that's going to figure out for 00:33:08.280 |
each satellite tile what's the weather and what else, what can I see in it. 00:33:14.960 |
Everything else that you've already learnt is going to be exactly nearly the same. 00:33:20.720 |
Here it is, learn equals createCNN, data, architecture, right? 00:33:27.440 |
And in this case, when I first built this notebook, 00:33:33.480 |
And I found this was a case, I tried ResNet 50 as I always like to do. 00:33:36.720 |
I found ResNet 50 helped a little bit, and I had some time to run it. 00:33:43.680 |
There's one more change I make, which is metrics. 00:33:49.080 |
Now to remind you, a metric has got nothing to do with how the model trains. 00:33:55.000 |
Changing your metrics will not change your resulting model at all. 00:33:59.520 |
The only thing that we use metrics for is we print them out during training. 00:34:05.920 |
it's printing out this other metric called F beta. 00:34:08.240 |
So if you're trying to figure out how to do a better job with your model, 00:34:13.160 |
changing the metrics will never be something that you need to do there. 00:34:25.400 |
a list of multiple metrics to be printed out as your model's trading. 00:34:32.640 |
The first thing I wanna know is the accuracy. 00:34:35.880 |
And the second thing I wanna know is how would I go on Kaggle? 00:34:40.240 |
And Kaggle told me that I'm gonna be judged on a particular metric called the F score. 00:34:46.600 |
So I'm not gonna bother telling you about the F score. 00:34:48.760 |
It's not really interesting enough to be worth spending your time on. 00:34:54.840 |
When you have a classifier, you're gonna have some false positives. 00:35:01.280 |
How do you weigh up those two things to kind of create a single number? 00:35:05.360 |
There's lots of different ways of doing that. 00:35:07.040 |
And something called the F score is basically a nice way of 00:35:14.040 |
And there are various kinds of F scores, F1, F2, and so forth. 00:35:18.880 |
And Kaggle said, in the competition rules, we're gonna use a metric called F2. 00:35:31.920 |
which in other words it's F with 1 or 2 or whatever depending on the value of beta. 00:35:39.840 |
And you can see that it's got a threshold and a beta. 00:35:47.600 |
And Kaggle said that they're gonna use F2, so I don't have to change that. 00:35:51.760 |
But there's one other thing that I need to set, which is a threshold. 00:36:01.360 |
Do you remember we had a little look the other day at the source code for 00:36:08.440 |
So if you put two question marks, you get the source code. 00:36:11.520 |
And we found that it used this thing called argmax. 00:36:14.360 |
And the reason for that, if you remember, was we kind of 00:36:21.320 |
had this input image that came in, and it went through our model. 00:36:25.880 |
And at the end, it came out with a table of ten numbers, right? 00:36:32.920 |
This is like if we're doing MNIST digit recognition. 00:36:35.040 |
The ten numbers were like the probability of each of the possible digits. 00:36:42.200 |
And so then we had to look through all of those and 00:36:51.560 |
just math notation that finds the biggest and 00:37:02.880 |
we used this accuracy function called argmax to find out behind the scenes 00:37:08.080 |
which class ID pet was the one that we're looking at. 00:37:13.000 |
And then it compared that to the actual and then took the average. 00:37:23.400 |
We can't do that for satellite recognition in this case, 00:37:28.200 |
because there isn't one label we're looking for. 00:37:32.960 |
So instead, what we do is we look at, so in this case. 00:37:37.280 |
So I don't know if you remember, but a data bunch has a special attribute called c. 00:37:49.960 |
And c is gonna be basically how many outputs do we want our model to create? 00:37:55.440 |
And so for any kind of classifier, we want one probability for 00:38:00.800 |
So in other words, data.c for classifiers is always gonna be equal to 00:38:14.520 |
So we're gonna have one probability for each of those. 00:38:18.640 |
But then we're not just gonna pick out one of those 17. 00:38:24.240 |
And so what we do is we compare each probability to some threshold. 00:38:29.000 |
And then we say anything that's higher than that threshold, 00:38:31.800 |
we're gonna assume that the model's saying it does have that feature. 00:38:38.000 |
I found that for this particular data set, a threshold of 0.2 00:38:46.760 |
This is the kind of thing you can easily just experiment to find a good threshold. 00:38:50.320 |
So I decided I wanted to print out the accuracy at a threshold of 0.2. 00:38:55.880 |
So the normal accuracy function doesn't work that way. 00:39:01.040 |
We have to use a different accuracy function called accuracy_thresh. 00:39:05.560 |
And that's the one that's gonna compare every probability to a threshold and 00:39:09.040 |
return all the things higher than that threshold and compare accuracy that way. 00:39:13.200 |
And so one of the things we would pass in is Thresh. 00:39:16.960 |
Now of course, our metric is gonna be calling our function for us. 00:39:23.320 |
So we don't get to tell it every time it calls back what threshold do we want. 00:39:28.520 |
So we really wanna create a special version of this function 00:39:32.360 |
that always uses an accuracy of a threshold of 0.2. 00:39:36.520 |
So one way to do that would be to go define something called accuracy_02 00:39:45.760 |
returns accuracy threshold with that input and 00:39:58.760 |
But it's so common that you wanna kind of say, create a new function 00:40:06.200 |
we're always gonna call it with a particular parameter. 00:40:10.680 |
It's called a partial, it's called a partial function application. 00:40:13.480 |
And so Python 3 has something called partial that takes some function and 00:40:20.720 |
some list of keywords and values and creates a new function. 00:40:25.920 |
That is exactly the same as this function, but 00:40:28.280 |
is always gonna call it with that keyword argument. 00:40:31.680 |
So here, this is exactly the same thing as the thing I just typed in. 00:40:35.560 |
O2 is now a new function that calls accuracy_thresh with a threshold of 0.2. 00:40:40.760 |
And so this is a really common thing to do, particularly with the fastAI library, 00:40:45.120 |
cuz there's lots of places where you have to pass in functions. 00:40:49.600 |
And you very often wanna pass in a slightly customized version of a function. 00:41:03.280 |
And I can then go ahead and do all the normal stuff. 00:41:07.320 |
Lrfind, recorder.plot, find the thing with the steepest slope. 00:41:13.520 |
So I don't know, somewhere around 1a neg 2, so we'll make that our learning rate. 00:41:18.880 |
And then fit for a while with 5, Lr, and see how we go, okay? 00:41:25.000 |
And so we've got an accuracy of about 96% and an fbeta of about 0.926. 00:41:31.680 |
And so you could then go and have a look at planet, 00:41:45.640 |
So we kinda say, we're on the right track, okay, with something we're doing fine. 00:41:51.520 |
So as you can see, once you get to a point that the data's there, 00:41:56.560 |
there's very little extra to do most of the time. 00:42:00.000 |
>> So when your model makes an incorrect prediction in a deployed app, 00:42:08.760 |
use that learning to improve the model in a more targeted way? 00:42:16.080 |
So the first bit, is there a way to record that? 00:42:18.240 |
Of course there is, you record it, that's up to you, right? 00:42:22.280 |
You'll need to have your user tell you, you were wrong. 00:42:28.440 |
This Australian car, you said it was a Holden, and actually it's a Falcon. 00:42:32.920 |
So first of all, you'll need to collect that feedback. 00:42:35.320 |
And the only way to do that is to ask the user to tell you when it's wrong. 00:42:39.280 |
So you now need to record in some log somewhere, 00:42:41.520 |
something saying, this was the file, I've stored it here. 00:42:51.080 |
And then at the end of the day or at the end of the week, 00:42:54.360 |
you could set up a little job to run something or you can manually run something. 00:43:07.680 |
So let's pretend here's your safe model, right? 00:43:18.160 |
Now in this case, I'm fitting with my original data set. 00:43:21.560 |
But you could create a new data bunch with just the misclassified 00:43:29.600 |
And the misclassified ones are likely to be particularly interesting. 00:43:34.320 |
So you might want to fit at a slightly higher learning rate, 00:43:36.880 |
in order to make them kind of really mean more. 00:43:39.360 |
Or you might want to run them through a few more epochs. 00:43:43.640 |
You just co-fit with your misclassified examples and 00:43:50.640 |
And that should really help your model quite a lot. 00:43:53.920 |
There are various other tweaks you can do to this, but that's the basic idea. 00:43:59.040 |
>> Next question, could someone talk a bit more about the data block ideology? 00:44:04.840 |
I'm not quite sure how the blocks are meant to be used. 00:44:09.720 |
Is there any other library that uses this type of programming that I could look at? 00:44:13.640 |
>> Yes, they do have to be in a certain order. 00:44:25.600 |
And it's basically the order that you see in the example of use, right? 00:44:43.800 |
And then how do I create a data bunch from it? 00:44:55.200 |
I don't know if other people have independently invented it. 00:44:58.840 |
The basic idea of kind of a pipeline of things that dot into each other is 00:45:11.240 |
Not so much in Python, but you see it more in JavaScript. 00:45:15.160 |
Although this kind of approach of each stage produces something slightly different. 00:45:21.080 |
You tend to see it more in ETL software, like Extraction Transformation and 00:45:26.960 |
Loading Software, where there's kind of particular stages in a pipeline. 00:45:30.000 |
So yeah, I mean, it's been inspired by a bunch of things. 00:45:32.600 |
But yeah, all you need to know is kind of use this example to guide you, and 00:45:41.240 |
then look up the documentation to see which particular kind of thing you want. 00:45:46.200 |
And in this case, the image file list, you're actually not going to find 00:45:51.760 |
the documentation or image file list in data blocks documentation, 00:45:54.840 |
because this is specific to the vision application. 00:45:58.040 |
So to then go and actually find out how to do something for 00:46:01.160 |
your particular application, you would then go to look at text and vision and 00:46:06.080 |
so forth, and that's where you can find out what are the data block API pieces 00:46:12.880 |
And of course, you can then look at the source code. 00:46:17.040 |
you could create your own part of any of these stages. 00:46:21.640 |
Pretty much all of these functions are, you know, very few lines of code. 00:46:27.680 |
Maybe we could look at an example of one, image list from folder. 00:46:36.560 |
So let's just put that somewhere temporary, and 00:46:44.280 |
Then you can look at the documentation to see exactly what that does, and 00:46:55.960 |
If you wanted to create a data frame, a pandas data frame from something 00:47:00.320 |
other than the CSV, you now know that you could actually just call 00:47:03.600 |
label from data frame, and you can look up to find what that does. 00:47:07.520 |
And as you can see, most fast AI functions are no more than a few lines of code. 00:47:14.160 |
They're normally pretty straightforward to see what are all the pieces there and 00:47:20.920 |
And it's probably one of these things that as you play around with it, 00:47:25.320 |
you'll get a good sense of how it all gets put together. 00:47:28.560 |
But if during the week there are particular things where you're thinking, 00:47:31.800 |
I don't understand how to do this, please let us know and we'll try to help you. 00:47:37.260 |
>> What resources do you recommend for getting started with video, for 00:47:43.200 |
example, being able to pull frames and submit them to your model? 00:47:46.240 |
>> I guess, I mean, the answer is it depends. 00:47:57.240 |
If you're using the web, which I guess probably most of you will be, 00:48:03.800 |
then there's web APIs that basically do that for you. 00:48:08.400 |
So you can grab the frames with the web API and 00:48:12.840 |
then they're just images which you can pass along. 00:48:15.920 |
If you're doing it client side, I guess most people tend to use OpenCV for that. 00:48:21.600 |
But maybe people during the week who are doing these video apps can tell us what 00:48:26.840 |
have you used and found useful, and we can start to prepare something in the lesson 00:48:30.540 |
wiki with a list of video resources, since it sounds like some people are interested. 00:48:35.160 |
Okay, so just like usual, we unfreeze our model and 00:48:44.360 |
then we fit some more and we get down to 9 to 9-ish. 00:48:51.480 |
So one thing to notice here is that before we unfreeze, 00:48:58.740 |
you'll tend to get this shape pretty much all the time. 00:49:01.040 |
If you do your learning rate finder before you unfreeze, it's pretty easy. 00:49:04.040 |
You know, find the steepest slope, not the bottom, right? 00:49:07.320 |
Remember we're trying to find the bit where we can like slide down it quickly. 00:49:10.800 |
So if you start at the bottom, it's just gonna send you straight off to the end 00:49:14.000 |
here, so somewhere around here, and then we can call it again after you unfreeze. 00:49:23.120 |
And you'll generally get a very different shape, right? 00:49:25.840 |
And this is a little bit harder to say what to look for, because it tends to be 00:49:30.120 |
this kind of shape where you get a little bit of upward and 00:49:32.200 |
then a kind of very gradual downward and then up here. 00:49:35.120 |
So, you know, I tend to kind of look for just before it shoots up and 00:49:40.480 |
go back about 10x, right, as a kind of a rule of thumb, so 1a neg 5, right? 00:49:45.960 |
And that is what I do for the first half of my slice. 00:49:50.440 |
And then for the second half of my slice, I normally do whatever learning rate I 00:49:55.320 |
used for the frozen part, so Lr, which was 0.01, 00:50:02.880 |
kind of divided by 5, or divided by 10, somewhere around that. 00:50:08.960 |
Look for the bit kind of at the bottom, find about 10x smaller. 00:50:12.840 |
That's the number that I put here, and then Lr over 5 or Lr over 10 is kind of 00:50:21.880 |
We'll be talking more about exactly what's going on here. 00:50:24.120 |
This is called discriminative learning rates as the course continues. 00:50:33.720 |
Because there are, how many people in this competition? 00:51:04.200 |
I don't know if you remember, but when I created my data set, 00:51:08.960 |
I put size equals 128, and actually the images that Kaggle gave us are 256. 00:51:16.360 |
So I used the size of 128 partially cuz I wanted to experiment quickly. 00:51:21.800 |
It's much quicker and easier to use small images to experiment. 00:51:27.720 |
I now have a model that's pretty good at recognizing 00:51:35.920 |
So what am I gonna do if I now wanna create a model that's pretty good at 256 00:51:46.640 |
Why don't I start with a model that's good at 128 by 128 images and 00:51:55.160 |
And that's actually gonna be really interesting because if I'm trained 00:52:00.240 |
quite a lot, if I'm on the verge of overfitting, which I don't wanna do, right? 00:52:05.000 |
Then I'm basically creating a whole new data set effectively, 00:52:08.800 |
one where my images are twice the size on each axis, right? 00:52:14.520 |
So it's really a totally different data set as far as my convolutional neural 00:52:18.960 |
So I kind of gonna lose all that overfitting, I get to start again. 00:52:27.740 |
Well, let's keep our same learner, but use a new data bunch, 00:52:34.840 |
So that's why I actually stopped here, right, before I created my data sets. 00:52:45.440 |
I'm gonna create a new data bunch with 256 instead. 00:52:52.760 |
So here it is, take that source, right, take that source, 00:52:58.600 |
transform it with the same transforms as before, but this time use size 256. 00:53:04.480 |
Now that should be better anyway, because this is gonna be higher resolution images. 00:53:09.040 |
But also I'm gonna start with, I haven't got rid of my learner, 00:53:12.240 |
it's the same learner I had before, so I'm gonna start with this kind of pre-trained 00:53:15.680 |
model, and so I'm gonna replace the data inside my learner with this new data bunch. 00:53:25.200 |
that means I'm going back to just training the last few layers. 00:53:29.840 |
And I will do a new LR find, and because I actually now have a pretty good model, 00:53:35.560 |
like it's pretty good for 128 by 128, so it's probably gonna be like at least okay 00:53:41.800 |
for 256 by 256, I don't get that same sharp shape that I did before. 00:53:46.960 |
But I can certainly see where it's way too high, right? 00:53:51.240 |
So, I'm gonna pick something well before where it's way too high. 00:53:57.560 |
So here I'm gonna go 1e neg 2 over 2, that seems well before it shoots up. 00:54:06.900 |
So we've frozen again, so we're just training the last few layers and 00:54:11.160 |
And as you can see, I very quickly, remember kind of 928 was where we got to 00:54:17.680 |
We're straight up there, and suddenly we've passed 0.93, all right? 00:54:22.280 |
So we're now already kind of into the top 10%, so we've hit our first goal, right? 00:54:31.160 |
We're doing, we're at the very least pretty confident 00:54:34.960 |
at the problem of just recognizing satellite imagery. 00:54:37.440 |
But of course now, we can do the same thing before. 00:54:40.280 |
We can unfreeze and train a little more, okay? 00:54:44.840 |
Again, using the same kind of approach I described before, 00:54:47.880 |
we allow it over 5 here, and even smaller one here, train a little bit more, 0.9314. 00:55:10.840 |
So you can see actually when my friend Brendan and 00:55:13.360 |
I entered this competition, we came 22nd with 0.9315. 00:55:17.200 |
And we spent, this was a year or two ago, months trying to get here. 00:55:22.600 |
So using kind of pretty much defaults with the minor tweaks and 00:55:30.960 |
you can kind of get right up into the top of the leaderboard of this very 00:55:36.480 |
Now, I should say we don't really know where we'd be. 00:55:41.520 |
We'd actually have to check it on the test set that Kaggle gave us and 00:55:44.600 |
actually submit to the competition, which you can do. 00:55:48.360 |
And so later on in the course, we'll learn how to do that. 00:56:00.400 |
And so you can see also as I kind of go along, I tend to save things. 00:56:06.000 |
I just, you can name your models whatever you like. 00:56:08.840 |
But I just want to basically know, was it kind of before or after the unfreeze? 00:56:12.960 |
So kind of had stage one or two, what size was I training on? 00:56:18.520 |
So that way, I can kind of always go back and experiment pretty easily. 00:56:23.280 |
So that's Planet, multi-label classification. 00:56:33.480 |
So the other example next we're going to look at is this data set called Canva. 00:56:40.040 |
And it's going to be doing something called Segmentation. 00:56:42.280 |
We're going to start with a picture like this. 00:56:45.200 |
And we're going to try and create a color coded picture like this. 00:56:49.000 |
Where all of the bicycle pixels are the same color. 00:56:52.280 |
All of the road line pixels are the same color. 00:56:57.200 |
All of the building pixels are the same color. 00:56:58.840 |
The sky is the same color, and so forth, okay? 00:57:01.600 |
Now, we're not actually going to make them colors. 00:57:04.600 |
We're actually going to do it where each of those pixels has a unique number. 00:57:14.960 |
The top right is trees, so tree is 26, and so forth, all right? 00:57:19.720 |
So in other words, this single top left pixel, we're basically, like I mentioned this, 00:57:27.440 |
we're going to do a classification problem, just like the pet's classification, for the 00:57:32.640 |
We're going to say, what is that top left pixel? 00:57:35.720 |
Is it bicycle, road lines, sidewalk, building? 00:57:46.080 |
So we're going to do a little classification problem for every single pixel in every single 00:57:59.380 |
In order to build a segmentation model, you actually need to download or create a dataset 00:58:08.440 |
where someone has actually labeled every pixel. 00:58:13.400 |
So as you can imagine, that's a lot of work, okay? 00:58:18.920 |
You're probably not going to create your own segmentation datasets, but you're probably 00:58:23.480 |
going to download or find them from somewhere else. 00:58:25.780 |
This is very common in medicine, life sciences. 00:58:30.040 |
You know, if you're looking through slides at nuclei, it's very likely you already have 00:58:35.560 |
a whole bunch of segmented cells and segmented nuclei. 00:58:40.800 |
If you're in radiology, you probably already have lots of examples of segmented lesions 00:58:46.520 |
There's a lot of, you know, kind of different domain areas where there are domain-specific 00:58:57.680 |
As you could guess from this example, it's also very common in kind of self-driving cars 00:59:03.520 |
and stuff like that where you need to see, you know, what objects are around and where 00:59:09.960 |
In this case, there's a nice dataset called CanvaD, which we can download, and they have 00:59:17.920 |
already got a whole bunch of images and segment masks prepared for us, which is pretty cool. 00:59:26.240 |
And remember, pretty much all of the datasets that we have provided kind of inbuilt URLs 00:59:33.480 |
for, you can see their details at course.fastedai/datasets, and nearly all of them are academic datasets 00:59:45.440 |
where some very kind people have gone to all of this trouble for us so that we can use 00:59:50.600 |
this dataset and made it available for us to use. 00:59:54.260 |
So if you do use it, one of these datasets for any kind of project, it would be very, 00:59:58.960 |
very nice if you were to go and find the citation and say, you know, thanks to these people 01:00:05.440 |
for this dataset, okay, because they've provided it, and all they're asking in return is for 01:00:12.880 |
Okay, so here is the CanvaD dataset, here is the citation, and on our datasets page 01:00:17.760 |
that will link to the academic paper where it came from. 01:00:21.240 |
Okay, Rachel, now is a good time for a question. 01:00:26.440 |
>> Is there a way to use learn.lr/find and have it return a suggested number directly 01:00:34.720 |
rather than having to plot it as a graph and then pick a learning rate by visually inspecting 01:00:40.240 |
And then there are a few other questions, I think, around more guidance on reading the 01:00:51.240 |
And the reason the answer is no is because this is still a bit more artisanal than I 01:00:58.440 |
As you can kind of see, I've been kind of saying how I read this learning rate graph 01:01:01.960 |
depends a bit on what stage I'm at and kind of what the shape of it is. 01:01:08.720 |
I guess, like, when you're just training the head, so before you unfreeze, it pretty much 01:01:17.760 |
And you could certainly create something that kind of creates a slightly, you know, creates 01:01:21.040 |
a smooth version of this, finds the sharpest negative slope and picked that. 01:01:26.560 |
You would probably be fine nearly all the time. 01:01:31.160 |
But then for, you know, these kinds of ones, you know, it requires a certain amount of 01:01:38.200 |
But the good news is you can experiment, right? 01:01:43.240 |
Obviously, if the line's going up, you don't want it. 01:01:47.320 |
And certainly, at the very bottom point, you don't want it, right, because you need it 01:01:53.160 |
But if you kind of start with somewhere around 10x smaller than that, and then also you could 01:01:57.920 |
try another 10x smaller than that, try a few numbers and find out which ones work best. 01:02:03.600 |
And within a small number of weeks, you will find that you're picking the best learning 01:02:13.520 |
It's kind of -- so at this stage, it still requires a bit of playing around to get a 01:02:17.280 |
sense of the different kinds of shapes that you see and how to respond to them. 01:02:22.640 |
Maybe by the time this video comes out, someone will have a pretty reliable auto learning 01:02:30.680 |
It's probably not a massively difficult job to do, be an interesting project, collect 01:02:37.760 |
a whole bunch of different datasets, maybe grab all the datasets from our datasets page, 01:02:42.980 |
try and come up with some simple heuristic, compare it to all the different lessons I've 01:03:13.160 |
And so basically we're going to start with some path which has got some information in 01:03:19.560 |
So I always start by, you know, untiring my data, doing LS, see what I was given. 01:03:25.160 |
In this case, there's a folder called labels and a folder called images. 01:03:36.480 |
And you know, at this point, like, you can see there's some kind of coded file names 01:03:41.400 |
for the images and some kind of coded file names for the segment masks. 01:03:46.960 |
And then you kind of have to figure out how to map from one to the other. 01:03:50.280 |
You know, normally these kind of datasets will come with a readme you can look at or 01:03:56.680 |
In this case, I can see, like, these ones always have this kind of particular format. 01:04:02.160 |
These ones always have exactly the same format with an underscore P. 01:04:05.660 |
So I kind of -- when I did this, honestly, I just guessed. 01:04:08.440 |
I thought, oh, it's probably the same thing, underscore P. 01:04:11.800 |
And so I created a little function that basically took the file name and added the underscore 01:04:20.680 |
And I tried opening it and I noticed it worked. 01:04:23.400 |
So I've created this little function that converts from the image file names to the 01:04:35.920 |
Normally we use open image to open a file and then you can go .show to take a look at 01:04:43.640 |
But this -- as we described, this is not a usual image file. 01:04:51.400 |
So you have to use open mask rather than open image because we want to return integers, 01:04:57.960 |
And fast AI knows how to deal with masks, so if you go mask.show, it will automatically 01:05:04.400 |
color code it for you in some appropriate way. 01:05:09.080 |
So we can kind of have a look inside, look at the data, see what the size is. 01:05:16.440 |
We can take a look at the data inside and so forth. 01:05:22.200 |
The other thing you might have noticed is that they gave us a file called codes.text 01:05:28.840 |
So codes.text, we can load it up and have a look inside. 01:05:32.600 |
And not surprisingly, it's got a list telling us that, for example, number 4 is 0, 1, 2, 01:05:43.760 |
So just like we had, you know, grizzlies, black bears and teddies, here we've got the 01:05:48.040 |
coding for what each one of these pixels means. 01:05:56.200 |
So to create a data bunch, we can go through the data block API and say okay, we've got 01:06:03.960 |
We need to create labels, which we can use with that get y file name function we just 01:06:10.560 |
We then need to split into training and validation. 01:06:17.040 |
Because actually the pictures they've given us are frames from videos. 01:06:21.320 |
So if I did them randomly, I would be having like two frames next to each other, one in 01:06:30.360 |
So the people that created this data set actually gave us a data set saying here is the list 01:06:35.720 |
of file names that are meant to be in your validation set. 01:06:38.920 |
And they're non-contiguous parts of the video. 01:06:42.260 |
So here's how you can split your validation and training using a file name file. 01:06:53.680 |
And so I actually have a list of class names. 01:06:58.000 |
So like often with stuff like the planet data set or the pets data set, we actually have 01:07:02.880 |
a string saying this is a pug or this is a ragdoll or this is a burman or this is cloudy 01:07:11.120 |
In this case, you don't have every single pixel labeled with an entire string. 01:07:17.680 |
They're each labeled with just a number and then there's a separate file telling you what 01:07:22.800 |
So here's where we get to tell it and the data block API, this is the list of what the 01:07:29.000 |
So these are the kind of parameters that the data block API gives you. 01:07:37.480 |
Remember I told you that, for example, sometimes we randomly flip an image, right? 01:07:43.640 |
What if we randomly flip the independent variable image but we don't also randomly flip this 01:07:55.120 |
So we need to tell fast.ai that I want to transform the Y. 01:08:00.600 |
So X is our independent variable, Y is our independent. 01:08:05.300 |
So whatever you do to the X, I also want you to do to the Y. 01:08:08.620 |
So there's all these little parameters that we can play with and I can create a data bunch. 01:08:14.640 |
I'm using a smaller batch size because as you can imagine, because I'm creating a classifier 01:08:19.080 |
for every pixel, that's going to take a lot more GPU. 01:08:22.200 |
So I found a batch size of 8 is all I could handle and then normalize in the usual way. 01:08:30.440 |
Fast.ai, because it knows that you've given it a segmentation problem, when you call show 01:08:36.140 |
batch, it actually combines the two pieces for you and it will color code the photo. 01:08:42.600 |
So you can see here the green on the trees and the red on the lines and this kind of 01:08:52.080 |
So you can see here, here are the pedestrians, this is the pedestrian's backpack. 01:08:56.560 |
So this is what the ground truth data looks like. 01:09:00.200 |
So once we've got that, we can go ahead and create a learner, I'll show you some more details 01:09:11.320 |
in a moment, call lrfind, find the sharpest bit which looks about 1a neg 2, call fit, 01:09:19.640 |
passing in slice lr and see the accuracy and save the model and unfreeze and train a little 01:09:32.480 |
And so we're going to have a break and when we come back, I'm going to show you some little 01:09:37.720 |
tweaks that we can do and I'm also going to explain this custom metric that we've created 01:09:43.340 |
and then we'll be able to go on and look at some other cool things. 01:09:46.720 |
So let's all come back at 8 o'clock, 6 minutes. 01:09:53.200 |
Okay, welcome back everybody and we're going to start off with a question we got during 01:10:00.200 |
>> Could you use unsupervised learning here, pixel classification with the bike example 01:10:08.760 |
to avoid needing a human to label a heap of images? 01:10:13.120 |
>> Well, not exactly unsupervised learning, but you can certainly get a sense of where 01:10:19.560 |
things are without needing these kind of labels. 01:10:25.080 |
And time permitting, we'll try and see some examples of how to do that. 01:10:30.000 |
But you're certainly not going to get such a quality and such a specific output as what 01:10:38.480 |
If you want to get this level of segmentation mask, you need a pretty good segmentation mask 01:10:52.640 |
>> Is there a reason we shouldn't deliberately make a lot of smaller data sets to step up 01:10:57.700 |
from in tuning, let's say 64 by 64, 128 by 128, 256 by 256, and so on? 01:11:12.120 |
I found this idea is something that I first came up with in the course a couple of years 01:11:20.680 |
ago and I kind of thought it seemed obvious and just presented it as a good idea and then 01:11:26.200 |
I later discovered that nobody had really published this before and then we started 01:11:29.480 |
experimenting with it and it was basically the main trick that we used to win the ImageNet 01:11:36.320 |
competition, the Dawnbench ImageNet training competition, and we were like, wow, people, 01:11:42.960 |
this wasn't -- not only was this not standard, nobody had heard of it before. 01:11:48.320 |
There's been now a few papers that use this trick for various specific purposes, but it's 01:11:53.960 |
still largely unknown and it means that you can train much faster, it generalizes better. 01:11:59.760 |
There's still a lot of unknowns about exactly like how small and how big and how much at 01:12:06.640 |
each level and so forth, but I guess in as much as it has a name now, it probably does 01:12:14.680 |
and I guess we'd call it progressive resizing. 01:12:17.800 |
I found that going much under 64 by 64 tends not to help very much, but yeah, it's a great 01:12:28.240 |
technique and I definitely try a few different sizes. 01:12:32.320 |
>> What does accuracy mean for pixel-wise segmentation? 01:12:40.200 |
Is it correctly classified pixels divided by the total number of pixels? 01:12:47.240 |
So if you imagine each pixel was a separate object you're classifying, it's exactly the 01:12:55.560 |
And so you actually can just pass in accuracy as your metric, but in this case, we actually 01:13:06.640 |
We've created a new metric called Accuracy Canvid, and the reason for that is that when 01:13:13.680 |
they labeled the images, sometimes they labeled a pixel as a void. 01:13:19.320 |
I'm not quite sure why, maybe some that they didn't know or somebody felt that they'd made 01:13:26.120 |
a mistake or whatever, but some of the pixels are void, and in the Canvid paper, they say 01:13:31.600 |
when you're reporting accuracy, you should remove the void pixels. 01:13:38.380 |
So we've created a Accuracy Canvid, so all metrics take the actual output of the neural 01:13:46.340 |
net, that's the input to the, this is what they call the input, the input to the metric, 01:13:51.000 |
and the target, i.e. the labels we're trying to predict. 01:13:54.720 |
So we then basically create a mask, so we look for the places where the target is not 01:14:04.040 |
And then we just take the input, do the argmax as per usual, just the standard accuracy argmax, 01:14:11.680 |
but then we just grab those that are not equal to the void code, and we do the same for the 01:14:18.840 |
So it's just a standard accuracy, it's almost exactly the same as the accuracy source code 01:14:23.400 |
we saw before with the addition of this mask. 01:14:27.340 |
So this quite often happens, that the particular Kaggle competition metric you're using, or 01:14:36.440 |
the particular way your organization scores things or whatever, there's often little tweaks 01:14:46.320 |
And so as you'll see, to do this stuff, the main thing you need to know pretty well is 01:14:51.320 |
how to do basic mathematical operations in PyTorch. 01:14:57.800 |
So that's just something you kind of need to practice. 01:15:01.200 |
>> I've noticed that most of the examples in most of my models result in a training loss 01:15:12.560 |
I should add that this still happens after trying many variations on number of epochs 01:15:21.040 |
So remember from last week, if your training loss is higher than your validation loss, 01:15:27.480 |
It definitely means that you're underfitting, you want your training loss to be lower than 01:15:35.860 |
If you're underfitting, you can train for longer, you can train the last bit at a lower 01:15:44.400 |
learning rate, but if you're still underfitting, then you're gonna have to decrease regularization, 01:15:55.360 |
So in the second half of this part of the course, we're gonna be talking quite a lot 01:16:00.240 |
about regularization and specifically how to avoid overfitting or underfitting by using 01:16:09.400 |
If you wanna skip ahead, we're gonna be learning about weight decay, dropout, and data augmentation 01:16:15.960 |
will be the key things that we're talking about. 01:16:22.560 |
Okay, for segmentation, we don't just create a convolutional neural network. 01:16:32.000 |
We can, but actually, an architecture called UNET turns out to be better, and actually, 01:16:46.360 |
Okay, so this is what a UNET looks like, and this is from the university website where 01:16:57.200 |
they talk about the UNET, and so we'll be learning about this both in this part of the 01:17:04.320 |
But basically, this bit down on the left-hand side is what a normal convolutional neural 01:17:12.080 |
It's something which starts with a big image and gradually makes it smaller and smaller 01:17:15.520 |
and smaller and smaller until eventually you just have one prediction. 01:17:19.200 |
What a UNET does is it then takes that and makes it bigger and bigger and bigger again, 01:17:24.520 |
and then it takes every stage of the downward path and kind of copies it across, and it 01:17:31.160 |
It was originally actually created or published as a biomedical image segmentation method, 01:17:38.600 |
but it turns out to be useful for far more than just biomedical image segmentation. 01:17:43.320 |
So it was presented at MICHI, which is the main medical imaging conference, and as of 01:17:50.640 |
just yesterday, it actually just became the most cited paper of all time from that conference. 01:17:57.640 |
So it's been incredibly useful, over 3,000 citations. 01:18:01.360 |
You don't really need to know any of the details at this stage. 01:18:04.160 |
All you need to know is if you want to create a segmentation model, you want to be saying 01:18:15.920 |
But you pass it the normal stuff, your data bunch, an architecture, and some metrics. 01:18:24.400 |
So having done that, everything else works the same. 01:18:27.320 |
You can do the LR finder, find the slope, train it for a while, watch the accuracy go 01:18:34.040 |
up, save it from time to time, unfreeze, probably want to go about 10 less, so it's still going 01:18:44.200 |
So 1e neg 5, LR over 5, train a bit more, and there we go. 01:18:56.900 |
You can learn.recorder is where we keep track of what's going on during training, and it's 01:19:01.800 |
got a number of nice methods, one of which is plot losses, and this plots your training 01:19:10.720 |
And you'll see quite often they actually go up a bit before they go down. 01:19:18.920 |
That's because you can also plot your learning rate over time, and you'll see that your learning 01:19:30.540 |
Because we said fit one cycle, and that's what fit one cycle does. 01:19:34.920 |
It actually makes the learning rate start low, go up, and then go down again. 01:19:41.960 |
Well, to find out why that's a good idea, let's first of all look at a really cool project 01:19:50.480 |
done by Jose Fernandez-Portal during the week. 01:19:54.720 |
He took our gradient descent demo notebook and actually plotted the weights over time, 01:20:04.800 |
not just the ground truth and model over time. 01:20:08.800 |
And he did it for a few different learning rates. 01:20:14.120 |
We were doing basically y equals ax plus b, or in his nomenclature here, y equals w naught 01:20:22.960 |
And so we can actually look and see over time what happens to those weights. 01:20:28.040 |
And we know this is the correct answer here, right? 01:20:31.480 |
So at a learning rate of 0.1, it kind of slides on in here, and you can see that it takes 01:20:36.580 |
a little bit of time to get to the right point, and you can see the loss improving. 01:20:43.280 |
At a higher learning rate of 0.7, you can see that the ground truth, the model jumps 01:20:50.960 |
And you can see that the weights jump straight to the right place really quickly. 01:20:56.040 |
What if we have a learning rate that's really too high? 01:21:00.680 |
You can see it takes a very, very, very long time to get to the right point. 01:21:09.520 |
So you can see here why getting the right learning rate is important. 01:21:13.920 |
When you get the right learning rate, it really zooms into the best spot very quickly. 01:21:20.480 |
Now as you get closer to the final spot, something interesting happens, which is that you really 01:21:32.000 |
want your learning rate to decrease, because you're getting close to the right spot. 01:21:38.640 |
And what actually happens -- so what actually happens is -- I can only draw 2D, sorry. 01:21:51.500 |
You don't generally actually have some kind of loss function surface that looks like that. 01:21:56.520 |
Remember, there's lots of dimensions, but it actually tends to look bumpy, like that. 01:22:04.320 |
And so you kind of want a learning rate that's high enough to jump over the bumps. 01:22:13.040 |
But then once you get close to the middle, once you get close to the best answer, you 01:22:18.600 |
don't want to be just jumping backwards and forwards between bumps. 01:22:21.400 |
So you really want your learning rate to go down so that as you get closer, you take smaller 01:22:28.720 |
So that's why it is that we want our learning rate to go down at the end. 01:22:36.440 |
Now this idea of decreasing the learning rate during training has been around forever, and 01:22:45.160 |
But the idea of gradually increasing it at the start is much more recent, and it mainly 01:22:52.520 |
If you're in San Francisco next week, actually, you can come and join me and Leslie Smith. 01:22:57.400 |
We're having a meetup where we'll be talking about this stuff, so come along to that. 01:23:03.160 |
What Leslie discovered is that if you gradually increase your learning rate, what tends to 01:23:10.520 |
happen is that actually -- actually what tends to happen is that loss function surfaces tend 01:23:20.040 |
to kind of look something like this, bumpy, bumpy, bumpy, bumpy, bumpy, flat, bumpy, bumpy, 01:23:25.640 |
bumpy, bumpy, bumpy, something like this, right? 01:23:31.560 |
And if you end up in the bottom of a bumpy area, that solution will tend not to generalize 01:23:38.280 |
very well because you found a solution that's -- it's good in that one place, but it's not 01:23:43.320 |
very good in other places, whereas if you found one in the flat area, it probably will 01:23:48.800 |
generalize well because it's not only good in that one spot, but it's good kind of around 01:23:54.980 |
If you have a really small learning rate, it will tend to kind of log down and stick 01:24:05.340 |
But if you gradually increase the learning rate, then it will kind of like jump down 01:24:10.320 |
and then as the learning rate goes up, it's going to start kind of going up again like 01:24:15.680 |
this, right? And then the learning rate is now going to be up here. It's going to be 01:24:19.440 |
bumping backwards and forwards, and eventually the learning rate starts to come down again, 01:24:25.440 |
and so it will tend to find its way to these flat areas. 01:24:29.080 |
So it turns out that gradually increasing the learning rate is a really good way of 01:24:34.120 |
helping the model to explore the whole function surface and try and find areas where both 01:24:41.240 |
the loss is low and also it's not bumpy, because if it was bumpy, it would get kicked out again. 01:24:49.720 |
And so this allows us to train at really high learning rates, so it tends to mean that we 01:24:54.400 |
solve our problem much more quickly and we tend to end up with much more generalizable 01:25:01.280 |
So if you call plot losses and find that it's just getting a little bit worse and then it 01:25:07.960 |
gets a lot better, you've found a really good maximum learning rate. So when you actually 01:25:12.720 |
call fit one cycle, you're not actually passing in a learning rate, you're actually passing 01:25:22.160 |
And if it's kind of always going down, particularly after you unfreeze, that suggests you could 01:25:28.120 |
probably bump your learning rates up a little bit, because you really want to see this kind 01:25:33.560 |
of shape. It's going to train faster and generalize better, just a little bit. And you'll tend 01:25:39.520 |
to particularly see it in the validation set, the orange is the validation set. 01:25:43.960 |
And again, the difference between knowing the theory and being able to do it is looking 01:25:50.680 |
at lots of these pictures. So after you train stuff, type learn.recorder. and hit tab and 01:25:59.520 |
see what's in there, right? And particularly the things that start with plot and start 01:26:03.520 |
getting a sense of, like, what are these pictures looking like when you're getting good results? 01:26:08.600 |
And then try making the learning rate much higher, try making it much lower, more epochs, 01:26:13.080 |
less epochs and get a sense for what these look like. 01:26:17.320 |
So in this case, we used a size in our transforms of the original image size over 2. These two 01:26:31.320 |
slashes in Python means integer divide, okay? Because obviously we can't have half pixel 01:26:37.520 |
amounts in our sizes. So integer divide divided by 2. And we used batch size of 8. And I found 01:26:43.400 |
that fits on my GPU. It might not fit on yours. If it doesn't, you can just decrease the batch 01:26:48.580 |
size down to 4. And this isn't really solving the problem, because the problem is to segment 01:26:55.720 |
all of the pixels, not half of the pixels. So I'm going to use the same trick that I 01:26:59.760 |
did last time, which is I'm now going to put the size up to the full size of the source 01:27:06.160 |
images, which means I now have to halve my batch size, otherwise I run out of GPU memory. 01:27:12.800 |
And I'm then going to set my learner. I can either say learn.data equals my new data, 01:27:20.200 |
or I actually found I've had a lot of trouble with kind of GPU memory, so I generally restarted 01:27:24.360 |
my kernel, came back here, created a new learner, and loaded up the weights that I saved last 01:27:31.080 |
time. But the key thing here being that this learner now has the same weights that I had 01:27:36.520 |
here, but the data is now the full image size. So I can now do an LR find again, find an 01:27:43.960 |
area where it's kind of, you know, well before it goes up. So I'm going to use 1nx3 and fit 01:27:49.800 |
some more. And then unfreeze and fit some more. And you can go to learn.show_results 01:28:00.440 |
to see how your predictions compare to the ground truth. And you've got to say they really 01:28:05.920 |
look pretty good. Not bad, huh? So, how good is pretty good? An accuracy of 92.15. The best 01:28:20.720 |
paper I know of for segmentation was a paper called the 100 layers tiramisu, which developed 01:28:27.680 |
a convolutional dense net, came out about two years ago. So after I trained this today, 01:28:34.720 |
I went back and looked at the paper to find their state of the art accuracy. Here it is. 01:28:46.860 |
And I looked it up. And their best was 91.5. And we got 92.1. So I've got to say, when 01:28:58.240 |
this happened today, I was like, wow. I don't know if better results have come out since 01:29:04.080 |
this paper. But I remember when this paper came out, and it was a really big deal. And 01:29:08.360 |
I was like, wow. This is an exceptionally good segmentation result. Like when you compare 01:29:13.240 |
it to the previous bests that they compared it to, it was a big step up. And so like in 01:29:18.800 |
last year's course, we spent a lot of time in the course re-implementing the 100 layers 01:29:23.800 |
tiramisu. And now, with our totally default fast AI class, I'm easily beating this. And 01:29:34.640 |
I also remember this I had to train for hours and hours and hours, whereas today's I trained 01:29:40.080 |
in minutes. So this is a super strong architecture for segmentation. So yeah, I'm not going to 01:29:50.120 |
promise that this is the definite state of the art today because I haven't done a complete 01:29:54.320 |
literature search to see what's happened in the last two years. But it's certainly beating 01:30:01.160 |
the world's best approach the last time I looked into this, which was in last year's 01:30:06.560 |
course, basically. And so these are kind of just all the little tricks I guess we've picked 01:30:12.160 |
up along the way in terms of like how to train things well. Things like using the pre-trained 01:30:17.520 |
model and things like using the one cycle convergence and all these little tricks. They 01:30:23.160 |
work extraordinarily well. And it's really nice to be able to like show something in 01:30:29.320 |
class where we can say, we actually haven't published the paper on the exact details of 01:30:34.840 |
how this variation of the unit works. There's a few little tweaks we do. But if you come 01:30:41.080 |
back for part two, we'll be going into all of the details about how we make this work 01:30:46.400 |
so well. But for you, all you have to know at this stage is that you can say learner.create_unit 01:30:52.600 |
and you should get great results also. There's another trick you can use if you're running 01:31:04.180 |
out of memory a lot, which is you can actually do something called mixed precision training. 01:31:13.780 |
And mixed precision training means that instead of using, for those of you that have done 01:31:18.160 |
a little bit of computer science, instead of using single precision floating point numbers, 01:31:23.000 |
you can do all the--most of the calculations in your model with half precision floating 01:31:27.080 |
point numbers. So 16 bits instead of 32 bits. Tradition--I mean, the very idea of this has 01:31:33.720 |
only been around really for the last couple of years in terms of like hardware that actually 01:31:39.280 |
does this reasonably quickly. And then fast AI library I think is the first and probably 01:31:45.720 |
still the only that makes it actually easy to use this. If you add to FP16 on the end 01:31:52.000 |
of any learner call, you're actually going to get a model that trains in 16-bit precision. 01:32:00.560 |
Because it's so new, you'll need to have kind of the most recent CUDA drivers and all that 01:32:06.200 |
stuff for this even to work. I'm going to trade it this morning on some of the platforms. 01:32:10.880 |
It just killed the kernel. So you need to make sure you've got the most recent drivers. 01:32:17.240 |
But if you've got a really recent GPU, like a 2080 Ti, not only will it work, but it will 01:32:24.800 |
work about twice as fast as otherwise. Now, the reason I'm mentioning it is that it's 01:32:30.120 |
going to use less GPU RAM. So even if you don't have like a 2080 Ti, you might find--or you'll 01:32:38.280 |
probably find that things that didn't fit into your GPU without this then do fit in 01:32:44.240 |
with this. Now, I actually have never seen people use 16-bit precision floating point 01:32:52.240 |
for segmentation before. Just for a bit of a laugh, I tried it and actually discovered 01:32:58.880 |
that I got an even better result. So I only found this this morning so I don't have anything 01:33:06.720 |
more to add here other than quite often when you make things a little bit less precise 01:33:12.120 |
in deep learning, it generalizes a little bit better. And I've never seen a 92.5 accuracy 01:33:20.440 |
on Canva before. So yeah, not only will this be faster, you'll be able to use bigger batch 01:33:26.760 |
values, but you might even find like I did that you get an even better result. So that's 01:33:33.000 |
a cool little trick. You just need to make sure that every time you create a learner, 01:33:36.720 |
you add this to FP16. If your kernel dies, it probably means you have slightly out of 01:33:41.920 |
date CUDA drivers or maybe even an old--too old graphics card. I'm not sure exactly which 01:33:49.680 |
cards support FP16. Okay, so one more before we kind of rewind. Sorry, two more. The first 01:34:04.400 |
one I'm going to show you is an interesting data set called the BWE head pose data set. 01:34:12.520 |
And Gabrielle Fanelli was kind enough to give us permission to use this in the class. His 01:34:17.600 |
team created this cool data set. Here's what the data set looks like. It's pictures. It's 01:34:24.540 |
actually got a few things in it. We're just going to do a simplified version. And one 01:34:27.200 |
of the things they do is they have a dot saying this is the center of the face. And so we're 01:34:35.160 |
going to try and create a model that can find the center of a face. So for this data set, 01:34:44.000 |
there's a few data set specific things we have to do which I don't really even understand 01:34:49.480 |
but I just know from the read me that you have to. They use some kind of depth sensing 01:34:53.880 |
camera. I think they actually used a Connect, you know, Xbox Connect. There's some kind 01:34:58.120 |
of calibration numbers that they provide in a little file which I had to read in. And 01:35:02.320 |
then they provided a little function that you have to use to take their coordinates 01:35:07.120 |
to change it from this depth sensor calibration thing to end up with actual coordinates. 01:35:14.440 |
So when you open this and you see these little conversion routines, that's just, you know, 01:35:20.640 |
I'm just doing what they told us to do basically. It's got nothing particularly to do with deep 01:35:24.440 |
learning to end up with this dot. The interesting bit really is where we create something which 01:35:33.000 |
is not an image or an image segment, but an image points. And we'll mainly learn about 01:35:39.720 |
this later in the course. But basically, image points use this idea of kind of the coordinates, 01:35:48.760 |
right? They're not pixel values, they're XY coordinates. There's just two numbers. As 01:35:54.760 |
you can see--let me see. Okay. So here's an example for a particular image file name, 01:36:12.240 |
this particular image file, and here it is. The coordinates of the center of the face 01:36:18.000 |
are 263, 428. And here it is. So there's just two numbers which represent whereabouts on 01:36:26.400 |
this picture as the center of the face. So if we're going to create a model that can 01:36:30.720 |
find the center of a face, we need a neural network that spits out two numbers. But note, 01:36:37.240 |
this is not a classification model. These are not two numbers that you look up in a 01:36:41.680 |
list to find out that they're road or building or ragdoll, cat or whatever. They're actual 01:36:48.320 |
locations. So, so far everything we've done has been a classification model, something 01:36:55.560 |
that's created labels or classes. This, for the first time, is what we call a regression 01:37:00.680 |
model. A lot of people think regression means linear regression. It doesn't. Regression 01:37:05.480 |
just means any kind of model where your output is some continuous number or set of numbers. 01:37:11.600 |
So, this is, we need to create an image regression model, something that can predict these two 01:37:16.720 |
numbers. So how do you do that? Same way as always, right? So we can actually just say 01:37:25.560 |
I've got a list of image files, it's in a folder, and I want to label them using this 01:37:32.320 |
function that we wrote that basically does the stuff that the README says to grab the 01:37:37.040 |
coordinates out of their text files. So that's going to give me the two numbers for everyone, 01:37:42.360 |
and then I'm going to split it according to some function. And so in this case, the files 01:37:49.360 |
they gave us, again, they're from videos, and so I picked just one folder to be my validation 01:37:56.520 |
set, in other words, a different person. So again, I was trying to think about, like, 01:38:00.080 |
how do I validate this fairly? So I said, well, the fair validation would be to make 01:38:04.780 |
sure that it works well on a person that it's never seen before. So my validation set is 01:38:09.740 |
all going to be a particular person. Create a data set, and so this data set, I just tell 01:38:15.460 |
it what kind of data set is it. Well, they're going to be a set of points. So points means, 01:38:19.440 |
you know, specific coordinates. Do some transforms. Again, I have to say transform Y equals true, 01:38:26.680 |
because that red dot needs to move if I flip or rotate or what, right? Pick some size, 01:38:32.960 |
I just picked a size that's going to work pretty quickly. Create a data bunch, normalize 01:38:36.280 |
it, and again, show batch, there it is. Okay? And notice that their red dots don't always 01:38:43.240 |
seem to be quite in the middle of the face. I don't know exactly what their kind of internal 01:38:48.520 |
algorithm for putting dots on. It kind of sometimes looks like it's meant to be the 01:38:53.160 |
nose, but sometimes it's not quite the nose. Anyway, you get the -- it's somewhere around 01:38:57.360 |
the center of the face, or the nose. So how do we create a model? We create a CNN. But 01:39:07.840 |
we're going to be learning a lot about loss functions in the next few lessons. But generally, 01:39:13.760 |
basically the loss function is that number that says how good is the model. And so for 01:39:19.560 |
classification, we use this loss function called cross-entropy loss, which says basically 01:39:25.600 |
-- you remember this from earlier lessons? Did you predict the correct class, and were 01:39:31.680 |
you confident of that prediction? Now, we can't use that for regression. So instead, 01:39:37.040 |
we use something called mean-squared error. And if you remember from last lesson, we actually 01:39:42.960 |
implemented mean-squared error from scratch. It's just the difference between the two squared 01:39:47.800 |
and added up together. Okay. So we need to tell it this is not classification, so we 01:39:53.440 |
use mean-squared error. So this is not classification, so we have to use mean-squared error. And 01:40:07.800 |
then once we've created the learner, we've told it what loss function to use, we can 01:40:11.200 |
go ahead and do lrfind. We can then fit. And you can see here, within a minute and a half, 01:40:18.520 |
our mean-squared error is 0.0004. Now, the nice thing is about, like, mean-squared error, 01:40:25.120 |
that's very easy to interpret, right? So we're trying to predict something, which is somewhere 01:40:30.560 |
around a few hundred. And we're getting a squared error on average of 0.0004. So we 01:40:38.560 |
can feel pretty confident that this is a really good model. And then we can look at the results 01:40:42.240 |
by learn.show_results, and we can see predictions, ground truth. It's doing a nearly perfect 01:40:50.760 |
job. Okay? So that's how you can do image regression models. So any time you've got 01:40:57.440 |
something you're trying to predict, which is some continuous value, you use an approach 01:41:01.320 |
that's something like this. So last example, before we look at some kind of more foundational 01:41:10.560 |
theory stuff, NLP. And next week we're going to be looking at a lot more NLP. But let's 01:41:18.240 |
now do the same thing, but rather than creating a classification of pictures, let's try and 01:41:24.280 |
classify documents. And so we're going to go through this in a lot more detail next 01:41:31.000 |
week, but let's do the quick version. Rather than importing from fastai.vision, I now 01:41:36.200 |
import for the first time from fastai.txt. That's where you'll find all the application-specific 01:41:41.200 |
stuff for analyzing text documents. And in this case, we're going to use a dataset called 01:41:46.460 |
imdb. And imdb has lots of movie reviews. They're generally about a couple of thousand 01:41:54.720 |
words. And each movie review has been classified as either negative or positive. So it's just 01:42:04.160 |
in a CSV file, so we can use pandas to read it, we can take a little look, we can take 01:42:08.200 |
a look at a review. And basically, as per usual, we can either use factory methods or 01:42:17.920 |
the data block API to create a data bunch. So here's the quick way to create a data bunch 01:42:23.120 |
from a CSV of texts, data bunch from CSV, and that's that. And yeah, at this point, 01:42:34.680 |
I could create a learner and start training it. But we're going to show you a little bit 01:42:39.280 |
more detail, which we're mainly going to look at next week. The steps that actually happen 01:42:44.320 |
when you create these data bunches is there's a few steps. The first is it does something 01:42:48.600 |
called tokenization, which is it takes those words, and it converts them into a standard 01:42:55.560 |
form of tokens, where there's basically each token represents a word. But it does things 01:43:02.040 |
like see here, see how didn't has been turned here into two separate words? And you see 01:43:07.480 |
how everything's been lower cased? See how your has been turned into two separate words? 01:43:13.400 |
So tokenization is trying to make sure that each token, each thing that we've got with 01:43:20.600 |
spaces around it here, represents a single linguistic concept. Also, it finds words that 01:43:31.920 |
are really rare, like really rare names and stuff like that and replaces them with a special 01:43:37.320 |
token called unknown. So anything starting with XX and fast AI is some special token. 01:43:45.080 |
So that's just tokenization. So we end up with something where we've got a list of tokenized 01:43:49.960 |
words. You'll also see that things like punctuation end up with spaces around them to make sure 01:43:55.160 |
that they're separate tokens. The next thing we do is we take a complete unique list of 01:44:03.800 |
all of the possible tokens, that's called the vocab, and that gets created for us. And 01:44:09.800 |
so here's the first ten items of the vocab. So here is every possible token, the first 01:44:15.160 |
ten of them that appear in all of the movie reviews. And we then replace every movie review 01:44:22.360 |
with a list of numbers. And the list of numbers simply says what numbered thing in the vocab 01:44:29.520 |
is in this place. So here's six is zero, one, two, three, four, five, six. So this is the 01:44:36.080 |
word "a." And this is three, zero, one, two, three, this is a comma, and so forth. So through 01:44:43.320 |
tokenization and numericalization, this is the standard way in NLP of turning a document 01:44:49.680 |
into a list of numbers. We can do that with the data block API, right? So this time it's 01:44:57.180 |
not image files list, it's text, split data from a CSV, convert them to data sets, tokenize 01:45:06.600 |
them, numericalize them, create a data bunch, and at that point we can start to create a 01:45:17.560 |
model. As we learn about next week, when we do NLP classification, we actually create 01:45:25.080 |
two models. The first model is something called a language model, which, as you can see, we 01:45:33.800 |
train in a kind of a usual way. We say we want to create a language model learner, we 01:45:37.960 |
train it, we can save it, we unfreeze, we train some more, and then after we've created 01:45:44.480 |
a language model, we fine tune it to create the classifier. So here's the thing where 01:45:49.000 |
we create the data bunch for the classifier, we create a learner, we train it, and we end 01:46:00.740 |
up with some accuracy. So that's the really quick version. We're going to go through it 01:46:05.040 |
in more detail next week. But you can see the basic idea of training an NLP classifier 01:46:10.520 |
is very, very, very similar to creating every other model we've seen so far. And this accuracy, 01:46:18.640 |
so the current state of the art for IMDB classification is actually the algorithm that we built and 01:46:26.360 |
published with a colleague named Sebastian Ruder, and this basically, what I just showed 01:46:33.000 |
you is pretty much the state of the art algorithm with some minor tweaks. You can get this up 01:46:37.240 |
to about 95% if you try really hard. So this is very close to the state of the art accuracy 01:46:43.200 |
that we developed. There's a question. Okay, now's a great time for a question. 01:46:53.880 |
>> For a dataset very different than ImageNet, like the satellite images or genomic images 01:46:58.480 |
shown in lesson two, we should use our own stats. Jeremy once said if you're using a 01:47:03.560 |
pre-trained model, you need to use the same stats it was trained with. Why is that? Isn't 01:47:08.920 |
it that normalized data with its own stats will have roughly the same distribution like 01:47:13.200 |
ImageNet? The only thing I can think of which may differ is skewness. Is it the possibility 01:47:18.720 |
of skewness or something else the reason of your statement? And does that mean you don't 01:47:23.240 |
recommend using pre-trained models with very different datasets like the one-point mutation 01:47:28.080 |
that you mentioned in lesson two? >> No. As you can see, I've used pre-trained 01:47:36.400 |
models for all of those things. Every time I've used an imageNet trained model and every 01:47:41.280 |
time I've used ImageNet stats. Why is that? Because that model was trained with those 01:47:47.400 |
stats. So for example, imagine you're trying to classify different types of green frogs. 01:47:56.100 |
So if you were to use your own per-channel means from your dataset, you would end up 01:48:01.080 |
converting them to a mean of zero, standard deviation of one for each of your red, green 01:48:06.320 |
and blue channels, which means they don't look like green frogs anymore. They now look 01:48:11.120 |
like gray frogs, right? But ImageNet expects frogs to be green, okay? So you need to normalize 01:48:18.240 |
with the same stats that the ImageNet training people normalized with, otherwise the unique 01:48:23.240 |
characteristics of your dataset won't appear anymore. You actually normalize them out in 01:48:27.480 |
terms of the per-channel statistics. So you should always use the same stats that the 01:48:32.080 |
model was trained with. Okay. So in every case, what we're doing here 01:48:44.920 |
is we're using gradient descent with mini-batches, so stochastic gradient descent, to fit some 01:48:51.040 |
parameters of a model. And those parameters are parameters to basically matrix multiplications. 01:48:59.400 |
In the second half of this part, we're actually going to learn about a little tweak called 01:49:03.000 |
convolutions, but it's basically a type of matrix multiplication. The thing is, though, 01:49:10.120 |
no amount of matrix multiplications is possibly going to create something that can read IMDB 01:49:18.120 |
reviews and decide if it's positive or negative, or look at satellite imagery and decide whether 01:49:23.920 |
it's got a road in it. That's far more than a linear classifier can do. Now, we know these 01:49:29.080 |
are deep neural networks, and deep neural networks contain lots of these matrix multiplications. 01:49:35.520 |
But every matrix multiplication is just a linear model, and a linear function on top 01:49:42.520 |
of a linear function is just another linear function. If you remember back to your high 01:49:50.360 |
school math, you might remember that if you have a Y equals AX plus B, and then you stick 01:49:55.960 |
another CY plus D on top of that, it's still just another slope and another intercept. 01:50:05.600 |
So no amount of stacking matrix multiplications is going to help in the slightest. 01:50:11.800 |
So what are these models actually -- what are we actually doing? And here's the interesting 01:50:19.040 |
thing. All we're actually doing is we literally do have a matrix multiplication or a slight 01:50:27.040 |
variation like a convolution that we'll learn about. But after each one, we do something 01:50:32.720 |
called a non-linearity or an activation function. An activation function is something that takes 01:50:39.800 |
the result of that matrix multiplication and sticks it through some function. And these 01:50:47.920 |
are some of the functions that we use. In the old days, the most common function that 01:50:56.800 |
we used to use was basically this shape. These shapes are called sigmoid. And they have, 01:51:09.720 |
you know, particular mathematical definitions. Nowadays, we almost never use those for these 01:51:17.400 |
-- between each matrix multiply. Nowadays, we nearly always use this one. It's called 01:51:25.400 |
a rectified linear unit. It's very important when you're doing deep learning to use big 01:51:30.680 |
long words that sound impressive, otherwise normal people might think they can do it too. 01:51:35.480 |
But just between you and me, a rectified linear unit is defined using the following function. 01:51:47.560 |
That's it. Okay. So -- and if you want to be really exclusive, of course, you then shorten 01:51:54.440 |
the long version and you call it a ReLU to show that you're really in the exclusive team. 01:51:59.760 |
So this is a ReLU activation. So here's the crazy thing. If you take your red, green, 01:52:08.440 |
blue pixel inputs and you chuck them through a matrix multiplication and then you replace 01:52:15.160 |
the negatives with zero and you put it through another matrix multiplication, place the negatives 01:52:19.160 |
with zero and you keep doing that again and again and again, you have a deep learning 01:52:23.760 |
neural network. That's it. All right. So how the hell does that work? So an extremely cool 01:52:32.360 |
guy called Michael Nielsen showed how this works. He has a very nice website. There's 01:52:39.320 |
actually more than a website. It's a book. Neural networks and deep learning dot com. 01:52:44.680 |
And he has these beautiful little JavaScript things where you can get to play around -- because 01:52:50.360 |
this was back in the old days, this was back when we used to use sigmoids, right? And what 01:52:54.320 |
he shows is that if you have enough little -- he shows these little matrix modifications. 01:53:00.840 |
If you have enough little matrix modifications followed by sigmoids, and exactly the same 01:53:05.720 |
thing works for a matrix multiplication followed by a ReLU, you can actually create arbitrary 01:53:11.440 |
shapes, right? And so this idea that these combinations of linear functions and nonlinearities 01:53:24.160 |
can create arbitrary shapes actually has a name. And this name is the universal approximation 01:53:30.040 |
theorem. And what it says is that if you have stacks of linear functions and nonlinearities, 01:53:38.040 |
the thing you end up with can approximate any function arbitrarily closely. So you just 01:53:46.280 |
need to make sure that you have a big enough matrix to multiply by or enough of them. So 01:53:52.680 |
if you have, you know, now this function, which is just a sequence of matrix multipliers 01:53:58.980 |
and nonlinearities, where the nonlinearities can be, you know, basically any of these things. 01:54:04.200 |
And we normally use this one. If that can approximate anything, then all you need is 01:54:08.880 |
some way to find the particular values of the weight matrices in your matrix multipliers 01:54:14.840 |
that solve the problem you want to solve. And we already know how to find the values 01:54:19.360 |
of parameters. We can use gradient descent. And so that's actually it, right? And this 01:54:25.400 |
is the bit I find the hardest thing normally to explain to students is that we're actually 01:54:33.200 |
done now. People often come up to me after this lesson and they say, what's the rest? 01:54:40.080 |
Please explain to me the rest of deep learning. But, like, no, there's no rest. Like, we have 01:54:45.200 |
a function where we take our input pixels or whatever, we multiply them by some weight 01:54:49.320 |
matrix, we replace the negatives with zeros, we multiply it by another weight matrix, replace 01:54:53.760 |
the negatives with zeros, we do that a few times, we see how close it is to our target, 01:54:59.640 |
and then we use gradient descent to update our weight matrices using the derivatives. 01:55:03.760 |
And we do that a few times. And eventually we end up with something that can classify 01:55:09.160 |
movie reviews or can recognize pictures of ragdoll cats. That's actually it. Okay? So 01:55:19.380 |
the reason it's hard to understand intuitively is because we're talking about weight matrices 01:55:27.880 |
that have, you know, once you wrap them all up, something like 100 million parameters. 01:55:33.040 |
They're very big weight matrices, right? So your intuition about what multiplying something 01:55:39.880 |
by a linear model and replacing the negatives with zeros a bunch of times can do, your intuition 01:55:45.680 |
doesn't hold, right? You just have to accept empirically the truth is doing that works 01:55:52.920 |
really well. So in part two of the course, we're actually going to build these from scratch, 01:56:00.680 |
right? But I mean, just to skip ahead, you'll basically will find that, you know, it's going 01:56:06.480 |
to be kind of five lines of code, right? It's going to be a little for loop that goes, you 01:56:11.640 |
know, t equals, you know, x at weight matrix one, t two equals max of t comma zero, stick 01:56:23.840 |
that in a for loop that goes through each weight matrix and at the end calculate my 01:56:29.480 |
loss function. And of course we're not going to calculate the gradients ourselves because 01:56:33.680 |
PyTorch does that for us. And that's about it. So, okay, question. 01:56:45.680 |
>> There's a question about tokenization. I'm curious about how tokenizing words works when 01:56:50.040 |
they depend on each other such as San Francisco. >> Yeah, okay. Okay. Tokenization, how do you 01:57:05.400 |
tokenize something like San Francisco? San Francisco contains two tokens, San Francisco. 01:57:15.680 |
That's it. That's how you tokenize San Francisco. The question may be coming from people who 01:57:22.000 |
have done, like, traditional NLP often need to kind of use these things called ngrams. 01:57:28.840 |
And ngrams are kind of this idea of, like, a lot of NLP in the old days was all built 01:57:34.640 |
on top of linear models where you basically counted how many times particular strings 01:57:40.040 |
of text appeared, like the phrase San Francisco. That would be a bigram or an ngram with an 01:57:48.040 |
n of two. The cool thing is that with deep learning we don't have to worry about that. 01:57:52.480 |
Like with many things, a lot of the complex feature engineering disappears when you do 01:57:56.920 |
deep learning. So with deep learning, each token is literally just a word or in the case 01:58:04.280 |
that the word really consists of two words, like your, you split it into two words. And 01:58:10.520 |
then what we're going to do is we're going to then let the deep learning model figure 01:58:16.680 |
out how best to combine words together. Now when we say, like, let the deep learning model 01:58:22.040 |
figure it out, of course, all we really mean is find the weight matrices using gradient 01:58:28.040 |
descent to give the right answer. Like, there's not really much more to it than that. Again, 01:58:34.280 |
there's some minor tweaks, right? In the second half of the course we're going to be learning 01:58:38.280 |
about the particular tweak for image models, which is using a convolution. There'll be 01:58:42.920 |
a CNN. For language, there's a particular tweak we do called using recurrent models 01:58:49.200 |
or an RNN. But they're very minor tweaks on what we've just described. So basically it 01:58:55.000 |
turns out with an RNN that it can learn that SAN plus Francisco has a different meaning 01:59:05.000 |
when those two things are together. >> Some satellite images have four channels. 01:59:12.280 |
How can we deal with data that has four channels or two channels when using pre-trained models? 01:59:17.640 |
>> Yeah, that's a good question. I think that's something that we're going to try and incorporate 01:59:26.480 |
into fast AI. So hopefully by the time you watch this video there will be easier ways 01:59:30.800 |
to do this. But the basic idea is a pre-trained image net model expects red, green, and blue 01:59:39.280 |
pixels. So if you've only got two channels there's a few things you can do. But basically 01:59:48.120 |
you want to create a third channel. And so you can create the third channel as either 01:59:54.080 |
being all zeros or it could be the average of the other two channels. And so you can 01:59:59.600 |
just use normal PyTorch arithmetic to create that third channel. You could either do that 02:00:07.760 |
ahead of time in a little loop and save your three channel versions or you could create 02:00:13.080 |
a custom dataset class that does that on demand. For fourth channel you probably don't want 02:00:23.160 |
to get rid of the fourth channel. So instead what you'd have to do is to actually modify 02:00:29.240 |
the model itself. So to know how to do that we'll only know how to do that in a couple 02:00:33.640 |
more lessons time. But basically the idea is that the initial weight matrix, weight matrix 02:00:41.480 |
is really the wrong term. They're not weight matrices, they're weight tensors so they can 02:00:45.960 |
have more than just two dimensions. So that initial weight matrix in the neural net it's 02:00:51.680 |
going to have, it's actually a tensor and one of its axes is going to have three slices 02:01:00.720 |
in it. So you would just have to change that to add an extra slice which I would generally 02:01:06.480 |
just initialize to zero or to some random numbers. So that's the short version. But 02:01:12.900 |
really to answer this, to understand exactly what I meant by that, we're going to need 02:01:16.400 |
a couple more lessons to get there. Okay, so wrapping up, what have we looked 02:01:24.360 |
at today? Basically we started out by saying, hey, it's really easy now to create web apps. 02:01:40.840 |
We've got starter kits for you that show you how to create web apps and people have created 02:01:46.600 |
some really cool web apps using what we've learned so far which is single label classification. 02:01:54.220 |
But the cool thing is the exact same steps we use to do single label classification, 02:02:00.760 |
you can also do to do multi-label classification such as in the planet or you could use to 02:02:13.560 |
do segmentation or you could use to do, or you could use to do any kind of image regression 02:02:28.160 |
or, this is probably a bit early if you don't try this yet, you could do for an LP classification 02:02:34.920 |
and a lot more. So in each case, all we're actually doing is we're doing gradient to 02:02:43.420 |
descent on not just two parameters but on maybe 100 million parameters but still just 02:02:50.920 |
plain gradient descent along with a non-linearity, which is normally this one, which it turns 02:03:01.160 |
out the universal approximation theorem tells us, lets us arbitrarily accurately approximate 02:03:07.800 |
any given function including functions such as converting a spoken waveform into the thing 02:03:15.200 |
the person was saying or converting a sentence in Japanese to a sentence in English or converting 02:03:20.680 |
a picture of a dog into the word dog. These are all mathematical functions that we can 02:03:26.500 |
learn using this approach. So this week, see if you can come up with an interesting idea 02:03:34.880 |
of a problem that you would like to solve which is either multi-label classification 02:03:40.760 |
or image regression or image segmentation, something like that and see if you can try 02:03:49.520 |
to solve that problem. You will probably find the hardest part of solving that problem is 02:03:56.160 |
coming up creating the data bunch and so then you'll need to dig into the data block API 02:04:01.600 |
to try to figure out how to create the data bunch from the data you have. And with some 02:04:07.800 |
practice you will start to get pretty good at that. It's not a huge API, there's a small 02:04:12.280 |
number of pieces, it's also very easy to add your own but for now, you know, ask on the 02:04:18.200 |
forum if you try something and you get stuck. Okay, great. So next week we're going to come 02:04:28.080 |
back and we're going to look at some more NLP. We're going to learn some more about some 02:04:34.120 |
details about how we actually train with SGD quickly. We're going to learn about things 02:04:38.360 |
like Atom and RMS prop and so forth. And hopefully we're also going to show off lots of really 02:04:44.720 |
cool web apps and models that you've all built during the week. So I'll see you then. Thanks.