back to indexLive coding 9
Chapters
0:0
1:0 Installing timm persistently in Paperspace
4:0 Fixing broken symlink
6:30 Navigating around files in vim
16:40 Improving a model for a kaggle competition
24:0 Saving a trained model
34:30 Test time augmentation
39:0 Prepare file for kaggle submission
45:0 Compare new predictions with previous
46:0 Submit new predictions
49:0 How to create an ensemble model
54:30 Change probability of affine transforms
57:0 Discussion about improving accuracy
00:00:01.280 |
So I know there was some questions on the forum. 00:00:13.760 |
- Oh yeah, or you can express them in your own terms, 00:00:51.080 |
And, but the one thing that I would be really interested 00:01:15.800 |
I'd prefer to have it persistently there, ready to go. 00:01:26.280 |
- Yeah, so it's actually pip packages are the only ones 00:01:29.560 |
we've actually got the persistence working for. 00:01:51.840 |
is when you install Tim is to do it with dash dash user. 00:02:03.320 |
would be to edit our slash storage slash dot bash dot local 00:02:13.920 |
Let's do PI for pip install equals pip install. 00:02:28.320 |
That should work even if it's not installed already. 00:02:40.840 |
or I could just type source and then the name of the script, 00:02:44.040 |
which of course in this case is exclamation mark dollar. 00:02:46.840 |
Oopsie dozy, exclamation mark VIM will rerun this. 00:03:05.000 |
And by the way, if you wanna know what something is, 00:03:11.440 |
if you type which PI, it won't tell you anything useful 00:03:22.600 |
it will tell you exactly what it is in this case. 00:03:34.720 |
is that's gonna put it in my dash dash local directory. 00:04:03.360 |
Our, this is here is telling us we've got a broken simlink. 00:04:09.040 |
Yeah, dot config is simlinked to slash storage. 00:04:22.160 |
So I might've maybe forgot to move that or something. 00:04:35.800 |
it'll tell us and we'll know to fix that then. 00:04:38.440 |
To create a file that's empty, you just use touch. 00:04:46.000 |
So I'm just gonna go ahead and create an empty file. 00:05:20.040 |
So my guess is that there's a bug in our pre-run script 00:05:59.240 |
So now, yeah, so now since it's been installed 00:06:04.120 |
into something that's similar to back to slash storage, 00:06:08.160 |
And if I run ipython, we can confirm it did install. 00:06:13.160 |
Does that answer that part of the question, Matt? 00:06:17.800 |
So then the second one, yeah, it was not a question, 00:06:23.200 |
So yeah, when I get back to using Kaggle on this machine, 00:06:31.600 |
And you also had a question about jumping around to, 00:06:44.440 |
which, let's grab Bast.ai's repo, for example. 00:08:10.900 |
how to set up VIM to automatically create that 00:08:15.740 |
But let's have a look at, I don't know, layers, for example. 00:08:29.780 |
The first is something which sounds very obscure, 00:08:43.820 |
Slash init, we'll search for the next thing called init. 00:08:52.340 |
regardless of whether it was a tag or a search or anything, 00:08:56.960 |
And right next to Control + O is the letter I, 00:09:03.640 |
Okay, so Control + O and Control + I go kind of, 00:09:11.740 |
but it just finds a single letter, which is F. 00:09:15.800 |
and it's under the search on the current line, 00:09:17.300 |
it will search on this line for the next thing I type. 00:09:22.520 |
actually maybe more interesting would be F full stop. 00:09:33.560 |
well, what about jumping to the end of a string? 00:09:36.500 |
the end of a string is the last character of the line. 00:09:41.020 |
which is to start inserting at the end of the line, 00:10:28.920 |
But yeah, so let's say there was some stuff at the end, 00:10:42.980 |
I can just type F double quote, and it takes me there. 00:10:48.000 |
And then Shift + F does the opposite, it searches backwards. 00:10:56.840 |
So for example, if I wanted to delete everything up 00:11:00.400 |
to the next quote, I can press D, F double quote, right? 00:11:12.080 |
Or maybe delete everything up to the next comment 00:11:18.200 |
So yeah, those are a couple of useful things. 00:11:32.600 |
of a set of paired parentheses or braces or brackets. 00:11:40.640 |
it goes to the start of end of the next parentheses 00:11:44.680 |
and you can see it jumps between the two, right? 00:11:49.600 |
you can see it jumps to the end of this one, right? 00:11:54.600 |
If I do it at the very end, it'll jump to this one. 00:12:00.560 |
to the end of the parenthesized parenthetical expression, 00:12:19.440 |
Although there's actually something even better for that, 00:12:36.640 |
So even when I'm in the middle of these parentheses, 00:12:46.920 |
So if I want to delete everything inside those parentheses, 00:12:53.920 |
DI, open round parentheses, and it deletes the contents, 00:13:00.560 |
So let's say I wanted to replace all my parameters 00:13:06.960 |
then I would use C for change inside parentheses. 00:13:25.640 |
You can kind of really crush with these tricks. 00:13:36.920 |
and you don't have to know them all, you know? 00:13:38.480 |
It's like you can learn one thing each day or something, 00:13:41.040 |
but yeah, I'm not using any plugins or anything, you know? 00:13:45.480 |
Okay, so we're gonna save a model in a moment. 00:13:59.880 |
- I want to make one comment about the Tim installation. 00:14:04.880 |
I don't know if maybe you discussed this yesterday 00:14:11.560 |
sometimes it might be better to install from master 00:14:14.280 |
because there are some changes that Ross has made 00:14:27.720 |
because that's like something that's more stable 00:14:40.120 |
sometimes like here he went six months without updating. 00:15:04.400 |
Great, yeah, thanks for the reminder, Tanisha. 00:15:34.040 |
And so PIP has this convention that if you say, 00:15:39.040 |
I want to install something that is at least as recent 00:15:44.200 |
as 0.6.2 dev, then that's a way of signaling to PIP 00:15:48.400 |
that you're happy to include pre-release options. 00:15:52.340 |
- Is there any reason that when you do the installation 00:15:57.240 |
of a theme and then you try to use the learner, 00:16:08.800 |
- Right, that's because you have to restart the kernel 00:16:21.120 |
So you wouldn't have to worry about that again. 00:16:23.120 |
Okay, so this was our notebook from yesterday. 00:17:22.880 |
be as good as this helpful fast AI out of the box person. 00:17:33.880 |
You know, which is quite a bit better than ours, right? 00:17:45.760 |
I hope you don't mind, but then you'll know how to beat us 00:17:48.160 |
because well, at least you know how to match us. 00:17:58.800 |
was you trained for longer, which makes sense. 00:18:03.140 |
And you also used some data augmentation, which makes sense. 00:18:34.880 |
Okay, so if we're going to train as long as Gerardo did, 00:18:38.360 |
then, you know, if you train more than about five epochs, 00:18:45.360 |
And certainly 10, I feel like you're in significant danger 00:18:48.480 |
of overfitting because your model's going to have seen 00:19:02.200 |
And this is discussed in the book in some detail. 00:19:07.200 |
But basically, if you pass in batch transforms, 00:19:17.160 |
these are things that are going to be applied 00:19:31.320 |
Org transform, so this is transforms for data augmentation. 00:19:37.340 |
what something's going to do is to check its help. 00:19:46.500 |
Okay, so it's going to do things like flip our images, 00:19:51.500 |
rotate them, zoom them, change their brightness, 00:20:02.880 |
Okay, and here's some examples of a very cute puppy 00:20:12.100 |
So this is all the same puppy, it's all the same picture. 00:20:16.020 |
And as you can see, each time the model sees it, 00:20:18.660 |
it sees a somewhat skewed or rotated or brightened 00:20:22.540 |
or darkened or whatever version of that picture. 00:20:43.940 |
And so all transforms actually returns a list, right? 00:21:13.780 |
such that it has at least 75% of the height width. 00:21:18.500 |
And it will basically pick a smaller zoomed in section 00:21:34.540 |
And so here you can see four versions of the same picture. 00:21:38.580 |
sometimes it's moved a little bit up and down, 00:21:40.980 |
sometimes it's a little bit darker or less dark, 00:21:52.260 |
Then the second thing I figured we should do is, 00:22:15.540 |
The default learning rate used by Fast AI is one where, 00:22:20.020 |
I would say I picked it on the conservative side, 00:22:23.180 |
which means it's a little bit lower than you probably need 00:22:27.140 |
because I wanted things to always be able to train. 00:22:34.780 |
a couple of downsides to using a lower learning rate 00:22:37.560 |
One is that given fixed resources, fixed amount of time, 00:22:50.420 |
The second is it turns out a high learning rate 00:22:54.100 |
helps the optimizer to explore the space of options 00:22:59.700 |
by jumping further to see if there's better places to go. 00:23:05.420 |
So the learning rate finder is suggesting things 00:23:10.420 |
around about 0.002, which is indeed the default. 00:23:15.260 |
But you can see that all the way up to like 10 00:23:25.100 |
as we saw after answering Nick's question yesterday, 00:23:31.500 |
which means we're gradually increasing the learning rate. 00:23:38.260 |
So I would also say that even these recommendations 00:23:43.880 |
So what I did just before I started this call 00:23:46.100 |
was I tried training at a learning rate of 0.01, 00:24:00.440 |
And I did find actually that that did give us 00:24:09.540 |
I mean, obviously we've got different training sets, 00:24:14.060 |
That we're gonna get a better result than our target. 00:24:24.700 |
So then since this took six minutes to train, 00:24:32.400 |
So there's a couple of different things we can save with. 00:24:56.100 |
So the learner, self means that this learner, 00:25:07.860 |
So basically what that means is if you call this 00:25:16.620 |
learner.export, it's gonna save it into learner.path. 00:25:21.620 |
So let's find out learner.path is what train images. 00:25:27.120 |
And so this is actually whatever we passed in here. 00:25:48.860 |
So an absolute path is something that starts with slash. 00:25:52.100 |
And so if I were to save it somewhere in storage, 00:25:55.140 |
for example, then I can type slash storage slash whatever. 00:26:00.140 |
Or maybe I wanna put it in slash notebooks somewhere. 00:26:22.820 |
I might even just put it into the current directory. 00:26:27.540 |
Well, actually, where are we current directory? 00:26:53.100 |
So learn.save doesn't save the whole learner. 00:26:59.260 |
It just saves the model and the optimizer state. 00:27:10.800 |
it also contains information about the data looters 00:27:17.580 |
and specifically what transformations are applied. 00:27:34.660 |
and we've already have stuff this in fast.ai. 00:27:37.820 |
In fast.ai, we have something that's a callback 00:27:40.540 |
that can save the model at the end of each e-park 00:27:57.920 |
I wanna be able to just load that exact thing 00:28:00.720 |
with all the same details next time, .export's the way to go. 00:28:16.360 |
Normally we actually try to make these things 00:28:22.800 |
do that for you, but this is less friendly than I would like. 00:28:35.840 |
Okay, and it looks like we need to give it a .pick or, 00:28:49.120 |
it'll randomly pick a subset of the crop of the image 00:28:56.060 |
And the validation set, it will pick out the center, 00:29:05.120 |
or all the height it can without changing the aspect ratio. 00:29:12.200 |
it will grab the whole thing and change the aspect ratio 00:29:36.960 |
- I can't hear you, but you guys can hear me. 00:30:05.660 |
All right, did you guys, were you guys saying anything 00:30:27.180 |
and I thought, oh, aim for something around there, 00:30:35.260 |
Like, it seems to, like, for fine-tuning things, 00:30:43.700 |
it often seems to get pretty close to, you know, 00:30:54.220 |
And that's a reasonable amount of time, too, I'd say. 00:31:06.740 |
is because of the size of the images were 460. 00:31:14.980 |
because when you show the team with the different, 00:31:35.940 |
and I think this is around what we always pre-size to. 00:31:39.660 |
So actually, maybe 480 would have been better, 00:31:44.220 |
one of the dimensions, 'cause there was 640 by 480. 00:31:46.780 |
And then your size, you picked, I actually changed it. 00:31:54.720 |
most of these models that are trained on ImageNet 00:32:09.540 |
Oh, and then the other thing is the model I picked 00:32:21.180 |
and the 22K refers to the version of ImageNet 00:32:26.340 |
as opposed to the version that's normally used, 00:32:33.940 |
but it's trained on ImageNet with a 22,000 category version. 00:32:46.660 |
rice paddy illness than the one with 1,000 images, 00:32:51.780 |
and it's just seen a lot more different pics, you know. 00:33:18.280 |
and then the error came, that's when it cut off. 00:33:20.540 |
So I don't think you explained what you did to, 00:33:23.100 |
we didn't catch the part where you explained the fix. 00:33:28.700 |
- Well, 'cause the export had an error, right? 00:33:40.120 |
because this was a string and it actually is via path, 00:33:49.260 |
So I would consider that a bug that ought to be fixed. 00:33:52.260 |
So hopefully by the time people watch this video, 00:33:55.960 |
But yes, at the moment, I had to change this to a path. 00:34:00.660 |
All right, so there's a few things we could do here, right? 00:34:13.340 |
is that particularly if you don't have method equals squish, 00:34:27.920 |
And then another thing is that we've been training it 00:34:33.620 |
but the validation set is any of those augmentations. 00:34:41.780 |
which you should particularly use if you don't use squish 00:34:44.860 |
and it's effectively cropping into the center, 00:34:47.620 |
which is something called test time augmentation. 00:35:04.660 |
four different randomly augmented versions of each image. 00:35:19.560 |
And as I said, it's gonna work particularly well 00:35:23.060 |
but it ought to work well even with the squish. 00:35:28.060 |
let's first of all make sure we can replicate 00:35:37.220 |
probabilities, targets equals learn.get preds 00:36:39.900 |
We get the predictions n times by default four 00:37:01.380 |
is it's always good to look at the source code 00:37:24.620 |
With, you can pretty much split progress bars. 00:37:36.260 |
We're gonna call self.get preds passes in a data loader 00:37:43.800 |
And then it takes either the maximum or the main 00:37:47.020 |
depending on whether you asked for the max or not. 00:37:50.100 |
And it also grabs it for the validation set data loader. 00:38:14.420 |
So you can see here it's running at four times 00:38:31.020 |
Okay, and it beat it, but just by a little bit. 00:38:58.180 |
Yeah, I kind of wish I didn't have the squish in now, 00:39:00.100 |
but I don't want you guys to have to wait 10 minutes 00:39:02.060 |
for it to retrain 'cause then it's much more clearly 00:39:11.380 |
So I generally find that when not using squish, 00:39:24.900 |
that using TTA and use max equals true is best. 00:39:38.260 |
So we can just repeat basically what we had yesterday, 00:39:50.860 |
Now there's no width decoded, I don't think, for TTA. 00:40:00.340 |
So we're gonna have to do a bit of extra work here. 00:40:03.260 |
So this is gonna give us the probabilities and the targets. 00:40:07.780 |
And so the probabilities, each row is gonna contain 00:40:50.500 |
so for each of the 3,469 things in the test set, 00:40:58.620 |
which presumably means the length of the vocab is 10, 00:41:07.260 |
Great, so to find, so what we wanna do is find out, 00:41:16.660 |
is whatever thing has the highest probability. 00:41:24.340 |
So in math and PyTorch, NumPy, that's called argmax. 00:41:28.940 |
So argmax is the index of the thing with the highest value. 00:41:35.740 |
And so what do we want to take the maximum over, 00:42:22.700 |
Now, I realized actually this thing we did yesterday 00:42:25.500 |
where we went K colon V for K comma V in enumerate, 00:43:29.860 |
and pass in each pair of these as an argument to it. 00:43:34.860 |
And so Python actually has syntax to do exactly that 00:43:58.860 |
That's gotta be a mapping which enumerate already is. 00:44:38.540 |
All right, I'm gonna have to try to think of a better way 00:44:47.540 |
- I think you don't need the star star in that case. 00:45:08.980 |
Okay, so what I'm gonna do is I'm gonna make a copy 00:45:17.540 |
of the last time we did a head of the submission. 00:45:20.500 |
And one reason I'd like to do that for my new submission 00:45:23.940 |
is to confirm that our new one looks somewhat similar. 00:45:26.940 |
So previously we went hispanormal, downy, blast, blast. 00:45:32.500 |
And so this makes me feel comfortable that, okay, 00:45:36.740 |
It's still giving largely the same results as before 00:45:41.820 |
And so that's just something I like to do, okay. 00:45:45.140 |
And then another thing I like to do is kind of keep track 00:45:55.660 |
and just pop it into a different notebook or a comment. 00:45:57.700 |
So down here, I'm just gonna have non-TTA version, 00:46:03.420 |
All right, so we should be able to submit that now. 00:47:17.940 |
Oh, it, oh, I see, I've got to, how does it happen? 00:47:29.080 |
I've got two desktops going, I didn't notice that. 00:48:14.620 |
- Wait, I thought yours was better than that. 00:48:36.100 |
Actually, 34th out of, I mean, it's just a fun competition. 00:48:48.580 |
Okay, so this person's still way ahead, right? 00:49:18.600 |
To create an ensemble, I would be inclined to maybe, 00:49:24.620 |
we could create an ensemble with an unsquished version, 00:49:37.600 |
So what I would do is I'd kind of like copy all the stuff 00:49:45.180 |
And then I would kind of paste them down here, 00:49:50.380 |
go through and remove the stuff that isn't quite needed. 00:50:21.100 |
And so then to merge cells, it's shift M, M for merge. 00:50:47.380 |
of probabilities and a second set of targets. 00:51:05.660 |
And then another model I would be inclined to try 00:51:31.220 |
and multiply that by the smaller side we want. 00:51:40.060 |
Nice to find something that works a bit more evenly, 00:51:51.820 |
So we could create 168 by 224 images, for instance, 00:52:21.860 |
And I never quite remember which way round it is, 00:52:50.100 |
yeah, all of our input images are the same aspect ratio. 00:52:53.340 |
So there's no particular reason to make them square. 00:53:05.940 |
as the thing that everything gets changed to. 00:53:09.900 |
But when everything's wider than they are tall, 00:53:12.500 |
especially when they're all the same aspect ratio, 00:53:18.220 |
And another thing I guess we should consider doing 00:53:26.240 |
you can change their resolution more gracefully 00:53:38.340 |
So we could do 320 instead of 640 and by 240. 00:53:43.340 |
So that would be another one I'd be inclined to try. 00:54:02.360 |
and we know how to check it, which is to go show batch. 00:54:06.680 |
Okay, so you can see I've got it the wrong way around. 00:54:41.060 |
And like given that we're gonna have such nice clear images, 00:54:48.140 |
or the ones where we're zooming and rotating and stuff. 00:54:54.900 |
we can change the probability of affine transforms 00:55:04.780 |
So in theory, I feel like this one feels the most correct 00:55:11.820 |
given that the data that we have is a fixed input size 00:56:00.000 |
And this one's gonna be rectangular, rectangular. 00:56:14.040 |
I guess I'm a little worried that paper space 00:56:25.200 |
into my notebooks directory just to be a bit paranoid. 00:57:00.480 |
I like the fact that I've got paper space so well set up 00:57:03.880 |
now that I don't even remember I'm using paper space. 00:57:16.080 |
All right, I'm gonna not have you guys watch that run 00:57:31.040 |
on like the data transformations and augmentations. 00:57:34.360 |
When would you focus on that versus, you know, 00:57:40.080 |
- Given that this is a image classification task 00:57:54.760 |
it will almost certainly have exactly the same 00:57:59.040 |
characteristics as ImageNet in terms of accuracy, 00:58:19.080 |
showing which TIM models are better than others 00:58:25.040 |
So I would, once everything else is working really well, 00:58:30.040 |
you know, I would then try it on a couple of models 00:58:36.760 |
like base or large or whatever I can get away with. 00:58:48.520 |
which has the kind of pictures that aren't in ImageNet, 00:58:58.000 |
I would, let's say it was a segmentation problem, 00:59:01.560 |
which is about recognizing what each pixel is, 00:59:11.920 |
Instead, I would go and look at something like 00:59:16.520 |
which techniques have the best results on segmentation. 00:59:32.680 |
they always say, oh, we made an ensemble, which is fine. 00:59:36.800 |
But the important thing isn't that they did an ensemble, 00:59:45.040 |
And I would use this kind of like smallest version of X 00:59:48.560 |
And yeah, generally fiddling with architectures 01:00:03.080 |
which almost any computer vision problem is of that type. 01:00:06.640 |
I guess the only interesting question for this one would be 01:00:17.760 |
but I'm fairly sure that using that information 01:00:51.440 |
and don't feel like you can only ask a question 01:00:55.360 |
like it's totally fine to ask a question about a video 01:01:07.000 |
oh, we covered this in this video, here's where you go, 01:01:09.720 |
I will tell you that, and that's totally fine. 01:01:11.560 |
But, and if it's like, okay, you said this thing 01:01:15.080 |
in this other video, but I don't get it, say it again, 01:01:23.140 |
because people can go back and rewatch the videos 01:01:27.600 |
and ask questions about things that aren't clear. 01:01:30.220 |
So yeah, it definitely does rely on people turning up 01:01:34.160 |
and saying, I'm not clear on this or whatever. 01:01:38.280 |
- Yeah, well, I sort of started from ground zero 01:01:45.400 |
I'm starting to feel a little bit more comfy with it. 01:01:49.040 |
- And I just wanna take the time to work through, 01:01:52.360 |
my way through and absorb what you've been talking about. 01:01:59.000 |
there's a couple more lesson lessons to come. 01:02:07.920 |
So there'll be a couple of weeks there to catch up. 01:02:16.080 |
or not join in any time and ask questions about any video 01:02:19.720 |
or even about things that's not covered in a video, 01:02:21.960 |
but you feel like would be something useful to know