back to indexLive coding 12
Chapters
0:0 Review of best vision models for fine tuning
10:50 Learn export file pth format
12:30 Multi-head deep learning model setup
16:0 Getting a sense of the error rate
20:0 Looking inside the model
22:30 Shape of the model
23:40 Last layer at the end of the head
26:0 Changing the last layer
29:0 Creating a DiseaseAndTypeClassifier subclass
38:0 Debugging the plumbing of the new subclass
46:0 Testing the new learner
49:0 Create a new loss function
52:0 Getting predictions for two different targets
56:0 Create new error function
00:00:00.000 |
Yeah, so I guess the thing I've kind of learned, I guess I should share my screen and say what I feel like I've learned from this sweet scavit kettle. 00:00:44.000 |
Pets. This is the top 15 after running a few more sweeps to see if there's some better options but basically, I think what's interesting is in the top 15 pets, you have like a bit of everything right, you've got ResNet RS and ResNet B2 distilled. 00:01:12.000 |
You know, VIT transformers ones, VIT, SWIN, mobile, VIT. 00:01:25.000 |
Still in there. Actually the fastest of all, which maybe reflects like maybe that's the most well optimized kind of because it's been around a while and videos probably worked hard on that. 00:01:37.000 |
ResNet GPU memory, like, yeah, it's kind of interesting the way there's all these very different approaches, but they, they all kind of end up in there. 00:01:53.000 |
I think, you know, one thing interesting when you look at on the graph is this, these green ones is VIT, 00:02:07.000 |
which they kind of cut off here at this fit time of 150. And I think that's because the larger vision transformers only work on larger images and I only did 24 pixel images. 00:02:23.000 |
So if, if I included larger images, we might see VIT at the very best in terms of error rate. 00:02:33.000 |
I was pleasantly surprised to see that some of the VITs, you know, didn't actually use much memory. 00:02:46.000 |
And they were pretty fast. And it's interesting because that was quite a while ago, right, the vision transformers paper came out quite a while ago, and 00:02:58.000 |
it was kind of like the way I remember it was like just tacking together the first and most obvious thing people thought of when it came to kind of making transformers work on vision and 00:03:09.000 |
yeah, the fact that it still works so well, it seems that people haven't improved on it much other than perhaps Swin. 00:03:18.000 |
So I guess that my takeaways on the like, small and fast models. Yeah, I guess I hadn't really looked much at resident IRS before so it's interesting to see that one's right up there. 00:03:36.000 |
And then on planet, it doesn't look that different except, you know, it really is just VIT, Swin and Conf Next. In fact, entirely is VIT, Swin and Conf Next. 00:03:59.000 |
Yeah, so I guess that's some of the things I noticed. 00:04:14.000 |
This was able to try so many different kinds of models. Yeah, ran in less than 24 hours on my three GPUs. 00:04:31.000 |
Yeah, I thought this was interesting. Look at this one, VIT small patch 32 memory usage is amazing. 00:04:46.000 |
And, yeah, third from the top of the small, faster ones. I have a question. Yeah. Can you hear me. Sure. Yes, hi. 00:04:57.000 |
Just a small question. So you mentioned small and large does large mean the size of the image. 00:05:06.000 |
Sometimes when I was talking about the IT Yes, it does the IT. 00:05:12.000 |
So I was specifically just doing 224 by 224 pixel images, and pretty much all the transformer based ones are fixed, they can only do a one size. 00:05:24.000 |
And so the IT models I don't think they have. 00:05:30.000 |
So, there's two meetings of larger here this larger as in more bigger models like more layers, wider layers. 00:05:38.000 |
And I think all the VIT models which are larger or capacity slower, probably more accurate, I think only run on literally on bigger images, which is like doesn't have to be that way that's just what they happen to do. 00:05:57.000 |
There are bigger VIT models that should be more accurate but they don't work on 224 by 224 pixel images, is there like a good threshold to know when is a good time to use large versus this fall one or is it just all experimental larger images. 00:06:14.000 |
You basically do everything on a smaller images you can, as long as it gets you reasonable results, because you want to iterate quickly. 00:06:24.000 |
And then when you're finished and you want to, you know, like it like in the case of Kaggle you want the best accuracy, you can. So you try bigger and bigger images to see what you can get away with and skip doing that, as long as the accuracy improves. 00:06:40.000 |
In a production environment, it's kind of similar but you make them bigger and bigger until you the latency of your model is too high, you know, you find the right trade off between model latency and accuracy. 00:06:54.000 |
Generally speaking, larger images will give you more accurate results. 00:07:02.000 |
Correct. I mean, a lot slower right because if you know if you increase it from 224 to on one side to 360, then you're going from 224 squared to 360 squared. 00:07:19.000 |
So like for for example an application would be like for object detection for video for example like live video, then even if it's a larger size, it will still be good to do small because it's faster. 00:07:36.000 |
Certainly for your iterating. There's no need to have a really accurate model for your iterating, because you're trying to find out what data pre processing works best or what architecture works best or whatever so yeah there's no point using large models 00:07:51.000 |
and large images generally, as long as they're big enough to get the job done recently well. 00:08:01.000 |
You were, you were going to do this in like a business context, let's say, and someone said hey German one to, you know, have a vision model, would you, would you kind of just pick a reasonable one and just kind of go with that and if the results were good, you 00:08:17.000 |
would just use it or I would do exactly what I'm doing here, you know, which is to try a few things on, you know, small fast models on small images on a small subset of the data. 00:08:30.000 |
I would look at what data pre processing to use and what architecture to use. 00:08:36.000 |
And then I would look at what are the constraints in terms of operationalizing this. 00:08:42.000 |
How much RAM do we have how much latency can we get away with how's expensive is it going to be to then scale it up to the point at which, you know, we're getting acceptable results using acceptable resources. 00:08:59.000 |
I wouldn't look very different at all to, you know, a Kaggle competition in terms of the modeling, but then there'd be a whole piece of analysis around user requirements and costs and stuff like that. 00:09:19.000 |
I tried doing what you're doing was going from smaller to larger models and mine somehow started out with much lower accuracy. 00:09:32.000 |
Is it just a fluke or I mean I had several issues happen that and then 00:09:42.000 |
you press the wrong buttons somehow. So I will. 00:09:51.000 |
I think I already have, haven't I shared my notebooks. So if I haven't, I'll certainly share them today. 00:10:05.000 |
Maybe I haven't yet. So I share my notebooks today. So what I suggest you do is like, like, you know, go from mine, make sure you can rerun them and then look at yours and see how they're different and then figure out where you went wrong. 00:10:18.000 |
But also like, yeah, you know, I always tell people when debugging like to look at the inputs and the output. So what predictions are you making? Are you always predicting zero, for example, you know, did you run the LR find or define what learning rate works well. 00:10:44.000 |
Jeremy, on the question for the production I I tried to after your work for last week, you did mention learn export and then later on we can learn that not know I found that is maybe there's a bug there because when I load and they actually looking for the 00:11:05.000 |
P suffix is dot PTH. But when we save and they say to the models folder, and then you can keep whatever name you want. 00:11:17.000 |
But when you want to load them, they actually have the suffix at the end. So I'm not sure there's some. 00:11:24.000 |
Yeah. So just to make sure you save it with a dot PTH suffix. But yeah, it certainly would make sense. You're asked to do that automatically. 00:11:32.000 |
But in the documentation, it seems it's a safe in pickle. 00:11:40.000 |
But the extent but, but this is just PyTorch so PyTorch uses, you know, a variant of the pickle format, and they normally use PTH as their extension. 00:11:51.000 |
So, yes, it is pickle and just it does use the PTH extension. 00:11:57.000 |
Jeremy, when you're opening this window. Yeah, you typed in like something in the in the in the URL bar to get worried. 00:12:08.000 |
Oh, I just typed my port number because I know that the only thing that has 8888 and that is okay. 00:12:16.000 |
I have some magic going on. Yeah, nothing like that. 00:12:34.000 |
I did have one more idea about this competition. 00:13:47.000 |
So there's 10,000 rows and 7,000 of them are one variety. 00:13:54.000 |
But there are 3,000 rows that contain other varieties. 00:14:00.000 |
So the only, you know, idea I had for this was something which is a bit counter intuitive but those of you that did I can't remember 2017 or 2018 fast AI might remember. 00:14:15.000 |
Sometimes if there's two different things just in this case what kind of rice is it, and what kind of disease is it sometimes trying to get your model to predict both of those things makes them better at both. 00:14:28.000 |
So if we try to get our model to predict what kind of disease is it, and what kind of rice is it. 00:14:35.000 |
It might actually get better at predicting the kind of disease, which might sound counter intuitive right the I find it counter intuitive because it sounds like it's got more work to do. 00:14:46.000 |
But you're also giving it more signal, like there's more things you're teaching it to look for and so maybe if it knows how to recognize different types of rice. 00:14:56.000 |
You can use that information to also recognize how different kinds of rice are impacted by different diseases. 00:15:02.000 |
So, I have no idea if that's going to be useful or not but I thought it would be an interesting exercise to try to, to try to do that. 00:15:12.000 |
So that's what I thought we might have a go at today. That sounds of interest, 00:15:18.000 |
which also is like frankly like a good exercise in 00:15:26.000 |
delving into models in a way we've never done before. 00:15:36.000 |
yeah, much more sophisticated than anything we've done with deep learning before, 00:15:44.000 |
which means it's very much up to you folks to stop me anytime something slightly confusing. 00:15:53.000 |
Because I actually want everybody to understand this. And it's a really good 00:16:00.000 |
test of how well you understand what's going on inside a neural network so if you're not understanding it as a sign I haven't explained it very well. 00:16:10.000 |
So, let me try. Let's have a look. Okay. So, one thing I just did yesterday afternoon was I just trained a model three times 00:16:25.000 |
to see what the error rate was, because I wanted to get a sense of like how much variation is there. 00:16:31.000 |
And I found if a user error learning rate of 0.02 and just train for three epochs I seem to pretty consistently get reasonable results. 00:16:39.000 |
So, that's something I can now do in two minutes. 00:16:43.000 |
See how I'm going. So I thought that would be good. So, this is one thing I really like doing people are often very into doing reproducible training where they have like set the seed for their training and run the same thing every time. 00:16:56.000 |
And I think that's normally a bad idea, because I actually want to see like what the natural variation is and so if I make a change I don't want to know whether that's, you know, changes. 00:17:07.000 |
The difference I see in the result is might be just due to natural variation, or it's actually something significant. So that's why I did the natural variation is really large. 00:17:18.000 |
Does that every week. Yeah, that's going to be tough to see like, did I improve things but then if the natural variation so large that improvements are invisible and trying to improve it seems pointless right because it sounds like you haven't really found a way to 00:17:36.000 |
And normally that happens because my learning rates to be. 00:17:42.000 |
So if you try this yourself and bump the learning rate up to point oh four you'll see like at least for me I got like 5% 6% five and a half percent, you know, it's like all over the place. 00:17:53.000 |
So, yeah, trading for more epochs at a lower learning rate will generally give you more stable results. 00:18:00.000 |
And there's a compromise because doing more epochs is slow so that's why I was trying to find a learning rate and the number of epochs which is fast and stable. 00:18:09.000 |
You could also try using a smaller subset of the data or, I don't know, like, in the end sometimes things just will be slow and such as life but most of the time I find I can get a compromise and I certainly did here, I think, 00:18:23.000 |
with six epochs at half the learning rate I certainly can do better, I can get to 4%, you know, rather than five, but that's okay, I just want something for the testing. 00:18:35.000 |
I think that was always counterintuitive to me that I think you talk about is like these improvements that you make on a small scale, like, show up on the larger scale, like always. 00:18:48.000 |
Oh yeah, absolutely. Basically, they pretty much always will. Yeah, because they're the same models with just more layers or wider activations. 00:18:58.000 |
Yeah, if you find something that's going to, some pre processing step that works well on a ConfNEXT tiny, it's going to work also well on a ConfNEXT large 99.9% of the time. 00:19:14.000 |
Most people act as if that's not true, I find, but you know like in academia and stuff. 00:19:20.000 |
I feel like you have to do a full suite of everything. Yeah, which, like most people just never think to try, but like intuitively. 00:19:33.000 |
Of course it's the same, you know, why wouldn't it be the same like it's, it is the same thing just scaled up a bit, they behave very similarly. 00:19:44.000 |
I mean it's hard to argue with you because it works. So, I mean, 00:19:49.000 |
Yeah, you can argue that it's not intuitive, that's fine, but like, I feel like the only reason it would be not intuitive is because everybody's told you for years that it doesn't work that way. 00:19:57.000 |
Do you know what I mean? Nobody told you that. I think it'd be like, yeah, of course it works that way. 00:20:04.000 |
Okay, so, okay, let's do something crazy. Let's actually look at a model. 00:20:10.000 |
So inside our learner, there's basically two main things. There's the data loaders, learner deals, and there's the model, learner model. 00:20:17.000 |
Okay, and we've seen these before. And if you've forgotten, then yeah, go back and have a look at the older videos from the, from the course. 00:20:30.000 |
Yeah, it's got like things in it. And in this case, the first thing in it is called a Tim body. 00:20:36.000 |
And the Tim body has things in it. The first thing in it is called model, and then Tim body dot model. 00:20:42.000 |
It has things in it. The first thing is called the stem and the next thing is called the stages and so forth, right? So you can see how it's this kind of tree. 00:20:49.000 |
And we actually want to go all the way to the bottom. 00:20:52.000 |
So the basic top, the very, there's two things in it at the very top level. There's a Tim body. 00:20:59.000 |
And there's a thing here, which doesn't actually have a name, but we always call it the head. 00:21:05.000 |
And so the body is the bit that basically does all the hard work of looking at the pixels and trying to find features and stuff like that. 00:21:13.000 |
That's something we call a convolutional neural network. 00:21:16.000 |
And at the very end of that, it spits out a whole bunch of information about those pixels. 00:21:26.000 |
And the head is the thing that then tries to make sense of that and make some predictions about what we're looking at. 00:21:32.000 |
And so this is the head. And as you can see, the head is pretty simple, whereas the body, which goes from here all the way to here, is not so simple. 00:21:47.000 |
And we want to predict two things, what kind of rice it is and what disease it has. 00:21:56.000 |
Now look at the very, very, very last layer. It's a linear layer. So a linear layer, if you remember, is just something that does a matrix product. 00:22:08.000 |
And the matrix product is a matrix which takes as input 512 features and spits out 10 features. So it's a 512 by 10 matrix. 00:22:22.000 |
So let's do a few things. Let's grab the head, right? So the head is the index one thing in the model. 00:22:36.000 |
You know, I've seen these model, sort of, whatever you want to call it x rays, a lot. Have you ever wanted to, like, is there, is there a way that maybe I don't know about to see the shape of the tensors as a flow, the shape of the data as it flows through the model? 00:23:06.000 |
You should try watching some fast AI lectures. 00:23:11.000 |
Yeah, so this will tell you how many parameters there are. 00:23:15.000 |
And yeah, the shape as it goes through. And so the key thing is since we're predicting 10 probabilities, one probability for each of the 10 possible diseases, we end up with a shape of 64 by 10. 00:23:27.000 |
The 64 is because we're using a batch size of 64. And for each image, we're predicting 10 probabilities. 00:23:34.000 |
It's very thorough and shows the callbacks. Wow. I don't remember this. Yeah, we don't look around, man. Here in fast AI, we're thorough. 00:23:42.000 |
So, yeah, so I'm back to the question because that's that's a great thing for us to look at. 00:23:50.000 |
So, yeah, so in the head, let's create something called the last layer, which is going to be the end of the head and the very end of the head. 00:24:04.000 |
So our last layer is this linear thing, right? 00:24:09.000 |
And so this is so we could actually see the the parameters themselves. 00:24:17.000 |
Oh, I hope it does that. A lot of these things are generated lazily, right? 00:24:22.000 |
So when you see this thing saying generator object, it's just it's it's literally the word is lazy. 00:24:28.000 |
It's too lazy to actually bother calculating what it is. So it doesn't bother until you force it to. 00:24:34.000 |
So if you turn it into a list, it actually forces to generate it. OK, so it's a list of one thing, which is not surprising. 00:24:42.000 |
Right. There it is. And so the last layer parameters. 00:24:52.000 |
Is a matrix, which is there we go. Ten by five twelve. So it's transposed to what I said, but that's OK. 00:25:02.000 |
So we're getting 512 inputs. And when we multiply this by this matrix, we end up with 10 outputs. 00:25:41.000 |
Home schooling transitions always require some input. 00:26:03.000 |
If we got rid of this, right, then then our last. 00:26:13.000 |
Linear layer here would be taking in one thousand five hundred and thirty six features and spitting out 512 features. 00:26:24.000 |
So what we could do would be to delete this layer. 00:26:34.000 |
And instead, take those one thousand five hundred and thirty six. 00:26:39.000 |
So this five hundred twelve features and create two linear layers, one with 10 outputs as before and one with however many. 00:27:07.000 |
So I was just contemplating whether back in that linear layer where it was the output was ten by five twelve. 00:27:16.000 |
Not the output. That's the matrix. So the output was ten. The output is sixty four by ten. 00:27:21.000 |
Yes. So when you want to mix diseases with rice in the output, I was wondering whether that might be like I don't know how many rice types there are. 00:27:33.000 |
OK, so that ten might be ten by ten matrix output. No, two by ten. So you want one probability of what type of race is it and one probability of what disease does it have. 00:27:50.000 |
So let's let's go ahead and do the easy thing first, which is to delete the layer we don't want. 00:27:56.000 |
So this says sequential. So sequential means like literally PyTorch is going to go through and calculate this and take the output of that and pass it to this and take the output of that and pass it to this and so forth. 00:28:08.000 |
Right. So if we delete the last layer, that's no problem. It's just won't ever call it. 00:28:15.000 |
So I can't quite remember if we can do this in sequential, but let's assume it works like normal Python. 00:28:33.000 |
Yeah, we can. OK, so it's got normal Python list semantics. 00:28:40.000 |
So this model will now be returning 512 output. 00:28:53.000 |
So we want to basically wrap it in a model which instead has two linear layers. 00:29:06.000 |
So there's a couple of ways we can do this, but let's do it like the most step by step away. 00:29:15.000 |
That's the second look. So we're going to create a class. Right. So in PyTorch, modules are classes. 00:29:21.000 |
Right. So we're going to take a class which includes this model. Right. So let's call this class. 00:29:42.000 |
PyTorch calls all things that it basically uses as layers in a neural net a module. So this is a neural net module. 00:29:51.000 |
Now, if you haven't done any programming in Python before, it would be very helpful to read a tutorial about basic Python programming, 00:30:00.000 |
because PyTorch assumes that you are pretty familiar with it. 00:30:05.000 |
If you've done any kind of OO programming before, I'm going to work on the assumption you have. 00:30:11.000 |
Then the constructor. There's a lot of weird things in Python. The constructor is called dunder init. So this is this is so dunder means underscores on each side. 00:30:22.000 |
And it always passes in the object being constructed or the object we're calling it on first. So we'll give that a name. 00:30:30.000 |
And so we're basically going to create two linear layers. 00:30:36.000 |
And one easy way to create the correct kind of layer would be. 00:30:47.000 |
So one question is like I understand this sub classing thing. Yeah. Is there some other way that you could push to two additional layers onto the existing thing or does that make any sense? 00:31:02.000 |
Yeah, we could try that. Let's see if we get this one working and then we'll try the other way. How about that? 00:31:07.000 |
And then we could also try using fast.ai has a create head function as well. 00:31:15.000 |
So we'll see how we go. So here's here's linear layer number one. 00:31:21.000 |
And as you can see, I literally just copied and pasted. It's inside the end sub module. So I just had to add that. But the representation of it is nice and convenient and that I can just copy and paste it. 00:31:33.000 |
In real life, we never normally write the in features now features. Everybody kind of knows that the first two things are in and out features. So let's make it look more normal. 00:31:45.000 |
So then the second layer and, you know, then maybe we'll like just give ourselves a note here. So we'll use this one for rice type and we'll use this one for disease. 00:31:59.000 |
OK, so at this point, once we create this is going to be these things are going to be in it. And then we also need to wrap the actual model. Right. So we'll just call that M and we'll just store that away. 00:32:15.000 |
M equals M. So what happens is when PyTorch calls like basically modules act exactly like functions in Python in Python terms are called callables that act exactly like functions. 00:32:30.000 |
But the way PyTorch sets it up is when you call this function, which is actually a module, it will always call a specially named method in your class and the name of that is forward. 00:32:43.000 |
So you have to create something called forward and it will pass the current set of features to it, which I normally I always call X. I think most people call it X if I remember correctly. 00:33:00.000 |
So this is going to contain a 64 by 512 tensor. 00:33:18.000 |
No, it's not going to contain a 64 by 512 tensor. It's going to contain an input tensor. This is going to be a model. So we need to create the 64 by 512 tensor from it by calling the model like so. 00:33:35.000 |
So results. In fact, what we often do is we'll go X equals because we're kind of making it like a sequential model going X equals. 00:33:43.000 |
Oh, you know, another idea is we something else we can try is we can make this whole thing as a sequential model. Let's do that next. 00:33:52.000 |
So this is probably going to be the least easy way is what I'm doing it here, the most manual way. 00:33:58.000 |
So we first of all call the original model and then basically we're going to create two separate outputs. 00:34:24.000 |
So that's so what I would then do is I would say let's create a new model. 00:34:33.000 |
So disease type classifier. So we'd create it like this. 00:34:40.000 |
And we need to pass in the existing model, which is this thing here. Right. 00:34:53.000 |
Oh, yes. And you always have to call the super classes done to in it to construct the object before you do anything else. 00:35:19.000 |
Okay, I just wanted to point out how cool it is that you created that the model was not the last layer by doing it on it. 00:35:32.000 |
Yeah, that is so cool. I didn't know this existed. Yeah, I had to do something quite difficult because how can you do stuff like that it's not a list and has only functionality to support this. 00:35:44.000 |
Yeah, yeah, I mean, yeah, I kind of like it's nice. 00:35:47.000 |
I generally find I can work on the assumption that PyTorch classes are well designed, because it turns out they generally are. And so to me a world, you know, a well designed collection class would have the exact same behavior as Python. 00:36:01.000 |
For example, fast course L collection class has the exact same behavior as Python. So, yeah. 00:36:13.000 |
So that thing where you deleted the thing and that's a PyTorch thing that's not fast. 00:36:18.000 |
Yeah, it's a PyTorch thing. This is just a regular. This is this is just part of the sequential. Yep. 00:36:31.000 |
By the way, I asked if I would explode the model into layers and then reconstruct it using sequential without the layers that I need. But hey, you can actually do this. This is so nice. 00:37:05.000 |
So we've now got a learner that contains our new model. 00:37:31.000 |
I guess we should get some predictions, right? 00:37:39.000 |
So the main thing I'm waiting for is the loss function. Like, yeah, yeah, yeah, yeah, yeah. That's that's we're going to get there. Let's, let's do this first. 00:37:55.000 |
I suppose you're doing the predictions just to verify the plumbing is working. Exactly. Okay. 00:38:11.000 |
It's stack trace input type. Oh, right, right, right. Floating point 16. Okay. 00:38:30.000 |
I think to simplify things we're going to remove 00:38:44.000 |
Is that some kind of mixed precision? Exactly mixed precision. Yes. 00:38:49.000 |
So let's just pretend that doesn't exist for a moment. 00:39:24.000 |
It's useful to like, if you talk about what's going through your mind when you see this error. 00:39:30.000 |
Yeah, I want to create like a minimum reproducible example. 00:39:42.000 |
let's just like create a learner and then copy it, and then not change it at all. 00:39:51.000 |
I can't even do that. Alright, so this. So then I would be like, okay, let's not even copy it, but instead let's just call it directly. 00:40:09.000 |
Okay, that works. So doing a copy apparently doesn't work. 00:40:13.000 |
Oh, though, generally speaking, I would be inclined to change copy to deep copy at this point. 00:40:23.000 |
Oh, why are we still getting a half precision somewhere. 00:40:40.000 |
Oh, it's probably because our data loaders got changed somehow let's recreate the data loaders as well. 00:40:51.000 |
Yeah, it looked like it was working. And then, at the very end, there we go. 00:40:59.000 |
How are we getting half precision what on earth is making it half precision. 00:41:15.000 |
Do you think resetting your kernel and I think so. 00:41:19.000 |
I don't see how this would help, but there's never any harm right. 00:41:55.000 |
Yep, that's exactly what happened. Okay well that's that shouldn't happen. So that's what a great way you know what happened. I don't know what happened like some like like Alex it's some something has some state. 00:42:07.000 |
State that's keeping things in half position, which. 00:42:12.000 |
Yeah, shouldn't be happening. And so, at some point, we can try to figure out what that is, but not now. 00:42:34.000 |
Actually, before we do, we're just going to use the copy directly. 00:42:39.000 |
So we just make as few changes each time as possible. 00:43:03.000 |
What are you looking for in this saying like I'm trying to see why it's returning. 00:43:09.000 |
I thought that was a decoded thing so I was just wondering why it's being returned. 00:43:22.000 |
By the targets. That's why. That's why that's why. So it actually returns preds, targets. That's what it's returning. 00:43:36.000 |
Alright, so now it's working quite nicely. And so I would be now inclined to like create a really minimal model, which is like a going to call dummy classifier. 00:44:00.000 |
And all it does is a call the original model. 00:44:11.000 |
Because if this works, then we're at a point where we can then try out new models right. 00:44:18.000 |
It's interesting I would have just gone straight back to the, the full model and tried that next, but you're slowly walking away. Yeah, I probably should have done it that way in the, you know, done it more slowly in the first place, but I got in over enthusiastic. 00:45:13.000 |
This is all pretty hacky but we're just trying to get something working. 00:45:18.000 |
So the head is the number one thing in the model, the last layer is the end of the head. 00:45:40.000 |
It's simple. Okay, so we delete the head and store it away. 00:46:11.000 |
This time we call the disease, etc classifier. 00:46:43.000 |
Cool. Okay, so we're now at the point where it's trying to calculate loss, and it has no way to do that. 00:46:59.000 |
I'm slightly surprised it's trying to calculate the loss at all. 00:47:12.000 |
So, okay, so the loss function to remind you is the thing which is like a number, which says how good is this model. 00:47:28.000 |
The loss function that that we were using was designed on something that only returned a single, a single tensor, and we're returning a couple of tenses. 00:47:46.000 |
And so that's why when it tries to call the loss function, it gets confused, which is fair enough. 00:47:54.000 |
So the loss function is another thing that is stored inside the learner. 00:48:29.000 |
One thing would be we could look at the source code for vision learner and see how that creates the loss function. 00:49:05.000 |
Okay, so it's trying to get it from the training data set. So the training data set knows what function loss function to use, which is pretty nifty. 00:49:17.000 |
So to start with, we could let's create a loss function. So let's create a really simple loss function. 00:49:25.000 |
So disease and type classifier loss that we're going to be past some. 00:49:38.000 |
We're going to be past predictions and actuals. 00:49:45.000 |
And what we could do is we could just say like for now, let's say the current loss function is whatever loss function we had before. 00:50:06.000 |
And let's just try to predict. Let's just try to get it so it's working just on the disease prediction, which is this bit here. 00:50:30.000 |
That'll be what's in our preds. And so just to start with, let's just keep getting this, get this so it keeps your works on disease predictions. 00:50:38.000 |
So we'll just return whatever the current loss function was and we'll call it on the disease predictions. 00:50:50.000 |
Okay, so now we need to go learn to dot loss function is that function we just created. 00:51:21.000 |
Sorry, when you did that, to like, set the current loss, set the gloss function in the learner. 00:51:30.000 |
Didn't want this mess up your code. I guess like you need to create the learner again. 00:51:42.000 |
Jeremy, do you want to go up a little bit back to your loss function? 00:51:47.000 |
Is it you actually want to pass the predict target disease disease? 00:51:55.000 |
So I'm just going to ignore the rice type prediction for now. 00:52:01.000 |
And just try to get it our new thing working to continue to do exactly what it did before, but with this new structure around it. 00:52:19.000 |
No, because at the moment, our targets, we haven't included anything other than just the diseases in the targets. So yeah, we're going to have to change our data loading as well to include the rice type as well. 00:52:53.000 |
Ah, yes. Okay. So then we've got metrics. So metrics are the things that just get printed out as you go, and we don't yet have a metric that works on this. 00:53:02.000 |
So a very easy way to fix that is just to remove metrics for now. 00:53:19.000 |
Good, it doesn't, because now we've got two sets of predictions. 00:53:25.000 |
Right, we've got a tuple, because that's the predictions is just whatever's whatever the model creates the model is creating two things, not one. 00:53:34.000 |
So we've now got rice predictions and disease predictions. 00:53:52.000 |
That's actually pretty good progress, I think. 00:53:57.000 |
But you know, like, for those of you who are involved in fast AI development, you know, it's pretty clear to me and try to do this, that this is far harder than it should be, and it feels like something that should be easy to do. 00:54:15.000 |
I used to see Andrew using the magic, the percentage and then have a patch, and then just add some little thing on top of it. 00:54:27.000 |
Yeah, it's not so much about patching it's about, I feel like there might even be some multi loss thing. 00:54:37.000 |
If there's not I feel like this is something we should add to fast AI to make it easier. 00:54:43.000 |
Can you explain a little bit about the why the loss is stored in the data loader like how that is a good thing. Yeah, sure. So, generally speaking, what is the appropriate loss function to use or at least a reasonable default depends on what kind of data you have. 00:55:03.000 |
So if your data is, you know, a single continuous output you probably have a regression problems you probably want mean squared error. 00:55:11.000 |
If it's a single categorical variable you probably want cross entropy loss. 00:55:16.000 |
If you have a multi categorical variable, you know, you probably want that log loss without the softmax and so forth. So yeah, basically by having it come from the, 00:55:32.000 |
the data set means that you can get sensible defaults that ought to work for that data set. 00:55:43.000 |
So that's why we generally most of the time don't have to specify what loss function to use. 00:55:49.000 |
Unless we're doing something kind of non standard. 00:56:01.000 |
Alright, so we're about to wrap up. The last thing I think I might do is just try to put this back, and we can do it exactly the same way, which is to say DTC 00:56:37.000 |
Oh no, it's okay, we're done here. So, we'll just return error rate on the disease predictions. 00:57:07.000 |
Cool. So I guess we should now be able to do things like learn two dot lrfind, for example. 00:57:22.000 |
And this should, we should be able to just replicate our disease model at this point. 00:57:32.000 |
So we're going to think with this extra rice type thing yet, and fine tune. 00:58:01.000 |
Let's see if I search like fastai multiple loss function or something. 00:58:33.000 |
I gotta go but yeah, that's a lot. No worries. 00:58:40.000 |
It looks like this person did something pretty similar. 00:58:46.000 |
They created their own little multitask loss wrapper. 00:58:52.000 |
All right, cool. Well, I think we're at a good place to stop that's. 00:58:58.000 |
We've got back so it's not totally broken. So that's good. 00:59:05.000 |
And next time we will try and plug this stuff in. 00:59:11.000 |
Anybody have any questions or anything before you wrap up. 00:59:16.000 |
Just a quick question, Jeremy it says the valley is point zero zero one but you use point zero one for the fine tune. 00:59:25.000 |
Yeah, I'm not sure to see this is pretty, this is it's picked out something pretty early in the curve I thought something down here seems more reasonable. 00:59:34.000 |
Despite, you know, it tends to recommend like rather conservative values. 00:59:42.000 |
So, yeah, I tend to kind of look for the bit that's I kind of look for the bit that's as far to the right as possible but still looks pretty steep gradient.