- Okay, we're recording. So I know there was some questions on the forum. Matt, I think it was you, right? A couple of questions. - Yep, that's correct. I can read them out for you if you like. - Oh yeah, or you can express them in your own terms, whatever, yeah, tell me.
- So I guess there weren't so much, first of all, there weren't a couple of, there weren't questions. There were just sort of differences when working on paper space, you were working on your local GPU. And I found sim linking the Kaggle folder into storage with my API keys, something that I did to make it a little bit easier to restart.
Just wanting to verify that that's sort of a good thing to do. And, but the one thing that I would be really interested and be really keen to know about is when I did the pip install of Tim, it was fine, it was installed, but I had to restart the kernel.
And I'm wondering, that might be a bit of a pain going forward. I'd prefer to have it persistently there, ready to go. But it's not a Conda package. So I wasn't sure how to... - Yeah, so it's actually pip packages are the only ones we've actually got the persistence working for.
So let's do that one first. So the key thing when you install Tim, let's see, do I already have it installed? Import Tim. Okay, great, I don't, so let's do it. So that the key thing to remember is when you install Tim is to do it with dash dash user.
Now, in order to make that easier, I think what I would be inclined to do would be to edit our slash storage slash dot bash dot local and add to it alias. Let's do PI for pip install equals pip install. Let's do minus U for upgrade. That should work even if it's not installed already.
Minus minus user. Okay, now, if I... So I could close and reopen my terminal or I could just type source and then the name of the script, which of course in this case is exclamation mark dollar. Oopsie dozy, exclamation mark VIM will rerun this. This whole thing needs to be in quotes.
Because it's a single thing, it's my alias. Okay, I'll pair it twice. Okay, so now I can just type PI. And by the way, if you wanna know what something is, if you type which PI, it won't tell you anything useful because it's an alias, not a binary. But if you type type PI, it will tell you exactly what it is in this case.
Oh, it's something that's an alias. So I can type PI, Tim. And the key thing about minus minus user is that's gonna put it in my dash dash local directory. Sorry, my dot local directory. So there it is, Tim. So then all you need to make sure is that your local directory is simlinked to dot slash storage config local.
Now that's interesting. Our, this is here is telling us we've got a broken simlink. So that's what that means. Yeah, dot config is simlinked to slash storage. But there is no dot get config there. So I might've maybe forgot to move that or something. So that's okay. Next time we try to commit, it'll tell us and we'll know to fix that then.
To create a file that's empty, you just use touch. So I'm just gonna go ahead and create an empty file. So at least it exists and then things won't get horribly confused. Did I not touch it correctly? Slash storage slash dot get config. Oh, there's a slash at the end.
Oh, that's why. That's why it's confused. So that would be a directory, this is not a directory. So my guess is that there's a bug in our pre-run script for dot get config. Yes, I've got a slash at the end. So that's why that didn't work. So if I source that.
Now it's happy. Great. So now, yeah, so now since it's been installed into something that's similar to back to slash storage, Tim will be available. And if I run ipython, we can confirm it did install. That should be all good. Does that answer that part of the question, Matt?
- Yes, thank you. - That was. So then the second one, yeah, it was not a question, but a comment, which was about Kaggle. So yeah, when I get back to using Kaggle on this machine, we will do that for sure, which will probably be next time. And you also had a question about jumping around to, you know, the end of a string, for example, which, let's grab Bast.ai's repo, for example.
Let's grab Bast.ai's repo, for example. Bast.ai's repo, for example. (upbeat music) (upbeat music) (upbeat music) (upbeat music) Oh, and you also had a question about loading and serving bottles, right? So, I mean, one thing obviously is it'd be nice to have a tags file. At some point, we could even talk about how to set up VIM to automatically create that for us from time to time.
But let's have a look at, I don't know, layers, for example. So a few things to mention. The first is something which sounds very obscure, but actually isn't, is F in VIM. F in VIM is like slash. Now slash searches, so we've seen it before. Slash init, we'll search for the next thing called init.
Okay. Oh, maybe something we haven't discussed is to go back to where we were, regardless of whether it was a tag or a search or anything, it's Control + O. And right next to Control + O is the letter I, which goes forward again. Okay, so Control + O and Control + I go kind of, it's like pressing the back button and the forward button on your browser.
There's something a lot like slash, but it just finds a single letter, which is F. If I type F, it's gonna, and it's under the search on the current line, it will search on this line for the next thing I type. So if I type F double quote, actually maybe more interesting would be F full stop.
So if I type F full stop, it's gonna jump to the full stop, F dot. So you see it jumps to the full stop, right? And so your question was, well, what about jumping to the end of a string? Now, in this case, the end of a string is the last character of the line.
So there's a better answer, which is to start inserting at the end of the line, it's Shift + A. Just one moment. (mouse clicking) (mouse clicking) My daughter's got kicked off her Zoom call, always technical problems. Okay, so I can undo that. Control + O to go back to where I was.
But yeah, so let's say there was some stuff at the end, hash some comment, right? And we wanted to, yeah, we wanted to go to the next double quote. I can just type F double quote, and it takes me there. And then Shift + F does the opposite, it searches backwards.
And the reason it's interesting mainly is that that's a motion, and therefore I can combine things with it. So for example, if I wanted to delete everything up to the next quote, I can press D, F double quote, right? And then I could press slash double quote to search the next one and press dot, and it'll do the same thing again, right?
Or maybe delete everything up to the next comment would be D, F hash. So yeah, those are a couple of useful things. Another really useful one is percent. Percent jumps between the start and the end of a set of paired parentheses or braces or brackets. So if I press percent here, it goes to the start of end of the next parentheses and then press it again, and you can see it jumps between the two, right?
And so if I do it from here, you can see it jumps to the end of this one, right? If I do it at the very end, it'll jump to this one. So if I want to delete from here to the end of the parenthesized parenthetical expression, like let's say to delete this bit, I could press D, F, sorry, D, F percent.
Sorry, not D, F percent, just D percent. There you go, D percent, you see? Although there's actually something even better for that, which is I, and I refers to an area, the whole area that is surrounded by some kind of parentheses. So even when I'm in the middle of these parentheses, the enclosing parentheses would go from here to here, and so I stands for inside.
So if I want to delete everything inside those parentheses, I can type DI, open round, do that again, DI, open round parentheses, and it deletes the contents, which is really nice. So let's say I wanted to replace all my parameters with something else, like A comma B, then I would use C for change inside parentheses.
So type my change, like A comma B, right? And then I can come down here and type dot, and it'll do the same thing. So yeah, it's like medial work. You can kind of really crush with these tricks. - Great, it's fantastic. - Yeah, it's cool. There's a lot of them, and you don't have to know them all, you know?
It's like you can learn one thing each day or something, but yeah, I'm not using any plugins or anything, you know? Okay, so we're gonna save a model in a moment. Any other questions or comments before I go back to our notebook? - I want to make one comment about the Tim installation.
I don't know if maybe you discussed this yesterday 'cause I came a little late, but with the Tim installation, sometimes it might be better to install from master because there are some changes that Ross has made that you might not receive from- - Yeah, we did mention that yesterday.
Actually, I think the conclusion we came to was to install the latest pre-release because that's like something that's more stable than installing from master, but, you know, better than his, sometimes like here he went six months without updating. So yeah, I agree. In fact, so let's do that. So this is 0.6.2 dev.
So I think we decided that we'd go, let's use our new PI thing. Tim is greater than or equal to 0.6.2 dev. 0.2 dev. Great, yeah, thanks for the reminder, Tanisha. All right, great. It's kind of this thing in Python modules and quite a lot of other things. If there's like an extra .dev at the end, that means it's a pre-release basically.
And so PIP has this convention that if you say, I want to install something that is at least as recent as 0.6.2 dev, then that's a way of signaling to PIP that you're happy to include pre-release options. - Is there any reason that when you do the installation of a theme and then you try to use the learner, it says that theme doesn't exist when you try to load the model.
- Right, that's because you have to restart the kernel after installing it. And so now that it's installed in .local every time I start a machine, it's going to be there anyway. So you wouldn't have to worry about that again. Okay, so this was our notebook from yesterday. And I wanted to try and improve the model.
And one of the reasons I wanted to try to improve the model is because our result was worse than the top 50%. There you go, top 56%. I didn't know that was a tip, that's handy. And so I want to aim to at least, be as good as this helpful fast AI out of the box person.
So they got .97385, how far off are we? You know, which is quite a bit better than ours, right? - That was me, that was my number. - Fantastic, I like it. It's a good notebook. So we're going to try to beat you. I hope you don't mind, but then you'll know how to beat us because well, at least you know how to match us.
- My only. - So yeah, I saw that what you did here was you trained for longer, which makes sense. And you also used some data augmentation, which makes sense. So let's talk about this. So if we're going to train for, so what's your name, Gerardo? Is it Gerardo or Gerardo?
- Either way, that's fine. - Which is right, I want to be accurate. - Well, my name is Gerardo in Spanish. - I see, so both of them, no worries. Thank you, Gerardo. Okay, so if we're going to train as long as Gerardo did, then, you know, if you train more than about five epochs, you're in danger of overfitting.
And certainly 10, I feel like you're in significant danger of overfitting because your model's going to have seen every image, you know, 10 times. So in order to avoid overfitting to the specific images it's seeing, we should make it so that it sees a slightly different image each time.
And this is discussed in the book in some detail. But basically, if you pass in batch transforms, these are things that are going to be applied to each mini batch, so to each bunch of, however many, 32 or 64 or whatever images. And this is basically a bunch of functions that are going to be applied.
So what does this function do? Org transform, so this is transforms for data augmentation. So we know that the best way to find out what something's going to do is to check its help. So let's start there. Not help, doc. Okay, so it's going to do things like flip our images, rotate them, zoom them, change their brightness, their warp, let's see, show in docs.
Okay, and here's some examples of a very cute puppy that Sylvia found, I think Sylvia found it. So this is all the same puppy, it's all the same picture. And as you can see, each time the model sees it, it sees a somewhat skewed or rotated or brightened or darkened or whatever version of that picture.
And so this is called data augmentation. So let's try then running that. And so all transforms actually returns a list, right? It returns a list of transformations. So here's the flip transformation with a probability of 0.5 at all flip. It's got a brightness transformation with a probability of one, it will change the lighting by up to 0.2.
And then a random resized crop is perhaps the most interesting one, which is it will zoom in such that it has at least 75% of the height width. And it will basically pick a smaller zoomed in section randomly chosen each time. So what we can do is when we say show batch, if you say unique equals true, it'll show the same picture each time.
And so here you can see four versions of the same picture. And you can see sometimes it's flipped, sometimes it's moved a little bit up and down, sometimes it's a little bit darker or less dark, and it's also a little bit rotated. So that's what data augmentation is. And that really helps us if we wanna train a few more epochs.
Then the second thing I figured we should do is, ResNet's actually great, but there are things which are greater. And as we talked about, Tim has a bunch of them and in particular comes next pretty good. And the other thing we could do is think about learning rates. The default learning rate used by Fast AI is one where, I would say I picked it on the conservative side, which means it's a little bit lower than you probably need because I wanted things to always be able to train.
But there's actually a downside to using, a couple of downsides to using a lower learning rate than you need. One is that given fixed resources, fixed amount of time, you're gonna have less epochs, not less epochs, sorry, less distance that the weights can move. The second is it turns out a high learning rate helps the optimizer to explore the space of options by jumping further to see if there's better places to go.
So the learning rate finder is suggesting things around about 0.002, which is indeed the default. But you can see that all the way up to like 10 to the negative two, it still looks like a pretty nice slope. And the other thing to remember is, as we saw after answering Nick's question yesterday, we're using one cycle training schedule, which means we're gradually increasing the learning rate.
And my claim was that by doing that, we can reach higher learning rates. So I would also say that even these recommendations are gonna be a bit in the conservative side. So what I did just before I started this call was I tried training at a learning rate of 0.01, which is five times higher than the default.
And so that's up here. And I did find actually that that did give us a quite a better result with a 2% error. So let's see. I mean, obviously we've got different training sets, but this is hopeful, right? That we're gonna get a better result than our target. It's nice to have a target to aim for.
Okay, so that was the next thing. So then since this took six minutes to train, it's probably a good idea to save it. So there's a couple of different things we can save with. One is .save and the other is .export. So learner.export saves the contents, that's just not very well written.
So the learner, self means that this learner, and it saves it to self.path/fname. So learner.path/fname using pickle. So basically what that means is if you call this learner.export, it's gonna save it into learner.path. So let's find out learner.path is what train images. And so this is actually whatever we passed in here.
So if we were to save things somewhere else, we've got a couple of options. One is to change learner.path by setting it equal to some other path. Or we can just use an absolute path. So an absolute path is something that starts with slash. And so if I were to save it somewhere in storage, for example, then I can type slash storage slash whatever.
Or maybe I wanna put it in slash notebooks somewhere. So these are some ways you can change where it's gonna save. I might even just put it into the current directory. I think that seems fine to me. Well, actually, where are we current directory? Yeah, put it in get paddy.
That sounds fine. Or maybe to be a bit more sure just in case the directory ever changes. Be more specific. So then the other option is learn.save. So learn.save doesn't save the whole learner. It just saves the model and the optimizer state. The difference is that remember a learner doesn't just contain the model, but it also contains, it also contains information about the data looters and specifically what transformations are applied.
So I don't really often, if ever use.save. The only reason I would use.save is like if I was writing something to like, and we've already have stuff this in fast.ai. Well, let's give an example. In fast.ai, we have something that's a callback that can save the model at the end of each e-park or each time it gets a better result than its previous best or whatever.
In those cases, we might use.save 'cause then you recreate a learner and you can .load into the learner. But yeah, for exporting something, I wanna be able to just load that exact thing with all the same details next time, .export's the way to go. So I'm gonna call .export.
I'm gonna use, it's a conv next, it's small, and I did 12 epochs. Oh, and this needs to be an actual path. Normally we actually try to make these things do that for you, but this is less friendly than I would like. Sorry about that. There we go. Okay, so we should now be able to see it.
There it is. Okay, and it looks like we need to give it a .pick or, or whatever. Okay. By default, it'll, with org transforms, which uses random resource crop, it'll randomly pick a subset of the crop of the image of this, up to this, of this size or bigger.
And the validation set, it will pick out the center, it'll, you know, as all the width it can or all the height it can without changing the aspect ratio. If you say squish instead, it will grab the whole thing and change the aspect ratio to squish it into a square.
Matt, you don't have to raise your hand. Just talk to me, mate. What's up? - Can you hear me? - I can't hear you. Does that mean you can't hear me? - I can hear, I can hear you, but I don't think you can hear you. You can't hear anybody.
- So you do need to raise your hand. - I can't hear you, but you guys can hear me. Okay. - Yes. - Yes, we can hear you, Jeremy. - We can hear you. - I see why. Okay, say something. - Can you hear me now? - Yeah, yeah, I can.
All right, ah, okay. All right, did you guys, were you guys saying anything I was meant to be hearing? Did I miss anything? - Yeah, why didn't you choose 12 epochs? - Oh, no particular reason. I just saw that this one was using 14, and I thought, oh, aim for something around there, but maybe just do a little bit less.
I guess I often do around 12ish epochs. Like, it seems to, like, for fine-tuning things, which are somewhat similar to the original, it often seems to get pretty close to, you know, getting all the information it can, just as a rule of thumb. And that's a reasonable amount of time, too, I'd say.
- My assumptions were that the number 460 is because of the size of the images were 460. And then another assumption was 224, because when you show the team with the different, they come next, and the image size was 224, that's the reason that I selected that. Is that, okay, is that a correct assumption?
- Well, they were 640 by 480, so actually, so we do this, look it up in the book, it's under the section called pre-sizing, and I think this is around what we always pre-size to. So actually, maybe 480 would have been better, because then it wouldn't have had to change one of the dimensions, 'cause there was 640 by 480.
And then your size, you picked, I actually changed it. So Gerardo picked 230, but actually, most of these models that are trained on ImageNet are generally trained on 224. So I wanted them to be the same size as what they trained on. So that's why I picked 224. Yeah, so then Squish I've talked about.
Oh, and then the other thing is the model I picked is one with a suffix in 22K. IN here refers to ImageNet, and the 22K refers to the version of ImageNet with 22,000 categories, as opposed to the version that's normally used, which only has 1,000 categories. So this is a ConvNEXT, which is small, but it's trained on ImageNet with a 22,000 category version.
The 22,000 category version, it just has a lot more images covering a lot more different things. So there's a much higher chance that it's gonna have seen something like rice paddy illness than the one with 1,000 images, and it's just seen a lot more different pics, you know. So yeah, I would recommend always using the IN22K pre-trained models.
So those are, I think, the key differences at the training stage. - Yeah, I think when you had put the export and then the error came, that's when it cut off. So I don't think you explained what you did to, we didn't catch the part where you explained the fix.
- The fix. - Well, 'cause the export had an error, right? And then I guess you've now added the path. - I don't think it had an error, but I just, oh, I see, yes, yes, yes. Okay, yeah, the export had an error because this was a string and it actually is via path, which I'd say is an oversight on my part.
I try to make it so that everything can accept a path or a string. So I would consider that a bug that ought to be fixed. So hopefully by the time people watch this video, that might've been fixed. But yes, at the moment, I had to change this to a path.
Thank you. All right, so there's a few things we could do here, right? But one key issue is that the, is that particularly if you don't have method equals squish, when we do validation, it's only selecting the center of the image and that's a problem, right? We would like it to see all the image.
And then another thing is that we've been training it with various different augmentations, but the validation set is any of those augmentations. So there's a trick you can use, which you should particularly use if you don't use squish and it's effectively cropping into the center, which is something called test time augmentation.
And in test time augmentation, we basically get multiple versions of each image. We actually by default get four different randomly augmented versions of each image. And plus the unaugmented version, we get the prediction on every one and then we take their average. And that's called test time augmentation. And as I said, it's gonna work particularly well without the squish, but it ought to work well even with the squish.
So to get those predictions, let's first of all make sure we can replicate this error rate manually, right? So if we go probabilities, targets equals learn.get preds and we pass in the validation set. And then we should find that if we ask now for the error rate, shift tab, so the inputs of the probabilities and the targets of the targets, there we go.
Okay, so that's our 2.02% error rate. So we've replicated that. Okay, so now we've got that 2.02. I would then try out TTA. And of course, before we use a new function, we would always read its documentation. Here we are, .TTA. So return the predictions on some dataset or some data loader.
We get the predictions n times by default four using the training set transformations. Great, oh, and instead of getting the average of predictions, we could also get the max of predictions. Cool, and the other thing which I definitely encourage you to do is it's always good to look at the source code 'cause my claim is that fast AI functions are generally not very big.
And like also quite a bit of it's stuff you can kind of skip over, right? This kind of like, oh, what if it's none? What if it's none? Like this is just setting defaults. You can kind of skip it. Try finally is you can skip 'cause it's just error handling.
With, you can pretty much split progress bars. You can pretty much skip. So the actual work starts happening here. We're gonna call self.get preds passes in a data loader and then it catenates that all together. And then it takes either the maximum or the main depending on whether you asked for the max or not.
And it also grabs it for the validation set data loader. Yeah, so you kind of get the idea. So let's run it. See if we can beat 2.02%. So you can see here it's running at four times for each of the four augmented versions. And then it will run at one time for the non-augmented version.
Okay, and it beat it, but just by a little bit. And then, you know, another thing is, well, what if we did the non-maximum? Use max equals false. Use max equals true. Use the maximum instead of the average. Yeah, I kind of wish I didn't have the squish in now, but I don't want you guys to have to wait 10 minutes for it to retrain 'cause then it's much more clearly see the benefit of using TTA.
That's interesting, that one's worse. So I generally find that when not using squish, that using TTA and use max equals true is best. Okay, so now we've done all that. We can try and submit this one to Kaggle. So we can just repeat basically what we had yesterday, but instead of get preds, we'll do TTA.
Now there's no width decoded, I don't think, for TTA. So we're gonna have to do a bit of extra work here. So this is gonna give us the probabilities and the targets. And so the probabilities, each row is gonna contain a probability for each element of the vocab. (laughs) So we can take a look.
And so it's a, so for each of the 3,469 things in the test set, there are 10 probabilities, which presumably means the length of the vocab is 10, which it is. Great, so to find, so what we wanna do is find out, well, what's it actually predicting? And the thing it's predicting is whatever thing has the highest probability.
So we're gonna go through each row and find the index of the thing with the highest probability. So in math and PyTorch, NumPy, that's called argmax. So argmax is the index of the thing with the highest value. So robs.orgmax. And so what do we want to take the maximum over, which dimension?
So we wanna do it over rows, which I think we say dimension equals one. (silence) There we go. So that's the correct shape. So now we should be able to do the same thing we did yesterday, which is to convert that into a series. And now we should be able to run this mapping.
Now, I realized actually this thing we did yesterday where we went K colon V for K comma V in enumerate, is actually a really long way of just saying create a dictionary from those tuples. So when you create a dictionary, you can do it like this. (silence) Right? Or you could do this.
Here's a tuple of tuples. (silence) Okay, sorry. Here's a tuple of tuples. And ideally, what we'd like is to call dict and pass in each pair of these as an argument to it. And so Python actually has syntax to do exactly that for any function, not just dict, which is the function star star.
And star star means take a mapping and pass it in as pairs. So that's what this does, right? That's gotta be a mapping which enumerate already is. So that's what star star, let's just pop this here. Why is this not working? I expect this to work. How annoying. Well, so much for that discussion.
Annoying. All right, I'm gonna have to try to think of a better way to make this work, 'cause so far, similar problem to what we had yesterday. - I think you don't need the star star in that case. - Wow, that's nice, isn't it? Even better. Thanks for the trick.
Okay, I didn't quite get to show you about how call star star is, but nevermind. Okay, so what I'm gonna do is I'm gonna make a copy of the last time we did a head of the submission. And one reason I'd like to do that for my new submission is to confirm that our new one looks somewhat similar.
So previously we went hispanormal, downy, blast, blast. Now we go hispanormal, blast, blast, blast. And so this makes me feel comfortable that, okay, we haven't totally broken things. It's still giving largely the same results as before with a few changes. And so that's just something I like to do, okay.
And then another thing I like to do is kind of keep track of stuff I've done before. I try not to delete things I've used before, and just pop it into a different notebook or a comment. So down here, I'm just gonna have non-TTA version, just in case I want that again later.
All right, so we should be able to submit that now. Okay, so I use Control + R and then to start typing competitions. Okay, so this is now a Squish, conv next small, 12 Epoch, fine-tune, TTA. Fine-tune, TTA. (sniffs) Oh, what on earth did it do to my window?
How do I get it back? Oh, it, oh, I see, I've got to, how does it happen? I've got two desktops going, I didn't notice that. All right, let's go and check out Kaggle. (sniffs) My submissions. Oh, look at that. Harada's still beating us, I think, but at least we've beaten our previous one.
(laughs) - That's amazing. - That's great. Jump to our leaderboard position, we're gonna have a good battle on. (sniffs) 34. - No, I think you beat me up. - Wait, I thought yours was better than that. - I think I'm a little bit lower. - Code. Oh, you were 97385.
Okay, 979, oh, okay. That's not bad, right? Actually, 34th out of, I mean, it's just a fun competition. Nobody's really trying too hard, but still, it's nice to feel like you're in the mix. How far are we? Okay, so this person's still way ahead, right? They've got an error of 1.3% and we've got an error of 2.1%.
You know, something else that would be fun would be, you know, like you can kind of super easily create an ensemble. So maybe I'll show you how I would go about creating an ensemble. To create an ensemble, I would be inclined to maybe, we could create an ensemble with an unsquished version, for instance.
So what I would do is I'd kind of like copy all the stuff that we used to get our predictions, right? And then I would kind of paste them down here, go through and remove the stuff that isn't quite needed. Like so. This one's gonna be no squish. And we'll do max.
Is max equals true. And so then to merge cells, it's shift M, M for merge. And don't need the error rate anymore. And so this is gonna be a second set of probabilities and a second set of targets. Yeah, so we could just run that and take the average of these two models.
I'll remove squish here. Okay, so that might be our third model. And then another model I would be inclined to try is one that doesn't use square. So we've got 640 by 480 images, right? So the aspect ratio is four to three. So I would be inclined to say, take that and multiply that by the smaller side we want.
Okay, that gives us 298.66. Nice to find something that works a bit more evenly, wouldn't it? What if we did it the other way around? So we could create 168 by 224 images, for instance, or 168 by 224 images, for instance. 336 maybe, 336 by 252 images. Yeah, let's do that.
So let's change their size. And I never quite remember which way round it is, but that's okay, we'll check it. So 336 by 252 images. And so the reason I'm doing rectangular, sorry, rectangular images is that, yeah, all of our input images are the same aspect ratio. So there's no particular reason to make them square.
When some of your images are wider than tall and some are taller than wide, then it makes perfect sense to use square as the thing that everything gets changed to. But when everything's wider than they are tall, especially when they're all the same aspect ratio, it makes more sense to keep them at that same aspect ratio.
And another thing I guess we should consider doing for 640 by 380 is to, you can change their resolution more gracefully without weird interpolating fuzziness by doing it by a factor of two. So we could do 320 instead of 640 and by 240. So that would be another one I'd be inclined to try.
Yeah, in fact, let's just do that. Let's make that the aspect ratio. There we go. And so obviously we should check it and we know how to check it, which is to go show batch. Okay, so you can see I've got it the wrong way around. There we go, that's better.
Cool. Oops. And like given that we're gonna have such nice clear images, I would probably do the affine transforms or the ones where we're zooming and rotating and stuff. So to say don't do those so often, we can change the probability of affine transforms from 0.75 to 0.5. Probability of affine transforms to 0.5.
So in theory, I feel like this one feels the most correct given that the data that we have is a fixed input size of that type. So I would be inclined to, well, we'll take a look afterwards, but. Oh, what did I just do? Copy. We'll save a different set.
And so we can easily then check the accuracy of each of them. And this one's gonna be rectangular, rectangular. There we go. Now that we're saving a few, I guess I'm a little worried that paper space might disappear. And so I'm actually inclined to save these into my notebooks directory just to be a bit paranoid.
Copy, paste. And so let's move, let's move this one into slash notebooks. Oh, that's right, I'm not using paper space, so I don't have to. I forgot, nevermind. I'm on my own machine. I like the fact that I've got paper space so well set up now that I don't even remember I'm using paper space.
Okay. Great. I think that's that. All right, I'm gonna not have you guys watch that run for 20 minutes, so I'm gonna go. Any questions or comments before we wrap up? - So I see that you're like focusing a lot on like the data transformations and augmentations. When would you focus on that versus, you know, playing around with different models and things like that instead?
- Given that this is a image classification task for natural length, for natural photos, it will almost certainly have exactly the same characteristics as ImageNet in terms of accuracy, or at least any fine-tuning on ImageNet. So I'm working on the assumption, which we can test later, but I'm pretty sure it's gonna be true, that the things that are in that notebook showing which TIM models are better than others will apply to this dataset.
So I would, once everything else is working really well, you know, I would then try it on a couple of models or at least run it on a bigger one, like base or large or whatever I can get away with. If it was like a segmentation problem or an object detection problem or a medical imaging dataset, which has the kind of pictures that aren't in ImageNet, you know, for all of these things, I would try more different architectures.
But then for those cases, I would, let's say it was a segmentation problem, which is about recognizing what each pixel is, it always is a pixel of. Even there, I would not try to replicate the research of others. Instead, I would go and look at something like paperswithcode.com to find out which techniques have the best results on segmentation.
And better still, I would go and find two or three previous Kaggle competitions that have a similar problem type and see who won and see what they did. Now, when you look at who won, they always say, oh, we made an ensemble, which is fine. But the important thing isn't that they did an ensemble, it'll be, they'll always say pretty much, the best model in our ensemble was X.
And so I would just use X. And I would use this kind of like smallest version of X I can get away with. And yeah, generally fiddling with architectures tends not to be very useful nowadays for any kind of problem that like people have fairly regularly studied, which almost any computer vision problem is of that type.
I guess the only interesting question for this one would be there is something saying what kind of rice is in this patty, which is like a category, but I'm fairly sure that using that information is not gonna be helpful in this case, because the model can perfectly well see what kind of rice it is.
So I had very much doubt we have to tell it 'cause it's got pictures. - Jeremy, it's gonna take me a while to work our way through all of the videos. - Yeah. - Are they gonna be actually available? - Yes. - Cool, thank you. - And don't feel like you can only join if you've watched all the previous videos and don't feel like you can only ask a question if you've watched all the previous videos, like it's totally fine to ask a question about a video we did a week ago or about something that we just covered yesterday or whatever.
If the answer to your question is, oh, we covered this in this video, here's where you go, I will tell you that, and that's totally fine. But, and if it's like, okay, you said this thing in this other video, but I don't get it, say it again, that's totally fine too.
Like we're moving at quite a fast pace because people can go back and rewatch the videos and because people can come back later and ask questions about things that aren't clear. So yeah, it definitely does rely on people turning up and saying, I'm not clear on this or whatever.
- Yeah, well, I sort of started from ground zero in this whole environment, but it is starting to make sense now. I'm starting to feel a little bit more comfy with it. - Nice, well done. - And I just wanna take the time to work through, my way through and absorb what you've been talking about.
- Well, also Daniel, I will say like, there's a couple more lesson lessons to come. Like what is it next week or the week after? I suspect during those two weeks, I'll probably stop the walkthroughs. So there'll be a couple of weeks there to catch up. But yeah, like feel free to join in any time or not join in any time and ask questions about any video or even about things that's not covered in a video, but you feel like would be something useful to know in order to understand.
- Okay, I'm really looking forward to the tabular data actually. - Oh, cool. - Yeah, okay, thank you. - Thanks all, see you next time. - Bye. - Take care.