back to index

Live coding 11


Chapters

0:0 Recap on Paddy Competition
4:30 Tips on getting votes for Kaggle notebooks
7:30 Gist uploading question
10:30 Weights and Biases Sweep
14:40 Tracking GPU metrics
16:40 fastgpu
20:0 Using .gitconfig
21:0 Analysis notebook
26:0 Parallel coordinates chart on wandb
31:30 Brute force hyperparameter optimisation vs human approach
37:30 Learning rate finder
40:0 Debugging port issues with ps
42:0 Background sessions in tmux
46:20 Strategy for iterating between notebooks
49:0 Cell All Output toggle for overview
50:50 Final transform for vit models
52:5 swinv2 fixed resolution models
53:0 Building an ensemble - appending predictions
55:50 Model stacking
57:0 Keeping track of submission notebooks

Whisper Transcript | Transcript Only Page

00:00:00.000 | I'll follow you up to the lesson nine and I can get to number 10, it's never so close to you before, so amazing, thank you.
00:00:09.000 | Ah, okay. Yes, I saw you on the leaderboard, Serana, you were a tenth in the PADI competition, that's very cool.
00:00:18.000 | So to catch people up, the most recent news on the PADI competition is I did two more entries.
00:00:36.000 | And I don't remember, I think I might have shown you, I can't remember if I showed you one or both, but yeah, so I ensembled the bottles that we had and that improved the submission from 9876 to 988.
00:00:56.000 | And then the other thing I did was I, you know, since the VIT models are actually definitely better than the rest, I kind of doubled their weights and that got it from 988, 1 to 9884.
00:01:14.000 | And let's see, Serana. Serana, you're down to 11th, you're going to have to put in another effort to get back in the top 10, my friend.
00:01:26.000 | Anybody else here on this leaderboard somewhere.
00:01:32.000 | I'm done at, I don't know, was it 37 last my check. 37 that's not bad what's your username.
00:01:39.000 | I think this is not you.
00:01:42.000 | Matt.
00:01:52.000 | I think it's 45.
00:01:54.000 | Stop taking further.
00:01:56.000 | You just you can't stop for a moment with these things or somebody or jump in ahead.
00:02:06.000 | Yeah, 60s is pretty good.
00:02:10.000 | I've had problems with paper space so I couldn't train again.
00:02:15.000 | I've been successful.
00:02:18.000 | Like still just nothing out of login.
00:02:21.000 | Just error. And I subscribe to the paid version still.
00:02:27.000 | Sure, maybe the restructure and some error.
00:02:31.000 | Oh, well, feel free to share it on the forum if it's an error that we might be able to help you with. I think it's just a generic when you try to set up a machine and just says error was a paper space area.
00:02:45.000 | That's annoying.
00:02:46.000 | The quite the quite receptive if you use that support email.
00:02:52.000 | I know I had an issue and they go right back to me.
00:02:57.000 | Another thing is if the error is your fault, I if you put something in pre run dot sh that breaks things, then just fire up a pytorch instance rather than a fast AI instance, because that doesn't run pre run.
00:03:14.000 | That doesn't run pre run dot sh. And so then you can fix it.
00:03:26.000 | I had to say thank you for ready to set up the competition to help us to get started.
00:03:37.000 | Director. He also shared in the forum to set up the local for us.
00:03:46.000 | Oh, yeah. Yeah, so I think thank you for him to get me back on the caco.
00:03:56.000 | Awesome. So now, radix next job will be to become a Kaggle notebooks grandmaster. That's what I'm going to be watching out for. I think he's got what it takes personally.
00:04:12.000 | Nice.
00:04:14.000 | Right.
00:04:16.000 | You've had a gold on Kaggle for books.
00:04:19.000 | No, it's not. I'm not sure what I have for notebooks.
00:04:24.000 | I haven't done that many notebooks ever. I think I have. What's your username on Kaggle let's find you.
00:04:30.000 | Radek one Radek one with a number, not, not, not worth. Yeah, that's me.
00:04:40.000 | Two silver. Okay.
00:04:43.000 | This one actually is on the way to being a goal you just, it's got so close.
00:04:49.000 | You need 50 votes from regulars, I guess I don't know what counts as a regular.
00:05:00.000 | Well, that's how it works. So it's not in the relative terms.
00:05:04.000 | No, it's just 50 votes full stop so.
00:05:08.000 | And, you know, I definitely noticed, like,
00:05:15.000 | it makes a big difference to
00:05:20.000 | Yeah, so therefore it makes a big difference to put notebooks in popular competitions because that's where people are looking so like this one got 400 votes right and I'm not sure it's necessarily my best notebook, but it was part of the patent competition, which
00:05:39.000 | had a lot of people working on it. So that's one trick.
00:05:44.000 | Yeah, so things which are not actually attached to any competition it's much harder to get votes for.
00:05:52.000 | Yeah, I'm getting pretty close to notebooks grandmaster actually so excited about that. What's your, something to do with loving science I'm guessing what's your, it's actually well yeah my, the, the link is slightly different actually
00:06:06.000 | T a n t a m l n l i k e s m a t h.
00:06:18.000 | Oh, math, not science. Okay.
00:06:23.000 | Let's take a look. Oh look at you 74. Very nice and you did two more golds now these nine silvers. Well, that's the stuff right now, let's see.
00:06:35.000 | I'm gonna go up vote. Oh, there we go. Yeah, that's that's some do it, channel our enthusiasm to getting to niche content notebooks grandmaster.
00:06:48.000 | That would be cool.
00:06:54.000 | Yeah, so just have to get those silver ones over the line.
00:07:01.000 | So, I've
00:07:12.000 | somebody asked about where the
00:07:19.000 | the gist uploading thing is. So let me take that up.
00:07:27.000 | Oh, and actually when I do, what I might do here is I'm going to connect to my server.
00:07:41.000 | Someone asked about the gist uploading the question asked in the forum somewhere. Yeah, yeah, yeah.
00:07:47.000 | forum. Exactly.
00:07:59.000 | And specifically, this is what it looks like when you're busy training a model using weights and biases. So you can see I've got three windows here.
00:08:16.000 | You can get rid of the dots always. Oh, that just means that I've got another team X session running on a different computer which has a smaller screen than this one.
00:08:28.000 | And it is worth some way to get rid of it by disconnecting other sessions, disconnect other clients.
00:08:42.000 | So, this is the one I just created. So if I hit this, there we go. So shift D, and then select the one to disconnect.
00:09:00.000 | Oh, nice. Okay, learn something new.
00:09:11.000 | Oh, we got another new face today. Hello Sophie I don't think you've joined us before is that right.
00:09:17.000 | I've been here, just quietly in the background sometimes. Okay, thank you for joining we're about to you visiting us from in Brisbane.
00:09:24.000 | Oh, good on you.
00:09:26.000 | And what do you work with AI stuff for you just getting started.
00:09:31.000 | Not all background in psychology, doing a postdoc in psych, and sort of, yeah, kind of move over into data science. Okay, cool. Have you done a lot of the statistical side of psychology.
00:09:43.000 | Yeah, yeah, quite a bit and quite a bit of coding in our, but I'm pretty new to Python so okay great big learning curve. Well you know what you're, you're a target market right so if you have any questions along the way.
00:09:55.000 | Just jump in even things that you feel like everybody else must know. I guarantee not everybody else does. So, yeah, definitely. These have been really helpful and really great running.
00:10:05.000 | Awesome. Thanks for joining.
00:10:08.000 | Okay. So, training three models parallel right now. Yeah, so I've got three GPUs in this machine.
00:10:17.000 | And so, yeah, one nice thing with with weights and biases is you basically, let me show you.
00:10:32.000 | Okay, so here's weights and biases.
00:10:39.000 | I'm going to use my Mac very much because nothing's locked in.
00:10:50.000 | Alright, and so you can see it's running this thing called a sweep, right.
00:10:55.000 | There's going to be 477 runs.
00:11:00.000 | Don't know why it says create 31 seconds ago because that's certainly not true.
00:11:08.000 | It's going to be running.
00:11:12.000 | And so, it's coming from this get repo.
00:11:19.000 | I feel like there's a, there's a suite view, because this is a particular run. This is, this is a particular run. That's right.
00:11:31.000 | I'm terrible with a query, to be honest. Okay, so let's go to the project. Yes.
00:11:37.000 | And then a project has sweeps.
00:11:41.000 | And then, okay, this one here I can kill out.
00:11:49.000 | Okay.
00:11:51.000 | So basically you kind of say on the, on the Linux side WNB, you know, sweet create or something like that.
00:11:59.000 | And then,
00:12:02.000 | there's things all grouped under this thing. Okay, so then, yeah, so then basically it runs
00:12:11.000 | lots of copies of your program feeding at different configurations.
00:12:19.000 | And, yeah, you can run the client as many times as you like so I've run it three times.
00:12:24.000 | At each time I've set it to a different career device. You turn your models into Python scripts into able to do this or exactly. So,
00:12:37.000 | so this is fine tuned up pie so it's just calling.
00:12:42.000 | So it causes parks, so that's going to just go through and check what batch size, etc etc etc you asked for right sticks them all into this pasta thing.
00:12:55.000 | And then it calls train passing in those arguments.
00:13:00.000 | And so then train is going to initialize weights and biases for this particular project for this particular entity which is fast AI using the configuration that you requested.
00:13:17.000 | And so then you can say for example okay there's got some particular data set some particular batch size and image size etc.
00:13:24.000 | And then it creates a learner for some particular model name, some particular pooling type.
00:13:31.000 | Fine tunes it.
00:13:33.000 | And then at the end logs, how much GPU memory used what model it was how long it took.
00:13:40.000 | And you don't have to look much because the fast AI weights and biases integration automatically tracks everything in the learner.
00:13:50.000 | So you can see here there's all this like
00:13:55.000 | learner dot architecture learner dot loss function etc etc.
00:14:01.000 | So curiosity, then was this process of refactoring into a script painful.
00:14:08.000 | Actually, I'm so actually probably actually tell I didn't do this Tom Thomas compel did this if I had done it, I would have used fast fast quarter script instead of this stuff, I guess.
00:14:23.000 | But, no, it wouldn't have been painful I would have just chucked in nb dev export on the cell that I had in my notebook and that would have become.
00:14:31.000 | Yeah, my script.
00:14:33.000 | So, wouldn't hide Jeremy painful hi.
00:14:40.000 | I have a question within within need be interesting to track a power consumption, for example.
00:14:48.000 | I mean, for some people it might be not for me as to how you would track how consumption I have no idea you'd have to have some kind of sense are connected to your power supply, I guess, they track a lot of system metrics in the runs.
00:15:03.000 | So like, if you look on a run.
00:15:06.000 | They will track like GPU memory CPU memory. Yeah.
00:15:10.000 | That's enough.
00:15:12.000 | Like, yeah, if you click on the thing on the left.
00:15:16.000 | It looks like a CPU chip, that thing yeah there's a lot of.
00:15:20.000 | So maybe there's power in here, I don't see how it can be right because like it. Well, unless the Nvidia.
00:15:28.000 | That does, you go. GPU power so it video tells you the GPU power usage apparently.
00:15:37.000 | Tell you about your CPU, etc power.
00:15:41.000 | The thing that's useful about this I think is the memory.
00:15:44.000 | The graph.
00:15:46.000 | Yeah, well, I mean the key thing is the maximum memory use so we actually track that here in the script.
00:15:54.000 | Yeah, we put it into GPU man.
00:15:58.000 | Okay. That's a GPU memory.
00:16:09.000 | So Thomas did that as well. I don't know why it's.
00:16:14.000 | The power of negative three.
00:16:17.000 | What's that about.
00:16:20.000 | Okay, curious.
00:16:24.000 | I have to ask him what that's
00:16:29.000 | that's doing.
00:16:32.000 | Thomas works at rates and biases, right, is that right, correct, correct, correct. Yeah, so he.
00:16:37.000 | I had never used it before. So,
00:16:43.000 | probably most people have never heard of this but first day I actually has a thing called fast GPU, which is what I've previously used for doing this kind of thing so in general when you've got more than one GPU, or just even if you got any one GPU and you got a bunch of things you want to run.
00:16:59.000 | It's helpful to have some way to say like okay here's the things to run, and then set a script off to go and run them all and check the results.
00:17:06.000 | So, fast GPU is the thing I built to do that. And the way fast GPU works is that you have a whole list of the whole directory of scripts in a folder, and it runs each script run at a time and puts them in that runs and it puts them into a separate directory.
00:17:25.000 | You know, to say this is completed, and it tracks the results, and you can do it on like as many your few GPUs as you like it at August, go ahead and run it.
00:17:36.000 | And this is fine but it's very basic.
00:17:41.000 | And I kind of been planning to make it a bit more sophisticated. And yeah weights and biases takes it a lot further, you know, by.
00:17:52.000 | And I kind of want to read, read to or add something on top of fast GPU so it is fairly compatible with weights and biases but you could do everything locally.
00:18:02.000 | So the key thing.
00:18:05.000 | So, the thing it's actually using to for that config file is it goes through the basically the Cartesian product of all the values in this yaml. So it's going to do each of these two data sets planets and pets.
00:18:18.000 | For this one running rate point eight for every one of these models.
00:18:24.000 | For every one of these poolings
00:18:28.000 | for okay this is just the one resize method, and for every one of these experiment numbers.
00:18:34.000 | So, yeah.
00:18:38.000 | So that's a little bit of a project at some point, the, the sweep allows you to run arbitrary programs doesn't have to be a script.
00:18:45.000 | So, potentially you could just stay in the notebook and use tiny kernel or sorry, I can be
00:18:54.000 | buying thing or whatever it's called. Yeah, exactly can be. Yeah.
00:18:58.000 | Yeah yeah no it'd be fun to work on this to make the whole thing, you know, run with notebooks and
00:19:07.000 | stick stuff in a local SQL live database and because like all this stuff, all this web gooey stuff, honestly I don't like it at all.
00:19:14.000 | But the nice thing is it actually doesn't matter because I don't have to use it because they provide an API. So before I realized they have a nice API.
00:19:23.000 | I kept on like sending Thomas these messages saying how do I do this how do I do that, why isn't this working and you'd have to like send me these like pages of screenshots like click here click there turn this off, then you have to redo this three times
00:19:36.000 | and I'm just like, Oh, I hate this.
00:19:40.000 | Yeah, then I found that within this like we do have an API, and I was like I looked at the API, it is so well documented it's got examples.
00:19:48.000 | Yeah, it's, it's really nice.
00:19:54.000 | So, I've put all the stuff I'm working on into this get repo. And so here's a tip by the way the information about if you're in a get repo, the get directory to clone directory the information about your get repo all lives in a file called dot get slash config.
00:20:15.000 | So you can see here.
00:20:18.000 | This is the get repo.
00:20:24.000 | So if we now go to GitHub.
00:20:28.000 | One cool thing about this runs is it tracks your get commit like the run you can get back to what code version. Yeah, that is very cool isn't it. Yeah.
00:20:39.000 | I mean, I do think we could pretty easily create a local only version of this without all the fancy gooey, you know, which would also have benefits and people who want the fancy gooey and run stuff from multiple sites stuff like that would use weights and biases, but, you know,
00:20:56.000 | you could also do stuff without weights and biases. Anyway, here's our. Yeah, so here's our repo. And this analysis.i.py.mb is the thing that I showed yesterday. If you want to check it out.
00:21:13.000 | I'll put that in the chat.
00:21:19.000 | Oh, by the way, you know, I think something else which would be good is we should start keeping a really good list for every walkthrough of like all the like key resources key like, you know, links key commands examples we wrote and stuff like that.
00:21:40.000 | So I think to do that, what we should do is we should turn all of the walkthrough top topics into wikis.
00:21:50.000 | I don't know if you folks have used wiki topics before, but basically a wiki topic simply means that everybody will end up with an edit button.
00:21:59.000 | So if I just click. Okay, this one already is a wiki. Right. So everybody should find on walkthrough one that you can click edit. Right. And so one thing we put in an edit, for example, would be probably like often Daniel has these really nice full walkthrough listings,
00:22:22.000 | we should have like a link to his reply, which you can get, by the way, by I think you click on this little date here. Yes. And that gives you a link directly to the post, which is handy.
00:22:40.000 | What about this one?
00:22:43.000 | Okay, make that a wiki. Sorry, this is going to be a little bit boring for you guys to watch better than us. We'll do it while I'm here.
00:22:53.000 | And if anybody else has any questions or comments while I do that. Yeah, Jeremy, you did the first GPU is possible to expand to high performance computing to use it on the note.
00:23:07.000 | Sorry to do what a pie in high performance computing so in the distributed environment, is it possible to track it as well.
00:23:19.000 | I mean, I don't know. I mean, yeah, I mean, anything that's running in in Python on a Linux computer should be fine.
00:23:36.000 | I think some HPC things are like use their own weird job scheduling systems and stuff.
00:23:44.000 | But yeah, as long as it's running a normal
00:23:49.000 | Nvidia.
00:23:51.000 | It doesn't even have to be Nvidia honestly. But yeah, as long as it's running a normal Linux environment, it should be fine.
00:24:00.000 | It's pretty generic, you know, pretty general. Okay, so they are now all.
00:24:05.000 | Wikis and so something I did the other day, for example, was in walkthrough for.
00:24:10.000 | I added something saying like, oh, this is the one where we actually had a bug and you need to add CD at the end, you know, and I tried to create a little list of what was covered.
00:24:18.000 | So for example, maybe.
00:24:22.000 | Matt's fantastic timestamps we could copy and paste his list items into here, for instance.
00:24:33.000 | Some of radix examples, maybe, or even just to link to it.
00:24:42.000 | Yeah, so for this walkthrough, we should certainly include this link to the analysis to the IPY and B. Anyway, so you could see, yeah, with the API, it was just so easy just to go API dot sweep dot runs comes in as a dictionary,
00:24:57.000 | which we can then check a list of dictionaries into a data frame.
00:25:06.000 | Okay, I'm rerunning the whole lot, by the way, because it turns out I made a mistake at some point I thought that Thomas had told me that squish was always better than crop for resizing and he told me I was exactly wrong and it's actually the crops
00:25:23.000 | always better than squish resizing so running the whole lot.
00:25:28.000 | Disannoying but shouldn't take too long.
00:25:35.000 | You find that analyzing the sweep results like this was useful in relative to like what you can see in the UI, you know, you can make so much better hammer. Yes, so much. I was like, I mean, they've done a good job with that.
00:25:57.000 | That UI like it's very sophisticated and clever and stuff but I just never got to be friends with it. And as soon as I turn it into a data frame is just like, okay, now I can get exactly what I want straight away, it was absolute breath of fresh air, frankly.
00:26:12.000 | I really like their parallel coordinates chart.
00:26:15.000 | And I find it very difficult to reproduce that in like any visualization library. Do you like in a way, I don't like the parallel coordinates chart, but yeah I mean, there must be parallel coordinates chat to Python out there.
00:26:31.000 | There is there's like a plotly one, but it's not that nice. Okay, because I don't bother with it so like hover over it and stuff and see, you know, what do they do they write their own.
00:26:43.000 | I think so. Yeah, that's impressive.
00:26:48.000 | And they kind of wrote their own data frame kind of language, their own visualization library, and like in a sense, it's like those weights and biases reports you and they have their own syntax.
00:27:01.000 | Okay.
00:27:16.000 | There isn't one in plotly or something.
00:27:19.000 | Yeah, there's one in plotly for sure. Plotly things are normally interactive. So, have you tried that.
00:27:25.000 | Do you know if it's.
00:27:27.000 | Yeah, it works.
00:27:30.000 | It's not as nice, but yeah it works like when you hover over you, like, there's a, there's at least a version one doesn't.
00:27:38.000 | Yeah, that one, it's like, it's very fiddly, you might have to draw a box around it.
00:27:45.000 | To, to, to highlight it.
00:27:50.000 | Oh, yeah. Okay, so you just drag over it. That's not terrible.
00:27:55.000 | Yeah, I mean it's okay. It's not the best UI.
00:28:00.000 | But, you know.
00:28:04.000 | Okay, this is, thanks for telling me about this it's cool.
00:28:08.000 | You don't think you don't you don't like this that much it's not that useful for you. I mean, I haven't managed it. I mean I know other people like it so I don't doubt that it's useful for something it's just apparently not useful for the things I've tried to use it for yet, somehow.
00:28:22.000 | I mean how do you do you kind of like drag over the end bit to see where they come from or something. Yeah, I mean it might be useful if you want to look at the weights and biases one.
00:28:32.000 | Because I think it renders one by default for you for the runs. Yeah, yeah, it does. It's easier to like, let's check it out. Operate that yeah.
00:28:43.000 | I think it could be in the sweeps thing.
00:29:06.000 | Most likely. Okay.
00:29:09.000 | And then, yeah, pick a sweep.
00:29:13.000 | That one has zero runs but
00:29:17.000 | maybe that one. Okay, and then.
00:29:20.000 | Yeah, okay so here we go.
00:29:23.000 | And then when you just hover over a section.
00:29:27.000 | See, I don't see how this is helping me.
00:29:31.000 | I guess like saying so there's not that much variance in the well I guess like what is the metric. We're trying to optimize doesn't really seem like it's even on this chart.
00:29:43.000 | Like, you know what you probably have to tell it what your metric is, and we probably didn't. So the far right hand thing is resize method rather than.
00:29:53.000 | So let's.
00:29:55.000 | Is there some way to tell it that we care about.
00:29:58.000 | There's an edit, there's like a little pencil. Let's see.
00:30:02.000 | Okay, add the column for add
00:30:09.000 | loss or something.
00:30:12.000 | That's the error.
00:30:14.000 | Wait, this is now let's do accuracy and multi.
00:30:18.000 | Okay.
00:30:21.000 | Okay, now we're talking.
00:30:24.000 | You probably want to get rid of pool and resize method since they don't have any variance. They're not adding any information.
00:30:41.000 | All right. There we go. Now you can like cover. I actually want to do the thing. Oh, here we go. Can I do this drag area.
00:30:52.000 | Yeah, that does mean this is definitely not gonna tell me more than the number of experiments is not either.
00:31:02.000 | That's true, because there's just some arbitrary thing. Anyway, there's a thing.
00:31:12.000 | Yeah, yeah, sometimes I learned something sometimes I don't find that visualization, you know, not always.
00:31:23.000 | So let's control PD to detach.
00:31:34.000 | Do you generally like to do the grid search thing or the Bayesian exploration.
00:31:41.000 | I sort of like, I'm all very new to all this right so but like, in general, I don't do hyper parameter Bayesian hyper parameter stuff ever.
00:31:54.000 | And that's kind of funny because I was actually the one that taught weights and biases about the method they use for hyper parameter optimization, which actually tells you this is not quite true I've used it once.
00:32:06.000 | I used it specifically for finding a good set of dropouts for a WD LSTM, because there's like five of them. And I told Lucas about how I'd like created a random forest that actually tries to, you know, predict how accurate something's
00:32:23.000 | going to be and then use that random forest to actually target better sets of hyper parameters. And then, yeah, that's what they ended up using for weights and biases, which is really cool.
00:32:34.000 | But I kind of like to really
00:32:38.000 | use a much more human driven approach from like well what's the hypothesis I'm trying to test how can I test that as fast as possible, like, most hyper parameters are independent of most other hyper parameters.
00:32:51.000 | So, you know, like you don't have to do a huge grid search whatever and you can figure out so for example in this case it's like okay well learning rate of point oh wait was basically always the best.
00:33:01.000 | Try every learning rate for every model for every resize type, etc. that that's just use that learning rate.
00:33:09.000 | Same thing for resize method, you know, crop was always better for the few things we tried it on so don't have to try every combination.
00:33:17.000 | And also like I feel like I learn a lot more about deep learning. When I, you know, ask like well what do I want to know about this thing or is that thing independent of that other thing or is it, or are they connected or not.
00:33:31.000 | Does it, you know, and so in the end I kind of come away feeling like okay well I now know that, you know, every model we tried the optimal learning rates basically the same every model we've tried the optimal resize methods basically the same like so I'm
00:33:46.000 | kind of come away, knowing that I don't have to try all these different things every time.
00:33:52.000 | And so now, next time I do another project, I can leverage my knowledge of what I've learned, rather than do yet another huge hyper parameter sweep.
00:34:04.000 | I see. You are the Bayesian optimization. Yeah, my brain is that is the thing that's learning. Exactly. And I find like people at big companies that spend all their time during these big, you know, hyper parameter optimizations like
00:34:20.000 | I always feel and talking to them that they don't seem to know much about the practice of deep learning, like they don't seem to know like what generally works and what generally doesn't work because they never bother trying to figure out the answers to those questions.
00:34:33.000 | But instead they just chuck in a huge hyper parameter optimization thing into, you know, 1000 TPUs.
00:34:46.000 | Yeah, that's kind of something that's that's really interesting. I mean, like, do you does it, do you feel like these like hyper parameters generalize across different architectures different models totally.
00:34:59.000 | Yeah, totally.
00:35:03.000 | In fact, yeah, that was a piece of analysis we did gosh I don't know four or five years ago along with a fellowship today I folks in the platform today I folks, which is trying lots of different sets of hyper parameters that across this different sets of data sets as possible.
00:35:16.000 | And the same sets of hyper parameters were the best or close enough to the best for everything we tried.
00:35:25.000 | Yeah.
00:35:32.000 | With different architectures like I can somewhat imagine that, you know, they just said maybe it's not that super important but you know between transformers and CNS. I mean I'm not questioning this because I don't have any experience to say that this is not correct I think this is wonderful and it is.
00:35:51.000 | It is. It's amazing. So yeah, the fact that across 90 different models that we're testing that couldn't be more different.
00:36:02.000 | They all had basically the same best learning rate or close enough, you know,
00:36:09.000 | a very interesting aspect here is during the learning rate is something that you dump a lot of time into. Usually when you start working on a project or in a competition, you would be naturally inclined to, hey, you know I'm using a different architecture.
00:36:26.000 | Let me try to find the experiment with learning rates, but it's nice that you can discuss.
00:36:34.000 | Well, I should mention, this is true of computer vision.
00:36:42.000 | But not necessarily for tabular, I suspect, like all computer vision problems do look pretty similar, you know, the data for them looks pretty similar.
00:36:57.000 | I suspect it's also true, like, specifically of object recognition so like.
00:37:05.000 | Yeah, for.
00:37:07.000 | I don't know. I mean these are things like nobody seems to bother testing like which I find a bit crazy but we should do similar tests for segmentation and, you know,
00:37:18.000 | bounding boxes and so forth.
00:37:24.000 | I'm pretty sure the same thing. You have the learning rate under. So we suggest maybe some different learning rates are good in different places. Well, the learning rate finder I built before I had done any of this research right.
00:37:40.000 | Okay.
00:37:42.000 | Like you might have noticed that I hardly ever use it nowadays in the course.
00:37:48.000 | I don't even know if we've mentioned it yet. In this course, maybe we have the last lesson camera.
00:37:54.000 | Does anybody remember if we've done the learning rate finder yet in course 22. Yeah, I think we did. You think we did. Yeah.
00:38:03.000 | I understand that. Well, one of the really, you can sit there and play with parameters or you like, and schedule wheels and get nowhere. And that's, it's one of the things I'm really taking away from the course is the fact that you're talking about strategy,
00:38:21.000 | which goes back to Renato Copi at his 2002 paper he had a term called strategy of analysis, and that's something that really stuck with me. And so, that sort of transcends that idea of just mucking around with parameters.
00:38:41.000 | Yep. Exactly.
00:38:48.000 | I suppose the magic parameters, these are the defaults and fast AI.
00:38:55.000 | Yeah, pretty much, although with learning rate.
00:39:05.000 | Oh, that's weird. With learning rate.
00:39:12.000 | The defaults a bit lower than the optimal.
00:39:18.000 | Just because I didn't want to like, push it, you know, I'd rather it always worked pretty well, rather than be pretty much the best, you know.
00:39:28.000 | Yeah.
00:39:32.000 | So I'm just going to go and disconnect my other computer because it's connected to point eight eight eight eight, which is going to mess things up. I'll be back in one tick.
00:39:59.000 | Okay.
00:40:28.000 | Okay.
00:40:52.000 | Actually, now I think about it, I don't quite know why this is connecting on port eight eight eight nine. But part of this is to learn how to debug problems right so normally, the Jupiter server uses port eight eight eight eight.
00:41:10.000 | And I've only got my SSH connected to forward point eight eight eight eight so it's currently not working.
00:41:16.000 | So the fact that using a different port suggests it's already, it's already running somewhere. So to find out where it's running, you can use PS which lists all the processes running on your computer.
00:41:28.000 | And generally speaking, I find I get used to some standard set of options that I nearly always want and then I forget what they mean. So I have no idea what w au or x means I just know that there are a set of options that I always use.
00:41:42.000 | So that basically lists all your processes, which obviously is a bit too many so we want to now filter out the ones that contain Jupiter or notebook.
00:41:52.000 | So pipe is how you do that in Linux so that's going to send the output of this into the input of another program and a program that just prints out a list of matching lines is called grip.
00:42:05.000 | So we can grab for Jupiter.
00:42:08.000 | Okay, there it is. So, I'm kind of wondering
00:42:16.000 | where that how that's running. I wonder if we've got like multiple sessions of teammates running.
00:42:24.000 | We don't. So teammates LS lists all your teammates sessions.
00:42:32.000 | Oh, I've got a stopped version in the background. Okay, that's why. So I just have to foreground it. There we go.
00:42:38.000 | That was a bit weird.
00:42:41.000 | Okay, so now that should work.
00:42:47.000 | You foreground.
00:42:48.000 | Okay, I'll set to put it in the background of G to put it in the foreground. And where do you control that somebody it actually stops it. Right. You can put it in the background and have it have it keep running by actually I'll show you.
00:43:05.000 | So if I press control Z and type jobs, that's stopped. Right. So if I now try to refresh this window.
00:43:13.000 | It's going to sit there waiting forever and never going to finish. Okay.
00:43:19.000 | Because it's background back. It's in the stopped in the background. If you type BG optionally followed by a job number, which would be number one, and it defaults to the last thing that you put that you put in the background, it will start running it in the background.
00:43:35.000 | Even after you stop there. Yeah, so it's now running in the background. So for that type jobs, it's now running.
00:43:44.000 | And it's still attached to this console. So if I open up this, you'll see it's still printing out things right but I'm, but I can also do other things.
00:43:55.000 | And I don't do this very much because normally if I want something running at the same time I would just chuck it in another team x plane.
00:44:04.000 | I don't know. It's kind of nice to know this exists.
00:44:07.000 | Something else to point out is once I said BG it at this ampersand after the job. That's because if you run something with an ampersand at the end.
00:44:17.000 | It always runs it in the background.
00:44:20.000 | So if you want to like fire off six processes to run in parallel, just put an ampersand at the end of each one, and it all run.
00:44:28.000 | It'll run in the background.
00:44:34.000 | So for example,
00:44:46.000 | is a script that runs LS six times.
00:44:52.000 | And so if I run it.
00:44:56.000 | You can see they're all interspersed with each other, because it ran all six times at the same time.
00:45:03.000 | I see. And let's say like you create a process like this in the background without teammates.
00:45:09.000 | And you want to kill it. You use the thing you could you could you could type fj to foreground it.
00:45:16.000 | And then, then then press Ctrl C.
00:45:22.000 | Yeah, something like that would would be fine, or you can you can kill a single job.
00:45:29.000 | So in general, like you probably would want to search for bash job control to learn how to do these things.
00:45:40.000 | And as I said, one of the key as it mentions here, one of the key things to know is that a job number has a percent of the start.
00:45:46.000 | So, this is actually percent one would be okay to this.
00:45:56.000 | Knowing what the Google is definitely. Yes, the Google is the key thing.
00:46:02.000 | Although often you can just put in a few examples so you could I'm guessing like if I take Ctrl C, B, G, F, G jobs, which are the things we just learned about.
00:46:13.000 | There we go. It kind of gets us pretty close. Now we know they've got job control commands.
00:46:27.000 | All right. Now, so when I kind of iterate through notebooks, what I tend to do is like, once I've got something vaguely working, I generally duplicate it, and then I try to get something else vaguely working and once that starts vaguely
00:46:46.000 | working, I then rename it to the thing that is what I want. So then from time to time, then I just clean up the duplicated versions that I didn't end up using and I can tell which they are, because I haven't renamed them yet.
00:47:01.000 | And so this is kind of how you can make it like you make a car is looks like you're making copies of it and so you can just click file.
00:47:10.000 | Make a copy. Yep. Or in here you can click it and click duplicate.
00:47:17.000 | And so I mean, you like, what do you do after you duplicate it you try to open up that I'll open up that duplicate and I'll try something else some different type of parameter and different method or whatever.
00:47:29.000 | So in this case, I started out here in Patty, and kind of just experimented.
00:47:39.000 | Okay.
00:47:40.000 | And show batch and find and try to get something running.
00:47:44.000 | And then, you know, after that I was like, okay, I've got something working, how do I make it better.
00:47:51.000 | And so I created Paddy small, but we've actually made a copy and it would have been called patty copy.ipnB.
00:48:01.000 | And I was like, oh, I wonder about different architectures.
00:48:06.000 | So I created this, like, I was like, okay, well, basically I want to try different item transforms, different batch transforms and different architectures.
00:48:14.000 | So create a train, which takes those three things.
00:48:18.000 | And so it creates a set of image loaders with those item transforms and those batch transforms, use a fixed seed to get the same validation set each time, train it with that architecture,
00:48:32.000 | and then return the TTA error range.
00:48:35.000 | And so then, this is kind of like your weights and biases, like, this is how you keep your different experiments ideas.
00:48:44.000 | So, yeah, so now you can see I've kind of gone through and tried a few different sets of item and batch transforms for this architecture.
00:48:57.000 | And this is like some just small architectures so they'll run reasonably quickly so these ran in about six minutes or so.
00:49:07.000 | And this is very handy, right? If you go sell all output toggle, you can quickly get an overview of what you're doing.
00:49:16.000 | And so from that I kind of got a sense of which things seem to work pretty well for this one and then I replicated that for a different architecture and found those things which, you know, these are very, very different.
00:49:28.000 | One's transformers based, one's confident based, you know, find the things which work pretty well consistently across very different architectures and for those then try those on other ones, SWIN V2 and SWIN.
00:49:44.000 | And, yeah, then find, you know, so then let's toggle the results back on.
00:49:52.000 | So I'm kind of looking at two things. The first is what's the error rate at the end of training. The other is what's the TTA error rate.
00:49:59.000 | So my squish worked pretty well for both.
00:50:03.000 | Crop worked pretty well for both. This is all for Next.
00:50:10.000 | This 640x480, 288x224 didn't work so well. I mean, it's not terrible, but it's definitely worse.
00:50:19.000 | And 320x240 instead, you know.
00:50:25.000 | Can you talk a little bit about what you're looking for in the TTA versus the final?
00:50:30.000 | I just want to say, like, I mean, the main thing I care about is TTA because that's what I'm going to end up using.
00:50:36.000 | Yeah, that's the main one, but like, let's see, in this case, this one's not really any better or worse than our best comes next.
00:50:48.000 | The TTA is way better. So that's very encouraging, which is interesting. So this is now for VOT, right?
00:50:58.000 | Now, VIT, we can't do the rectangular ones because VIT has a fixed input size.
00:51:03.000 | So the final transformation has to be 224x224. So if you pass an int instead of a tuple, it's going to create square final images.
00:51:14.000 | And, you know, on the other hand, this one looks crappy, right?
00:51:20.000 | So definitely want to use Squish for VIT.
00:51:27.000 | And then this one looks pretty good, you know. So this was using padding.
00:51:34.000 | So like for VIT, I probably wouldn't use crop.
00:51:41.000 | Last time I looked, TTA was not really a thing in other modeling frameworks that is given to you. Is that still the case?
00:51:49.000 | No, as far as I know, that's true. Yeah.
00:51:55.000 | You know, so there are a lot of people, well, one group in particular has been copying without credit everything they can.
00:52:02.000 | And they might have done it. I won't mention their name. But yeah. So SWIN V2, apparently, Tanisha told me is what all the cool kids on Kaggle use nowadays.
00:52:18.000 | That's a fixed resolution. And I found that for the larger sizes, there was no 224. You had the choice of 192 or 256.
00:52:29.000 | And 256, it got so slow, I couldn't bear it. But interestingly, even going down to 192, SWIN's TTA is actually nearly as good as the best VIT.
00:52:40.000 | So I thought that was pretty encouraging.
00:52:46.000 | This one, interestingly, like VIT, didn't do nearly as well for the crop.
00:52:53.000 | And again, like VIT, it did pretty well on the pad.
00:52:58.000 | And then this is SWIN V1, which does have a 224.
00:53:03.000 | And so here, this TTA is OK, but the final result's not great. And so to me, I'm like, no, it's not fantastic.
00:53:13.000 | This one's again, you know, it's interesting. The crop, none of them are going well, except for ConNEXT.
00:53:23.000 | This one's not great either, right? So SWIN V1, a little unimpressive.
00:53:31.000 | So basically, that's what I did next. And then I was like, OK, let's pick the ones that look good.
00:53:36.000 | And I made a duplicate of Paddy small.
00:53:41.000 | And I just did a search and replace of small with large. So we've now got ConNEXT large.
00:53:47.000 | And the other things I did differently was I got rid of the fixed random seed.
00:53:51.000 | So there's no seed equals 42 here. And so that means we're going to have a different training set each time.
00:53:57.000 | And so these are now not comparable, which is fine. You'll see if one of them is like totally crap, right?
00:54:02.000 | But they're not totally comparable. But the point is now, once I train each of these, they're training on a different architecture,
00:54:09.000 | a different resizing method.
00:54:15.000 | And I append to a list. So I start off with a lefty list and I append the TTA predictions.
00:54:26.000 | And so I deleted the cells from the duplicate that weren't very good in Paddy small.
00:54:35.000 | So you'll see there's no crop anymore. Just squish and pad for VIT.
00:54:41.000 | And the SWIN V2.
00:54:46.000 | Probably shouldn't have kept both of the SWIN V1s. Actually, they weren't so good.
00:54:53.000 | And then what I did in the very last Kaggle entry was I took the two VIT ones because they were the clear best.
00:55:05.000 | And I appended them to the list. So they were there twice. So it's just a slightly clunky way of doing a weighted average if you like.
00:55:16.000 | Yes, stack them all together. Take the main of their predictions.
00:55:22.000 | Find the arg backs across the main of their predictions to get the predictions and then submit in the same way as before.
00:55:30.000 | So yeah, that was basically my process. It's like it's very not particularly thoughtful.
00:55:39.000 | It's pretty mechanical, which is what I like about it. In fact, you could probably automate this whole thing.
00:55:44.000 | Sorry, somebody has to say something?
00:55:47.000 | No, no. I was going to say, how critical is this model stacking in Kaggle?
00:55:54.000 | Just curious how you think about that.
00:56:00.000 | I mean, we should try, right? We should probably submit. In fact, we're kind of out of time.
00:56:08.000 | How about next time? Let's submit just the VIT, the best VIT, and we'll see how it goes.
00:56:17.000 | And that will give us a sense of how much the ensembling matters.
00:56:23.000 | We kind of know ahead of time it's not going to matter hugely.
00:56:32.000 | I mean, you specifically said on Kaggle. On Kaggle, it definitely matters because in Kaggle you want to win.
00:56:38.000 | But in real life, my small conflict got 97, well, rounded up, that's 98%, and my ensemble got 98.8%.
00:56:59.000 | Now that's, in terms of error rate, that's nearly halving the error, so I guess that's actually pretty good.
00:57:07.000 | Really important question. How do you keep track of what submissions are tied to which notebook?
00:57:13.000 | Oh, I just put a description to remind me, but a better approach would actually be to write the notebook name there, which is what I normally do.
00:57:22.000 | But in this case, I wasn't taking it particularly seriously, I guess. So I was only planning to do these ones, and that was it.
00:57:29.000 | So I was basically like, okay, do one with a single small model, then do one with an ensemble of small models, and then do one with an ensemble of big models.
00:57:37.000 | And then it was after I submitted that that I thought, oh, I should probably wait for the VITs a bit higher, so I ended up with the fourth one.
00:57:44.000 | So it's pretty easy for me, though, you did for significant submissions, so easy to track.
00:57:51.000 | But yeah, I think now that I know actually that I'm doing a little bit more, because I actually did want to try one more thing.
00:57:59.000 | I think what I'll probably do is I'll go back and I'm going to, you can edit these, I'm going to go and I'll put in the notebook name in each one.
00:58:06.000 | And then, and then I wouldn't go back and change those notebooks later, unless there was like, I probably never, I would, I would just duplicate them and make changes in the duplicate and rename them to something sensible.
00:58:20.000 | And of course, this all ends up back in GitHub.
00:58:24.000 | So I always see, yeah, see what's going on.
00:58:30.000 | So this is like, MLOps, Hamill, without.
00:58:36.000 | It's like you have a, you'd like every, like, quote, run is a notebook, like in a way to advise and kind of keep track. Yeah, yeah.
00:58:48.000 | Exactly. But I mean, the only reason I can kind of do this is because I had already done, like, lots of runs of models to find out which ones I can focus on right so I didn't have to try 100 architectures.
00:59:01.000 | I mean, in a way, it forces you to really look at it closely. Yeah. If you just, you know, like have this dashboard, kind of like, like, at this role, my view is that this approach, you will actually become a better deep learning practitioner.
00:59:17.000 | And I also believe almost nobody does this approach and I almost feel like there are very few people I come across who are actually good deep learning practitioners, but not many people seem to know what works and what doesn't.
00:59:32.000 | So, yeah.
00:59:35.000 | All right. Well, that's it, I think.
00:59:39.000 | Thanks for joining again, and yeah.
00:59:43.000 | See you all next time. Bye.
00:59:46.000 | Thank you.
00:59:47.000 | Take care, everybody.