back to index

Live coding 16


Chapters

0:0 Start
1:4 About Weighting (WeightedDL)
1:50 Curriculum Learning / Top Losses
3:8 Distribution of the test set vs training set
3:35 Is Curriculum Learning related to Boosting?
4:25 Focusing on examples that the model is getting wrong
4:38 Are the labels ever wrong? By accident, or intentionally?
6:40 Image annotation issues: Paddy Kaggle discussion 4
8:23 UNIFESP X-ray Body Part Classifier Competition 4
10:20 Medical images / DICOM Images
10:57 fastai for medical imaging
11:40 JPEG 2000 Compression
12:40 ConvNet Paper
13:50 On Research Field
15:30 When a paper is worth reading?
17:14 Quoc V. Le
17:50 When to stop iterating on a model? - Using the right data.
20:10 Taking advantage of Semi-Supervised Learning, Transfer Learning
21:33 Not enough data on certain category. Binary Sigmoid instead of SoftMax
23:50 Question about submitting to Kaggle
25:33 Public and private leaderboard on Kaggle
29:30 Where did we get to in the last lesson?
31:20 GradientAccumulation on Jeremy’s Road to the Top, Part 3
37:20 “Save & Run” a Kaggle notebook
38:55 Next: How outputs and inputs to a model looks like
40:55 Next: How the “middle” (convnet) of a model looks like
41:32 Part 2: Outputs of a hidden layer
42:53 The Ethical Side
44:30 fastai1/courses/dl1/excel

Whisper Transcript | Transcript Only Page

00:00:00.000 | I am recording now so, but please keep talking. Don't be shy, Serada. It's okay.
00:00:08.000 | Oh, getting a bit noisy out here with street cleaning something.
00:00:18.000 | I think your headset is very good.
00:00:23.000 | Oh, good. I didn't hear anything so.
00:00:26.000 | I mean, I say street cleaning. It's more like footpath cleaning. We have a walking path along the front of our house.
00:00:33.000 | Oh, come on. I start the press in the recording, but that everybody stops talking. Well, you know.
00:00:44.000 | People don't want to just hear my voice all the time on these recordings, guys.
00:00:52.000 | There's things I wanted to cover in today's session, but then the responsible part of me says I probably ought to create a lesson before Tuesday's class, so maybe we'll do that.
00:01:03.000 | I've got a question, Jeremy. I had to leave before you finished that code change yesterday. Did you, was that actually, do you want to recap on where we got to with waiting with data levels?
00:01:20.000 | Probably not because you can just watch the video and so, like, otherwise, I guess we're just doing it.
00:01:24.000 | So, is it working now? Yeah, yeah, it's all good. You know, I mean, it's,
00:01:30.000 | the concept is working correctly in terms of the code.
00:01:35.000 | We didn't, like, get a better score, but I didn't particularly expect to either.
00:01:44.000 | You know, maybe after next Tuesday's lesson, we will revisit it because I actually think the main thing it might be useful for is what's called curriculum learning, which is basically focusing on the hard bits.
00:01:55.000 | Looks like Nick's internet still isn't working, but Nick was saying the other day that he looked at which ones we're having the errors on, which is like what we've, what we look at in the book.
00:02:08.000 | Like, looking at the classification interpretation and looking at, like, plot top losses and stuff and he said, like, yeah, all the ones that we're making, that we're getting wrong are basically from the same one or two classes.
00:02:22.000 | I haven't done much with curriculum learning in practice, like I, like all it means in theory is that we use our weighted data loader to weight the ones that we're getting wrong higher.
00:02:38.000 | whether that will actually give us a better result or not, I'm not sure, but that I think that's more likely to be a
00:02:46.000 | useful path than simply reweighting things to be more balanced because we don't want things to be more balanced because the ones that we observe the most often in the test set are actually the ones we want to
00:03:04.000 | be the best at, you know.
00:03:06.000 | I will say I didn't check whether the distribution of the test set is the same as the training set.
00:03:13.000 | If it's randomly selected, then it will be, and if it's not, then that would be a reason to use a weighted data loader as well.
00:03:21.000 | Yeah.
00:03:29.000 | Okay, so what's the difference, I mean, I guess like what's the, is it is a curriculum learning kind of related to boosting and conceptually?
00:03:44.000 | Not really. I mean, maybe. So boosting is where you calculate the difference between the actuals and your predictions to get residuals, and then you create a model that tries to predict the residuals.
00:03:58.000 | And then you can add those two predictions together, which is, if not done carefully, is a recipe for overfitting.
00:04:06.000 | But if done carefully can be very effective.
00:04:10.000 | Yeah, we're talking about something which is conceptually very different, which is saying like, oh, we're like really bad at recognizing this category, so let's show that category more often during training.
00:04:26.000 | That's a good question.
00:04:27.000 | Of kind of focusing on examples, you're getting it wrong, like more conceptually doing something similar.
00:04:36.000 | I was just gonna ask, are the labels ever wrong, like by accident, or intentionally in Kaggle?
00:04:43.000 | Of course, absolutely.
00:04:45.000 | So, or both intentionally as well?
00:04:47.000 | No, not intentionally. I mean, not normally, like sometimes there might be a competition where they say like, oh, this is a synthetically generated data set, and some of the data is wrong.
00:04:55.000 | Because we're trying to do something like what happens in practice, but we can't share the real data.
00:05:01.000 | So is there any advantage in trying something like some uncertainty values from something like MC Dropout, try to find like a threshold of things that are too difficult, and then potentially they're wrongly labeled?
00:05:13.000 | I'm not sure you would need that, like the thing we use in the book and the course is simply to find the things that we are confident of, but we know we're wrong, but turn out to be wrong, and then just look at the pictures.
00:05:29.000 | Is that max values enough, you think, to basically know whether or not it fits? I do, yeah. I mean, that seems to work pretty well. I mean, the only thing is you would need to be able to recognize these things in photos.
00:05:39.000 | But I'm sure if you spend an hour reading on the internet about what these different diseases are and how they look, you would be able to pick it up faster soon enough.
00:05:48.000 | And then, you know, just like we did in chapter two for recognizing the things that aren't black and brown and teddy bears.
00:05:55.000 | So plausibly, even just knocking out some of the extremely difficult examples might get you higher on the leaderboard purely by virtue of them misleading them.
00:06:02.000 | Not by knocking out the hard ones, but by knocking out the wrong ones, yes.
00:06:08.000 | Unless the test set is mislabeled consistently with the training set, in which case you would not want to knock them out because you would want to be able to correctly predict the things which people are incorrectly recognizing as the wrong disease.
00:06:22.000 | Something to try, though.
00:06:24.000 | Yeah. Yeah. So I would do exactly what we did in chapter two. You know, you can use exactly the same widget.
00:06:31.000 | But as I say, you'd have to probably spend an hour learning about rice disease, which probably be a reasonably interesting thing to do anyway.
00:06:38.000 | I just a link. There's a discussion in the Patty. Some people identify there's some mislabeling at least over 20 already.
00:06:52.000 | Yeah. Yeah. So it definitely happens.
00:06:54.000 | It says we have manually annotated every image with the help of agricultural experts, but there could be errors.
00:07:01.000 | Well, this person knows more about rice than I do. I think the images in the tongue grow have a chance of issues. The symptoms can easily be confused with potassium deficiency.
00:07:15.000 | Fair enough.
00:07:19.000 | Is that an example of what you're talking about where if layman or sorry, if a semi expert gets confused, then the labeling in the test sets probably the same.
00:07:32.000 | Yeah, exactly. So you're probably fixing these would probably screw up your model because assuming that the test set was labeled used by the same people in the same way.
00:07:43.000 | I mean, sometimes test sets. The test set is more of a gold standard. They'll make more effort to talk to like a larger number of high quality experts and have them vote or something.
00:07:53.000 | Honestly, this competition seems like it doesn't even have any prize money attached. So like, I think it's really low, low investment, probably.
00:08:04.000 | And so I doubt they did that. But, but that can happen. Yeah, that the test set could have. I mean, it makes sense to invest in getting really good labels for the test set.
00:08:24.000 | I was looking at one of the other competitions on UNIFESP, the x-rays. And I think there there was one, somebody had identified that a wrist was wrongly labeled as something like that.
00:08:36.000 | Yeah, it's not, there's no money again but it's been running for a while. What's it called?
00:08:42.000 | UNIFESP, U-N-I-F-E-S-P.
00:08:51.000 | It's another community competition. Gosh, it's not very popular. Why is there only 70-14s?
00:08:57.000 | Yeah, sorry, go on.
00:08:59.000 | Yeah, I don't know. I was just looking around and it looked interesting. So I'm number 15 at the moment.
00:09:04.000 | But it is a slightly weird one because, well, it's interesting because some of the x-rays have multiple labels, but the labels are just concatenated.
00:09:18.000 | It's an interesting discussion on how you'd analyze that. Would you treat a combination as a distinct classification, whether it was like a neck and a chest or something?
00:09:28.000 | Or do you look at each of them individually and then try and label a multiple one from the different things?
00:09:35.000 | Okay, so I'm just having a look at this competition. So when does it close? This is a month to go, but I don't know.
00:09:46.000 | Exactly when that is.
00:09:49.000 | Normally, there's July 31st.
00:09:55.000 | Okay, where do you see that?
00:09:56.000 | When you go to the bottom of on the overview and it says there's a whole timeline. So then you just hover over that.
00:10:05.000 | Oh my God, I see. It says close in a month, but you actually have to get a toolkit by hovering. Okay, thanks, Tanishk.
00:10:11.000 | That's strange. It works. Okay, so we've actually got more than a month, so maybe next week we could have a look at this one because it would be a good opportunity to play around with medical image stuff because they're using DICOM, I think.
00:10:24.000 | Yeah, somebody has also, which I used, supplied a library of PNGs, which made it easier to use, but I don't know what you'd lose in using that rather than the DICOM images.
00:10:39.000 | It rather depends. So DICOM is a very generic file format that can contain lots of different things in it, but one of the things DICOM contains is higher bit depth images than a PNG allows.
00:10:52.000 | So if they've, yes, they might have gotten rid of that.
00:10:58.000 | FastAI has a nice medical imaging, pretty small but like has some useful stuff, Medical Imaging Library, which I think is fastai.vision.medical, which can handle DICOM directly.
00:11:12.000 | And I see there's a FastAI entry as well.
00:11:19.000 | That'd be fun. We should try this next week.
00:11:27.000 | I see there's the PNGs.
00:11:28.000 | I think the DICOM has come to about 27 gigabytes.
00:11:35.000 | Oh my God.
00:11:36.000 | So the PNG was quite attractive from that point of view.
00:11:40.000 | So one thing that you can do with DICOM is to compress them, particularly using JPEG 2000, which is a really good compression. But yeah, people often don't for some reason.
00:11:53.000 | So probably the first thing I'd look at in that competition is to look at DICOM and see is it storing 16 bit data or not.
00:12:01.000 | And if it is, I would try to find a way to
00:12:04.000 | resave that without losing the extra information.
00:12:09.000 | Which I think we've got examples of in our medical imaging tutorial.
00:12:17.000 | I'll take a look at that.
00:12:22.000 | All right.
00:12:23.000 | I'm going to share my screen, even though I don't know what I'm doing.
00:12:26.000 | I'm going to have to drop in a few minutes, but I'll catch the rest on the recording.
00:12:30.000 | Thanks for this.
00:12:32.000 | Nice to see you.
00:12:33.000 | By the way, I was looking at this
00:12:40.000 | ConfNEXT paper.
00:12:43.000 | And gosh, everybody congratulates transformers on everything.
00:12:51.000 | Vision transformers bring new ideas like the Adam W Optimizer.
00:12:55.000 | I guess who actually wrote the first thing saying we should always use the Adam W Optimizer.
00:13:00.000 | Be silver.
00:13:02.000 | In fast AI, I think that was years before vision transformers.
00:13:07.000 | Adam W.
00:13:10.000 | There we go.
00:13:12.000 | Mid 2018.
00:13:18.000 | I read that paper last night and I was just thinking like they kind of talk about how all of these things were already there, right?
00:13:26.000 | They just rediscovered them like slightly larger kernel size and things like that, which begs the question, why is it no one just done like experiments to just tweak these things together?
00:13:39.000 | I mean, we do, but nobody takes any notice because they're not written in PDFs, you know?
00:13:47.000 | Is it, I mean, these benchmarks, they're like...
00:13:50.000 | The thing is that like a lot of researchers aren't good practitioners.
00:13:55.000 | So they just, they're not very good at training accurate neural networks and they don't know these tricks, you know, and they don't hang out on Kaggle and learn about what actually works.
00:14:06.000 | So, but then the thing is like, it's not always easy to publish, like even if you did stick it into a PDF and submit it to NeurIPS, there's no particular high likelihood that they're going to accept it because
00:14:18.000 | the field research wise is very focused on theory results and, you know, things with lots of Greek letters in them.
00:14:26.000 | Does that mean that the part of the problem is that the data sets, the benchmarks are just too inaccessible to the average person?
00:14:33.000 | No, I wouldn't say that.
00:14:35.000 | No, I wouldn't say that. The issue is I think the culture of research is not particularly interested in experimental results, you know?
00:14:50.000 | With my limited experience, I will say it's very hard to find reviewers as well, especially you have a very strong domain, not just running all the sample data set you can find in open source.
00:15:06.000 | When you call the domain and then a lot of peer reviewers, they're just not picking up to review it. Even if we pay for the reviewers we're using so people can get it for free and we take a few months just to find reviewers.
00:15:27.000 | Jeremy, so on the topic of papers, when do you know when a paper is worth reading, given the situation?
00:15:43.000 | You don't? I mean, I'm very fond of papers that describe things which did very well in an actual competition, you know, that then we know this is something that actually predicts things accurately.
00:15:58.000 | You know, you can get similar results if they've got a good, you know, just table of results. So generally speaking, I like things that actually have good results, particularly if they show like how long it took to check.
00:16:12.000 | Like how long it took to train and how much data they trained on.
00:16:16.000 | And yeah, so are they getting good results using less data and less time than you might expect from the same thing?
00:16:25.000 | And yeah, I certainly wouldn't focus only on those that get good results on really big data sets. That's not necessarily more interesting. I'm very interested in things that show good results using transfer learning.
00:16:41.000 | It's not practically useful. I don't train that much from random. So I'm very interested in things that do well in transfer learning.
00:16:48.000 | Also, like, look for people who you've liked their work before, you know, and in particular, that doesn't mean like always reading the latest papers.
00:17:01.000 | You know, if you come across a paper from somebody that you find useful, go back and look at their Google Scholar and read their older papers.
00:17:11.000 | See who they collaborate with and read their papers. So, for example, I really like Kwok Lee in Google Brain.
00:17:19.000 | He and his team do a lot of good work. It tends to be, you know, very practical and high quality results.
00:17:29.000 | And so we know when his team releases a paper, I also know like he seems to have similar interests in mind, like he tends to do stuff involving transfer learning and getting good results in less epochs and stuff like that.
00:17:42.000 | So if I say he's got a new paper out, I'm pretty likely to read it.
00:17:49.000 | I have a question. And I mean, for for the category competitions and like like in a lab type of environment is, I mean, when the question that I have is when to stop iterating on a model on a model that that you have is is I have the
00:18:12.000 | someone asked me when is enough enough to do the training on the data that you have. When is enough.
00:18:19.000 | So that question, I mean.
00:18:24.000 | There's some reason you're doing this work, right? So like you hopefully know when it does what you want it to do.
00:18:37.000 | I mean, the thing that happens all the that happens to me all the time is that it trained the model and it works perfectly fine on the lab when we're doing it.
00:18:50.000 | And then as soon as we throw a couple of images that they are not part of the set. I mean, that goes nuts.
00:18:56.000 | OK. So that's like light or more light or the temperature or stuff like that.
00:19:04.000 | Well, that's a different problem, right? So that that means your problem is that you're you're not using
00:19:13.000 | the, you know, the right data to train on.
00:19:20.000 | So like you need to you. You need to be thinking about how you're going to deploy this thing.
00:19:27.000 | When you train it and if you train it with data that's different to how you're going to deploy it, it's not going to work.
00:19:36.000 | Yeah, so that's that's what that means. And.
00:19:46.000 | It might be difficult to get data, enough data of the kind you're going to deploy it on. But like at some point, you're going to be deploying this thing, which means by definition, you've got some way of getting that data you're going to deploy it with.
00:19:57.000 | So like do the exact thing you're going to use to deploy it, but don't deploy it. Just capture that data until you've got some actual data from the actual environment you want to deploy the model in.
00:20:12.000 | You can also take advantage of semi supervised learning techniques to then, you know, and transfer learning to maximize the amount of juice you get from that data that you've collected.
00:20:30.000 | And finally, I'd say, like, let's say for medical imaging, like, okay, you want to deploy a model to like a new hospital, they've got a different brand of MRI machine you haven't seen before.
00:20:43.000 | I would take advantage of fine tuning, you know, each time I deployed it to some different environment where things a bit different, I would expect to have to go through a fine tuning process to train it to recognize that particular MRI machines images.
00:20:59.000 | But you know, each time you do that fine tuning, it shouldn't take very much data or very much time because it's your models already learnt the key features, and you're just asking it to learn to recognize slightly different ways of seeing those features.
00:21:18.000 | Yeah, I don't think you'll solve this by training for longer, you know, you'll solve it by figuring out your, your data pipeline your data labeling and your rollout strategy.
00:21:33.000 | Usually the issues that we're having is that we don't have enough data of a certain category. But, I mean, the thing that you did yesterday, it results a little bit of that problem I think we're going to start using.
00:21:48.000 | Yeah, well also like, if you don't have enough data of some category, don't use the model for that category, you know, so like
00:22:00.000 | you know, rather than using softbacks use binary sigmoid, you know, as your last layer and so then you've kind of got like a
00:22:09.000 | probability that x appears in this image and so then you can you can recognize when none of the things that you can predict well appear in the image. And so,
00:22:21.000 | then have a, you know, you always want a human in the loop anyway.
00:22:26.000 | So when you didn't find any of the categories of things you've got enough data to be able to find then
00:22:33.000 | triage those to human review.
00:22:39.000 | One thing that we did is, I mean we have like 50 something categories, just one moment, hang on.
00:22:46.000 | Sorry about that.
00:23:09.000 | So, we had like 50 categories and some of them are like, they have a lot like 10 of them have a lot of items. So we end up doing like in a three step kind of process, like the ones with a lot, the ones with medium number.
00:23:24.000 | With a smaller number, and it looks like it resolved the problem a little bit. Cool, but this was to classify metadata coming from, from other systems and classified for legal purposes for legal retention.
00:23:45.000 | I see.
00:23:47.000 | Got it.
00:23:53.000 | I had a question actually, you tried the weighted data loaders right so I think you submitted that to Kaggle notebook. So, did you do any validation locally first before submitting to Kaggle, something like that.
00:24:08.000 | No, I mean you saw what I did right. And when I did it, so I just.
00:24:14.000 | I just like I was intentionally using a very mechanistic approach.
00:24:24.000 | Because it was part of like just showing like here's the basic steps of pretty much any computer vision model which is entirely mechanical and doesn't require any domain expertise.
00:24:38.000 | So yeah, my question more was like, shouldn't be always treat the public leaderboard like a good or like should we take a hold out local data set first to validate.
00:24:49.000 | Yeah, so I mean, I always have a validation set. Yeah,
00:24:56.000 | which we saw in this, and this I just used a random splitter, because as far as I know the test set in the Kaggle competition is a randomly split validation set.
00:25:09.000 | Yeah, so like, whether it be for Kaggle or anything. I think creating a validation set that as closely as possible represents the, the data you expect to get in deployment or in your test set is really important.
00:25:25.000 | And, yeah, I actually didn't spend the time doing that on this patty competition.
00:25:34.000 | Normally on Kaggle if somebody does and notices there's a difference between the private leaderboard and the public leaderboard like the test set and the training set normally it'll appear, you know, in discussions or on a Kaggle kernel or something,
00:25:48.000 | which is partly why I didn't look into it but yeah I mean you should probably check, doesn't have the same distribution of disease types, you know, from the predictions that you create.
00:26:03.000 | Do the images look similar, do they have the same sizes.
00:26:08.000 | And for me if I said as I see any difference between the test set and the trading set that puts my alarm bells on, right, because now I know that's not randomly selected.
00:26:20.000 | And if you know it's not randomly selected then you immediately have to think okay they're trying to trick us. So, I would then look everything I could for differences.
00:26:31.000 | Because it takes effort to not randomly select a trade a test set so they must be doing it very intentionally for some reason.
00:26:43.000 | I think so, like I don't think a Kaggle competition should ever silently give you a systematically different test set.
00:26:52.000 | I think there's great reasons to create a systematically different test set, but there's never a reason not to tell people. So if it's like medical imaging is a different hospital you should say this is a different hospital or if it's fishing you should say these are different boats, or, you know, because like
00:27:05.000 | you want people to do well on your data, so if you tell them, then they can use that information to give you better models.
00:27:18.000 | So, Korean, like, going back to what you asked about, there's this validation in training, then there's this, whether your local validation maps to what's happening on the leaderboard, the score on the hidden test set.
00:27:31.000 | But there's one other scenario that I encountered recently, and maybe it would be interesting to someone. When you're working on a competition, sometimes you might miss something in your code or the prediction.
00:27:42.000 | You know your model is doing something useful but you're failing to output a correctly formatted submission file, and not in a sense that the submission fails on Kaggle, but some predictions are not aligned where there should be or, you know, therefore a different
00:27:58.000 | customer ID or stuff like that. So, once you have one good submission file, relatively good, you can just store it locally and then see, you know, run a
00:28:12.000 | check the correlation between your new submission and the one that you know that this, okay, and you know the correlation should be upwards of 0.9, and then you know yeah okay so I didn't mess up anything with the technical aspect of outputting the prediction.
00:28:26.000 | I mean, it's not a great trick but, you know, I was like putting my hair out, why is this not working with a better model. So this was like a sanity checks that maybe at some point.
00:28:43.000 | Thanks. Cool.
00:28:45.000 | Thanks.
00:28:50.000 | All right. So, let me share my screen.
00:29:02.000 | Let's find zoom.
00:29:05.000 | Here's my screen.
00:29:19.000 | Oh, that's not the right button.
00:29:23.000 | Control up shift H. Okay.
00:29:31.000 | Where did we get to.
00:29:36.000 | In the last lesson.
00:29:42.000 | We finished random forests right.
00:29:51.000 | Oh that's right and I haven't posted that video yet.
00:29:58.000 | That's okay we can check live.
00:30:05.000 | Last year live.
00:30:27.000 | Okay, so we could small models.
00:30:31.000 | Until we get to the end of this.
00:30:45.000 | Okay, so that basically.
00:30:49.000 | So we basically finished the second one of our Kaggle things.
00:31:00.000 | So next week.
00:31:08.000 | See what's in part three.
00:31:18.000 | Right, gradient accumulation. I think that's worth covering. So one thing that somebody pointed out on Kaggle is I've actually, I'm using gradient accumulation wrong.
00:31:30.000 | I was passing in two here to mean to make create two batch like do two batches before you accumulate.
00:31:39.000 | But actually what I meant to be putting in here is the kind of target batch size I want.
00:31:44.000 | So that would be actually I should be putting 64 here.
00:31:50.000 | So I feel a bit stupid.
00:31:52.000 | So what I've been doing is I've been
00:31:56.000 | actually not using gradient growth accumulation at all I guess it's been doing a batch and saying that's over.
00:32:02.000 | I'm saying my maximum batch size should be two.
00:32:06.000 | Okay, so this has actually been not working at all. That's interesting.
00:32:10.000 | Oops.
00:32:14.000 | So it's been using a batch size of 32 and not accumulating.
00:32:20.000 | Okay.
00:32:23.000 | So that's one thing to note. So when I get Kaggle GPU time again, we'll need to rerun this.
00:32:31.000 | Actually, it only took 4000 seconds.
00:32:38.000 | So I guess we should, we could just get it running right now, couldn't we?
00:32:56.000 | So that should be 64.
00:33:15.000 | How many paths defines how large the effective batch size you want is.
00:33:33.000 | Over batches.
00:33:38.000 | We can just remove this sentence entirely.
00:33:41.000 | We divide the batch size by some number based on how small we need it to be.
00:34:10.000 | It to be for our GPUs.
00:34:19.000 | Okay.
00:34:34.000 | And on Kaggle I think these were all smaller I don't know why but the Kaggle GPUs use less memory than my GPU for some reason.
00:34:50.000 | Okay.
00:34:52.000 | So right now.
00:35:01.000 | Let's try running it.
00:35:22.000 | Jeremy, you would increase the outcome number until no longer get good out of memory. Yeah, and you could be at a pretty much guess it by looking at like, I mean you can just.
00:35:37.000 | Once you found a batch size that fits, you know, so the default batch size I believe is 32.
00:35:45.000 | Once you find a batch size that fits, sorry 64 is the default batch size it fits you just like, it's like okay well if it fits in 32 then I just need to set it to two because 64 divided by two is enough.
00:35:58.000 | And the key thing I do here is, you know, so I've got this report GPU function. So what I did at home was I just, you know, changed this until it got less than 16 gig.
00:36:13.000 | And as you can see I'm just doing like a single a park on small images so this ran in.
00:36:18.000 | I don't know 15 seconds or something.
00:36:33.000 | Yeah, batch size 64 by default.
00:36:48.000 | Yeah, so then I just went through checking the memory use of conflicts large or different image sizes, again just keeping on using just one epoch.
00:37:00.000 | And that's how I figured out what I could do to set a QM to trip to work.
00:37:16.000 | Alright, so that should be right to save and run.
00:37:23.000 | And then
00:37:32.000 | turn off this one.
00:37:35.000 | So when you're running something like he clicks a version, and you click run, you'll then see it down here.
00:37:44.000 | And that runs in the background you don't have to leave this open.
00:37:48.000 | And so you can go back to it later. So if I just copy that can close it.
00:37:54.000 | And if I go to my
00:37:57.000 | notebook in Kaggle
00:38:03.000 | this shows be version three or four because version four hasn't finished running yet.
00:38:07.000 | So if I click here, I can go to version four and it says all it's still running
00:38:17.000 | and I can see here it's been running for about a minute.
00:38:21.000 | And it shows me anything that you print out will appear, including warnings.
00:38:29.000 | So that's, yeah, that's what happens in Kaggle.
00:38:36.000 | So if we also do the multi objective loss function thing.
00:38:43.000 | That would be cool.
00:38:46.000 | So I thought like next time in our next lesson, broadly speaking.
00:38:53.000 | This is taking a long time.
00:38:56.000 | I kind of want to cover like what the inputs to a model look like and what the outputs to a model look like.
00:39:04.000 | So like in terms of inputs, really the key thing is embeddings.
00:39:11.000 | That's the key thing we haven't seen yet in terms of what model inputs look like.
00:39:18.000 | The model outputs, I think we need to look at softmax.
00:39:32.000 | Softmax cross entropy loss, entropy loss.
00:39:44.000 | And then, you know, our multi target loss, which we could do first kind of a segue.
00:39:55.000 | So maybe in terms of the ordering, the segue would be like doing multi target loss first.
00:40:06.000 | And we could talk about softmax and cross entropy, which would then lead us potentially to like looking at the bear classifier.
00:40:18.000 | What if there's no bears?
00:40:23.000 | So we can just use the binary sigmoid.
00:40:29.000 | So then for embeddings, I guess that's where we'd cover the collaborative filtering, collaborative filtering, because that's like a really nice version of embeddings.
00:40:45.000 | So I guess the question is, for those who have done the course before, are there any other topics?
00:40:53.000 | I guess like time permitting, it would be nice to look at like the conf net, what a conf net is.
00:41:00.000 | Just kind of say that. So then we've got like the outputs, the inputs, and then the middle.
00:41:13.000 | What about more NLP stuff? I know people like what?
00:41:18.000 | Well, I've heard that.
00:41:21.000 | I can face is going to integrate it with past day I may be looking at that how it works.
00:41:26.000 | Well, it's not done yet. So you can't do that yet, but definitely in part two.
00:41:33.000 | I got a question. I don't know if it's helpful, but there's a lot of emphasis on outputs and inputs. But like in the middle, just understanding like the outputs of a hidden layer, whether they're going awry or not, how do you kind of debug that?
00:41:47.000 | How do you understand when to kind of look at that?
00:41:51.000 | Yeah, very helpful.
00:41:55.000 | Last time we did a part two, we did a very deep dive into that. And I think we should do that again in a part two, because like most people won't have to debug that because if you're using an off-the-shelf model, you know, like it's, you know, with off-the-shelf initializations, that shouldn't happen.
00:42:18.000 | So it's probably more of an advanced debugging technique, I would say.
00:42:23.000 | But yeah, if you're interested in looking at it now, definitely check out our previous part two, because we did a very deep dive into that and developed the so-forth colorful dimension plot, which is absolutely great for that.
00:42:42.000 | Yeah.
00:42:46.000 | So that would exactly, so collaborative filtering would lead us exactly into that. Thank you.
00:42:52.000 | Yeah, sorry, Sarada.
00:42:55.000 | Do you like to spend finally talking about the importance of the ethical side? At least you point to the resources Rachel prepared before, so I think people, because it's so easy to build a model, but how to apply is getting more scary now.
00:43:15.000 | Yes, I mentioned in lesson one, the data ethics course, but you're right, it would be nice to kind of like touch on something there, wouldn't it?
00:43:27.000 | A lecture by Rachel from part one before. That was a great lecture.
00:43:33.000 | Yeah, I mean, okay, that actually would be a great thing just to talk about, you know, that that lecture is not at all out of date.
00:43:53.000 | So maybe touch on it in this one.
00:43:56.000 | So talk linked to, you know, for varying levels of interest, the two hour version would be Rachel's talk in the 2020 lecture, and then deeper interest you all would be the, yes, the full ethics course.
00:44:11.000 | That's a great point. Thank you.
00:44:22.000 | So then, for, for actually pretty much all of these things,
00:44:32.000 | we have Excel spreadsheets,
00:44:41.000 | which is fun.
00:44:49.000 | So there's, let's have a look collaborative filtering.
00:44:59.000 | Oh, looks like I've already downloaded that.
00:45:06.000 | Thank you, Jeremy, I will encourage you to continue teaching in Excel. Yesterday, I, on the panel in a data science conference, and when I mentioned I start with Excel actually inspire a lot of people, they want to have a go with data science and learning it.
00:45:24.000 | So please do feedback.
00:45:27.000 | Because there's certainly some people who don't find it useful at all. And they tend to be quite loud about about it so certainly nice to hear that, that feedback.
00:45:44.000 | What bother you, Jeremy. Sorry.
00:45:48.000 | So I thought you didn't let those people get to you.
00:45:51.000 | So, I only pretend that anybody doesn't get to me.
00:45:58.000 | I was gonna back up so I don't say that's that was really great to see. I've only seen it done once before.
00:46:05.000 | And that was in a physicist in Belgium who explained, ready to transfer modeling using Excel, and it was just so nice to see the clarity.
00:46:17.000 | Great. Okay, thank you. I will.
00:46:23.000 | Let's see, so we've got.
00:46:30.000 | So I think these are actually from the 2019 course faster one courses deal one.
00:46:39.000 | So I'm just going to grab them all.
00:46:44.000 | So one thing I don't think we're going to cover this year.
00:46:48.000 | This part one that we will cover in part two is like different optimizers like momentum and Adam and stuff, but I think that's okay because I feel like nowadays.
00:47:00.000 | Just use the default Adam W and it works.
00:47:03.000 | So I don't. I think it's fine.
00:47:06.000 | Not to know too much more than that.
00:47:11.000 | It's, it's a little bit of a technicality nowadays. Yeah.
00:47:19.000 | It used to be something we did in one of the first lessons you know but that was when you kind of had to know it right because you always fiddled around with momentum and blah blah blah.
00:47:31.000 | To me, always like the biggest thing when starting on something is to how to, you know, once I figure out how to read in the data, then, so I'm really grateful that there's such an emphasis in this edition of the course on the reading, you know, data, and, you know,
00:47:55.000 | data, that is something that we also stay on the lookout for just understanding better within the data.
00:48:05.000 | Great. I don't think we did this one anymore, because we kind of have better versions in.
00:48:12.000 | In Jupiter with iPod widgets so we've got this fun convolutions example.
00:48:26.000 | Which I think is still valuable.
00:48:31.000 | Okay, we've got soft max
00:48:36.000 | and cross entropy examples.
00:48:41.000 | And we've got collaborative filtering
00:48:48.000 | interesting wonder what that is.
00:48:52.000 | And then,
00:48:55.000 | also we've got word embeddings.
00:49:05.000 | Embeddings are such a cool and important subject, and it's something that we haven't discussed that much in this course. No, we haven't touched them at all.
00:49:19.000 | Great.
00:49:20.000 | All right.
00:49:23.000 | It feels like a lot to cover.
00:49:27.000 | That we will.
00:49:30.000 | We will do our best.
00:49:33.000 | Okay, I think we're up to our hour so thanks everybody. Nice chat today, and I will
00:49:41.000 | get to work on putting this together.
00:49:45.000 | Have a nice weekend.
00:49:47.000 | Thank you so much.
00:49:49.000 | Thanks.
00:49:51.000 | Everyone.
00:49:53.000 | With and bias video today.
00:49:57.000 | I think six o'clock basement time so with anyone interest.
00:50:03.000 | The guy mentioned he Thomas mentioned you're going to have another US session as well, you can join. Yes, I think there's details on the forum.
00:50:11.000 | Yeah.
00:50:12.000 | Thanks.
00:50:13.000 | See ya.