Lesson 14: Deep Learning Foundations to Stable Diffusion

00:00:00.400 | Okay. Hi everybody. And welcome to lesson 14. The numbers are getting up pretty high now, huh?

00:00:06.320 | We had a lesson last time talking about calculus and how we implement the chain rule

00:00:17.120 | in neural network training in an efficient way called backpropagation.

00:00:24.640 | I just wanted to point out that one excellent student, Kaushik Sinha, has produced a very nice

00:00:34.880 | explanation of the code that we looked at last time, and I've linked to it.

00:00:39.760 | So it's got the math and then the code.

00:00:43.520 | The code's slightly different to what I had, but it's basically the same things and minor changes.

00:00:52.320 | And it might be helpful to kind of link between the math and the code to see what's going on.

00:00:58.160 | So you'll find that in the lesson 13 resources.

00:01:03.680 | But I thought I'd just quickly try to explain it as well.

00:01:08.240 | So maybe I could try to copy this and just explain what's going on here.

00:01:20.240 | With this code. So the basic idea is that we have a neural network that is calculating,

00:01:31.200 | well, a neural network and a loss function that together the calculator loss.

00:01:35.440 | So let's imagine that, well, let's just call that the loss function, we'll call it L.

00:01:41.200 | And the loss function is being applied to the output of the neural network.

00:01:47.680 | So the neural network function, we'll call n and that takes two things,

00:01:51.520 | a bunch of weights and a bunch of inputs.

00:01:54.960 | The loss function also requires the targets, but I'm just going to ignore that for now

00:02:02.320 | because it's not really part of what we actually care about.

00:02:05.040 | And what we're interested in knowing is if we want to be able to update the weights.

00:02:11.040 | Let's say this is just a single layer things, keep it simple.

00:02:14.240 | If we want to be able to update the weights, we need to know how does the loss change

00:02:23.520 | if we change the weights, if we change one weight at a time, if you like.

00:02:29.760 | So how would we calculate that?

00:02:32.080 | Well, what we could do is we could rewrite our loss function by saying, well, let's call

00:02:41.040 | capital N the result of the neural network applied to the weights and the inputs.

00:02:48.720 | And that way we can now rewrite the loss function to say L equals, big L equals

00:02:55.920 | little L, the loss function applied to the output of the neural network.

00:03:02.240 | And so maybe you can see where this is going.

00:03:06.240 | We can now say, okay, the derivative of the loss with respect to the weights

00:03:11.280 | is going to be equal to the derivative of the loss

00:03:16.240 | with respect to the outputs of that neural network layer

00:03:20.960 | times, this is the chain rule, the derivative of the outputs of that neural network layer.

00:03:32.880 | I'm going to get my notation consistent since these are not scalar with respect to the weights.

00:03:45.360 | Right, so you can see we can get rid of those and we end up with the change in loss with respect

00:03:53.840 | to the weights. And so we can just say this is a chain rule. That's what the chain rule is.

00:04:02.000 | So the change in the loss with respect to the output of the neural network.

00:04:08.240 | Well, we did the forward pass here and then we took here, this here is where we calculated

00:04:17.280 | the derivative of the loss with respect to the output of the neural network,

00:04:26.640 | which came out from here and ended up in diff. So there it is there. So out.g contains

00:04:36.640 | this derivative. So then to calculate, let's actually do one more. We could also say

00:04:46.880 | the change in the loss with respect to the inputs, we can do the same thing.

00:04:55.040 | With the chain rule times.

00:05:00.880 | And so this time we have the inputs. So here you can see that is this line of code.

00:05:19.120 | So that is the change in the loss with respect to the inputs. That's what input.g means. And it's

00:05:31.200 | equal to the change in the loss with respect to the output. So that's what out.g means.

00:05:43.680 | Times, it's actually matrix times because we're doing matrix calculus, times the this derivative

00:05:50.800 | and since this is a linear layer we were looking at this derivative is simply the weights themselves.

00:05:55.120 | And then we have exactly the same thing

00:05:58.240 | for w.g, which is the change in the loss, the derivative of the loss with respect to the weights.

00:06:10.640 | And so again you've got the same thing. You've got your out.g and remember we actually showed

00:06:14.320 | how we can simplify this into also a matrix product with a transpose as well. So that's how

00:06:20.000 | what's happening in our code is mapping to the math. So hopefully that's useful but as I say do

00:06:30.800 | check out this really nice resource which has a lot more detail if you're interested in digging

00:06:37.280 | deeper. The other thing I'd say is some people have mentioned that they actually

00:06:44.160 | didn't study this at high school which is fine. We've provided resources on the forum

00:06:52.320 | for recommending how to learn the basics of derivatives and the chain rule. And so in

00:07:01.280 | particular I would recommend 3Blue1Brown's Essence of Calculus series and also Khan Academy.

00:07:07.200 | It's not particularly difficult to learn it'll only take you a few hours and then you can

00:07:12.560 | this will make a lot more sense. Or if you did it at high school but you've forgotten it same deal.

00:07:18.720 | So don't worry if you found this difficult because you had forgotten the or had never learned

00:07:27.440 | the you know basic derivative and chain rule stuff. That's something that you can pick up now

00:07:33.440 | and I would recommend doing so. Okay so what we then did last time which is actually pretty

00:07:44.080 | exciting is we got to a point where we had successfully created a training loop which

00:07:50.880 | did these four steps. So and the nice thing is that every single thing here is something

00:07:59.040 | that we have implemented from scratch. Now we didn't always use our implemented from scratch

00:08:03.920 | versions. There's no particular reason to. When we've re-implemented something that already exists

00:08:08.240 | let's use the version that exists. But every single thing here I guess not argmax but that's

00:08:14.240 | trivially easy to implement. Every single thing here we have implemented ourselves. And we

00:08:23.200 | successfully trained a an MNIST model to 90 percent 96 percent accurately recognize handwritten

00:08:32.080 | digits. So I think that's super neat. It's it's this is I mean this this is not a great metric.

00:08:42.240 | It's only looking at the training session in particular it's only looking at one batch of

00:08:45.600 | the training set. Since last time I've just refactored a little bit I've pulled out this

00:08:50.080 | report function which is now just running at the end of each epoch. And it's just printing out

00:08:59.200 | the loss and the accuracy. Just something I wanted to mention here is hopefully you've seen

00:09:05.040 | fstrings before. They're a really helpful part of Python that lets you pop a variable or an

00:09:12.400 | expression inside curly braces in a string and it it'll evaluate it. You might not have seen this

00:09:18.720 | colon thing. This is called a format specifier. And with a format specifier you can change how

00:09:26.560 | things are printed in an fstring. So this is how I'm printing it to do decimal places.

00:09:31.280 | This says a two decimal places floating point number called loss printed out here followed

00:09:38.000 | by a comma. So I'm not going to show you how to use those other than to say yeah

00:09:44.320 | Python fstrings and format specifiers are really helpful. And so if you haven't used them before

00:09:50.720 | do go look them up tutorial of the documentation because they're definitely something that you'll

00:09:56.640 | probably find useful to know about. Okay so let's just rerun all those lines of code.

00:10:02.480 | If you're wondering how I just reran all the cells above where I was there's a

00:10:10.000 | cell here there's run all above. And it's so helpful that I always make sure there's a

00:10:19.920 | keyboard shortcut for that. So you can see here I've added a keyboard shortcut QA. So if I type

00:10:27.520 | QA it runs all cells above. If I type QB it runs all cells below. And so yeah stuff that you do a

00:10:36.000 | lot make sure you've got keyboard shortcuts for them. You don't want to be fiddling around moving

00:10:39.600 | around your mouse everywhere. You want it to be as easy as thinking. So this is really exciting.

00:10:46.080 | We've successfully built and trained a neural network model from scratch and it works okay.

00:10:51.280 | It's a bit clunky there's a lot of code there's features we're missing. So let's start refactoring

00:10:57.520 | it. And so refactoring is all about making it so we have to write less code to do the same work.

00:11:11.360 | And so we're now going to I'm going to show you something that's part of PyTorch and then I'm

00:11:17.440 | going to show you how to build it. And then you'll see why this is really useful. So PyTorch has a

00:11:23.680 | sub module called nn torch.nn and in there there's something called the module class. Now we can we

00:11:29.760 | don't normally use it this way but I just want to show you how it works. We can create an instance

00:11:33.520 | of it in the usual way where we create instances of classes. And then we can assign things to

00:11:39.600 | attributes of that module. So for example it's assign a linear layer to it. And if we now print out

00:11:48.080 | that you'll see it says oh this is a module containing something called foo which is a

00:11:55.520 | linear layer. But here's something quite tricky. This module we can say show me all of the named

00:12:04.240 | children of that module. And it says oh there's one called foo and it's a linear layer.

00:12:09.760 | And we can say oh show me all of the parameters of this module. And it says oh okay sure there's two

00:12:20.560 | of them. There's this four by three tensor that's the weights. And there's this four long vector

00:12:30.160 | that's the biases. And so somehow just by creating this module and assigning this to it it's

00:12:38.720 | automatically tracked what's in this module and what are its parameters. That's pretty neat. So

00:12:45.200 | we're going to see both how and why it does that. I'm just going to point out by the way why did I

00:12:50.640 | add list here. If I just said m1.name children it just prints out generator object which is not very

00:12:58.880 | helpful. And that's because this is a kind of iterator called a generator. And it's something

00:13:07.920 | which is going to only produce the contents of this when I actually do something with it

00:13:14.000 | such as list them out. So just popping a list around a generator is one way to like run the

00:13:18.880 | generator and get its output. So that's a little trick when you want to look inside a generator.

00:13:26.800 | Okay so now as I said we don't normally use it this way. What we normally do

00:13:33.600 | is we create our own class. So for example we create our own multi-layer perceptron

00:13:37.600 | and we inherit it. We inherit from nn.module. And so then in done to enter this is the thing that

00:13:43.760 | constructs an object of the class. This is the special magic method that does that. We'll say

00:13:50.640 | okay well how many inputs are there to this multi-layer perceptron. How many hidden activations.

00:13:56.240 | And how many output activations are there. So just be one hidden layer. And then here we can do just

00:14:01.680 | like we did up here where we assigned things as attributes. We can do that in this constructor.

00:14:08.880 | So we'll create an l1 attribute which is a linear layer from number in to number hidden.

00:14:13.760 | L2 is a linear layer from number hidden to number out. And we'll also create a value. And so when we

00:14:25.840 | call that module we can take the input that we get and run the linear layer and then run the

00:14:38.400 | value and then run the l2. And so I can create one of these as you see. And I can have a look and see

00:14:50.800 | like oh here's the attribute l1. And there it is like I had. And I can say print out the model

00:14:58.560 | and the model knows all the stuff that's in it. And I can go through each of the named children

00:15:05.840 | and print out the name and the layer. Now of course if you remember although you can use

00:15:14.720 | dundercall we actually showed how we can refactor things

00:15:21.440 | using forward such that it would automatically kind of do the things necessary to make

00:15:29.680 | all the you know automatic gradient stuff work correctly. And so in practice

00:15:39.680 | we're actually not going to do dundercall we would do forward. So this is an example of creating

00:15:47.920 | a custom PyTorch module. And the key thing to recognize is that it knows what are all the

00:15:54.720 | attributes you added to it. And it also knows what are all the parameters. So if I go through

00:16:01.680 | the parameters and print out their shapes you can see I've got my linear layer's weights.

00:16:06.080 | First linear layer sorry second linear layer. My first linear layer's weights. My first linear

00:16:12.080 | layer's biases. Second linear layer's weights. Second linear layer's biases. And this 50 is

00:16:17.280 | because we set nh the number of hidden to 50. So why is that interesting? Well because now

00:16:30.080 | I don't have to write all this anymore going through layers and having to make sure that

00:16:38.800 | they've all been put into a list. We've just been able to add them as attributes and they're

00:16:44.720 | automatically going to appear as parameters. So we can just say go through each parameter

00:16:49.360 | and update it based on the gradient and the learning rate. And furthermore you can actually

00:16:57.280 | just go model.zero grad and it'll zero out all of the gradients. So that's really made our code

00:17:05.520 | quite a lot nicer and quite a lot more flexible which is cool.

00:17:09.840 | So let's check that this still works.

00:17:14.240 | There we go. So just to clarify with if I called report on this before I ran it

00:17:25.920 | as you would expect the accuracy is about 8% or about 10% bit less and the loss is pretty high.

00:17:32.400 | And so after I run this fit this model the accuracy goes up and the loss goes down.

00:17:41.360 | So basically it's all of this exactly the same as before. The only thing I've changed are these

00:17:47.680 | two lines of code. So that's a really useful refactoring. So how on earth did this happen?

00:17:52.960 | How did it know what the parameters and layers are automatically?

00:17:58.320 | It used a trick called dunder setatra and we're going to create our own nn.module now.

00:18:07.680 | So if there was no such thing as nn.module here's how we'd build it.

00:18:16.000 | And so let's actually build it and also add some things to it. So in dunder init

00:18:20.800 | we would have to create a dictionary for our named children. This is going to contain a list

00:18:25.840 | a dictionary of all of the layers. Okay and then just like before we'll create a couple of

00:18:30.880 | linear layers right and then what we're going to do is we're going to define this special

00:18:36.720 | magic thing that python has called dunder setatra and this is called automatically by python if you

00:18:42.160 | have it every time you set an attribute such as here or here and it's going to be past the name

00:18:49.360 | of the attribute the key and the value is the actual thing on the right hand side of the equal

00:18:54.400 | sign. Now generally speaking things that start with an underscore we use for either for private

00:19:02.160 | stuff so we check that it doesn't start with an underscore and if it doesn't start with an

00:19:08.480 | underscore setatra will put this value into the module's dictionary with this key and then

00:19:21.440 | call pythons the normal python setatra to make sure it just actually does the attribute setting.

00:19:30.640 | So super is how you call whatever is in the the super class the base class. So another useful

00:19:38.960 | thing to know about is how do we how does how does it do this nifty thing where you can just type the

00:19:43.840 | name and it kind of lists out all this information about it. That's a special thing called dunder

00:19:48.960 | repra. So here dunder repra will just have it return a stringified version of the modules dictionary

00:19:57.680 | and then here we've got parameters. How did parameters work? So how did this thing work?

00:20:02.880 | Well we can go through each of those modules go through each value so the values of the modules

00:20:11.440 | is all the actual layers and then go through each of the parameters in each module and

00:20:17.840 | yield p so that's going to that's going to create an iterator if you remember when we

00:20:22.880 | looked at iterators for all the parameters. So let's try it so we can create one of these modules

00:20:27.840 | and if we just like before loop through its parameters there they are.

00:20:32.480 | Now I'll just mention something that's optional kind of like advanced python that a lot of people

00:20:40.400 | don't know about which is there's no need to loop through a list or a generator or I guess say loop

00:20:47.680 | through an iterator and yield there's actually a shortcut which is you can just say

00:20:53.280 | yield from and then give it the iterator and so with that we can get this all down

00:21:04.720 | to one line of code and it will do exactly the same thing. So that's basically saying

00:21:10.880 | yield one at a time everything in here that's what yield from does.

00:21:16.640 | So there's a cool little advanced python thing totally optional but if you're interested I think

00:21:22.000 | it can be kind of neat. So we've now learned how to create our own implementation of nn.module

00:21:29.120 | and therefore we are now allowed to use PyTorch's nn.module so that's good news.

00:21:37.840 | So how would we do using the PyTorch nn.module how would we create the model

00:21:48.400 | that we started with which is where we had this self.layers because we want to somehow

00:21:56.960 | register all of these all at once that's not going to happen based on the code we just wrote.

00:22:04.800 | So to do that let's have a look we can so let's make a list of the layers we want

00:22:15.280 | and so we'll create again a subclass of nn.module make sure you call the super classes in it first

00:22:23.760 | and we'll just store the list of layers and then to tell PyTorch about all those layers

00:22:34.000 | we basically have to loop through them and call add-module and say what the name of the module is

00:22:40.480 | and what the module is and again you probably should have used forward here in the first place

00:22:49.520 | and you can see this is now done exactly the same thing okay so if you've used a sequential model

00:22:57.840 | before you'll see or you can see that we're on the path to creating a sequential model.

00:23:03.040 | Okay so ganache has asked an interesting question which is what on earth is super

00:23:09.120 | calling because we actually in fact we don't even need the parentheses here we actually don't have

00:23:15.360 | a base class that's because if you don't put any parentheses or if you put empty parentheses

00:23:22.320 | it's actually a shortcut for writing that and so python has stuff in object which does you know

00:23:31.440 | all the normal objecty things like storing your attributes so that you can get them back later

00:23:37.200 | so that's what's happening there.

00:23:39.680 | Okay so this is a little bit awkward is to have to store the list and then enumerate and call

00:23:50.160 | add-module so now that we've implemented that from scratch we can use PyTorch's version which is

00:23:55.840 | they've just got something called module list that just does that for you okay so if you use

00:24:00.640 | module list and pass it a list of layers it will just go ahead and register them all those modules

00:24:06.320 | for you so here's something called sequential model so this is just like an n not sequential now

00:24:10.480 | so if i create it passing in the layers there you go you can see there's my model containing my

00:24:17.360 | module list with my layers and so i don't know why i never used forward for these things it's silly

00:24:27.600 | um i guess it doesn't matter terribly in this stage but anyhow okay so

00:24:31.760 | call fit and there we go okay so um so in forward here i just go through each layer

00:24:45.040 | and i set the result of that equal to calling that layer on the previous result and then pass

00:24:51.120 | and return it at the end now there's a little um another way of doing this which i think is

00:24:55.680 | kind of fun it's not like shorter or anything at this stage i just wanted to show an example of

00:25:01.040 | something that you see quite a lot in machine learning code which is the use of reduce

00:25:05.280 | this implementation here is exactly the same as this thing here

00:25:12.160 | so let me explain how it works what reduce does so reduce is a very common kind of like fundamental

00:25:22.800 | computer science concept reductions this is something that does a reduction

00:25:26.880 | and what a reduction is is it something that says

00:25:28.960 | start with the third parameter some initial value so we're going to start with x the thing we're

00:25:37.600 | being passed and then loop through a sequence so look through each of our layers and then for each

00:25:44.800 | layer call some function here is our function and the function is going to get passed first

00:25:54.560 | time around it'll be past the initial value and the first thing in your list so your first layer

00:26:00.640 | and x so it's just going to call my the layer function on x the second time around it takes

00:26:08.240 | the output of that and passes it in as a circuit as the first parameter and passes in the second

00:26:13.680 | layer so then the second time this goes through it's going to be calling the second layer on the

00:26:18.960 | result of the first layer and so forth and that's what a reduction is and so when you might see

00:26:25.440 | reduce you'll certainly see it talked about quite a lot in in papers and books and you might sometimes

00:26:32.000 | also see it in code it's a very general concept and so here's how you can implement a sequential

00:26:40.560 | model using reduce so there's no explicit loop there although the loop's still happening internally

00:26:48.320 | all right so now that we've reimplemented sequential we can just go ahead and use

00:26:53.040 | pytorch's version so there's nn.sequential we can pass in our layers

00:26:58.160 | and we can fit not surprisingly we can see the model so yeah looks very similar to the one we

00:27:04.960 | built ourselves all right so this thing of looping three parameters and updating our parameters

00:27:20.160 | based on gradients and a learning rate and then zeroing them is very common so common that there

00:27:31.760 | is something that does that all for us and that's called an optimizer it's the stuff in optim so

00:27:38.320 | let's create our own optimizer and as you can see it's just going to do the two things we just saw

00:27:43.840 | it's going to go through each of the parameters and update them using the gradient and the

00:27:50.800 | learning rate and there's also zero grad which will go through each parameter and set their

00:27:58.720 | gradients to zero if you use dot data it's like it's just a way of avoiding having to say torch

00:28:05.520 | dot no grad basically okay so in optimizer we're going to pass it the parameters that we want to

00:28:10.640 | optimize and we're going to pass it the learning rate and we're just going to store them away

00:28:15.520 | and since the parameters might be a generator we'll call list to to turn them into a list

00:28:24.400 | so we are going to create our optimizer pass it in the model dot parameters which have been

00:28:29.840 | automatically constructed for us by nn dot module and so here's our new loop now we don't have to

00:28:35.120 | do any of the stuff manually we can just say opt dot step so that's going to call this

00:28:42.400 | and opt dot zero grad and that's going to call this there it is so we've now built our own sgd

00:28:53.840 | optimizer from scratch so i think this is really interesting right like these things which seem

00:28:59.760 | like they must be big and complicated once we have this nice structure in place you know an sgd

00:29:06.320 | optimizer doesn't take much code at all and so it's all very transparent simple clear if you're

00:29:13.680 | having trouble using complex library code that you've found elsewhere you know this can be a

00:29:20.720 | really good approach is to actually just go all the way back remove as you know as many

00:29:25.920 | of these abstractions as you can and like run everything by hand to see exactly what's going

00:29:32.480 | on it can be really freeing to see that you can do all this anyways since pytorch has this for us

00:29:40.480 | in torch dot optium it's got a optium dot sgd and just like our version you pass in the parameters

00:29:47.520 | and you pass in the learning rate so you really see it is just the same so let's define something

00:29:52.640 | called get model that's going to return the model the sequential model and the optimizer for it

00:30:02.000 | so if we go model comma opt equals get model and then we can call the loss function to see where

00:30:08.400 | it's starting and so then we can write our training loop again go through each epoch

00:30:18.720 | go through each starting point for our for our batches grab the slice slice into our x and y in

00:30:28.880 | the training set calculate our predictions calculate our loss do the backward pass do the

00:30:34.480 | optimizer step do the zero gradient and print out how you're going at the end of each one and there

00:30:40.080 | we go all right so let's keep making this simpler there's still too much code so one thing we could

00:30:49.440 | do is we could replace these lines of code with one line of code by using something we'll call

00:30:56.880 | the data set plus so the data set class is just something that we're going to pass in our

00:31:02.480 | independent and dependent variable we'll store them away as self dot x and self dot y

00:31:08.960 | we'll have something so if you if you define done to len then that's the thing that allows the

00:31:16.800 | len function to work so the length of the data set will just be the length of the independent

00:31:21.040 | variables and then done to get item is the thing that will be called automatically anytime you use

00:31:27.680 | square brackets in python so that just is going to call this function passing in

00:31:32.800 | the indices that you want so when we grab some items from our data set we're going to return a

00:31:39.040 | tuple of the x values and the y values so then we'll be able to do this so let's create a data

00:31:47.760 | set using this tiny little three line class it's going to be a data set containing the x and y

00:31:54.640 | training and they'll create another data set containing the x and y valid and those two

00:31:59.920 | data sets will call train ds and valid ds so let's check the length of those data sets

00:32:07.040 | should be the same as the length of the x's and they are and so now we can do exactly what we

00:32:15.200 | hope we could do we can say xb comma yb equals train ds and pass in some slice

00:32:21.280 | so that's going to give us back our check the shapes are correct it should be five by 28 by 28

00:32:32.400 | five by 28 times 28 and the y's should just be five and so here they are the x's and the y's

00:32:41.840 | so that's nice we've created a data set from scratch and again it's not complicated at all

00:32:48.160 | and if you look at the actual pytorch source code this is basically all data sets do so let's try

00:32:54.240 | it we call get model and so now we've replaced our data set line with this one and as per usual

00:33:02.080 | it still runs and so this is what i do when i'm writing code is i try to like always make sure

00:33:09.920 | that my starting code works as i refactor and so you can see all the steps and so somebody reading

00:33:15.360 | my code can then see exactly like why am i building everything i'm building how does it all fit in see

00:33:19.920 | that it still works and i can also keep it clear in my own head so i think this is a really nice way

00:33:24.800 | of implementing libraries as well all right so now we're going to replace these two lines of code

00:33:35.440 | with this one line of code so we're going to create something called a data loader and a data

00:33:40.000 | loader is something that's just going to do this okay so we need to create an iterator

00:33:45.520 | so an iterator is a class that has a dunder iter method when you say for n in python behind the

00:33:55.760 | scenes it's actually calling dunder iter to get a special object which it can then

00:34:04.400 | loop through using yield so it's basically getting this thing that you can iterate through using

00:34:09.280 | yield so a data loader is something that's going to have a data set and a batch size because we're

00:34:16.400 | going to go through the batches and grab one batch at a time so we have to store away the

00:34:23.360 | data set and the batch size and so when you when we call the for loop it's going to call dunder

00:34:28.240 | iter we're going to want to do exactly what we saw before go through the range just like we did

00:34:34.000 | before and then yield that bit of the data set and that's all so that's a data loader

00:34:43.600 | so we can now create a train data loader and a valid data loader from our train data set and

00:34:48.480 | valid data set and so now we can if you remember the way you can get one thing out of an iterator

00:34:57.440 | so you don't need to use a for loop you can just say iter and that will also call dunder iter

00:35:03.120 | next we'll just grab one value from it so here we will run this and you can see we've now just

00:35:08.640 | confirmed we've xb is a 50 by 784 and yb there it is and then we can check what it looks like so

00:35:20.240 | let's grab the first element of our x batch make it 28 by 28 and there it is so now that we've got

00:35:30.720 | a data loader again we can grab our model and we can simplify our fit function to just go for xb yb

00:35:37.920 | and train deal so this is getting nice and small don't you think and it still works the same way

00:35:44.480 | okay so this is really cool and now that it's nice and concise we can start adding features to it

00:35:52.560 | so one feature i think we should add is that our training set each time we go through it

00:35:59.600 | it should be in a different order it should be randomized the order so instead of always

00:36:08.800 | just going through these indexes in order we want some way to say go use random indexes

00:36:15.680 | so the way we can do that is creates a class called sampler and what sampler is going to do

00:36:22.400 | i'll show you is if we create a sampler without shuffle without randomizing it

00:36:32.960 | it's going to simply return all the numbers from zero up to n in order and it'll be an

00:36:40.880 | iterator see this is done to it but if i do want it shuffled then it will randomly shuffle them

00:36:47.600 | so here you can see i've created a sampler without shuffle so if i then make an iterator from that

00:36:54.320 | and print a few things from the iterator you can see it's just printing out the indexes it's going

00:37:00.800 | to want or i can do exactly the same thing as we learned earlier in the course using i slice

00:37:07.280 | we can grab the first five so here's the first five things from a sampler when it's not shuffled

00:37:12.000 | so as you can see these are just indexes so we could add shuffle equals true and now that's

00:37:19.920 | going to call random dot shuffle which just randomly permits them and now if i do the same thing

00:37:26.240 | i've got random indexes of my source data

00:37:29.600 | so why is that useful well what we could now do is create something called a batch sampler and

00:37:38.880 | what the batch sampler is going to do is it's going to basically do this i slice thing for us

00:37:44.320 | so we're going to say okay pass in a sampler so that's something that generates indices

00:37:48.480 | and pass in a batch size and remember we've looked at chunking before it's going to chunk

00:37:54.960 | that iterator by that batch size

00:37:57.760 | and so if i now say all right please take our sampler and create batches of four

00:38:07.920 | as you can see here it's creating batches of four indices at a time so rather than

00:38:18.400 | just looping through them in order i can now loop through this batch sampler

00:38:25.600 | so we're going to change our data loader so that now it's going to

00:38:34.560 | take some batch sampler and it's going to loop through the batch sampler that's going to give

00:38:46.240 | us indices and then we're going to get that data set item from that batch for everything in that

00:38:53.120 | batch so that's going to give us a list and then we have to stack all of the x's and all of the

00:39:02.720 | y's together into tensors so i've created something here called collate function and we're going to

00:39:12.000 | default that to this little function here which is going to grab our batch pull out the x's and y

00:39:22.640 | separately and then stack them up into tensors so this is called our collate function okay so if

00:39:31.840 | we put all that together we can create a training sampler which is a batch sampler over the training

00:39:38.000 | set with shuffle true a validation sampler will be a batch sampler over the validation set with

00:39:46.240 | shuffle false and so then we can pass that into this data loader class the training data set

00:39:56.320 | and the training sampler and the collate function which we don't really need because

00:40:01.840 | it's we're just using the default one so i guess we can just get rid of that

00:40:04.960 | and so now there we go we can do exactly the same thing as before x b y b is next iter

00:40:15.440 | and this time we use the valid data loader check the shapes and this is how

00:40:24.720 | PyTorch's actual data loaders work this is the this is all the pieces they have they have samplers

00:40:33.520 | they have batch samplers they have a collation function and they have data loaders

00:40:39.920 | so remember that

00:40:43.920 | what i want you to be doing for your homework is experimenting with these carefully to see exactly

00:40:57.280 | what each thing's taking in okay so PyTorch is asking on the chat what is this collate thing

00:41:03.840 | doing okay so collate function it defaults to collate what does it do well let's see let's go

00:41:13.840 | through each of these steps okay so we need so when we've got a batch sampler so let's do

00:41:20.320 | just the valid sampler okay so the batch sampler here it is so we're going to go through each

00:41:31.120 | thing in the batch sampler so let's just grab one thing from the batch sampler okay so the

00:41:36.640 | output of the batch sampler will be next it uh okay so here's what the batch sampler contains

00:41:47.040 | all right just the first 50 digits not surprisingly because this is our validation sampler

00:41:51.680 | if we did a training sampler that would be randomized there they are okay so then what

00:41:57.680 | we then do is we go self.dataset i for i and b so let's copy that copy

00:42:07.200 | paste and so rather than self.dataset i we'll just say

00:42:15.760 | valid dsi

00:42:18.880 | oh and it's not i and b it's i and o that's what we called it

00:42:28.560 | um oh we did it for training sorry training okay so what it's created here is

00:42:43.520 | a list of tuples of tensors i think let's have a look so let's have a look so we'll call this

00:42:50.240 | um p whatever so p zero okay is a tuple it's got the x and the y independent independent variable

00:43:09.680 | so that's not what we want what we want is something that we can loop through we want

00:43:16.000 | to get batches so what the collation model is going to do sorry not the collation model the

00:43:23.440 | collate function is going to do is it's going to take all of our x's and all of our y's and

00:43:32.240 | collect them into two tensors one tensor of x's and one tensor of y's so the way it does that is

00:43:37.760 | it first of all calls zip

00:43:40.640 | so zip is a very very commonly used python function it's got nothing to do with the compression

00:43:53.120 | program zip but instead what it does is it effectively allows us to like transpose things

00:43:58.000 | so that now as you can see we've got all of the second elements or index one elements

00:44:07.200 | all together and all of the index zero elements together

00:44:10.240 | and so then we can stack those all up together

00:44:13.600 | and that gives us our y's for our batch so that's what collate does so the collate function is used

00:44:25.600 | an awful lot um uh in in pytorch increasingly nowadays where hugging face stuff uses it a lot

00:44:38.320 | and so we'll be using it a lot as well um and basically it's a thing that allows us to customize

00:44:43.840 | how the data that we get back from our data set once it's been kind of generating a list of of

00:44:52.000 | things from the data set how do we put it together into some into a bunch of things

00:44:57.760 | that our model can take as inputs because that's really what we want here so that's

00:45:02.160 | what the collation function does oh this is the wrong way around

00:45:14.720 | like so um this is um something that i do so often that fast core has a quick little shortcut

00:45:22.400 | for it just called store address store attributes and so if you just put that in your dunder init

00:45:28.480 | then you just need one line of code and it does exactly the same thing so there's a little

00:45:33.680 | shortcut as you see and so you'll see that quite a bit all right let's have a um seven minute break

00:45:42.800 | and uh see you back here very soon and we're going to look at a multi-processing data loader

00:45:49.840 | and then we all have nearly finished this notebook all right see you soon

00:45:54.000 | all right let's keep going um

00:46:06.880 | so we've seen how to create a data loader um and uh sampling from it um

00:46:14.480 | the pytorch data loader works exactly like this but um it uses a lot more code because

00:46:26.480 | it implements um multi-processing and so multi-processing means that the actual this thing here

00:46:35.440 | that code can be run uh in multiple processes they can be run in parallel for multiple items

00:46:42.800 | so this code for example might be opening up a jpeg rotating it flipping it etc right so because

00:46:52.240 | remember this is just calling the dunder get item uh for a data set so that could be doing a lot of

00:46:57.840 | work for each item and we're doing it for every item in the batch so we'd love to do those all in

00:47:01.840 | parallel so i'll show you a very quick and dirty way that basically does the job um so um python

00:47:12.640 | has a multi-processing library um it doesn't work particularly well with pytorch tenses so pytorch

00:47:20.160 | has created an exact reimplementation of it so it's identical api wise but it does work well with

00:47:25.920 | tenses so this is basically we'll just grab the multi-processing so this is not quite cheating

00:47:30.640 | because multi-processing isn't the standard library and this is api equivalent so i'm going to say

00:47:36.320 | we're allowed to do that um so as we've discussed you know when we call square brackets on a class

00:47:47.040 | it's actually identical to calling the dunder get item function on on the object so you can see here

00:47:56.800 | if we say give me items three six eight and one it's the same as calling dunder get item passing in

00:48:04.240 | three six eight and one now why does this matter well i'll show you why it matters because we're

00:48:13.360 | going to be able to use map and i'll explain why we want to use map in a moment map is a really

00:48:17.600 | important concept you might have heard of map reduce so we've already talked about reductions

00:48:21.520 | and what those are um maps are kind of the other key piece map is something which takes a sequence

00:48:28.160 | and calls a function on every element of that sequence so imagine we had a couple of batches

00:48:35.680 | of indices three and six and eight and one then we're going to call dunder get item

00:48:42.560 | on each of those batches so that's what map does map calls this function on every element

00:48:50.080 | of this sequence and so that's going to give us the same stuff but now this same as this but now

00:48:58.880 | batched into two batches now why do we want to do that because multi-processing has something

00:49:06.960 | called pool where you can tell it how many workers you want to run how many processes you want to run

00:49:13.280 | and it then has a map which works just like the python normal python map but it runs this

00:49:20.240 | function in parallel over the items from this iterator so this is how we can create a

00:49:28.000 | multi-processing data loader so here we're creating our data loader and again we don't

00:49:36.240 | actually need to pass in the collect function because we're using the default one so if we

00:49:39.280 | say nworkers equals two and then create that if we say next see how it's taking a moment

00:49:46.800 | and it took a moment because it was firing off those two workers in the background so the first

00:49:51.440 | batch actually comes out more slowly but the reason that we would use a multi-processing data loader

00:49:57.520 | is if this is doing a lot of work we want it to run in parallel and even though the first

00:50:04.400 | the first item might come out a bit slower once those processes are fired up it's going to be

00:50:09.600 | faster to run so this is yeah this is a really simplified multi-processing data loader because

00:50:16.080 | this needs to be super super efficient PyTorch has lots more code than this to make it much more

00:50:23.440 | efficient but the idea is this and this is actually a perfectly good way of experimenting

00:50:30.960 | or building your own data loader to make things work exactly how you want so now that we've

00:50:37.120 | re-implemented all this from PyTorch let's just grab the PyTorch and as you can see they're exactly

00:50:42.560 | the same data loader they don't have one thing called sampler that you pass shuffle to they have

00:50:47.840 | two separate classes called sequential sample and random sampler I don't know why they do it that

00:50:51.680 | way it's a little bit more work to me but same idea and I got batch sampler and so it's exactly

00:50:57.760 | the same idea the training sampler is a batch sampler with a random sampler the validation

00:51:04.160 | sampler is a batch sampler with a sequential sampler passing batch sizes and so we can now

00:51:10.640 | pass those samplers to the data loader this is now the PyTorch data loader and just like ours it

00:51:17.920 | also takes a collate function okay and it works cool so that's as you can see it's it's doing

00:51:34.000 | exactly the same stuff that ours is doing with exactly the same API and it's got some shortcuts

00:51:40.000 | as I'm sure you've noticed when you've used data loaders so for example calling batch sampler is

00:51:48.400 | very going to be very very common so you can actually just pass the batch size directly

00:51:53.360 | to a data loader and it will then auto create the batch samplers for you so you don't have to pass

00:51:58.960 | in batch sampler at all instead you can just say sampler and it will automatically wrap that in

00:52:05.200 | a batch sampler for you that does exactly the same thing and in fact because it's so common to create

00:52:11.520 | a random sampler or a sequential sampler for a data set you don't have to do that manually you

00:52:16.400 | can just pass in shuffle equals true or shuffle equals false to the data loader and that does

00:52:20.960 | again exactly the same thing there it is now something that is very interesting is that

00:52:33.200 | when you think about it the batch sampler and the collation function are things which are taking

00:52:40.320 | the result of the sampler looping through them and then collating them together but what we could do

00:52:48.080 | is actually because our data sets know how to grab multiple indices at once

00:52:59.440 | we can actually just use the batch sampler as a sampler we don't actually have to loop through

00:53:12.080 | them and collate them because they're basically instantly they come pre-collated so this is a

00:53:20.320 | trick which actually hugging face stuff can use as well and we'll be seeing it again so this is

00:53:25.040 | an important thing to understand is how come we can pass a batch sampler to sampler and what's it

00:53:30.320 | doing and so rather than trying to look through the pytorch code i suggest going back to our

00:53:35.520 | non-multiprocessing pure python code to see exactly how that would work

00:53:41.360 | because it's a really nifty trick for things that you can grab multiple things from at once

00:53:50.160 | and it can save a whole lot of time it can make your code a lot faster

00:53:53.600 | okay so now that we've got all that nicely implemented we should now add a validation

00:54:00.480 | set and there's not really too much to talk about here we'll just take our fit function

00:54:05.040 | and this is exactly the same code that we had before and then we're just going to add something

00:54:13.200 | which goes through the validation set and gets the predictions and sums up the losses and accuracies

00:54:22.480 | and from time to time prints out the loss and accuracy

00:54:31.760 | and so get-dls we will implement by using the pytorch data loader now and so now our whole

00:54:41.440 | process will be get-dls passing in the training and validation data set notice that for our

00:54:48.320 | validation data loader i'm doubling the batch size because it doesn't have to do back propagation so

00:54:54.160 | it should use about half as much memory so i can use a bigger batch size get our model and then

00:55:00.400 | call this fit and now it's printing out the loss and accuracy on the validation set so finally we

00:55:10.800 | actually know how we're doing which is that we're getting 97 accuracy on the validation set

00:55:18.000 | and that's on the whole thing not just on the last batch so that's cool we've now implemented

00:55:23.440 | a proper working sensible training loop it's still you know a bit more code than i would

00:55:33.120 | like but it's not bad and every line of code in there and every line of code it's calling

00:55:38.240 | is all stuff that we have built ourselves reimplemented ourselves so we know exactly

00:55:45.680 | what's going on and that means it's going to be much easier for us to create anything we

00:55:50.480 | can think of we don't have to rely on other people's code

00:55:53.520 | so hopefully you're as excited about that as i am because it really opens up a whole world for us

00:56:06.480 | so one thing that we're going to want to be able to do now that we've got a training loop

00:56:15.280 | is to grab data and there's a really fantastic library of data sets available on hugging face

00:56:26.560 | nowadays and so let's look at how we use those data sets now that we know how to bring things

00:56:34.080 | into data loaders and stuff so that now we can use the entire world of hugging face data sets

00:56:41.680 | with our code so we're going to so you need to pip install data sets

00:56:50.080 | and once you've piped and stored data sets you'll be able to say from data sets import

00:56:56.560 | and you can import a few things i just these two things now load data set load data set builder

00:57:01.600 | and we're going to look at a a data set called fashion mnist and so the way things tend to work

00:57:10.160 | with hugging face is there's something called the hugging face hub which has models and it has data

00:57:14.720 | sets amongst other things and generally you'll give them a name and you can then say in this case

00:57:23.200 | load a data set builder for fashion mnist now a data set builder is just basically something

00:57:30.720 | which has some metadata about about this data set so the data set builder has a dot info

00:57:39.920 | and the dot info has a dot description and here's a description of this and as you can see again

00:57:45.920 | we've got 28 by 28 grayscale so it's going to be very familiar to us because it's just like mnist

00:57:51.600 | and again we've got 10 categories and again we've got 60 000 training examples and again we've got

00:57:57.040 | 10 000 test examples so this is this is cool so as it says it's direct drop and replacement for mnist

00:58:03.040 | and so the data set builder also will tell us what are what's in this data set and so

00:58:16.240 | hugging face stuff generally uses dictionaries rather than tuples so there's going to be an image

00:58:21.920 | of type image there's going to be a label of type class label there's 10 classes and these are the

00:58:28.400 | names of the classes so it's quite nice that in hugging face data sets you know we can kind of

00:58:33.840 | get this information directly it also tells us if there are some recommended training test splits

00:58:40.880 | we can find out those as well so this is the size of the training split and the number of examples

00:58:50.880 | so now that we're ready to start playing it with it we can load the data set okay so this is a

00:58:56.560 | different string load data set builder versus load data set so this will actually download it

00:59:00.800 | cache it and here it is and it creates a data set dictionary so a data set dictionary if you've

00:59:10.080 | used fastai is basically just like what we call the data sets class they call the data set dict

00:59:15.280 | class so it's a dictionary that contains in this case a train and a test item and those are data

00:59:22.560 | sets these data sets are very much like the data sets that we created in the previous notebook

00:59:28.080 | so we can now grab the training and test items from that dictionary and just pop them into

00:59:38.320 | variables and so we can now have a look at the zero index thing in training and just like we

00:59:45.280 | were promised it contains an image and a label so as you can see we're not getting tuples anymore

00:59:51.680 | we're getting dictionaries containing the x and the y in this case image and label so i'm going

00:59:57.440 | to get i'm pretty bored writing image and label and strings all the time so i'm just going to

01:00:02.320 | store them as x and y so x is going to be the string image and y will be the string label

01:00:07.200 | um i guess the other way i could have done that would have been to say x comma y equals that

01:00:22.400 | that would probably be a bit neater um because it's coming straight from the features and if you

01:00:29.840 | if you iterate into a dictionary you get back it's it's keys that's why that works so anyway i've

01:00:36.320 | done it manually here which is a bit sad but there you go okay so we can now grab the from train zero

01:00:44.320 | which we've already seen we can grab the x i.e the image and there it is there's the image

01:00:59.520 | we could grab the first five images and the first five labels for example and there they are now

01:01:07.920 | we already know what the names of the classes are so we could now see what these map to

01:01:16.000 | by grabbing those features so there they are so um this is a special hugging face class

01:01:27.440 | which most libraries have something including fastai that works like this there's something

01:01:32.480 | called int to string which is going to take these and convert them to these so if i call it

01:01:39.600 | on our y batch you'll see we've got first is ankle boot and there that is indeed an ankle boot

01:01:46.400 | then we'll have a couple of t-shirts and address

01:01:54.160 | okay so um how do we use this to train a model well we're going to need a data loader

01:02:01.760 | and we want a data loader that for now we're going to do just like we've done it before it's going to

01:02:07.440 | return um uh well actually we're going to do something a bit different we're going to have

01:02:15.200 | our collate function is actually going to return a dictionary actually this is pretty common for

01:02:21.040 | um hugging face stuff um and pytorch doesn't mind if you it's happy for you to return a dictionary

01:02:28.160 | from a collation function so rather than returning a tuple of the stacked up hopefully this looks very

01:02:34.960 | familiar this looks a lot like the thing that goes through the data set for each one and stacks them

01:02:40.720 | up just like we did um in the previous notebook so that's what we're doing we're doing all in one

01:02:45.680 | step here in our collate function and then again exactly the same thing go through our batch grab

01:02:52.480 | the y and this is just stacking them up with the integers so we don't have to call stack

01:02:57.440 | and so we're now going to have the image and label bits in our dictionary so if we

01:03:05.280 | create our data loader using that collation function grab one batch so we can go batch x

01:03:14.000 | dot shape is a 16 by 1 by 28 by 28 and our y if the batch here here it is so the thing to notice

01:03:22.960 | here is that we haven't done any transforms or anything or written our own data set class or

01:03:32.880 | anything we're actually putting all the work directly in the collation function so this is

01:03:37.360 | like a really nice way to skip all of the kind of abstractions of your framework if you want to

01:03:45.680 | is you can just do all of your work and collate functions so it's going to pass you

01:03:49.760 | each item so it's going to you're going to get the batch directly you just go through each

01:03:56.640 | item and so here we're saying okay grab the x key from that dictionary convert it to a tensor

01:04:08.400 | and then do that for everything in the batch and then stack them all together so this is yeah this

01:04:14.160 | is like can be quite a nice way to do things if you want to do things just very manually without

01:04:22.080 | having to think too much about you know a framework particularly if you're doing really custom stuff

01:04:26.000 | this can be quite helpful having said that um hugging face data sets absolutely lets you

01:04:32.080 | avoid doing everything in collate function which if we want to create really simple applications

01:04:37.760 | that's where we're going to eventually want to head so we can um do this using a transform

01:04:47.280 | instead and so the way we do that is we create a function you're going to take our batch

01:04:55.040 | it's going to replace the x in our batch with the tensor version of each of those pao images

01:05:01.760 | and i'm not even stacking them or anything and then we're going to return that batch

01:05:06.400 | and so uh hugging face data sets has something called with transform and that's going to take

01:05:12.240 | your data set your hugging face data set and it's going to apply this function to every element

01:05:20.080 | and it doesn't run at all now it's going to basically when when it behind the scenes when

01:05:26.000 | it calls done to get item it will call this function on the fly so in other words this could

01:05:32.320 | have data augmentation which can be random or whatever because it's going to be rerun every

01:05:37.040 | time you grab an item it's not cached or anything like that so other than that this data set has

01:05:43.600 | exactly the api same api as any other data set it has a length it has a done to get item so you can

01:05:48.800 | pass it to a data loader and so um um pytorch already knows how to collate dictionaries of

01:06:00.960 | tenses so we've got a dictionary of tenses now so that means we don't need a collate function anymore

01:06:06.800 | i can create a data loader from this without a collate function as you can see and so this is

01:06:14.160 | giving exactly the same thing as before but without having to create a custom collate function

01:06:19.120 | now even this is a bit more code than i want having to return this seems a bit silly

01:06:24.240 | but the reason i had to do this is because hugging face data sets expects the with transform

01:06:31.120 | function to return the the new version of the of the data so i wanted to be able to write it

01:06:40.080 | like this transform in place and just say the change i want to make and have it automatically

01:06:45.920 | return that so if i call if i create this function that's exactly the same as the previous one

01:06:52.240 | that doesn't have return how would i turn this into something which does return the result

01:06:58.720 | so here's an interesting trick we could take that function

01:07:05.840 | pass it to another function to create a new function which is the version of this in place

01:07:12.960 | function that returns the result and the way i do that is by creating a function called inplace

01:07:17.600 | it takes a function it returns a function the function it returns is one that calls

01:07:26.320 | my original function and then returns the result so this is the function this is a function

01:07:33.760 | generating function and it's modifying an in place function to become a function that returns

01:07:43.840 | the new version of that data and so this is a function this function is passed to this

01:07:52.000 | function which returns a function and here it is so here's the version that hugging face will be

01:07:56.880 | able to use so i can now pass that to with transform and it does exactly the same thing

01:08:03.200 | so this is very very common in python it's so common that this line of code can be entirely

01:08:12.320 | removed and replaced with this little token if you have a function and put at at the start

01:08:22.960 | you can then put that before a function and what it says is take this whole function

01:08:26.880 | pass it to this function and replace it with the result so this is exactly the same as the

01:08:36.960 | combination of this and this and when we do it this way this kind of little syntax sugar

01:08:43.680 | is called a decorator okay so there's nothing nothing magic about decorators it's literally

01:08:50.080 | identical to this oh i guess the only difference is we don't end up with this unnecessary intermediate

01:08:56.880 | underscore version but the result is exactly the same and therefore i can create a transformed

01:09:04.240 | data set by using this and there we go it's all working fine

01:09:15.760 | um yeah so i mean none of this is particularly um necessary but what we're doing is we're just

01:09:26.240 | kind of like seeing you know the pieces that we can we can put in place um to make this stuff as

01:09:35.840 | easy as possible and we don't have to think about things too much um all right now with all this

01:09:45.200 | we can basically make things pretty automatic um and the way we can make things pretty automatic

01:09:51.600 | is we're going to use a cool thing in python called item getter and item getter is a function

01:09:57.040 | that returns a function so hopefully you're getting used to this idea now

01:10:01.200 | this creates a function that gets the a and c

01:10:08.960 | items from a dictionary or something that looks like a dictionary so here's a dictionary it

01:10:17.040 | contains keys a b and c so this function will take a dictionary and return the a and c values

01:10:27.520 | and as you can see it has done exactly that um explain why this is useful in a moment um i just

01:10:36.080 | wanted to briefly mention what did i mean when i said something that looks like a dictionary i

01:10:40.720 | mean this is a dictionary okay that looks like a dictionary but python doesn't care about what type

01:10:47.760 | things actually are it only cares about what they look like and remember that when we call

01:10:54.160 | something with square brackets when we index into something behind the scenes it's just called

01:10:58.240 | calling dunder get item so we could create our own class and it's dunder get item gets the key

01:11:05.920 | and it's just going to manually return one if k equals a or two if k equals b or three otherwise

01:11:11.200 | and look that class also works just fine with an item getter um the reason this is interesting

01:11:19.520 | is because like a lot of people write python as if it's like c plus plus or java or something they

01:11:27.200 | write as if as if it's this kind of statically type thing um but i really wanted to point out

01:11:32.320 | that it's an extremely dynamic language and there's a lot more flexibility than you might

01:11:37.120 | have realized anyway that's a littler side um so what we can do is um

01:11:46.720 | think about a batch for example where we've got these two dictionaries

01:11:53.200 | okay so um pytorch comes with a default collation function called not surprisingly

01:12:01.760 | default collate so that's part of um pytorch and what default collate does with dictionaries

01:12:08.560 | is it simply takes the matching keys and then grabs their values and stacks them together

01:12:14.960 | and so that's why if i call default collate a is now one three b is now two four that's actually

01:12:22.000 | what happened before when we created this data loader is it used the default collation function

01:12:29.200 | which does that it also works on things that are tuples not dictionaries which is what most of you

01:12:34.560 | would have used before and what we can do therefore is we could create something called collate dict

01:12:40.240 | which is something which is going to take a um data set um and it's going to create a item getter

01:12:52.880 | function for the features in that data set which in this case is image and label so this is a

01:12:58.720 | function which will get the image and label items and so we're now going to return a function and

01:13:06.000 | that function is simply going to call our item getter on default collate and what this is going

01:13:12.400 | to do is it's going to take a dictionary and collate it into a tuple um just like we did up here

01:13:20.560 | so if we run that so we're now going to call data loader on our transform data set passing

01:13:28.400 | in and remember this is a function that returns a function so it's a collation function for this

01:13:34.800 | data set and there it is so now this looks a lot like what we had in our previous notebook this

01:13:41.040 | is not returning a dictionary but it's returning a tuple so this is um a really important idea

01:13:50.160 | for particularly for working with hugging face data sets is that they tend to do things

01:13:55.040 | with dictionaries and most other things in the pytorch world tend to work with tuples

01:14:01.600 | so you can just use this now to convert anything that takes that returns dictionaries into something

01:14:09.440 | that provides tuples by passing it as a collation function to your data loader so remember you know

01:14:16.160 | the thing you want to be doing this this week is is doing things like import pdb pdb dot set trace

01:14:23.200 | right put break points step through see exactly what's happening you know um not just here but

01:14:30.720 | also even more importantly doing it inside inside the inner most inner function um so then you can

01:14:41.440 | see um what did i do wrong there oh today let's set underscore trace

01:14:49.760 | um so then we can see exactly what's going on print out b

01:14:59.040 | list the code and i could step into it and look i'm now inside the default collate function

01:15:11.520 | which is inside pytorch and so i can now see exactly how that works

01:15:16.000 | there it all is so it's going to go through and this code is going to look very familiar because

01:15:23.440 | we've implemented all this ourselves except it's being careful that like it works for lots of

01:15:28.400 | different types of things dictionaries numpy arrays so on and so forth um

01:15:38.160 | so the first thing i wanted to do oh actually something i do want to mention here

01:15:42.080 | this is so useful we want to be able to use it in all of our notebooks

01:15:46.000 | so rather than copying and pasting this every time it would be really nice to create a python module

01:15:53.760 | that contains this definition so we've created um a library called nbdev um it's really a whole

01:16:02.400 | system called nbdev which does exactly that it creates um modules you can use from your notebooks

01:16:08.480 | and the way you do it is you use this special uh thing we call comment directives which is

01:16:15.600 | hash pipe and then hash pipe export so you put this at the top of a cell and it says do something

01:16:22.400 | special for this cell what this does is it says put this into a python module for me please

01:16:26.480 | export it to a python module what python module is it going to put it in well if you go all the

01:16:33.120 | way to the top you tell it what default export module to create so it's going to create a module

01:16:40.000 | called datasets so what i do at the very end of this module is i've got this line that says

01:16:49.440 | import nbdev nbdev.nbdev export and what that's going to do for me is create

01:17:02.960 | a library a python library it's going to have a datasets.py in it and we'll see everything that

01:17:11.680 | we exported here it is collect dict will appear in this for me and so what that means is now in the

01:17:18.720 | future in my notebooks i will be able to import collect dict from the from my datasets now you

01:17:26.400 | might wonder well how does it know to call it mini AI what's mini AI well in nbdev you create a

01:17:32.880 | settings.ini file where you say what the name of your library is so we're going to be using this

01:17:39.840 | quite a lot now because we're getting to the point where we're starting to implement stuff

01:17:46.880 | that didn't exist before so previously most of the stuff or pretty much all the stuff we've created

01:17:52.880 | i've said like oh that already exists in pytorch so we don't need it we just use pytorches

01:17:59.040 | but we're now getting to a point where we're starting to create stuff that doesn't exist

01:18:03.760 | anywhere we've created it ourselves and so therefore we want to be able to use it again

01:18:10.320 | so during the rest of this course we're going to be building together a library called mini AI

01:18:17.600 | that's going to be our framework our version of something like fastai maybe it's something like

01:18:23.840 | what fastai3 will end up being we'll see so that's what's going on here to so we're going to be using

01:18:35.200 | once i start using mini AI i'll show you exactly how to install this but that's what this export

01:18:39.840 | is and so you might have noticed i also had an export on this in place thing and i also had it

01:18:48.160 | on my necessary import statements

01:18:51.120 | okay um we want to be able to see what this data set looks like so i thought it now's a good time

01:19:00.400 | to talk a bit about plotting because knowing how to visualize things well is really important

01:19:06.240 | and again the idea is we we're not allowed to use fastai's plotting library so we've got to learn

01:19:12.960 | how to do everything ourselves so here's the basic way to plot some an image using matplotlib

01:19:20.960 | so we can create a batch grab the x part of it um grab the very first thing in that

01:19:30.320 | and i am show means show an image and here it is there is our anchor boot um

01:19:38.560 | so let's start to think about what stuff we might create which we can export to make this a bit

01:19:45.360 | easier so let's create something called show image which basically does

01:19:54.880 | iamshow but we're going to do a few extra things we will make sure that it's in the correct

01:20:04.800 | access order we will make sure it's not on cuda that's on the cpu

01:20:10.400 | if it's not a numpy array we'll convert it to a numpy array

01:20:15.120 | we'll be able to pass in an existing access which we'll talk about soon if we want to

01:20:24.240 | we'll be able to set a title if we want to and also this thing here removes all this ugly 0 5

01:20:31.200 | blah blah blah axis because we're showing an image we don't want any of that

01:20:34.320 | so if we try that you can see there we go we've also been able to say what size we want the image

01:20:43.200 | there it all is now here's something interesting when i say help

01:20:50.640 | the help shows the things that i implemented

01:20:54.720 | but it also shows a whole lot more things how did that magic thing happen and you can see they work

01:21:02.960 | because here's fixed size which i didn't add oh sorry i did add well okay that's a bad example

01:21:09.200 | anyway these other ones all work as well um so how did that happen well the trick is

01:21:16.080 | that i added star star quags here and star star quags says grab um you can you know pass

01:21:23.760 | as many or any other arguments as you like that aren't listed and they'll all be put into a

01:21:29.200 | dictionary with this name and then when i call i am show i pass that entire dictionary star star

01:21:38.640 | here means as separate arguments and that's how come it works and then how come does it know how

01:21:44.800 | come it knows what help to provide the reason why is that fast core has a special thing called

01:21:51.200 | delegates which is a decorator so now you know what a decorator is and you tell it what is it

01:21:59.120 | that you're going to be passing quags to i'm going to be passing it to i am show and then it

01:22:04.960 | automatically creates the documentation correctly to show you what quags can do so this is a really

01:22:14.160 | helpful way of being able to kind of extend existing functions like i am show and still

01:22:21.600 | get all of their functionality and all of their documentation and add your own so delegates is

01:22:25.920 | one of the most useful things we have in fast core in my opinion so we're going to export that

01:22:31.600 | so now we can use show image anytime you want which is nice um something that's really helpful

01:22:38.400 | to know about matplotlib is how to create subplots so for example what happens if you want to plot

01:22:46.240 | two images next to each other so in matplotlib subplots creates multiple plots and you pass it

01:22:54.720 | number of rows and the number of columns so this here has as you see one row and two columns

01:23:03.920 | and it returns axes now what it calls axes is what it refers to as the individual plots

01:23:11.120 | so if we now call show image on the first image passing in axes zero it's going to get that here

01:23:20.640 | right then we call ax.am show that means put the image on this subplot they don't call it

01:23:29.040 | a subplot unfortunately they call it an axis put it on this axis so that's how come we're able to

01:23:34.320 | show an image one image on the first axis and then show a second image on the second axis by

01:23:40.800 | which we mean subplot and there's our two images so that's pretty handy um so i've decided to add

01:23:50.320 | some additional functionality to subplots so therefore i use delegates on subplots because

01:23:55.040 | i'm adding functionality to it and i'm going to be taking quags and passing it through to subplots

01:24:02.320 | and the main thing i wanted to do is to automatically create an appropriate figure size

01:24:07.360 | by just finding out you tell us what image size you want and i also want to be able to add a

01:24:14.320 | title for the whole set of subplots and so there it is and then i also want to show you that in

01:24:25.280 | it'll automatically if we want to create documentation for us as well for our library

01:24:29.680 | and here is the documentation so as you can see here for the stuff i've added it's telling me

01:24:38.160 | exactly what each of these parameters are their type their defaults and information about each one

01:24:45.600 | and that information is automatically coming from these little comments is we call these documents

01:24:52.240 | this is all automatic stuff done by fast core and nbdev and so you might have noticed when you look

01:24:59.600 | at fastai library documentation it always has all this info that's that's that's why you don't

01:25:05.520 | actually have to call showdoc it's automatically added to your documentation for you i'm just

01:25:10.400 | showing you here what it's going to end up looking like and you can see that it's worked

01:25:14.480 | with delegates it's put all the extra stuff from delegates in here as well

01:25:18.320 | and here they all listed out here as well so anyway subplots so let's create a three by three

01:25:27.200 | set of plots and we'll grab the first two images and so now we can go through each of the subplots

01:25:35.520 | now it returns it as a three by three basically a list of three lists of three items so i flatten

01:25:42.640 | them all out into a single list so we'll go through each of those subplots and go through each image

01:25:49.520 | and show each image on each axis and so here's a quick way to quickly show them all as you can see

01:25:57.440 | it's a little bit ugly here so we'll keep on adding more useful plotting functionality

01:26:04.480 | so here's something that again it calls our subplots delegates to it

01:26:08.960 | but we're going to be able to say for example how many subplots do we want

01:26:15.440 | and it'll automatically calculate the rows and the columns

01:26:18.080 | and it's going to remove the axes for any ones that we're not actually using

01:26:25.360 | and so here we got that so that's what get grid's going to let us do so we're getting quite close

01:26:33.360 | and so finally why don't we just create a single thing called show images that's going to get our

01:26:40.800 | grid and it's going to go through our images optionally with a list of titles and show each one

01:26:47.920 | and we can use that here you can see

01:26:54.720 | we have successfully got all of our labeled images

01:27:01.520 | and so we yeah I think all this stuff for the plotting is pretty useful so as you might have

01:27:14.560 | noticed they were all exported so in our datasets.py we've got our get grid we've

01:27:19.840 | got our subplots we've got our show images so that's going to make life easier for us now

01:27:24.800 | since we have to create everything from scratch we have created all of those things

01:27:29.760 | so as I mentioned at the very end we have this one line of code to run

01:27:38.400 | and so just to show you if I remove

01:27:43.840 | miniai.datasets.minibidai/datasets.py so it's all empty

01:27:53.120 | and then I run this line of code and now it's back as you can see and it tells you it's auto generated

01:28:01.200 | all right so

01:28:06.160 | we are nearly at the point where we can build our learner and once we've built our learner

01:28:14.880 | we're going to be able to really dive deep into training and studying models so we've kind of got

01:28:21.120 | nearly got all of our infrastructure in place before we do there's some pieces of

01:28:29.520 | python which not everybody knows and I want to kind of talk about and kind of computer

01:28:36.640 | science concepts I want to talk about so that's what o6 foundations is about

01:28:40.560 | so this whole section is just going to tell it just going to talk about some stuff in python

01:28:49.120 | that you may not have come across before

01:28:50.880 | or you know maybe it's a review for some of you as well and it's all stuff we're going to be using

01:28:58.640 | basically in the next notebook so that's why I wanted to to cover it so we're going to be

01:29:04.560 | creating a learner class so a learner class is going to be a very general purpose training loop

01:29:11.920 | which we can get to to do anything that we wanted to do and we're going to be creating things called

01:29:18.000 | callbacks to make that happen and so therefore we're going to just spend a few moments talking

01:29:23.120 | about what are callbacks how are they used in in computer science how are they implemented

01:29:29.360 | look at some examples they come up a lot perhaps the most common place that you see callbacks in

01:29:36.720 | software is for GUI events so for events from some graphical user interface so the main graphical

01:29:44.800 | user interface library in jupyter notebooks is called ipy widgets and we can create a widget

01:29:53.520 | like a button like so and when we display it it shows me a button and at the moment it doesn't

01:30:00.640 | do anything if I click on it what we can do though is we can add an onclick callback to it

01:30:14.560 | which is something which is a fun we're going to pass it a function

01:30:17.840 | which is called when you click it so let's define that function so I'm going to say

01:30:25.440 | w.onclickf is going to assign the f function to the onclick callback now if I click this

01:30:33.680 | there you go it's doing it now what does that mean well a callback is simply a call of all

01:30:43.920 | that you've provided so remember a callable is a more general version of a function so in this

01:30:48.400 | place it is a function that you've provided that will be called back to when something happens

01:30:54.560 | so in this case there's something that's happening is that they're clicking a button

01:30:58.080 | so this is how we are defining and using a callback as a GUI event so basically everything

01:31:07.200 | in ipy widgets if you want to create your own graphical user interfaces for jupyter

01:31:12.880 | you can do it with ipy widgets and by using these callbacks so these particular kinds of callbacks

01:31:21.280 | are called events but it's just a callback all right so that's somebody else's callback

01:31:28.800 | let's create our own callback so let's say we've got some very slow calculation

01:31:38.080 | and so it takes a very long time to add up the numbers 0 to 5 squared because we sleep for a

01:31:44.720 | second after each one so let's run our slow calculation still running oh how's it going

01:31:50.960 | come on finish our calculation there we go the answer is 30 now for a slow calculation like that

01:31:56.400 | such as training a model it's a slow calculation it would be nice to do things like i don't know

01:32:02.640 | print you know print out the loss from time to time or show a progress bar or whatever so

01:32:08.480 | generally for those kinds of things we would like to define a callback that is called at the end of

01:32:15.920 | each epoch or batch or every few seconds or something like that so here's how we can modify

01:32:23.120 | our slow calculation routine such that you can optionally pass at a callback and so all of this

01:32:29.360 | code's the same except we've added this one line of code that says if there's a callback then

01:32:35.680 | call it and pass in what what we're where we're up to so then we could create our callback function

01:32:44.400 | so this is just like we created a full callback function f let's create a show progress callback

01:32:48.960 | function that's going to tell us how far we've got so now if we call show slow calculation passing

01:32:56.880 | in our callback you can see it's going to call this function at the end of each step so here we've

01:33:06.960 | created our own callback so there's nothing special about a callback like it doesn't require its own

01:33:14.160 | like syntax it's not a new concept it's just an idea really which is the idea of passing in a

01:33:22.160 | function which some other function will call at particular times such as at the end of a step

01:33:29.040 | or such as when you click a button so that's what we mean by callbacks

01:33:33.840 | we don't have to define the function ahead of time we could define the function

01:33:42.160 | at the same time that we call the slow calculation by using lambda so as we've discussed before

01:33:50.720 | lambda just defines a function but it doesn't give it a name so here's a function it takes

01:33:55.520 | one parameter and prints out exactly the same thing as before so here's the same way as doing it

01:33:59.760 | but using a lambda

01:34:01.360 | um we could make it more sophisticated now and rather than always saying also we finished epoch

01:34:11.360 | whatever we could have let you pass in an exclamation and we print that out and so in

01:34:18.640 | this case we could now have our lambda call that function

01:34:23.200 | and so one of the things that we can do now is to again we can create a function that returns a

01:34:33.040 | function and so we could create a make show progress function where you pass in the exclamation

01:34:39.920 | we could then create and there's no need to give it a name actually it's just return it directly

01:34:47.680 | we can return a function that calls that exclamation so here we are passing in nice

01:35:00.080 | and that's exactly the same as doing something like what we've done before

01:35:07.520 | we could say instead of using a lambda we can create an inner function like this

01:35:17.280 | so here is now a function that returns a function this does exactly the same thing

01:35:20.960 | okay so one way with a lambda one way out of lambda

01:35:26.800 | and one of the reasons I wanted to show you that is so I can

01:35:30.800 | I've got so many here's is that we can do exactly the same thing using partial so with partial

01:35:45.680 | it's going to do exactly the same thing as this kind of make show progress it's going to call

01:35:51.600 | show progress and pass okay I guess so this is an again an example of a function returning a

01:35:57.360 | function and so this is a function that calls show progress passing in this as the first parameter

01:36:03.600 | and again it does exactly the same thing

01:36:11.760 | okay so we go we tend to use partial a lot so that's certainly something worth spending time

01:36:19.200 | practicing now as we've discussed python doesn't care about types in particular

01:36:29.440 | and there's nothing about any of this that requires cb to be a function

01:36:36.480 | it just has to be it just has to be a callable a callable is something that that you can that you

01:36:42.720 | can call and so as we've discussed another way of creating a callable is defining dunder call

01:36:48.320 | so here's a class and this is going to work exactly the same as our make show progress

01:36:54.960 | thing but now as a class so there's a dunder in it which draws the exclamation and a dunder call

01:37:01.520 | that prints and so now we're creating a object which is callable and does exactly the same thing

01:37:10.560 | okay so these are all like um fundamental ideas that I want you to get really comfortable with

01:37:21.360 | the idea of dunder call dunder things in general partials classes because they come up all the time

01:37:31.840 | um in pytorch code and um and in the code we'll be writing and in fact pretty much all frameworks

01:37:39.760 | so it's really important to feel comfortable with them and remember you don't have to rely on

01:37:45.520 | the resources we're providing you know if there's certain things here that are very new to you

01:37:51.520 | you know google around for some tutorials or ask for help on the forums finding things and so forth

01:37:58.240 | and then I'm just going to briefly recover something I've mentioned before which is star

01:38:03.040 | args and star star quags because again they come up a lot um I just wanted to show you how they

01:38:08.400 | work so if we create a function that has star args and star star quags nothing else and I'm just

01:38:16.160 | going to have this function just print them now I'm going to call the function I'm going to pass

01:38:22.160 | three I'm going to pass a and I'm going to pass thing one equals hello now these are passed what

01:38:29.440 | we would say by position we haven't got a blah equals they're just stuck there things that are

01:38:35.440 | passed by position are placed in star args if you have one it doesn't have to be called args you can

01:38:41.360 | call this anything you like but in the star bit and so you can see here that args is a tuple

01:38:49.200 | containing the positionally passed arguments and then quags is a dictionary containing

01:38:57.120 | the named arguments so that is all that star args and star star quags too and as I say there's

01:39:04.160 | nothing special about these names I'll call this a I'll call this b

01:39:11.760 | okay and it'll do exactly the same thing okay so um this comes up a lot um and so it's it's

01:39:23.120 | important to remember this is literally all that they're doing and then um on the other hand

01:39:33.200 | let's say we had a function which takes a couple of okay let's try that print

01:39:44.160 | a actually just print them directly a b c okay we can also rather than just using them as parameters

01:39:58.400 | we can also use them when calling something so let's say I create something called args again

01:40:03.920 | doesn't have to be called args called which contains one comma two and I create something

01:40:10.560 | called quags that contains a dictionary containing c colon three I can then call g and I can pass in

01:40:23.120 | star args comma star star quags and that's going to take this one two and pass them as individual

01:40:32.320 | arguments for positionally and it's going to take the c three and pass that as a named argument c

01:40:38.800 | equals three and there it is okay so they're kind of two linked but different ways that use star

01:40:48.640 | and star star um okay now here's a slightly different way of doing callbacks which I really

01:40:59.280 | like in this case I've now passing in a callback that's not callable but instead it's going to

01:41:07.360 | have a method called before calc and another method called after calc and I'm so now my callback

01:41:17.760 | is going to be a class containing a before calc and an after calc method and so if I run that

01:41:28.480 | you can see it's that there it goes okay and so this is printing before and after every step

01:41:39.200 | by call calling before calc and after calc so callback actually doesn't have to be a callable

01:41:44.160 | doesn't have to be a function a callback could be something that contains methods

01:41:48.240 | so we could have a version of this which actually as you can see here it's going to pass in to after

01:42:00.560 | calc both the epoch number and the value it's up to but by using star args and star star quags I

01:42:07.520 | can just safely ignore them if I don't want them right so it's just going to chew them up and not

01:42:13.120 | complain if I didn't have those here it won't work see because it got passed in vowel equals

01:42:24.640 | and there's nothing here looking for vowel equals it doesn't like that so this is one good use of

01:42:31.760 | star args and star star quags is to etap arguments you don't want um or we could use the argument so

01:42:39.920 | let's actually use epoch and val and print them out and there it is

01:42:49.280 | so this is a more sophisticated callback that's giving us status as we go

01:42:56.320 | um

01:43:06.800 | I'm going to skip this bit because we don't really care about that

01:43:12.960 | okay so finally let's just review this idea of dunder which we've mentioned before

01:43:20.640 | but just to to really nail this home anything that looks like this underscore underscore

01:43:26.560 | something underscore underscore something is special and basically it could be that python

01:43:32.240 | has to find that special thing or pytorch has to find that special thing or numpy has to find

01:43:36.960 | that special thing but they're special these are called dunder methods um and some of them

01:43:45.520 | are defined as part of the python data model and so if you go to the python documentation

01:43:52.640 | it'll tell you about these various different here's repra which we used earlier

01:43:59.040 | here's init that we used earlier so they're all here pytorch has some of its own numpy has some

01:44:05.600 | of its own so for example if python sees plus what it actually does is it calls dunder add

01:44:13.760 | so if we want to create something that's not very good at adding things

01:44:19.040 | it actually already also always adds 0.01 to it then i can say sloppy adder one plus sloppy adder

01:44:29.280 | two equals 3.01 so plus here is actually calling dunder add so if you're not familiar with these

01:44:40.320 | click on this data model link and read about these specific one two three four five six seven eight

01:44:45.760 | nine ten eleven methods because we'll be using all of these in the course so

01:44:51.840 | i'll try to revise them when we can but i'm generally going to assume that you know these

01:44:56.960 | a particularly interesting one is getatra we've seen setatra already getatra is just the opposite

01:45:08.880 | take a look at this here's a class it just contains two attributes a and b that are set to

01:45:15.040 | one and two so i'll create that an object of that class a dot b equals two because i set

01:45:20.240 | b to two okay now when you say a dot b that's just syntax sugar basically in python what it's

01:45:28.720 | actually calling behind the scenes is getatra it calls getatra on the object and so this one here

01:45:37.840 | is the same as getatra a comma b which hopefully oh actually that that'll be um yeah so it calls

01:45:48.640 | getatra a comma b and this can kind of be fun because you could call getatra a comma and then

01:45:54.160 | either b or a randomly how's that for crazy so if i run this two one one one two as you can see it's

01:46:04.080 | random um so yeah python's such a dynamic language you can even set it up so you literally don't know

01:46:12.080 | what attributes going to be called now getatra behind the scenes is actually calling something

01:46:21.280 | called done to getatra and by default it'll use the version in the object base class so here's

01:46:28.160 | something just like a it's got i've got a and b defined but i've also got done to getatra defined

01:46:34.160 | and so done to getatra it's only called for stuff that hasn't been defined yet

01:46:38.640 | and it'll pass in the key or the the the name of the attribute so generally speaking if the first

01:46:47.600 | character is an underscore it's going to be private or special so i'm just going to raise

01:46:52.880 | an attribute error error otherwise i'm going to steal it and return hello from k so if i go

01:47:02.160 | b dot a that's defined so it gives me one if i go b dot foo that's not defined so it calls getatra

01:47:10.880 | and i get back hello from foo and so um uh this gets used a lot in both fastai code and also

01:47:17.040 | hugging face code um to you know often make it more convenient to access things um so that's

01:47:26.640 | yeah that's how the getatra function and the done to getatra method work

01:47:32.480 | um okay so i went over that pretty quickly um since i know for quite a few folks this will be

01:47:40.080 | all review but i know for folks who haven't seen any of this this is a lot to cover so i'm hoping

01:47:45.680 | that you'll kind of go back over this revise it slowly experiment with it and look up some

01:47:50.480 | additional resources and ask on the forum and stuff for anything that's not clear remember um

01:47:56.560 | everybody has parts of the course that's really easy for them and parts of the course that are

01:48:04.400 | completely unfamiliar for them and so if this particular part of the course is completely

01:48:08.320 | unfamiliar to you it's not because this is harder um or going to be more difficult or whatever

01:48:15.760 | it's just so happens that this is a bit that you're less familiar with or maybe the stuff about

01:48:22.000 | calculus in the last lesson was a bit that you're less familiar with um there isn't really anything

01:48:27.280 | particularly in the course that's more difficult than other parts it's just that you know based on

01:48:33.120 | whether you happen to have that background and so yeah if you spend a few hours studying and

01:48:39.280 | practicing you know you'll be able to pick up these things and um yeah so don't stress if there

01:48:46.080 | are things that you don't get right away just take the time and if you yeah if you do get lost please

01:48:52.000 | ask because people are very keen to help if you've tried asking on the forum hopefully you've noticed

01:48:57.440 | that people are really keen to help all right so um i think this has been a pretty successful lesson

01:49:05.040 | we've we've got to a point where we've got a pretty nicely optimized training loop we

01:49:09.280 | understand exactly what data loaders and data sets do we've got an optimizer we've been playing with

01:49:15.600 | hugging face data sets and we've got those working really smoothly um so we really feel like we're

01:49:20.720 | in a pretty good position to to write our generic learner training loop and then we can start

01:49:26.880 | building and experimenting with lots of models so look forward to seeing you next time to doing

01:49:32.080 | that together. Okay, bye!

Lesson 14: Deep Learning Foundations to Stable Diffusion

Chapters