Back to Index

Lesson 14: Deep Learning Foundations to Stable Diffusion


Chapters

0:0 Introduction
0:30 Review of code and math from Lesson 13
7:40 f-Strings
10:0 Re-running the Notebook - Run All Above
11:0 Starting code refactoring: torch.nn
12:48 Generator Object
13:26 Class MLP: Inheriting from nn.Module
17:3 Checking the more flexible refactored MLP
17:53 Creating our own nn.Module
21:38 Using PyTorch’s nn.Module
23:51 Using PyTorch’s nn.ModuleList
24:59 reduce()
26:49 PyThorch’s nn.Sequential
27:35 Optimizer
29:37 PyTorch’ optim and get_model()
30:4 Dataset
33:29 DataLoader
35:53 Random sampling, batch size, collation
40:59 What does collate do?
45:17 fastcore’s store_attr()
46:7 Multiprocessing DataLoader
50:36 PyTorch’s Multiprocessing DataLoader
53:55 Validation set
56:11 Hugging Face Datasets, Fashion-MNIST
61:55 collate function
64:41 transforms function
66:47 decorators
69:42 itemgetter
71:55 PyTorch’s default_collate
75:38 Creating a Python library with nbdev
78:53 Plotting images
81:14 kwargs and fastcore’s delegates
88:3 Computer Science concepts with Python: callbacks
93:40 Lambdas and partials
96:26 Callbacks as callable classes
97:58 Multiple callback funcs; *args and **kwargs
103:15 __dunder__ thingies
107:33 Wrap-up

Transcript

Okay. Hi everybody. And welcome to lesson 14. The numbers are getting up pretty high now, huh? We had a lesson last time talking about calculus and how we implement the chain rule in neural network training in an efficient way called backpropagation. I just wanted to point out that one excellent student, Kaushik Sinha, has produced a very nice explanation of the code that we looked at last time, and I've linked to it.

So it's got the math and then the code. The code's slightly different to what I had, but it's basically the same things and minor changes. And it might be helpful to kind of link between the math and the code to see what's going on. So you'll find that in the lesson 13 resources.

But I thought I'd just quickly try to explain it as well. So maybe I could try to copy this and just explain what's going on here. With this code. So the basic idea is that we have a neural network that is calculating, well, a neural network and a loss function that together the calculator loss.

So let's imagine that, well, let's just call that the loss function, we'll call it L. And the loss function is being applied to the output of the neural network. So the neural network function, we'll call n and that takes two things, a bunch of weights and a bunch of inputs.

The loss function also requires the targets, but I'm just going to ignore that for now because it's not really part of what we actually care about. And what we're interested in knowing is if we want to be able to update the weights. Let's say this is just a single layer things, keep it simple.

If we want to be able to update the weights, we need to know how does the loss change if we change the weights, if we change one weight at a time, if you like. So how would we calculate that? Well, what we could do is we could rewrite our loss function by saying, well, let's call capital N the result of the neural network applied to the weights and the inputs.

And that way we can now rewrite the loss function to say L equals, big L equals little L, the loss function applied to the output of the neural network. And so maybe you can see where this is going. We can now say, okay, the derivative of the loss with respect to the weights is going to be equal to the derivative of the loss with respect to the outputs of that neural network layer times, this is the chain rule, the derivative of the outputs of that neural network layer.

I'm going to get my notation consistent since these are not scalar with respect to the weights. Right, so you can see we can get rid of those and we end up with the change in loss with respect to the weights. And so we can just say this is a chain rule.

That's what the chain rule is. So the change in the loss with respect to the output of the neural network. Well, we did the forward pass here and then we took here, this here is where we calculated the derivative of the loss with respect to the output of the neural network, which came out from here and ended up in diff.

So there it is there. So out.g contains this derivative. So then to calculate, let's actually do one more. We could also say the change in the loss with respect to the inputs, we can do the same thing. With the chain rule times. And so this time we have the inputs.

So here you can see that is this line of code. So that is the change in the loss with respect to the inputs. That's what input.g means. And it's equal to the change in the loss with respect to the output. So that's what out.g means. Times, it's actually matrix times because we're doing matrix calculus, times the this derivative and since this is a linear layer we were looking at this derivative is simply the weights themselves.

And then we have exactly the same thing for w.g, which is the change in the loss, the derivative of the loss with respect to the weights. And so again you've got the same thing. You've got your out.g and remember we actually showed how we can simplify this into also a matrix product with a transpose as well.

So that's how what's happening in our code is mapping to the math. So hopefully that's useful but as I say do check out this really nice resource which has a lot more detail if you're interested in digging deeper. The other thing I'd say is some people have mentioned that they actually didn't study this at high school which is fine.

We've provided resources on the forum for recommending how to learn the basics of derivatives and the chain rule. And so in particular I would recommend 3Blue1Brown's Essence of Calculus series and also Khan Academy. It's not particularly difficult to learn it'll only take you a few hours and then you can this will make a lot more sense.

Or if you did it at high school but you've forgotten it same deal. So don't worry if you found this difficult because you had forgotten the or had never learned the you know basic derivative and chain rule stuff. That's something that you can pick up now and I would recommend doing so.

Okay so what we then did last time which is actually pretty exciting is we got to a point where we had successfully created a training loop which did these four steps. So and the nice thing is that every single thing here is something that we have implemented from scratch.

Now we didn't always use our implemented from scratch versions. There's no particular reason to. When we've re-implemented something that already exists let's use the version that exists. But every single thing here I guess not argmax but that's trivially easy to implement. Every single thing here we have implemented ourselves.

And we successfully trained a an MNIST model to 90 percent 96 percent accurately recognize handwritten digits. So I think that's super neat. It's it's this is I mean this this is not a great metric. It's only looking at the training session in particular it's only looking at one batch of the training set.

Since last time I've just refactored a little bit I've pulled out this report function which is now just running at the end of each epoch. And it's just printing out the loss and the accuracy. Just something I wanted to mention here is hopefully you've seen fstrings before. They're a really helpful part of Python that lets you pop a variable or an expression inside curly braces in a string and it it'll evaluate it.

You might not have seen this colon thing. This is called a format specifier. And with a format specifier you can change how things are printed in an fstring. So this is how I'm printing it to do decimal places. This says a two decimal places floating point number called loss printed out here followed by a comma.

So I'm not going to show you how to use those other than to say yeah Python fstrings and format specifiers are really helpful. And so if you haven't used them before do go look them up tutorial of the documentation because they're definitely something that you'll probably find useful to know about.

Okay so let's just rerun all those lines of code. If you're wondering how I just reran all the cells above where I was there's a cell here there's run all above. And it's so helpful that I always make sure there's a keyboard shortcut for that. So you can see here I've added a keyboard shortcut QA.

So if I type QA it runs all cells above. If I type QB it runs all cells below. And so yeah stuff that you do a lot make sure you've got keyboard shortcuts for them. You don't want to be fiddling around moving around your mouse everywhere. You want it to be as easy as thinking.

So this is really exciting. We've successfully built and trained a neural network model from scratch and it works okay. It's a bit clunky there's a lot of code there's features we're missing. So let's start refactoring it. And so refactoring is all about making it so we have to write less code to do the same work.

And so we're now going to I'm going to show you something that's part of PyTorch and then I'm going to show you how to build it. And then you'll see why this is really useful. So PyTorch has a sub module called nn torch.nn and in there there's something called the module class.

Now we can we don't normally use it this way but I just want to show you how it works. We can create an instance of it in the usual way where we create instances of classes. And then we can assign things to attributes of that module. So for example it's assign a linear layer to it.

And if we now print out that you'll see it says oh this is a module containing something called foo which is a linear layer. But here's something quite tricky. This module we can say show me all of the named children of that module. And it says oh there's one called foo and it's a linear layer.

And we can say oh show me all of the parameters of this module. And it says oh okay sure there's two of them. There's this four by three tensor that's the weights. And there's this four long vector that's the biases. And so somehow just by creating this module and assigning this to it it's automatically tracked what's in this module and what are its parameters.

That's pretty neat. So we're going to see both how and why it does that. I'm just going to point out by the way why did I add list here. If I just said m1.name children it just prints out generator object which is not very helpful. And that's because this is a kind of iterator called a generator.

And it's something which is going to only produce the contents of this when I actually do something with it such as list them out. So just popping a list around a generator is one way to like run the generator and get its output. So that's a little trick when you want to look inside a generator.

Okay so now as I said we don't normally use it this way. What we normally do is we create our own class. So for example we create our own multi-layer perceptron and we inherit it. We inherit from nn.module. And so then in done to enter this is the thing that constructs an object of the class.

This is the special magic method that does that. We'll say okay well how many inputs are there to this multi-layer perceptron. How many hidden activations. And how many output activations are there. So just be one hidden layer. And then here we can do just like we did up here where we assigned things as attributes.

We can do that in this constructor. So we'll create an l1 attribute which is a linear layer from number in to number hidden. L2 is a linear layer from number hidden to number out. And we'll also create a value. And so when we call that module we can take the input that we get and run the linear layer and then run the value and then run the l2.

And so I can create one of these as you see. And I can have a look and see like oh here's the attribute l1. And there it is like I had. And I can say print out the model and the model knows all the stuff that's in it. And I can go through each of the named children and print out the name and the layer.

Now of course if you remember although you can use dundercall we actually showed how we can refactor things using forward such that it would automatically kind of do the things necessary to make all the you know automatic gradient stuff work correctly. And so in practice we're actually not going to do dundercall we would do forward.

So this is an example of creating a custom PyTorch module. And the key thing to recognize is that it knows what are all the attributes you added to it. And it also knows what are all the parameters. So if I go through the parameters and print out their shapes you can see I've got my linear layer's weights.

First linear layer sorry second linear layer. My first linear layer's weights. My first linear layer's biases. Second linear layer's weights. Second linear layer's biases. And this 50 is because we set nh the number of hidden to 50. So why is that interesting? Well because now I don't have to write all this anymore going through layers and having to make sure that they've all been put into a list.

We've just been able to add them as attributes and they're automatically going to appear as parameters. So we can just say go through each parameter and update it based on the gradient and the learning rate. And furthermore you can actually just go model.zero grad and it'll zero out all of the gradients.

So that's really made our code quite a lot nicer and quite a lot more flexible which is cool. So let's check that this still works. There we go. So just to clarify with if I called report on this before I ran it as you would expect the accuracy is about 8% or about 10% bit less and the loss is pretty high.

And so after I run this fit this model the accuracy goes up and the loss goes down. So basically it's all of this exactly the same as before. The only thing I've changed are these two lines of code. So that's a really useful refactoring. So how on earth did this happen?

How did it know what the parameters and layers are automatically? It used a trick called dunder setatra and we're going to create our own nn.module now. So if there was no such thing as nn.module here's how we'd build it. And so let's actually build it and also add some things to it.

So in dunder init we would have to create a dictionary for our named children. This is going to contain a list a dictionary of all of the layers. Okay and then just like before we'll create a couple of linear layers right and then what we're going to do is we're going to define this special magic thing that python has called dunder setatra and this is called automatically by python if you have it every time you set an attribute such as here or here and it's going to be past the name of the attribute the key and the value is the actual thing on the right hand side of the equal sign.

Now generally speaking things that start with an underscore we use for either for private stuff so we check that it doesn't start with an underscore and if it doesn't start with an underscore setatra will put this value into the module's dictionary with this key and then call pythons the normal python setatra to make sure it just actually does the attribute setting.

So super is how you call whatever is in the the super class the base class. So another useful thing to know about is how do we how does how does it do this nifty thing where you can just type the name and it kind of lists out all this information about it.

That's a special thing called dunder repra. So here dunder repra will just have it return a stringified version of the modules dictionary and then here we've got parameters. How did parameters work? So how did this thing work? Well we can go through each of those modules go through each value so the values of the modules is all the actual layers and then go through each of the parameters in each module and yield p so that's going to that's going to create an iterator if you remember when we looked at iterators for all the parameters.

So let's try it so we can create one of these modules and if we just like before loop through its parameters there they are. Now I'll just mention something that's optional kind of like advanced python that a lot of people don't know about which is there's no need to loop through a list or a generator or I guess say loop through an iterator and yield there's actually a shortcut which is you can just say yield from and then give it the iterator and so with that we can get this all down to one line of code and it will do exactly the same thing.

So that's basically saying yield one at a time everything in here that's what yield from does. So there's a cool little advanced python thing totally optional but if you're interested I think it can be kind of neat. So we've now learned how to create our own implementation of nn.module and therefore we are now allowed to use PyTorch's nn.module so that's good news.

So how would we do using the PyTorch nn.module how would we create the model that we started with which is where we had this self.layers because we want to somehow register all of these all at once that's not going to happen based on the code we just wrote. So to do that let's have a look we can so let's make a list of the layers we want and so we'll create again a subclass of nn.module make sure you call the super classes in it first and we'll just store the list of layers and then to tell PyTorch about all those layers we basically have to loop through them and call add-module and say what the name of the module is and what the module is and again you probably should have used forward here in the first place and you can see this is now done exactly the same thing okay so if you've used a sequential model before you'll see or you can see that we're on the path to creating a sequential model.

Okay so ganache has asked an interesting question which is what on earth is super calling because we actually in fact we don't even need the parentheses here we actually don't have a base class that's because if you don't put any parentheses or if you put empty parentheses it's actually a shortcut for writing that and so python has stuff in object which does you know all the normal objecty things like storing your attributes so that you can get them back later so that's what's happening there.

Okay so this is a little bit awkward is to have to store the list and then enumerate and call add-module so now that we've implemented that from scratch we can use PyTorch's version which is they've just got something called module list that just does that for you okay so if you use module list and pass it a list of layers it will just go ahead and register them all those modules for you so here's something called sequential model so this is just like an n not sequential now so if i create it passing in the layers there you go you can see there's my model containing my module list with my layers and so i don't know why i never used forward for these things it's silly um i guess it doesn't matter terribly in this stage but anyhow okay so call fit and there we go okay so um so in forward here i just go through each layer and i set the result of that equal to calling that layer on the previous result and then pass and return it at the end now there's a little um another way of doing this which i think is kind of fun it's not like shorter or anything at this stage i just wanted to show an example of something that you see quite a lot in machine learning code which is the use of reduce this implementation here is exactly the same as this thing here so let me explain how it works what reduce does so reduce is a very common kind of like fundamental computer science concept reductions this is something that does a reduction and what a reduction is is it something that says start with the third parameter some initial value so we're going to start with x the thing we're being passed and then loop through a sequence so look through each of our layers and then for each layer call some function here is our function and the function is going to get passed first time around it'll be past the initial value and the first thing in your list so your first layer and x so it's just going to call my the layer function on x the second time around it takes the output of that and passes it in as a circuit as the first parameter and passes in the second layer so then the second time this goes through it's going to be calling the second layer on the result of the first layer and so forth and that's what a reduction is and so when you might see reduce you'll certainly see it talked about quite a lot in in papers and books and you might sometimes also see it in code it's a very general concept and so here's how you can implement a sequential model using reduce so there's no explicit loop there although the loop's still happening internally all right so now that we've reimplemented sequential we can just go ahead and use pytorch's version so there's nn.sequential we can pass in our layers and we can fit not surprisingly we can see the model so yeah looks very similar to the one we built ourselves all right so this thing of looping three parameters and updating our parameters based on gradients and a learning rate and then zeroing them is very common so common that there is something that does that all for us and that's called an optimizer it's the stuff in optim so let's create our own optimizer and as you can see it's just going to do the two things we just saw it's going to go through each of the parameters and update them using the gradient and the learning rate and there's also zero grad which will go through each parameter and set their gradients to zero if you use dot data it's like it's just a way of avoiding having to say torch dot no grad basically okay so in optimizer we're going to pass it the parameters that we want to optimize and we're going to pass it the learning rate and we're just going to store them away and since the parameters might be a generator we'll call list to to turn them into a list so we are going to create our optimizer pass it in the model dot parameters which have been automatically constructed for us by nn dot module and so here's our new loop now we don't have to do any of the stuff manually we can just say opt dot step so that's going to call this and opt dot zero grad and that's going to call this there it is so we've now built our own sgd optimizer from scratch so i think this is really interesting right like these things which seem like they must be big and complicated once we have this nice structure in place you know an sgd optimizer doesn't take much code at all and so it's all very transparent simple clear if you're having trouble using complex library code that you've found elsewhere you know this can be a really good approach is to actually just go all the way back remove as you know as many of these abstractions as you can and like run everything by hand to see exactly what's going on it can be really freeing to see that you can do all this anyways since pytorch has this for us in torch dot optium it's got a optium dot sgd and just like our version you pass in the parameters and you pass in the learning rate so you really see it is just the same so let's define something called get model that's going to return the model the sequential model and the optimizer for it so if we go model comma opt equals get model and then we can call the loss function to see where it's starting and so then we can write our training loop again go through each epoch go through each starting point for our for our batches grab the slice slice into our x and y in the training set calculate our predictions calculate our loss do the backward pass do the optimizer step do the zero gradient and print out how you're going at the end of each one and there we go all right so let's keep making this simpler there's still too much code so one thing we could do is we could replace these lines of code with one line of code by using something we'll call the data set plus so the data set class is just something that we're going to pass in our independent and dependent variable we'll store them away as self dot x and self dot y we'll have something so if you if you define done to len then that's the thing that allows the len function to work so the length of the data set will just be the length of the independent variables and then done to get item is the thing that will be called automatically anytime you use square brackets in python so that just is going to call this function passing in the indices that you want so when we grab some items from our data set we're going to return a tuple of the x values and the y values so then we'll be able to do this so let's create a data set using this tiny little three line class it's going to be a data set containing the x and y training and they'll create another data set containing the x and y valid and those two data sets will call train ds and valid ds so let's check the length of those data sets should be the same as the length of the x's and they are and so now we can do exactly what we hope we could do we can say xb comma yb equals train ds and pass in some slice so that's going to give us back our check the shapes are correct it should be five by 28 by 28 five by 28 times 28 and the y's should just be five and so here they are the x's and the y's so that's nice we've created a data set from scratch and again it's not complicated at all and if you look at the actual pytorch source code this is basically all data sets do so let's try it we call get model and so now we've replaced our data set line with this one and as per usual it still runs and so this is what i do when i'm writing code is i try to like always make sure that my starting code works as i refactor and so you can see all the steps and so somebody reading my code can then see exactly like why am i building everything i'm building how does it all fit in see that it still works and i can also keep it clear in my own head so i think this is a really nice way of implementing libraries as well all right so now we're going to replace these two lines of code with this one line of code so we're going to create something called a data loader and a data loader is something that's just going to do this okay so we need to create an iterator so an iterator is a class that has a dunder iter method when you say for n in python behind the scenes it's actually calling dunder iter to get a special object which it can then loop through using yield so it's basically getting this thing that you can iterate through using yield so a data loader is something that's going to have a data set and a batch size because we're going to go through the batches and grab one batch at a time so we have to store away the data set and the batch size and so when you when we call the for loop it's going to call dunder iter we're going to want to do exactly what we saw before go through the range just like we did before and then yield that bit of the data set and that's all so that's a data loader so we can now create a train data loader and a valid data loader from our train data set and valid data set and so now we can if you remember the way you can get one thing out of an iterator so you don't need to use a for loop you can just say iter and that will also call dunder iter next we'll just grab one value from it so here we will run this and you can see we've now just confirmed we've xb is a 50 by 784 and yb there it is and then we can check what it looks like so let's grab the first element of our x batch make it 28 by 28 and there it is so now that we've got a data loader again we can grab our model and we can simplify our fit function to just go for xb yb and train deal so this is getting nice and small don't you think and it still works the same way okay so this is really cool and now that it's nice and concise we can start adding features to it so one feature i think we should add is that our training set each time we go through it it should be in a different order it should be randomized the order so instead of always just going through these indexes in order we want some way to say go use random indexes so the way we can do that is creates a class called sampler and what sampler is going to do i'll show you is if we create a sampler without shuffle without randomizing it it's going to simply return all the numbers from zero up to n in order and it'll be an iterator see this is done to it but if i do want it shuffled then it will randomly shuffle them so here you can see i've created a sampler without shuffle so if i then make an iterator from that and print a few things from the iterator you can see it's just printing out the indexes it's going to want or i can do exactly the same thing as we learned earlier in the course using i slice we can grab the first five so here's the first five things from a sampler when it's not shuffled so as you can see these are just indexes so we could add shuffle equals true and now that's going to call random dot shuffle which just randomly permits them and now if i do the same thing i've got random indexes of my source data so why is that useful well what we could now do is create something called a batch sampler and what the batch sampler is going to do is it's going to basically do this i slice thing for us so we're going to say okay pass in a sampler so that's something that generates indices and pass in a batch size and remember we've looked at chunking before it's going to chunk that iterator by that batch size and so if i now say all right please take our sampler and create batches of four as you can see here it's creating batches of four indices at a time so rather than just looping through them in order i can now loop through this batch sampler so we're going to change our data loader so that now it's going to take some batch sampler and it's going to loop through the batch sampler that's going to give us indices and then we're going to get that data set item from that batch for everything in that batch so that's going to give us a list and then we have to stack all of the x's and all of the y's together into tensors so i've created something here called collate function and we're going to default that to this little function here which is going to grab our batch pull out the x's and y separately and then stack them up into tensors so this is called our collate function okay so if we put all that together we can create a training sampler which is a batch sampler over the training set with shuffle true a validation sampler will be a batch sampler over the validation set with shuffle false and so then we can pass that into this data loader class the training data set and the training sampler and the collate function which we don't really need because it's we're just using the default one so i guess we can just get rid of that and so now there we go we can do exactly the same thing as before x b y b is next iter and this time we use the valid data loader check the shapes and this is how PyTorch's actual data loaders work this is the this is all the pieces they have they have samplers they have batch samplers they have a collation function and they have data loaders so remember that what i want you to be doing for your homework is experimenting with these carefully to see exactly what each thing's taking in okay so PyTorch is asking on the chat what is this collate thing doing okay so collate function it defaults to collate what does it do well let's see let's go through each of these steps okay so we need so when we've got a batch sampler so let's do just the valid sampler okay so the batch sampler here it is so we're going to go through each thing in the batch sampler so let's just grab one thing from the batch sampler okay so the output of the batch sampler will be next it uh okay so here's what the batch sampler contains all right just the first 50 digits not surprisingly because this is our validation sampler if we did a training sampler that would be randomized there they are okay so then what we then do is we go self.dataset i for i and b so let's copy that copy paste and so rather than self.dataset i we'll just say valid dsi oh and it's not i and b it's i and o that's what we called it um oh we did it for training sorry training okay so what it's created here is a list of tuples of tensors i think let's have a look so let's have a look so we'll call this um p whatever so p zero okay is a tuple it's got the x and the y independent independent variable so that's not what we want what we want is something that we can loop through we want to get batches so what the collation model is going to do sorry not the collation model the collate function is going to do is it's going to take all of our x's and all of our y's and collect them into two tensors one tensor of x's and one tensor of y's so the way it does that is it first of all calls zip so zip is a very very commonly used python function it's got nothing to do with the compression program zip but instead what it does is it effectively allows us to like transpose things so that now as you can see we've got all of the second elements or index one elements all together and all of the index zero elements together and so then we can stack those all up together and that gives us our y's for our batch so that's what collate does so the collate function is used an awful lot um uh in in pytorch increasingly nowadays where hugging face stuff uses it a lot and so we'll be using it a lot as well um and basically it's a thing that allows us to customize how the data that we get back from our data set once it's been kind of generating a list of of things from the data set how do we put it together into some into a bunch of things that our model can take as inputs because that's really what we want here so that's what the collation function does oh this is the wrong way around like so um this is um something that i do so often that fast core has a quick little shortcut for it just called store address store attributes and so if you just put that in your dunder init then you just need one line of code and it does exactly the same thing so there's a little shortcut as you see and so you'll see that quite a bit all right let's have a um seven minute break and uh see you back here very soon and we're going to look at a multi-processing data loader and then we all have nearly finished this notebook all right see you soon all right let's keep going um so we've seen how to create a data loader um and uh sampling from it um the pytorch data loader works exactly like this but um it uses a lot more code because it implements um multi-processing and so multi-processing means that the actual this thing here that code can be run uh in multiple processes they can be run in parallel for multiple items so this code for example might be opening up a jpeg rotating it flipping it etc right so because remember this is just calling the dunder get item uh for a data set so that could be doing a lot of work for each item and we're doing it for every item in the batch so we'd love to do those all in parallel so i'll show you a very quick and dirty way that basically does the job um so um python has a multi-processing library um it doesn't work particularly well with pytorch tenses so pytorch has created an exact reimplementation of it so it's identical api wise but it does work well with tenses so this is basically we'll just grab the multi-processing so this is not quite cheating because multi-processing isn't the standard library and this is api equivalent so i'm going to say we're allowed to do that um so as we've discussed you know when we call square brackets on a class it's actually identical to calling the dunder get item function on on the object so you can see here if we say give me items three six eight and one it's the same as calling dunder get item passing in three six eight and one now why does this matter well i'll show you why it matters because we're going to be able to use map and i'll explain why we want to use map in a moment map is a really important concept you might have heard of map reduce so we've already talked about reductions and what those are um maps are kind of the other key piece map is something which takes a sequence and calls a function on every element of that sequence so imagine we had a couple of batches of indices three and six and eight and one then we're going to call dunder get item on each of those batches so that's what map does map calls this function on every element of this sequence and so that's going to give us the same stuff but now this same as this but now batched into two batches now why do we want to do that because multi-processing has something called pool where you can tell it how many workers you want to run how many processes you want to run and it then has a map which works just like the python normal python map but it runs this function in parallel over the items from this iterator so this is how we can create a multi-processing data loader so here we're creating our data loader and again we don't actually need to pass in the collect function because we're using the default one so if we say nworkers equals two and then create that if we say next see how it's taking a moment and it took a moment because it was firing off those two workers in the background so the first batch actually comes out more slowly but the reason that we would use a multi-processing data loader is if this is doing a lot of work we want it to run in parallel and even though the first the first item might come out a bit slower once those processes are fired up it's going to be faster to run so this is yeah this is a really simplified multi-processing data loader because this needs to be super super efficient PyTorch has lots more code than this to make it much more efficient but the idea is this and this is actually a perfectly good way of experimenting or building your own data loader to make things work exactly how you want so now that we've re-implemented all this from PyTorch let's just grab the PyTorch and as you can see they're exactly the same data loader they don't have one thing called sampler that you pass shuffle to they have two separate classes called sequential sample and random sampler I don't know why they do it that way it's a little bit more work to me but same idea and I got batch sampler and so it's exactly the same idea the training sampler is a batch sampler with a random sampler the validation sampler is a batch sampler with a sequential sampler passing batch sizes and so we can now pass those samplers to the data loader this is now the PyTorch data loader and just like ours it also takes a collate function okay and it works cool so that's as you can see it's it's doing exactly the same stuff that ours is doing with exactly the same API and it's got some shortcuts as I'm sure you've noticed when you've used data loaders so for example calling batch sampler is very going to be very very common so you can actually just pass the batch size directly to a data loader and it will then auto create the batch samplers for you so you don't have to pass in batch sampler at all instead you can just say sampler and it will automatically wrap that in a batch sampler for you that does exactly the same thing and in fact because it's so common to create a random sampler or a sequential sampler for a data set you don't have to do that manually you can just pass in shuffle equals true or shuffle equals false to the data loader and that does again exactly the same thing there it is now something that is very interesting is that when you think about it the batch sampler and the collation function are things which are taking the result of the sampler looping through them and then collating them together but what we could do is actually because our data sets know how to grab multiple indices at once we can actually just use the batch sampler as a sampler we don't actually have to loop through them and collate them because they're basically instantly they come pre-collated so this is a trick which actually hugging face stuff can use as well and we'll be seeing it again so this is an important thing to understand is how come we can pass a batch sampler to sampler and what's it doing and so rather than trying to look through the pytorch code i suggest going back to our non-multiprocessing pure python code to see exactly how that would work because it's a really nifty trick for things that you can grab multiple things from at once and it can save a whole lot of time it can make your code a lot faster okay so now that we've got all that nicely implemented we should now add a validation set and there's not really too much to talk about here we'll just take our fit function and this is exactly the same code that we had before and then we're just going to add something which goes through the validation set and gets the predictions and sums up the losses and accuracies and from time to time prints out the loss and accuracy and so get-dls we will implement by using the pytorch data loader now and so now our whole process will be get-dls passing in the training and validation data set notice that for our validation data loader i'm doubling the batch size because it doesn't have to do back propagation so it should use about half as much memory so i can use a bigger batch size get our model and then call this fit and now it's printing out the loss and accuracy on the validation set so finally we actually know how we're doing which is that we're getting 97 accuracy on the validation set and that's on the whole thing not just on the last batch so that's cool we've now implemented a proper working sensible training loop it's still you know a bit more code than i would like but it's not bad and every line of code in there and every line of code it's calling is all stuff that we have built ourselves reimplemented ourselves so we know exactly what's going on and that means it's going to be much easier for us to create anything we can think of we don't have to rely on other people's code so hopefully you're as excited about that as i am because it really opens up a whole world for us so one thing that we're going to want to be able to do now that we've got a training loop is to grab data and there's a really fantastic library of data sets available on hugging face nowadays and so let's look at how we use those data sets now that we know how to bring things into data loaders and stuff so that now we can use the entire world of hugging face data sets with our code so we're going to so you need to pip install data sets and once you've piped and stored data sets you'll be able to say from data sets import and you can import a few things i just these two things now load data set load data set builder and we're going to look at a a data set called fashion mnist and so the way things tend to work with hugging face is there's something called the hugging face hub which has models and it has data sets amongst other things and generally you'll give them a name and you can then say in this case load a data set builder for fashion mnist now a data set builder is just basically something which has some metadata about about this data set so the data set builder has a dot info and the dot info has a dot description and here's a description of this and as you can see again we've got 28 by 28 grayscale so it's going to be very familiar to us because it's just like mnist and again we've got 10 categories and again we've got 60 000 training examples and again we've got 10 000 test examples so this is this is cool so as it says it's direct drop and replacement for mnist and so the data set builder also will tell us what are what's in this data set and so hugging face stuff generally uses dictionaries rather than tuples so there's going to be an image of type image there's going to be a label of type class label there's 10 classes and these are the names of the classes so it's quite nice that in hugging face data sets you know we can kind of get this information directly it also tells us if there are some recommended training test splits we can find out those as well so this is the size of the training split and the number of examples so now that we're ready to start playing it with it we can load the data set okay so this is a different string load data set builder versus load data set so this will actually download it cache it and here it is and it creates a data set dictionary so a data set dictionary if you've used fastai is basically just like what we call the data sets class they call the data set dict class so it's a dictionary that contains in this case a train and a test item and those are data sets these data sets are very much like the data sets that we created in the previous notebook so we can now grab the training and test items from that dictionary and just pop them into variables and so we can now have a look at the zero index thing in training and just like we were promised it contains an image and a label so as you can see we're not getting tuples anymore we're getting dictionaries containing the x and the y in this case image and label so i'm going to get i'm pretty bored writing image and label and strings all the time so i'm just going to store them as x and y so x is going to be the string image and y will be the string label um i guess the other way i could have done that would have been to say x comma y equals that that would probably be a bit neater um because it's coming straight from the features and if you if you iterate into a dictionary you get back it's it's keys that's why that works so anyway i've done it manually here which is a bit sad but there you go okay so we can now grab the from train zero which we've already seen we can grab the x i.e the image and there it is there's the image we could grab the first five images and the first five labels for example and there they are now we already know what the names of the classes are so we could now see what these map to by grabbing those features so there they are so um this is a special hugging face class which most libraries have something including fastai that works like this there's something called int to string which is going to take these and convert them to these so if i call it on our y batch you'll see we've got first is ankle boot and there that is indeed an ankle boot then we'll have a couple of t-shirts and address okay so um how do we use this to train a model well we're going to need a data loader and we want a data loader that for now we're going to do just like we've done it before it's going to return um uh well actually we're going to do something a bit different we're going to have our collate function is actually going to return a dictionary actually this is pretty common for um hugging face stuff um and pytorch doesn't mind if you it's happy for you to return a dictionary from a collation function so rather than returning a tuple of the stacked up hopefully this looks very familiar this looks a lot like the thing that goes through the data set for each one and stacks them up just like we did um in the previous notebook so that's what we're doing we're doing all in one step here in our collate function and then again exactly the same thing go through our batch grab the y and this is just stacking them up with the integers so we don't have to call stack and so we're now going to have the image and label bits in our dictionary so if we create our data loader using that collation function grab one batch so we can go batch x dot shape is a 16 by 1 by 28 by 28 and our y if the batch here here it is so the thing to notice here is that we haven't done any transforms or anything or written our own data set class or anything we're actually putting all the work directly in the collation function so this is like a really nice way to skip all of the kind of abstractions of your framework if you want to is you can just do all of your work and collate functions so it's going to pass you each item so it's going to you're going to get the batch directly you just go through each item and so here we're saying okay grab the x key from that dictionary convert it to a tensor and then do that for everything in the batch and then stack them all together so this is yeah this is like can be quite a nice way to do things if you want to do things just very manually without having to think too much about you know a framework particularly if you're doing really custom stuff this can be quite helpful having said that um hugging face data sets absolutely lets you avoid doing everything in collate function which if we want to create really simple applications that's where we're going to eventually want to head so we can um do this using a transform instead and so the way we do that is we create a function you're going to take our batch it's going to replace the x in our batch with the tensor version of each of those pao images and i'm not even stacking them or anything and then we're going to return that batch and so uh hugging face data sets has something called with transform and that's going to take your data set your hugging face data set and it's going to apply this function to every element and it doesn't run at all now it's going to basically when when it behind the scenes when it calls done to get item it will call this function on the fly so in other words this could have data augmentation which can be random or whatever because it's going to be rerun every time you grab an item it's not cached or anything like that so other than that this data set has exactly the api same api as any other data set it has a length it has a done to get item so you can pass it to a data loader and so um um pytorch already knows how to collate dictionaries of tenses so we've got a dictionary of tenses now so that means we don't need a collate function anymore i can create a data loader from this without a collate function as you can see and so this is giving exactly the same thing as before but without having to create a custom collate function now even this is a bit more code than i want having to return this seems a bit silly but the reason i had to do this is because hugging face data sets expects the with transform function to return the the new version of the of the data so i wanted to be able to write it like this transform in place and just say the change i want to make and have it automatically return that so if i call if i create this function that's exactly the same as the previous one that doesn't have return how would i turn this into something which does return the result so here's an interesting trick we could take that function pass it to another function to create a new function which is the version of this in place function that returns the result and the way i do that is by creating a function called inplace it takes a function it returns a function the function it returns is one that calls my original function and then returns the result so this is the function this is a function generating function and it's modifying an in place function to become a function that returns the new version of that data and so this is a function this function is passed to this function which returns a function and here it is so here's the version that hugging face will be able to use so i can now pass that to with transform and it does exactly the same thing so this is very very common in python it's so common that this line of code can be entirely removed and replaced with this little token if you have a function and put at at the start you can then put that before a function and what it says is take this whole function pass it to this function and replace it with the result so this is exactly the same as the combination of this and this and when we do it this way this kind of little syntax sugar is called a decorator okay so there's nothing nothing magic about decorators it's literally identical to this oh i guess the only difference is we don't end up with this unnecessary intermediate underscore version but the result is exactly the same and therefore i can create a transformed data set by using this and there we go it's all working fine um yeah so i mean none of this is particularly um necessary but what we're doing is we're just kind of like seeing you know the pieces that we can we can put in place um to make this stuff as easy as possible and we don't have to think about things too much um all right now with all this we can basically make things pretty automatic um and the way we can make things pretty automatic is we're going to use a cool thing in python called item getter and item getter is a function that returns a function so hopefully you're getting used to this idea now this creates a function that gets the a and c items from a dictionary or something that looks like a dictionary so here's a dictionary it contains keys a b and c so this function will take a dictionary and return the a and c values and as you can see it has done exactly that um explain why this is useful in a moment um i just wanted to briefly mention what did i mean when i said something that looks like a dictionary i mean this is a dictionary okay that looks like a dictionary but python doesn't care about what type things actually are it only cares about what they look like and remember that when we call something with square brackets when we index into something behind the scenes it's just called calling dunder get item so we could create our own class and it's dunder get item gets the key and it's just going to manually return one if k equals a or two if k equals b or three otherwise and look that class also works just fine with an item getter um the reason this is interesting is because like a lot of people write python as if it's like c plus plus or java or something they write as if as if it's this kind of statically type thing um but i really wanted to point out that it's an extremely dynamic language and there's a lot more flexibility than you might have realized anyway that's a littler side um so what we can do is um think about a batch for example where we've got these two dictionaries okay so um pytorch comes with a default collation function called not surprisingly default collate so that's part of um pytorch and what default collate does with dictionaries is it simply takes the matching keys and then grabs their values and stacks them together and so that's why if i call default collate a is now one three b is now two four that's actually what happened before when we created this data loader is it used the default collation function which does that it also works on things that are tuples not dictionaries which is what most of you would have used before and what we can do therefore is we could create something called collate dict which is something which is going to take a um data set um and it's going to create a item getter function for the features in that data set which in this case is image and label so this is a function which will get the image and label items and so we're now going to return a function and that function is simply going to call our item getter on default collate and what this is going to do is it's going to take a dictionary and collate it into a tuple um just like we did up here so if we run that so we're now going to call data loader on our transform data set passing in and remember this is a function that returns a function so it's a collation function for this data set and there it is so now this looks a lot like what we had in our previous notebook this is not returning a dictionary but it's returning a tuple so this is um a really important idea for particularly for working with hugging face data sets is that they tend to do things with dictionaries and most other things in the pytorch world tend to work with tuples so you can just use this now to convert anything that takes that returns dictionaries into something that provides tuples by passing it as a collation function to your data loader so remember you know the thing you want to be doing this this week is is doing things like import pdb pdb dot set trace right put break points step through see exactly what's happening you know um not just here but also even more importantly doing it inside inside the inner most inner function um so then you can see um what did i do wrong there oh today let's set underscore trace um so then we can see exactly what's going on print out b list the code and i could step into it and look i'm now inside the default collate function which is inside pytorch and so i can now see exactly how that works there it all is so it's going to go through and this code is going to look very familiar because we've implemented all this ourselves except it's being careful that like it works for lots of different types of things dictionaries numpy arrays so on and so forth um so the first thing i wanted to do oh actually something i do want to mention here this is so useful we want to be able to use it in all of our notebooks so rather than copying and pasting this every time it would be really nice to create a python module that contains this definition so we've created um a library called nbdev um it's really a whole system called nbdev which does exactly that it creates um modules you can use from your notebooks and the way you do it is you use this special uh thing we call comment directives which is hash pipe and then hash pipe export so you put this at the top of a cell and it says do something special for this cell what this does is it says put this into a python module for me please export it to a python module what python module is it going to put it in well if you go all the way to the top you tell it what default export module to create so it's going to create a module called datasets so what i do at the very end of this module is i've got this line that says import nbdev nbdev.nbdev export and what that's going to do for me is create a library a python library it's going to have a datasets.py in it and we'll see everything that we exported here it is collect dict will appear in this for me and so what that means is now in the future in my notebooks i will be able to import collect dict from the from my datasets now you might wonder well how does it know to call it mini AI what's mini AI well in nbdev you create a settings.ini file where you say what the name of your library is so we're going to be using this quite a lot now because we're getting to the point where we're starting to implement stuff that didn't exist before so previously most of the stuff or pretty much all the stuff we've created i've said like oh that already exists in pytorch so we don't need it we just use pytorches but we're now getting to a point where we're starting to create stuff that doesn't exist anywhere we've created it ourselves and so therefore we want to be able to use it again so during the rest of this course we're going to be building together a library called mini AI that's going to be our framework our version of something like fastai maybe it's something like what fastai3 will end up being we'll see so that's what's going on here to so we're going to be using once i start using mini AI i'll show you exactly how to install this but that's what this export is and so you might have noticed i also had an export on this in place thing and i also had it on my necessary import statements okay um we want to be able to see what this data set looks like so i thought it now's a good time to talk a bit about plotting because knowing how to visualize things well is really important and again the idea is we we're not allowed to use fastai's plotting library so we've got to learn how to do everything ourselves so here's the basic way to plot some an image using matplotlib so we can create a batch grab the x part of it um grab the very first thing in that and i am show means show an image and here it is there is our anchor boot um so let's start to think about what stuff we might create which we can export to make this a bit easier so let's create something called show image which basically does iamshow but we're going to do a few extra things we will make sure that it's in the correct access order we will make sure it's not on cuda that's on the cpu if it's not a numpy array we'll convert it to a numpy array we'll be able to pass in an existing access which we'll talk about soon if we want to we'll be able to set a title if we want to and also this thing here removes all this ugly 0 5 blah blah blah axis because we're showing an image we don't want any of that so if we try that you can see there we go we've also been able to say what size we want the image there it all is now here's something interesting when i say help the help shows the things that i implemented but it also shows a whole lot more things how did that magic thing happen and you can see they work because here's fixed size which i didn't add oh sorry i did add well okay that's a bad example anyway these other ones all work as well um so how did that happen well the trick is that i added star star quags here and star star quags says grab um you can you know pass as many or any other arguments as you like that aren't listed and they'll all be put into a dictionary with this name and then when i call i am show i pass that entire dictionary star star here means as separate arguments and that's how come it works and then how come does it know how come it knows what help to provide the reason why is that fast core has a special thing called delegates which is a decorator so now you know what a decorator is and you tell it what is it that you're going to be passing quags to i'm going to be passing it to i am show and then it automatically creates the documentation correctly to show you what quags can do so this is a really helpful way of being able to kind of extend existing functions like i am show and still get all of their functionality and all of their documentation and add your own so delegates is one of the most useful things we have in fast core in my opinion so we're going to export that so now we can use show image anytime you want which is nice um something that's really helpful to know about matplotlib is how to create subplots so for example what happens if you want to plot two images next to each other so in matplotlib subplots creates multiple plots and you pass it number of rows and the number of columns so this here has as you see one row and two columns and it returns axes now what it calls axes is what it refers to as the individual plots so if we now call show image on the first image passing in axes zero it's going to get that here right then we call ax.am show that means put the image on this subplot they don't call it a subplot unfortunately they call it an axis put it on this axis so that's how come we're able to show an image one image on the first axis and then show a second image on the second axis by which we mean subplot and there's our two images so that's pretty handy um so i've decided to add some additional functionality to subplots so therefore i use delegates on subplots because i'm adding functionality to it and i'm going to be taking quags and passing it through to subplots and the main thing i wanted to do is to automatically create an appropriate figure size by just finding out you tell us what image size you want and i also want to be able to add a title for the whole set of subplots and so there it is and then i also want to show you that in it'll automatically if we want to create documentation for us as well for our library and here is the documentation so as you can see here for the stuff i've added it's telling me exactly what each of these parameters are their type their defaults and information about each one and that information is automatically coming from these little comments is we call these documents this is all automatic stuff done by fast core and nbdev and so you might have noticed when you look at fastai library documentation it always has all this info that's that's that's why you don't actually have to call showdoc it's automatically added to your documentation for you i'm just showing you here what it's going to end up looking like and you can see that it's worked with delegates it's put all the extra stuff from delegates in here as well and here they all listed out here as well so anyway subplots so let's create a three by three set of plots and we'll grab the first two images and so now we can go through each of the subplots now it returns it as a three by three basically a list of three lists of three items so i flatten them all out into a single list so we'll go through each of those subplots and go through each image and show each image on each axis and so here's a quick way to quickly show them all as you can see it's a little bit ugly here so we'll keep on adding more useful plotting functionality so here's something that again it calls our subplots delegates to it but we're going to be able to say for example how many subplots do we want and it'll automatically calculate the rows and the columns and it's going to remove the axes for any ones that we're not actually using and so here we got that so that's what get grid's going to let us do so we're getting quite close and so finally why don't we just create a single thing called show images that's going to get our grid and it's going to go through our images optionally with a list of titles and show each one and we can use that here you can see we have successfully got all of our labeled images and so we yeah I think all this stuff for the plotting is pretty useful so as you might have noticed they were all exported so in our datasets.py we've got our get grid we've got our subplots we've got our show images so that's going to make life easier for us now since we have to create everything from scratch we have created all of those things so as I mentioned at the very end we have this one line of code to run and so just to show you if I remove miniai.datasets.minibidai/datasets.py so it's all empty and then I run this line of code and now it's back as you can see and it tells you it's auto generated all right so we are nearly at the point where we can build our learner and once we've built our learner we're going to be able to really dive deep into training and studying models so we've kind of got nearly got all of our infrastructure in place before we do there's some pieces of python which not everybody knows and I want to kind of talk about and kind of computer science concepts I want to talk about so that's what o6 foundations is about so this whole section is just going to tell it just going to talk about some stuff in python that you may not have come across before or you know maybe it's a review for some of you as well and it's all stuff we're going to be using basically in the next notebook so that's why I wanted to to cover it so we're going to be creating a learner class so a learner class is going to be a very general purpose training loop which we can get to to do anything that we wanted to do and we're going to be creating things called callbacks to make that happen and so therefore we're going to just spend a few moments talking about what are callbacks how are they used in in computer science how are they implemented look at some examples they come up a lot perhaps the most common place that you see callbacks in software is for GUI events so for events from some graphical user interface so the main graphical user interface library in jupyter notebooks is called ipy widgets and we can create a widget like a button like so and when we display it it shows me a button and at the moment it doesn't do anything if I click on it what we can do though is we can add an onclick callback to it which is something which is a fun we're going to pass it a function which is called when you click it so let's define that function so I'm going to say w.onclickf is going to assign the f function to the onclick callback now if I click this there you go it's doing it now what does that mean well a callback is simply a call of all that you've provided so remember a callable is a more general version of a function so in this place it is a function that you've provided that will be called back to when something happens so in this case there's something that's happening is that they're clicking a button so this is how we are defining and using a callback as a GUI event so basically everything in ipy widgets if you want to create your own graphical user interfaces for jupyter you can do it with ipy widgets and by using these callbacks so these particular kinds of callbacks are called events but it's just a callback all right so that's somebody else's callback let's create our own callback so let's say we've got some very slow calculation and so it takes a very long time to add up the numbers 0 to 5 squared because we sleep for a second after each one so let's run our slow calculation still running oh how's it going come on finish our calculation there we go the answer is 30 now for a slow calculation like that such as training a model it's a slow calculation it would be nice to do things like i don't know print you know print out the loss from time to time or show a progress bar or whatever so generally for those kinds of things we would like to define a callback that is called at the end of each epoch or batch or every few seconds or something like that so here's how we can modify our slow calculation routine such that you can optionally pass at a callback and so all of this code's the same except we've added this one line of code that says if there's a callback then call it and pass in what what we're where we're up to so then we could create our callback function so this is just like we created a full callback function f let's create a show progress callback function that's going to tell us how far we've got so now if we call show slow calculation passing in our callback you can see it's going to call this function at the end of each step so here we've created our own callback so there's nothing special about a callback like it doesn't require its own like syntax it's not a new concept it's just an idea really which is the idea of passing in a function which some other function will call at particular times such as at the end of a step or such as when you click a button so that's what we mean by callbacks we don't have to define the function ahead of time we could define the function at the same time that we call the slow calculation by using lambda so as we've discussed before lambda just defines a function but it doesn't give it a name so here's a function it takes one parameter and prints out exactly the same thing as before so here's the same way as doing it but using a lambda um we could make it more sophisticated now and rather than always saying also we finished epoch whatever we could have let you pass in an exclamation and we print that out and so in this case we could now have our lambda call that function and so one of the things that we can do now is to again we can create a function that returns a function and so we could create a make show progress function where you pass in the exclamation we could then create and there's no need to give it a name actually it's just return it directly we can return a function that calls that exclamation so here we are passing in nice and that's exactly the same as doing something like what we've done before we could say instead of using a lambda we can create an inner function like this so here is now a function that returns a function this does exactly the same thing okay so one way with a lambda one way out of lambda and one of the reasons I wanted to show you that is so I can I've got so many here's is that we can do exactly the same thing using partial so with partial it's going to do exactly the same thing as this kind of make show progress it's going to call show progress and pass okay I guess so this is an again an example of a function returning a function and so this is a function that calls show progress passing in this as the first parameter and again it does exactly the same thing okay so we go we tend to use partial a lot so that's certainly something worth spending time practicing now as we've discussed python doesn't care about types in particular and there's nothing about any of this that requires cb to be a function it just has to be it just has to be a callable a callable is something that that you can that you can call and so as we've discussed another way of creating a callable is defining dunder call so here's a class and this is going to work exactly the same as our make show progress thing but now as a class so there's a dunder in it which draws the exclamation and a dunder call that prints and so now we're creating a object which is callable and does exactly the same thing okay so these are all like um fundamental ideas that I want you to get really comfortable with the idea of dunder call dunder things in general partials classes because they come up all the time um in pytorch code and um and in the code we'll be writing and in fact pretty much all frameworks so it's really important to feel comfortable with them and remember you don't have to rely on the resources we're providing you know if there's certain things here that are very new to you you know google around for some tutorials or ask for help on the forums finding things and so forth and then I'm just going to briefly recover something I've mentioned before which is star args and star star quags because again they come up a lot um I just wanted to show you how they work so if we create a function that has star args and star star quags nothing else and I'm just going to have this function just print them now I'm going to call the function I'm going to pass three I'm going to pass a and I'm going to pass thing one equals hello now these are passed what we would say by position we haven't got a blah equals they're just stuck there things that are passed by position are placed in star args if you have one it doesn't have to be called args you can call this anything you like but in the star bit and so you can see here that args is a tuple containing the positionally passed arguments and then quags is a dictionary containing the named arguments so that is all that star args and star star quags too and as I say there's nothing special about these names I'll call this a I'll call this b okay and it'll do exactly the same thing okay so um this comes up a lot um and so it's it's important to remember this is literally all that they're doing and then um on the other hand let's say we had a function which takes a couple of okay let's try that print a actually just print them directly a b c okay we can also rather than just using them as parameters we can also use them when calling something so let's say I create something called args again doesn't have to be called args called which contains one comma two and I create something called quags that contains a dictionary containing c colon three I can then call g and I can pass in star args comma star star quags and that's going to take this one two and pass them as individual arguments for positionally and it's going to take the c three and pass that as a named argument c equals three and there it is okay so they're kind of two linked but different ways that use star and star star um okay now here's a slightly different way of doing callbacks which I really like in this case I've now passing in a callback that's not callable but instead it's going to have a method called before calc and another method called after calc and I'm so now my callback is going to be a class containing a before calc and an after calc method and so if I run that you can see it's that there it goes okay and so this is printing before and after every step by call calling before calc and after calc so callback actually doesn't have to be a callable doesn't have to be a function a callback could be something that contains methods so we could have a version of this which actually as you can see here it's going to pass in to after calc both the epoch number and the value it's up to but by using star args and star star quags I can just safely ignore them if I don't want them right so it's just going to chew them up and not complain if I didn't have those here it won't work see because it got passed in vowel equals and there's nothing here looking for vowel equals it doesn't like that so this is one good use of star args and star star quags is to etap arguments you don't want um or we could use the argument so let's actually use epoch and val and print them out and there it is so this is a more sophisticated callback that's giving us status as we go um I'm going to skip this bit because we don't really care about that okay so finally let's just review this idea of dunder which we've mentioned before but just to to really nail this home anything that looks like this underscore underscore something underscore underscore something is special and basically it could be that python has to find that special thing or pytorch has to find that special thing or numpy has to find that special thing but they're special these are called dunder methods um and some of them are defined as part of the python data model and so if you go to the python documentation it'll tell you about these various different here's repra which we used earlier here's init that we used earlier so they're all here pytorch has some of its own numpy has some of its own so for example if python sees plus what it actually does is it calls dunder add so if we want to create something that's not very good at adding things it actually already also always adds 0.01 to it then i can say sloppy adder one plus sloppy adder two equals 3.01 so plus here is actually calling dunder add so if you're not familiar with these click on this data model link and read about these specific one two three four five six seven eight nine ten eleven methods because we'll be using all of these in the course so i'll try to revise them when we can but i'm generally going to assume that you know these a particularly interesting one is getatra we've seen setatra already getatra is just the opposite take a look at this here's a class it just contains two attributes a and b that are set to one and two so i'll create that an object of that class a dot b equals two because i set b to two okay now when you say a dot b that's just syntax sugar basically in python what it's actually calling behind the scenes is getatra it calls getatra on the object and so this one here is the same as getatra a comma b which hopefully oh actually that that'll be um yeah so it calls getatra a comma b and this can kind of be fun because you could call getatra a comma and then either b or a randomly how's that for crazy so if i run this two one one one two as you can see it's random um so yeah python's such a dynamic language you can even set it up so you literally don't know what attributes going to be called now getatra behind the scenes is actually calling something called done to getatra and by default it'll use the version in the object base class so here's something just like a it's got i've got a and b defined but i've also got done to getatra defined and so done to getatra it's only called for stuff that hasn't been defined yet and it'll pass in the key or the the the name of the attribute so generally speaking if the first character is an underscore it's going to be private or special so i'm just going to raise an attribute error error otherwise i'm going to steal it and return hello from k so if i go b dot a that's defined so it gives me one if i go b dot foo that's not defined so it calls getatra and i get back hello from foo and so um uh this gets used a lot in both fastai code and also hugging face code um to you know often make it more convenient to access things um so that's yeah that's how the getatra function and the done to getatra method work um okay so i went over that pretty quickly um since i know for quite a few folks this will be all review but i know for folks who haven't seen any of this this is a lot to cover so i'm hoping that you'll kind of go back over this revise it slowly experiment with it and look up some additional resources and ask on the forum and stuff for anything that's not clear remember um everybody has parts of the course that's really easy for them and parts of the course that are completely unfamiliar for them and so if this particular part of the course is completely unfamiliar to you it's not because this is harder um or going to be more difficult or whatever it's just so happens that this is a bit that you're less familiar with or maybe the stuff about calculus in the last lesson was a bit that you're less familiar with um there isn't really anything particularly in the course that's more difficult than other parts it's just that you know based on whether you happen to have that background and so yeah if you spend a few hours studying and practicing you know you'll be able to pick up these things and um yeah so don't stress if there are things that you don't get right away just take the time and if you yeah if you do get lost please ask because people are very keen to help if you've tried asking on the forum hopefully you've noticed that people are really keen to help all right so um i think this has been a pretty successful lesson we've we've got to a point where we've got a pretty nicely optimized training loop we understand exactly what data loaders and data sets do we've got an optimizer we've been playing with hugging face data sets and we've got those working really smoothly um so we really feel like we're in a pretty good position to to write our generic learner training loop and then we can start building and experimenting with lots of models so look forward to seeing you next time to doing that together.

Okay, bye!