back to indexLesson 14: Deep Learning Foundations to Stable Diffusion
Chapters
0:0 Introduction
0:30 Review of code and math from Lesson 13
7:40 f-Strings
10:0 Re-running the Notebook - Run All Above
11:0 Starting code refactoring: torch.nn
12:48 Generator Object
13:26 Class MLP: Inheriting from nn.Module
17:3 Checking the more flexible refactored MLP
17:53 Creating our own nn.Module
21:38 Using PyTorch’s nn.Module
23:51 Using PyTorch’s nn.ModuleList
24:59 reduce()
26:49 PyThorch’s nn.Sequential
27:35 Optimizer
29:37 PyTorch’ optim and get_model()
30:4 Dataset
33:29 DataLoader
35:53 Random sampling, batch size, collation
40:59 What does collate do?
45:17 fastcore’s store_attr()
46:7 Multiprocessing DataLoader
50:36 PyTorch’s Multiprocessing DataLoader
53:55 Validation set
56:11 Hugging Face Datasets, Fashion-MNIST
61:55 collate function
64:41 transforms function
66:47 decorators
69:42 itemgetter
71:55 PyTorch’s default_collate
75:38 Creating a Python library with nbdev
78:53 Plotting images
81:14 kwargs and fastcore’s delegates
88:3 Computer Science concepts with Python: callbacks
93:40 Lambdas and partials
96:26 Callbacks as callable classes
97:58 Multiple callback funcs; *args and **kwargs
103:15 __dunder__ thingies
107:33 Wrap-up
00:00:00.400 |
Okay. Hi everybody. And welcome to lesson 14. The numbers are getting up pretty high now, huh? 00:00:06.320 |
We had a lesson last time talking about calculus and how we implement the chain rule 00:00:17.120 |
in neural network training in an efficient way called backpropagation. 00:00:24.640 |
I just wanted to point out that one excellent student, Kaushik Sinha, has produced a very nice 00:00:34.880 |
explanation of the code that we looked at last time, and I've linked to it. 00:00:43.520 |
The code's slightly different to what I had, but it's basically the same things and minor changes. 00:00:52.320 |
And it might be helpful to kind of link between the math and the code to see what's going on. 00:00:58.160 |
So you'll find that in the lesson 13 resources. 00:01:03.680 |
But I thought I'd just quickly try to explain it as well. 00:01:08.240 |
So maybe I could try to copy this and just explain what's going on here. 00:01:20.240 |
With this code. So the basic idea is that we have a neural network that is calculating, 00:01:31.200 |
well, a neural network and a loss function that together the calculator loss. 00:01:35.440 |
So let's imagine that, well, let's just call that the loss function, we'll call it L. 00:01:41.200 |
And the loss function is being applied to the output of the neural network. 00:01:47.680 |
So the neural network function, we'll call n and that takes two things, 00:01:54.960 |
The loss function also requires the targets, but I'm just going to ignore that for now 00:02:02.320 |
because it's not really part of what we actually care about. 00:02:05.040 |
And what we're interested in knowing is if we want to be able to update the weights. 00:02:11.040 |
Let's say this is just a single layer things, keep it simple. 00:02:14.240 |
If we want to be able to update the weights, we need to know how does the loss change 00:02:23.520 |
if we change the weights, if we change one weight at a time, if you like. 00:02:32.080 |
Well, what we could do is we could rewrite our loss function by saying, well, let's call 00:02:41.040 |
capital N the result of the neural network applied to the weights and the inputs. 00:02:48.720 |
And that way we can now rewrite the loss function to say L equals, big L equals 00:02:55.920 |
little L, the loss function applied to the output of the neural network. 00:03:02.240 |
And so maybe you can see where this is going. 00:03:06.240 |
We can now say, okay, the derivative of the loss with respect to the weights 00:03:11.280 |
is going to be equal to the derivative of the loss 00:03:16.240 |
with respect to the outputs of that neural network layer 00:03:20.960 |
times, this is the chain rule, the derivative of the outputs of that neural network layer. 00:03:32.880 |
I'm going to get my notation consistent since these are not scalar with respect to the weights. 00:03:45.360 |
Right, so you can see we can get rid of those and we end up with the change in loss with respect 00:03:53.840 |
to the weights. And so we can just say this is a chain rule. That's what the chain rule is. 00:04:02.000 |
So the change in the loss with respect to the output of the neural network. 00:04:08.240 |
Well, we did the forward pass here and then we took here, this here is where we calculated 00:04:17.280 |
the derivative of the loss with respect to the output of the neural network, 00:04:26.640 |
which came out from here and ended up in diff. So there it is there. So out.g contains 00:04:36.640 |
this derivative. So then to calculate, let's actually do one more. We could also say 00:04:46.880 |
the change in the loss with respect to the inputs, we can do the same thing. 00:05:00.880 |
And so this time we have the inputs. So here you can see that is this line of code. 00:05:19.120 |
So that is the change in the loss with respect to the inputs. That's what input.g means. And it's 00:05:31.200 |
equal to the change in the loss with respect to the output. So that's what out.g means. 00:05:43.680 |
Times, it's actually matrix times because we're doing matrix calculus, times the this derivative 00:05:50.800 |
and since this is a linear layer we were looking at this derivative is simply the weights themselves. 00:05:58.240 |
for w.g, which is the change in the loss, the derivative of the loss with respect to the weights. 00:06:10.640 |
And so again you've got the same thing. You've got your out.g and remember we actually showed 00:06:14.320 |
how we can simplify this into also a matrix product with a transpose as well. So that's how 00:06:20.000 |
what's happening in our code is mapping to the math. So hopefully that's useful but as I say do 00:06:30.800 |
check out this really nice resource which has a lot more detail if you're interested in digging 00:06:37.280 |
deeper. The other thing I'd say is some people have mentioned that they actually 00:06:44.160 |
didn't study this at high school which is fine. We've provided resources on the forum 00:06:52.320 |
for recommending how to learn the basics of derivatives and the chain rule. And so in 00:07:01.280 |
particular I would recommend 3Blue1Brown's Essence of Calculus series and also Khan Academy. 00:07:07.200 |
It's not particularly difficult to learn it'll only take you a few hours and then you can 00:07:12.560 |
this will make a lot more sense. Or if you did it at high school but you've forgotten it same deal. 00:07:18.720 |
So don't worry if you found this difficult because you had forgotten the or had never learned 00:07:27.440 |
the you know basic derivative and chain rule stuff. That's something that you can pick up now 00:07:33.440 |
and I would recommend doing so. Okay so what we then did last time which is actually pretty 00:07:44.080 |
exciting is we got to a point where we had successfully created a training loop which 00:07:50.880 |
did these four steps. So and the nice thing is that every single thing here is something 00:07:59.040 |
that we have implemented from scratch. Now we didn't always use our implemented from scratch 00:08:03.920 |
versions. There's no particular reason to. When we've re-implemented something that already exists 00:08:08.240 |
let's use the version that exists. But every single thing here I guess not argmax but that's 00:08:14.240 |
trivially easy to implement. Every single thing here we have implemented ourselves. And we 00:08:23.200 |
successfully trained a an MNIST model to 90 percent 96 percent accurately recognize handwritten 00:08:32.080 |
digits. So I think that's super neat. It's it's this is I mean this this is not a great metric. 00:08:42.240 |
It's only looking at the training session in particular it's only looking at one batch of 00:08:45.600 |
the training set. Since last time I've just refactored a little bit I've pulled out this 00:08:50.080 |
report function which is now just running at the end of each epoch. And it's just printing out 00:08:59.200 |
the loss and the accuracy. Just something I wanted to mention here is hopefully you've seen 00:09:05.040 |
fstrings before. They're a really helpful part of Python that lets you pop a variable or an 00:09:12.400 |
expression inside curly braces in a string and it it'll evaluate it. You might not have seen this 00:09:18.720 |
colon thing. This is called a format specifier. And with a format specifier you can change how 00:09:26.560 |
things are printed in an fstring. So this is how I'm printing it to do decimal places. 00:09:31.280 |
This says a two decimal places floating point number called loss printed out here followed 00:09:38.000 |
by a comma. So I'm not going to show you how to use those other than to say yeah 00:09:44.320 |
Python fstrings and format specifiers are really helpful. And so if you haven't used them before 00:09:50.720 |
do go look them up tutorial of the documentation because they're definitely something that you'll 00:09:56.640 |
probably find useful to know about. Okay so let's just rerun all those lines of code. 00:10:02.480 |
If you're wondering how I just reran all the cells above where I was there's a 00:10:10.000 |
cell here there's run all above. And it's so helpful that I always make sure there's a 00:10:19.920 |
keyboard shortcut for that. So you can see here I've added a keyboard shortcut QA. So if I type 00:10:27.520 |
QA it runs all cells above. If I type QB it runs all cells below. And so yeah stuff that you do a 00:10:36.000 |
lot make sure you've got keyboard shortcuts for them. You don't want to be fiddling around moving 00:10:39.600 |
around your mouse everywhere. You want it to be as easy as thinking. So this is really exciting. 00:10:46.080 |
We've successfully built and trained a neural network model from scratch and it works okay. 00:10:51.280 |
It's a bit clunky there's a lot of code there's features we're missing. So let's start refactoring 00:10:57.520 |
it. And so refactoring is all about making it so we have to write less code to do the same work. 00:11:11.360 |
And so we're now going to I'm going to show you something that's part of PyTorch and then I'm 00:11:17.440 |
going to show you how to build it. And then you'll see why this is really useful. So PyTorch has a 00:11:23.680 |
sub module called nn torch.nn and in there there's something called the module class. Now we can we 00:11:29.760 |
don't normally use it this way but I just want to show you how it works. We can create an instance 00:11:33.520 |
of it in the usual way where we create instances of classes. And then we can assign things to 00:11:39.600 |
attributes of that module. So for example it's assign a linear layer to it. And if we now print out 00:11:48.080 |
that you'll see it says oh this is a module containing something called foo which is a 00:11:55.520 |
linear layer. But here's something quite tricky. This module we can say show me all of the named 00:12:04.240 |
children of that module. And it says oh there's one called foo and it's a linear layer. 00:12:09.760 |
And we can say oh show me all of the parameters of this module. And it says oh okay sure there's two 00:12:20.560 |
of them. There's this four by three tensor that's the weights. And there's this four long vector 00:12:30.160 |
that's the biases. And so somehow just by creating this module and assigning this to it it's 00:12:38.720 |
automatically tracked what's in this module and what are its parameters. That's pretty neat. So 00:12:45.200 |
we're going to see both how and why it does that. I'm just going to point out by the way why did I 00:12:50.640 |
add list here. If I just said m1.name children it just prints out generator object which is not very 00:12:58.880 |
helpful. And that's because this is a kind of iterator called a generator. And it's something 00:13:07.920 |
which is going to only produce the contents of this when I actually do something with it 00:13:14.000 |
such as list them out. So just popping a list around a generator is one way to like run the 00:13:18.880 |
generator and get its output. So that's a little trick when you want to look inside a generator. 00:13:26.800 |
Okay so now as I said we don't normally use it this way. What we normally do 00:13:33.600 |
is we create our own class. So for example we create our own multi-layer perceptron 00:13:37.600 |
and we inherit it. We inherit from nn.module. And so then in done to enter this is the thing that 00:13:43.760 |
constructs an object of the class. This is the special magic method that does that. We'll say 00:13:50.640 |
okay well how many inputs are there to this multi-layer perceptron. How many hidden activations. 00:13:56.240 |
And how many output activations are there. So just be one hidden layer. And then here we can do just 00:14:01.680 |
like we did up here where we assigned things as attributes. We can do that in this constructor. 00:14:08.880 |
So we'll create an l1 attribute which is a linear layer from number in to number hidden. 00:14:13.760 |
L2 is a linear layer from number hidden to number out. And we'll also create a value. And so when we 00:14:25.840 |
call that module we can take the input that we get and run the linear layer and then run the 00:14:38.400 |
value and then run the l2. And so I can create one of these as you see. And I can have a look and see 00:14:50.800 |
like oh here's the attribute l1. And there it is like I had. And I can say print out the model 00:14:58.560 |
and the model knows all the stuff that's in it. And I can go through each of the named children 00:15:05.840 |
and print out the name and the layer. Now of course if you remember although you can use 00:15:14.720 |
dundercall we actually showed how we can refactor things 00:15:21.440 |
using forward such that it would automatically kind of do the things necessary to make 00:15:29.680 |
all the you know automatic gradient stuff work correctly. And so in practice 00:15:39.680 |
we're actually not going to do dundercall we would do forward. So this is an example of creating 00:15:47.920 |
a custom PyTorch module. And the key thing to recognize is that it knows what are all the 00:15:54.720 |
attributes you added to it. And it also knows what are all the parameters. So if I go through 00:16:01.680 |
the parameters and print out their shapes you can see I've got my linear layer's weights. 00:16:06.080 |
First linear layer sorry second linear layer. My first linear layer's weights. My first linear 00:16:12.080 |
layer's biases. Second linear layer's weights. Second linear layer's biases. And this 50 is 00:16:17.280 |
because we set nh the number of hidden to 50. So why is that interesting? Well because now 00:16:30.080 |
I don't have to write all this anymore going through layers and having to make sure that 00:16:38.800 |
they've all been put into a list. We've just been able to add them as attributes and they're 00:16:44.720 |
automatically going to appear as parameters. So we can just say go through each parameter 00:16:49.360 |
and update it based on the gradient and the learning rate. And furthermore you can actually 00:16:57.280 |
just go model.zero grad and it'll zero out all of the gradients. So that's really made our code 00:17:05.520 |
quite a lot nicer and quite a lot more flexible which is cool. 00:17:14.240 |
There we go. So just to clarify with if I called report on this before I ran it 00:17:25.920 |
as you would expect the accuracy is about 8% or about 10% bit less and the loss is pretty high. 00:17:32.400 |
And so after I run this fit this model the accuracy goes up and the loss goes down. 00:17:41.360 |
So basically it's all of this exactly the same as before. The only thing I've changed are these 00:17:47.680 |
two lines of code. So that's a really useful refactoring. So how on earth did this happen? 00:17:52.960 |
How did it know what the parameters and layers are automatically? 00:17:58.320 |
It used a trick called dunder setatra and we're going to create our own nn.module now. 00:18:07.680 |
So if there was no such thing as nn.module here's how we'd build it. 00:18:16.000 |
And so let's actually build it and also add some things to it. So in dunder init 00:18:20.800 |
we would have to create a dictionary for our named children. This is going to contain a list 00:18:25.840 |
a dictionary of all of the layers. Okay and then just like before we'll create a couple of 00:18:30.880 |
linear layers right and then what we're going to do is we're going to define this special 00:18:36.720 |
magic thing that python has called dunder setatra and this is called automatically by python if you 00:18:42.160 |
have it every time you set an attribute such as here or here and it's going to be past the name 00:18:49.360 |
of the attribute the key and the value is the actual thing on the right hand side of the equal 00:18:54.400 |
sign. Now generally speaking things that start with an underscore we use for either for private 00:19:02.160 |
stuff so we check that it doesn't start with an underscore and if it doesn't start with an 00:19:08.480 |
underscore setatra will put this value into the module's dictionary with this key and then 00:19:21.440 |
call pythons the normal python setatra to make sure it just actually does the attribute setting. 00:19:30.640 |
So super is how you call whatever is in the the super class the base class. So another useful 00:19:38.960 |
thing to know about is how do we how does how does it do this nifty thing where you can just type the 00:19:43.840 |
name and it kind of lists out all this information about it. That's a special thing called dunder 00:19:48.960 |
repra. So here dunder repra will just have it return a stringified version of the modules dictionary 00:19:57.680 |
and then here we've got parameters. How did parameters work? So how did this thing work? 00:20:02.880 |
Well we can go through each of those modules go through each value so the values of the modules 00:20:11.440 |
is all the actual layers and then go through each of the parameters in each module and 00:20:17.840 |
yield p so that's going to that's going to create an iterator if you remember when we 00:20:22.880 |
looked at iterators for all the parameters. So let's try it so we can create one of these modules 00:20:27.840 |
and if we just like before loop through its parameters there they are. 00:20:32.480 |
Now I'll just mention something that's optional kind of like advanced python that a lot of people 00:20:40.400 |
don't know about which is there's no need to loop through a list or a generator or I guess say loop 00:20:47.680 |
through an iterator and yield there's actually a shortcut which is you can just say 00:20:53.280 |
yield from and then give it the iterator and so with that we can get this all down 00:21:04.720 |
to one line of code and it will do exactly the same thing. So that's basically saying 00:21:10.880 |
yield one at a time everything in here that's what yield from does. 00:21:16.640 |
So there's a cool little advanced python thing totally optional but if you're interested I think 00:21:22.000 |
it can be kind of neat. So we've now learned how to create our own implementation of nn.module 00:21:29.120 |
and therefore we are now allowed to use PyTorch's nn.module so that's good news. 00:21:37.840 |
So how would we do using the PyTorch nn.module how would we create the model 00:21:48.400 |
that we started with which is where we had this self.layers because we want to somehow 00:21:56.960 |
register all of these all at once that's not going to happen based on the code we just wrote. 00:22:04.800 |
So to do that let's have a look we can so let's make a list of the layers we want 00:22:15.280 |
and so we'll create again a subclass of nn.module make sure you call the super classes in it first 00:22:23.760 |
and we'll just store the list of layers and then to tell PyTorch about all those layers 00:22:34.000 |
we basically have to loop through them and call add-module and say what the name of the module is 00:22:40.480 |
and what the module is and again you probably should have used forward here in the first place 00:22:49.520 |
and you can see this is now done exactly the same thing okay so if you've used a sequential model 00:22:57.840 |
before you'll see or you can see that we're on the path to creating a sequential model. 00:23:03.040 |
Okay so ganache has asked an interesting question which is what on earth is super 00:23:09.120 |
calling because we actually in fact we don't even need the parentheses here we actually don't have 00:23:15.360 |
a base class that's because if you don't put any parentheses or if you put empty parentheses 00:23:22.320 |
it's actually a shortcut for writing that and so python has stuff in object which does you know 00:23:31.440 |
all the normal objecty things like storing your attributes so that you can get them back later 00:23:39.680 |
Okay so this is a little bit awkward is to have to store the list and then enumerate and call 00:23:50.160 |
add-module so now that we've implemented that from scratch we can use PyTorch's version which is 00:23:55.840 |
they've just got something called module list that just does that for you okay so if you use 00:24:00.640 |
module list and pass it a list of layers it will just go ahead and register them all those modules 00:24:06.320 |
for you so here's something called sequential model so this is just like an n not sequential now 00:24:10.480 |
so if i create it passing in the layers there you go you can see there's my model containing my 00:24:17.360 |
module list with my layers and so i don't know why i never used forward for these things it's silly 00:24:27.600 |
um i guess it doesn't matter terribly in this stage but anyhow okay so 00:24:31.760 |
call fit and there we go okay so um so in forward here i just go through each layer 00:24:45.040 |
and i set the result of that equal to calling that layer on the previous result and then pass 00:24:51.120 |
and return it at the end now there's a little um another way of doing this which i think is 00:24:55.680 |
kind of fun it's not like shorter or anything at this stage i just wanted to show an example of 00:25:01.040 |
something that you see quite a lot in machine learning code which is the use of reduce 00:25:05.280 |
this implementation here is exactly the same as this thing here 00:25:12.160 |
so let me explain how it works what reduce does so reduce is a very common kind of like fundamental 00:25:22.800 |
computer science concept reductions this is something that does a reduction 00:25:26.880 |
and what a reduction is is it something that says 00:25:28.960 |
start with the third parameter some initial value so we're going to start with x the thing we're 00:25:37.600 |
being passed and then loop through a sequence so look through each of our layers and then for each 00:25:44.800 |
layer call some function here is our function and the function is going to get passed first 00:25:54.560 |
time around it'll be past the initial value and the first thing in your list so your first layer 00:26:00.640 |
and x so it's just going to call my the layer function on x the second time around it takes 00:26:08.240 |
the output of that and passes it in as a circuit as the first parameter and passes in the second 00:26:13.680 |
layer so then the second time this goes through it's going to be calling the second layer on the 00:26:18.960 |
result of the first layer and so forth and that's what a reduction is and so when you might see 00:26:25.440 |
reduce you'll certainly see it talked about quite a lot in in papers and books and you might sometimes 00:26:32.000 |
also see it in code it's a very general concept and so here's how you can implement a sequential 00:26:40.560 |
model using reduce so there's no explicit loop there although the loop's still happening internally 00:26:48.320 |
all right so now that we've reimplemented sequential we can just go ahead and use 00:26:53.040 |
pytorch's version so there's nn.sequential we can pass in our layers 00:26:58.160 |
and we can fit not surprisingly we can see the model so yeah looks very similar to the one we 00:27:04.960 |
built ourselves all right so this thing of looping three parameters and updating our parameters 00:27:20.160 |
based on gradients and a learning rate and then zeroing them is very common so common that there 00:27:31.760 |
is something that does that all for us and that's called an optimizer it's the stuff in optim so 00:27:38.320 |
let's create our own optimizer and as you can see it's just going to do the two things we just saw 00:27:43.840 |
it's going to go through each of the parameters and update them using the gradient and the 00:27:50.800 |
learning rate and there's also zero grad which will go through each parameter and set their 00:27:58.720 |
gradients to zero if you use dot data it's like it's just a way of avoiding having to say torch 00:28:05.520 |
dot no grad basically okay so in optimizer we're going to pass it the parameters that we want to 00:28:10.640 |
optimize and we're going to pass it the learning rate and we're just going to store them away 00:28:15.520 |
and since the parameters might be a generator we'll call list to to turn them into a list 00:28:24.400 |
so we are going to create our optimizer pass it in the model dot parameters which have been 00:28:29.840 |
automatically constructed for us by nn dot module and so here's our new loop now we don't have to 00:28:35.120 |
do any of the stuff manually we can just say opt dot step so that's going to call this 00:28:42.400 |
and opt dot zero grad and that's going to call this there it is so we've now built our own sgd 00:28:53.840 |
optimizer from scratch so i think this is really interesting right like these things which seem 00:28:59.760 |
like they must be big and complicated once we have this nice structure in place you know an sgd 00:29:06.320 |
optimizer doesn't take much code at all and so it's all very transparent simple clear if you're 00:29:13.680 |
having trouble using complex library code that you've found elsewhere you know this can be a 00:29:20.720 |
really good approach is to actually just go all the way back remove as you know as many 00:29:25.920 |
of these abstractions as you can and like run everything by hand to see exactly what's going 00:29:32.480 |
on it can be really freeing to see that you can do all this anyways since pytorch has this for us 00:29:40.480 |
in torch dot optium it's got a optium dot sgd and just like our version you pass in the parameters 00:29:47.520 |
and you pass in the learning rate so you really see it is just the same so let's define something 00:29:52.640 |
called get model that's going to return the model the sequential model and the optimizer for it 00:30:02.000 |
so if we go model comma opt equals get model and then we can call the loss function to see where 00:30:08.400 |
it's starting and so then we can write our training loop again go through each epoch 00:30:18.720 |
go through each starting point for our for our batches grab the slice slice into our x and y in 00:30:28.880 |
the training set calculate our predictions calculate our loss do the backward pass do the 00:30:34.480 |
optimizer step do the zero gradient and print out how you're going at the end of each one and there 00:30:40.080 |
we go all right so let's keep making this simpler there's still too much code so one thing we could 00:30:49.440 |
do is we could replace these lines of code with one line of code by using something we'll call 00:30:56.880 |
the data set plus so the data set class is just something that we're going to pass in our 00:31:02.480 |
independent and dependent variable we'll store them away as self dot x and self dot y 00:31:08.960 |
we'll have something so if you if you define done to len then that's the thing that allows the 00:31:16.800 |
len function to work so the length of the data set will just be the length of the independent 00:31:21.040 |
variables and then done to get item is the thing that will be called automatically anytime you use 00:31:27.680 |
square brackets in python so that just is going to call this function passing in 00:31:32.800 |
the indices that you want so when we grab some items from our data set we're going to return a 00:31:39.040 |
tuple of the x values and the y values so then we'll be able to do this so let's create a data 00:31:47.760 |
set using this tiny little three line class it's going to be a data set containing the x and y 00:31:54.640 |
training and they'll create another data set containing the x and y valid and those two 00:31:59.920 |
data sets will call train ds and valid ds so let's check the length of those data sets 00:32:07.040 |
should be the same as the length of the x's and they are and so now we can do exactly what we 00:32:15.200 |
hope we could do we can say xb comma yb equals train ds and pass in some slice 00:32:21.280 |
so that's going to give us back our check the shapes are correct it should be five by 28 by 28 00:32:32.400 |
five by 28 times 28 and the y's should just be five and so here they are the x's and the y's 00:32:41.840 |
so that's nice we've created a data set from scratch and again it's not complicated at all 00:32:48.160 |
and if you look at the actual pytorch source code this is basically all data sets do so let's try 00:32:54.240 |
it we call get model and so now we've replaced our data set line with this one and as per usual 00:33:02.080 |
it still runs and so this is what i do when i'm writing code is i try to like always make sure 00:33:09.920 |
that my starting code works as i refactor and so you can see all the steps and so somebody reading 00:33:15.360 |
my code can then see exactly like why am i building everything i'm building how does it all fit in see 00:33:19.920 |
that it still works and i can also keep it clear in my own head so i think this is a really nice way 00:33:24.800 |
of implementing libraries as well all right so now we're going to replace these two lines of code 00:33:35.440 |
with this one line of code so we're going to create something called a data loader and a data 00:33:40.000 |
loader is something that's just going to do this okay so we need to create an iterator 00:33:45.520 |
so an iterator is a class that has a dunder iter method when you say for n in python behind the 00:33:55.760 |
scenes it's actually calling dunder iter to get a special object which it can then 00:34:04.400 |
loop through using yield so it's basically getting this thing that you can iterate through using 00:34:09.280 |
yield so a data loader is something that's going to have a data set and a batch size because we're 00:34:16.400 |
going to go through the batches and grab one batch at a time so we have to store away the 00:34:23.360 |
data set and the batch size and so when you when we call the for loop it's going to call dunder 00:34:28.240 |
iter we're going to want to do exactly what we saw before go through the range just like we did 00:34:34.000 |
before and then yield that bit of the data set and that's all so that's a data loader 00:34:43.600 |
so we can now create a train data loader and a valid data loader from our train data set and 00:34:48.480 |
valid data set and so now we can if you remember the way you can get one thing out of an iterator 00:34:57.440 |
so you don't need to use a for loop you can just say iter and that will also call dunder iter 00:35:03.120 |
next we'll just grab one value from it so here we will run this and you can see we've now just 00:35:08.640 |
confirmed we've xb is a 50 by 784 and yb there it is and then we can check what it looks like so 00:35:20.240 |
let's grab the first element of our x batch make it 28 by 28 and there it is so now that we've got 00:35:30.720 |
a data loader again we can grab our model and we can simplify our fit function to just go for xb yb 00:35:37.920 |
and train deal so this is getting nice and small don't you think and it still works the same way 00:35:44.480 |
okay so this is really cool and now that it's nice and concise we can start adding features to it 00:35:52.560 |
so one feature i think we should add is that our training set each time we go through it 00:35:59.600 |
it should be in a different order it should be randomized the order so instead of always 00:36:08.800 |
just going through these indexes in order we want some way to say go use random indexes 00:36:15.680 |
so the way we can do that is creates a class called sampler and what sampler is going to do 00:36:22.400 |
i'll show you is if we create a sampler without shuffle without randomizing it 00:36:32.960 |
it's going to simply return all the numbers from zero up to n in order and it'll be an 00:36:40.880 |
iterator see this is done to it but if i do want it shuffled then it will randomly shuffle them 00:36:47.600 |
so here you can see i've created a sampler without shuffle so if i then make an iterator from that 00:36:54.320 |
and print a few things from the iterator you can see it's just printing out the indexes it's going 00:37:00.800 |
to want or i can do exactly the same thing as we learned earlier in the course using i slice 00:37:07.280 |
we can grab the first five so here's the first five things from a sampler when it's not shuffled 00:37:12.000 |
so as you can see these are just indexes so we could add shuffle equals true and now that's 00:37:19.920 |
going to call random dot shuffle which just randomly permits them and now if i do the same thing 00:37:29.600 |
so why is that useful well what we could now do is create something called a batch sampler and 00:37:38.880 |
what the batch sampler is going to do is it's going to basically do this i slice thing for us 00:37:44.320 |
so we're going to say okay pass in a sampler so that's something that generates indices 00:37:48.480 |
and pass in a batch size and remember we've looked at chunking before it's going to chunk 00:37:57.760 |
and so if i now say all right please take our sampler and create batches of four 00:38:07.920 |
as you can see here it's creating batches of four indices at a time so rather than 00:38:18.400 |
just looping through them in order i can now loop through this batch sampler 00:38:25.600 |
so we're going to change our data loader so that now it's going to 00:38:34.560 |
take some batch sampler and it's going to loop through the batch sampler that's going to give 00:38:46.240 |
us indices and then we're going to get that data set item from that batch for everything in that 00:38:53.120 |
batch so that's going to give us a list and then we have to stack all of the x's and all of the 00:39:02.720 |
y's together into tensors so i've created something here called collate function and we're going to 00:39:12.000 |
default that to this little function here which is going to grab our batch pull out the x's and y 00:39:22.640 |
separately and then stack them up into tensors so this is called our collate function okay so if 00:39:31.840 |
we put all that together we can create a training sampler which is a batch sampler over the training 00:39:38.000 |
set with shuffle true a validation sampler will be a batch sampler over the validation set with 00:39:46.240 |
shuffle false and so then we can pass that into this data loader class the training data set 00:39:56.320 |
and the training sampler and the collate function which we don't really need because 00:40:01.840 |
it's we're just using the default one so i guess we can just get rid of that 00:40:04.960 |
and so now there we go we can do exactly the same thing as before x b y b is next iter 00:40:15.440 |
and this time we use the valid data loader check the shapes and this is how 00:40:24.720 |
PyTorch's actual data loaders work this is the this is all the pieces they have they have samplers 00:40:33.520 |
they have batch samplers they have a collation function and they have data loaders 00:40:43.920 |
what i want you to be doing for your homework is experimenting with these carefully to see exactly 00:40:57.280 |
what each thing's taking in okay so PyTorch is asking on the chat what is this collate thing 00:41:03.840 |
doing okay so collate function it defaults to collate what does it do well let's see let's go 00:41:13.840 |
through each of these steps okay so we need so when we've got a batch sampler so let's do 00:41:20.320 |
just the valid sampler okay so the batch sampler here it is so we're going to go through each 00:41:31.120 |
thing in the batch sampler so let's just grab one thing from the batch sampler okay so the 00:41:36.640 |
output of the batch sampler will be next it uh okay so here's what the batch sampler contains 00:41:47.040 |
all right just the first 50 digits not surprisingly because this is our validation sampler 00:41:51.680 |
if we did a training sampler that would be randomized there they are okay so then what 00:41:57.680 |
we then do is we go self.dataset i for i and b so let's copy that copy 00:42:07.200 |
paste and so rather than self.dataset i we'll just say 00:42:18.880 |
oh and it's not i and b it's i and o that's what we called it 00:42:28.560 |
um oh we did it for training sorry training okay so what it's created here is 00:42:43.520 |
a list of tuples of tensors i think let's have a look so let's have a look so we'll call this 00:42:50.240 |
um p whatever so p zero okay is a tuple it's got the x and the y independent independent variable 00:43:09.680 |
so that's not what we want what we want is something that we can loop through we want 00:43:16.000 |
to get batches so what the collation model is going to do sorry not the collation model the 00:43:23.440 |
collate function is going to do is it's going to take all of our x's and all of our y's and 00:43:32.240 |
collect them into two tensors one tensor of x's and one tensor of y's so the way it does that is 00:43:40.640 |
so zip is a very very commonly used python function it's got nothing to do with the compression 00:43:53.120 |
program zip but instead what it does is it effectively allows us to like transpose things 00:43:58.000 |
so that now as you can see we've got all of the second elements or index one elements 00:44:07.200 |
all together and all of the index zero elements together 00:44:10.240 |
and so then we can stack those all up together 00:44:13.600 |
and that gives us our y's for our batch so that's what collate does so the collate function is used 00:44:25.600 |
an awful lot um uh in in pytorch increasingly nowadays where hugging face stuff uses it a lot 00:44:38.320 |
and so we'll be using it a lot as well um and basically it's a thing that allows us to customize 00:44:43.840 |
how the data that we get back from our data set once it's been kind of generating a list of of 00:44:52.000 |
things from the data set how do we put it together into some into a bunch of things 00:44:57.760 |
that our model can take as inputs because that's really what we want here so that's 00:45:02.160 |
what the collation function does oh this is the wrong way around 00:45:14.720 |
like so um this is um something that i do so often that fast core has a quick little shortcut 00:45:22.400 |
for it just called store address store attributes and so if you just put that in your dunder init 00:45:28.480 |
then you just need one line of code and it does exactly the same thing so there's a little 00:45:33.680 |
shortcut as you see and so you'll see that quite a bit all right let's have a um seven minute break 00:45:42.800 |
and uh see you back here very soon and we're going to look at a multi-processing data loader 00:45:49.840 |
and then we all have nearly finished this notebook all right see you soon 00:46:06.880 |
so we've seen how to create a data loader um and uh sampling from it um 00:46:14.480 |
the pytorch data loader works exactly like this but um it uses a lot more code because 00:46:26.480 |
it implements um multi-processing and so multi-processing means that the actual this thing here 00:46:35.440 |
that code can be run uh in multiple processes they can be run in parallel for multiple items 00:46:42.800 |
so this code for example might be opening up a jpeg rotating it flipping it etc right so because 00:46:52.240 |
remember this is just calling the dunder get item uh for a data set so that could be doing a lot of 00:46:57.840 |
work for each item and we're doing it for every item in the batch so we'd love to do those all in 00:47:01.840 |
parallel so i'll show you a very quick and dirty way that basically does the job um so um python 00:47:12.640 |
has a multi-processing library um it doesn't work particularly well with pytorch tenses so pytorch 00:47:20.160 |
has created an exact reimplementation of it so it's identical api wise but it does work well with 00:47:25.920 |
tenses so this is basically we'll just grab the multi-processing so this is not quite cheating 00:47:30.640 |
because multi-processing isn't the standard library and this is api equivalent so i'm going to say 00:47:36.320 |
we're allowed to do that um so as we've discussed you know when we call square brackets on a class 00:47:47.040 |
it's actually identical to calling the dunder get item function on on the object so you can see here 00:47:56.800 |
if we say give me items three six eight and one it's the same as calling dunder get item passing in 00:48:04.240 |
three six eight and one now why does this matter well i'll show you why it matters because we're 00:48:13.360 |
going to be able to use map and i'll explain why we want to use map in a moment map is a really 00:48:17.600 |
important concept you might have heard of map reduce so we've already talked about reductions 00:48:21.520 |
and what those are um maps are kind of the other key piece map is something which takes a sequence 00:48:28.160 |
and calls a function on every element of that sequence so imagine we had a couple of batches 00:48:35.680 |
of indices three and six and eight and one then we're going to call dunder get item 00:48:42.560 |
on each of those batches so that's what map does map calls this function on every element 00:48:50.080 |
of this sequence and so that's going to give us the same stuff but now this same as this but now 00:48:58.880 |
batched into two batches now why do we want to do that because multi-processing has something 00:49:06.960 |
called pool where you can tell it how many workers you want to run how many processes you want to run 00:49:13.280 |
and it then has a map which works just like the python normal python map but it runs this 00:49:20.240 |
function in parallel over the items from this iterator so this is how we can create a 00:49:28.000 |
multi-processing data loader so here we're creating our data loader and again we don't 00:49:36.240 |
actually need to pass in the collect function because we're using the default one so if we 00:49:39.280 |
say nworkers equals two and then create that if we say next see how it's taking a moment 00:49:46.800 |
and it took a moment because it was firing off those two workers in the background so the first 00:49:51.440 |
batch actually comes out more slowly but the reason that we would use a multi-processing data loader 00:49:57.520 |
is if this is doing a lot of work we want it to run in parallel and even though the first 00:50:04.400 |
the first item might come out a bit slower once those processes are fired up it's going to be 00:50:09.600 |
faster to run so this is yeah this is a really simplified multi-processing data loader because 00:50:16.080 |
this needs to be super super efficient PyTorch has lots more code than this to make it much more 00:50:23.440 |
efficient but the idea is this and this is actually a perfectly good way of experimenting 00:50:30.960 |
or building your own data loader to make things work exactly how you want so now that we've 00:50:37.120 |
re-implemented all this from PyTorch let's just grab the PyTorch and as you can see they're exactly 00:50:42.560 |
the same data loader they don't have one thing called sampler that you pass shuffle to they have 00:50:47.840 |
two separate classes called sequential sample and random sampler I don't know why they do it that 00:50:51.680 |
way it's a little bit more work to me but same idea and I got batch sampler and so it's exactly 00:50:57.760 |
the same idea the training sampler is a batch sampler with a random sampler the validation 00:51:04.160 |
sampler is a batch sampler with a sequential sampler passing batch sizes and so we can now 00:51:10.640 |
pass those samplers to the data loader this is now the PyTorch data loader and just like ours it 00:51:17.920 |
also takes a collate function okay and it works cool so that's as you can see it's it's doing 00:51:34.000 |
exactly the same stuff that ours is doing with exactly the same API and it's got some shortcuts 00:51:40.000 |
as I'm sure you've noticed when you've used data loaders so for example calling batch sampler is 00:51:48.400 |
very going to be very very common so you can actually just pass the batch size directly 00:51:53.360 |
to a data loader and it will then auto create the batch samplers for you so you don't have to pass 00:51:58.960 |
in batch sampler at all instead you can just say sampler and it will automatically wrap that in 00:52:05.200 |
a batch sampler for you that does exactly the same thing and in fact because it's so common to create 00:52:11.520 |
a random sampler or a sequential sampler for a data set you don't have to do that manually you 00:52:16.400 |
can just pass in shuffle equals true or shuffle equals false to the data loader and that does 00:52:20.960 |
again exactly the same thing there it is now something that is very interesting is that 00:52:33.200 |
when you think about it the batch sampler and the collation function are things which are taking 00:52:40.320 |
the result of the sampler looping through them and then collating them together but what we could do 00:52:48.080 |
is actually because our data sets know how to grab multiple indices at once 00:52:59.440 |
we can actually just use the batch sampler as a sampler we don't actually have to loop through 00:53:12.080 |
them and collate them because they're basically instantly they come pre-collated so this is a 00:53:20.320 |
trick which actually hugging face stuff can use as well and we'll be seeing it again so this is 00:53:25.040 |
an important thing to understand is how come we can pass a batch sampler to sampler and what's it 00:53:30.320 |
doing and so rather than trying to look through the pytorch code i suggest going back to our 00:53:35.520 |
non-multiprocessing pure python code to see exactly how that would work 00:53:41.360 |
because it's a really nifty trick for things that you can grab multiple things from at once 00:53:50.160 |
and it can save a whole lot of time it can make your code a lot faster 00:53:53.600 |
okay so now that we've got all that nicely implemented we should now add a validation 00:54:00.480 |
set and there's not really too much to talk about here we'll just take our fit function 00:54:05.040 |
and this is exactly the same code that we had before and then we're just going to add something 00:54:13.200 |
which goes through the validation set and gets the predictions and sums up the losses and accuracies 00:54:22.480 |
and from time to time prints out the loss and accuracy 00:54:31.760 |
and so get-dls we will implement by using the pytorch data loader now and so now our whole 00:54:41.440 |
process will be get-dls passing in the training and validation data set notice that for our 00:54:48.320 |
validation data loader i'm doubling the batch size because it doesn't have to do back propagation so 00:54:54.160 |
it should use about half as much memory so i can use a bigger batch size get our model and then 00:55:00.400 |
call this fit and now it's printing out the loss and accuracy on the validation set so finally we 00:55:10.800 |
actually know how we're doing which is that we're getting 97 accuracy on the validation set 00:55:18.000 |
and that's on the whole thing not just on the last batch so that's cool we've now implemented 00:55:23.440 |
a proper working sensible training loop it's still you know a bit more code than i would 00:55:33.120 |
like but it's not bad and every line of code in there and every line of code it's calling 00:55:38.240 |
is all stuff that we have built ourselves reimplemented ourselves so we know exactly 00:55:45.680 |
what's going on and that means it's going to be much easier for us to create anything we 00:55:50.480 |
can think of we don't have to rely on other people's code 00:55:53.520 |
so hopefully you're as excited about that as i am because it really opens up a whole world for us 00:56:06.480 |
so one thing that we're going to want to be able to do now that we've got a training loop 00:56:15.280 |
is to grab data and there's a really fantastic library of data sets available on hugging face 00:56:26.560 |
nowadays and so let's look at how we use those data sets now that we know how to bring things 00:56:34.080 |
into data loaders and stuff so that now we can use the entire world of hugging face data sets 00:56:41.680 |
with our code so we're going to so you need to pip install data sets 00:56:50.080 |
and once you've piped and stored data sets you'll be able to say from data sets import 00:56:56.560 |
and you can import a few things i just these two things now load data set load data set builder 00:57:01.600 |
and we're going to look at a a data set called fashion mnist and so the way things tend to work 00:57:10.160 |
with hugging face is there's something called the hugging face hub which has models and it has data 00:57:14.720 |
sets amongst other things and generally you'll give them a name and you can then say in this case 00:57:23.200 |
load a data set builder for fashion mnist now a data set builder is just basically something 00:57:30.720 |
which has some metadata about about this data set so the data set builder has a dot info 00:57:39.920 |
and the dot info has a dot description and here's a description of this and as you can see again 00:57:45.920 |
we've got 28 by 28 grayscale so it's going to be very familiar to us because it's just like mnist 00:57:51.600 |
and again we've got 10 categories and again we've got 60 000 training examples and again we've got 00:57:57.040 |
10 000 test examples so this is this is cool so as it says it's direct drop and replacement for mnist 00:58:03.040 |
and so the data set builder also will tell us what are what's in this data set and so 00:58:16.240 |
hugging face stuff generally uses dictionaries rather than tuples so there's going to be an image 00:58:21.920 |
of type image there's going to be a label of type class label there's 10 classes and these are the 00:58:28.400 |
names of the classes so it's quite nice that in hugging face data sets you know we can kind of 00:58:33.840 |
get this information directly it also tells us if there are some recommended training test splits 00:58:40.880 |
we can find out those as well so this is the size of the training split and the number of examples 00:58:50.880 |
so now that we're ready to start playing it with it we can load the data set okay so this is a 00:58:56.560 |
different string load data set builder versus load data set so this will actually download it 00:59:00.800 |
cache it and here it is and it creates a data set dictionary so a data set dictionary if you've 00:59:10.080 |
used fastai is basically just like what we call the data sets class they call the data set dict 00:59:15.280 |
class so it's a dictionary that contains in this case a train and a test item and those are data 00:59:22.560 |
sets these data sets are very much like the data sets that we created in the previous notebook 00:59:28.080 |
so we can now grab the training and test items from that dictionary and just pop them into 00:59:38.320 |
variables and so we can now have a look at the zero index thing in training and just like we 00:59:45.280 |
were promised it contains an image and a label so as you can see we're not getting tuples anymore 00:59:51.680 |
we're getting dictionaries containing the x and the y in this case image and label so i'm going 00:59:57.440 |
to get i'm pretty bored writing image and label and strings all the time so i'm just going to 01:00:02.320 |
store them as x and y so x is going to be the string image and y will be the string label 01:00:07.200 |
um i guess the other way i could have done that would have been to say x comma y equals that 01:00:22.400 |
that would probably be a bit neater um because it's coming straight from the features and if you 01:00:29.840 |
if you iterate into a dictionary you get back it's it's keys that's why that works so anyway i've 01:00:36.320 |
done it manually here which is a bit sad but there you go okay so we can now grab the from train zero 01:00:44.320 |
which we've already seen we can grab the x i.e the image and there it is there's the image 01:00:59.520 |
we could grab the first five images and the first five labels for example and there they are now 01:01:07.920 |
we already know what the names of the classes are so we could now see what these map to 01:01:16.000 |
by grabbing those features so there they are so um this is a special hugging face class 01:01:27.440 |
which most libraries have something including fastai that works like this there's something 01:01:32.480 |
called int to string which is going to take these and convert them to these so if i call it 01:01:39.600 |
on our y batch you'll see we've got first is ankle boot and there that is indeed an ankle boot 01:01:46.400 |
then we'll have a couple of t-shirts and address 01:01:54.160 |
okay so um how do we use this to train a model well we're going to need a data loader 01:02:01.760 |
and we want a data loader that for now we're going to do just like we've done it before it's going to 01:02:07.440 |
return um uh well actually we're going to do something a bit different we're going to have 01:02:15.200 |
our collate function is actually going to return a dictionary actually this is pretty common for 01:02:21.040 |
um hugging face stuff um and pytorch doesn't mind if you it's happy for you to return a dictionary 01:02:28.160 |
from a collation function so rather than returning a tuple of the stacked up hopefully this looks very 01:02:34.960 |
familiar this looks a lot like the thing that goes through the data set for each one and stacks them 01:02:40.720 |
up just like we did um in the previous notebook so that's what we're doing we're doing all in one 01:02:45.680 |
step here in our collate function and then again exactly the same thing go through our batch grab 01:02:52.480 |
the y and this is just stacking them up with the integers so we don't have to call stack 01:02:57.440 |
and so we're now going to have the image and label bits in our dictionary so if we 01:03:05.280 |
create our data loader using that collation function grab one batch so we can go batch x 01:03:14.000 |
dot shape is a 16 by 1 by 28 by 28 and our y if the batch here here it is so the thing to notice 01:03:22.960 |
here is that we haven't done any transforms or anything or written our own data set class or 01:03:32.880 |
anything we're actually putting all the work directly in the collation function so this is 01:03:37.360 |
like a really nice way to skip all of the kind of abstractions of your framework if you want to 01:03:45.680 |
is you can just do all of your work and collate functions so it's going to pass you 01:03:49.760 |
each item so it's going to you're going to get the batch directly you just go through each 01:03:56.640 |
item and so here we're saying okay grab the x key from that dictionary convert it to a tensor 01:04:08.400 |
and then do that for everything in the batch and then stack them all together so this is yeah this 01:04:14.160 |
is like can be quite a nice way to do things if you want to do things just very manually without 01:04:22.080 |
having to think too much about you know a framework particularly if you're doing really custom stuff 01:04:26.000 |
this can be quite helpful having said that um hugging face data sets absolutely lets you 01:04:32.080 |
avoid doing everything in collate function which if we want to create really simple applications 01:04:37.760 |
that's where we're going to eventually want to head so we can um do this using a transform 01:04:47.280 |
instead and so the way we do that is we create a function you're going to take our batch 01:04:55.040 |
it's going to replace the x in our batch with the tensor version of each of those pao images 01:05:01.760 |
and i'm not even stacking them or anything and then we're going to return that batch 01:05:06.400 |
and so uh hugging face data sets has something called with transform and that's going to take 01:05:12.240 |
your data set your hugging face data set and it's going to apply this function to every element 01:05:20.080 |
and it doesn't run at all now it's going to basically when when it behind the scenes when 01:05:26.000 |
it calls done to get item it will call this function on the fly so in other words this could 01:05:32.320 |
have data augmentation which can be random or whatever because it's going to be rerun every 01:05:37.040 |
time you grab an item it's not cached or anything like that so other than that this data set has 01:05:43.600 |
exactly the api same api as any other data set it has a length it has a done to get item so you can 01:05:48.800 |
pass it to a data loader and so um um pytorch already knows how to collate dictionaries of 01:06:00.960 |
tenses so we've got a dictionary of tenses now so that means we don't need a collate function anymore 01:06:06.800 |
i can create a data loader from this without a collate function as you can see and so this is 01:06:14.160 |
giving exactly the same thing as before but without having to create a custom collate function 01:06:19.120 |
now even this is a bit more code than i want having to return this seems a bit silly 01:06:24.240 |
but the reason i had to do this is because hugging face data sets expects the with transform 01:06:31.120 |
function to return the the new version of the of the data so i wanted to be able to write it 01:06:40.080 |
like this transform in place and just say the change i want to make and have it automatically 01:06:45.920 |
return that so if i call if i create this function that's exactly the same as the previous one 01:06:52.240 |
that doesn't have return how would i turn this into something which does return the result 01:06:58.720 |
so here's an interesting trick we could take that function 01:07:05.840 |
pass it to another function to create a new function which is the version of this in place 01:07:12.960 |
function that returns the result and the way i do that is by creating a function called inplace 01:07:17.600 |
it takes a function it returns a function the function it returns is one that calls 01:07:26.320 |
my original function and then returns the result so this is the function this is a function 01:07:33.760 |
generating function and it's modifying an in place function to become a function that returns 01:07:43.840 |
the new version of that data and so this is a function this function is passed to this 01:07:52.000 |
function which returns a function and here it is so here's the version that hugging face will be 01:07:56.880 |
able to use so i can now pass that to with transform and it does exactly the same thing 01:08:03.200 |
so this is very very common in python it's so common that this line of code can be entirely 01:08:12.320 |
removed and replaced with this little token if you have a function and put at at the start 01:08:22.960 |
you can then put that before a function and what it says is take this whole function 01:08:26.880 |
pass it to this function and replace it with the result so this is exactly the same as the 01:08:36.960 |
combination of this and this and when we do it this way this kind of little syntax sugar 01:08:43.680 |
is called a decorator okay so there's nothing nothing magic about decorators it's literally 01:08:50.080 |
identical to this oh i guess the only difference is we don't end up with this unnecessary intermediate 01:08:56.880 |
underscore version but the result is exactly the same and therefore i can create a transformed 01:09:04.240 |
data set by using this and there we go it's all working fine 01:09:15.760 |
um yeah so i mean none of this is particularly um necessary but what we're doing is we're just 01:09:26.240 |
kind of like seeing you know the pieces that we can we can put in place um to make this stuff as 01:09:35.840 |
easy as possible and we don't have to think about things too much um all right now with all this 01:09:45.200 |
we can basically make things pretty automatic um and the way we can make things pretty automatic 01:09:51.600 |
is we're going to use a cool thing in python called item getter and item getter is a function 01:09:57.040 |
that returns a function so hopefully you're getting used to this idea now 01:10:01.200 |
this creates a function that gets the a and c 01:10:08.960 |
items from a dictionary or something that looks like a dictionary so here's a dictionary it 01:10:17.040 |
contains keys a b and c so this function will take a dictionary and return the a and c values 01:10:27.520 |
and as you can see it has done exactly that um explain why this is useful in a moment um i just 01:10:36.080 |
wanted to briefly mention what did i mean when i said something that looks like a dictionary i 01:10:40.720 |
mean this is a dictionary okay that looks like a dictionary but python doesn't care about what type 01:10:47.760 |
things actually are it only cares about what they look like and remember that when we call 01:10:54.160 |
something with square brackets when we index into something behind the scenes it's just called 01:10:58.240 |
calling dunder get item so we could create our own class and it's dunder get item gets the key 01:11:05.920 |
and it's just going to manually return one if k equals a or two if k equals b or three otherwise 01:11:11.200 |
and look that class also works just fine with an item getter um the reason this is interesting 01:11:19.520 |
is because like a lot of people write python as if it's like c plus plus or java or something they 01:11:27.200 |
write as if as if it's this kind of statically type thing um but i really wanted to point out 01:11:32.320 |
that it's an extremely dynamic language and there's a lot more flexibility than you might 01:11:37.120 |
have realized anyway that's a littler side um so what we can do is um 01:11:46.720 |
think about a batch for example where we've got these two dictionaries 01:11:53.200 |
okay so um pytorch comes with a default collation function called not surprisingly 01:12:01.760 |
default collate so that's part of um pytorch and what default collate does with dictionaries 01:12:08.560 |
is it simply takes the matching keys and then grabs their values and stacks them together 01:12:14.960 |
and so that's why if i call default collate a is now one three b is now two four that's actually 01:12:22.000 |
what happened before when we created this data loader is it used the default collation function 01:12:29.200 |
which does that it also works on things that are tuples not dictionaries which is what most of you 01:12:34.560 |
would have used before and what we can do therefore is we could create something called collate dict 01:12:40.240 |
which is something which is going to take a um data set um and it's going to create a item getter 01:12:52.880 |
function for the features in that data set which in this case is image and label so this is a 01:12:58.720 |
function which will get the image and label items and so we're now going to return a function and 01:13:06.000 |
that function is simply going to call our item getter on default collate and what this is going 01:13:12.400 |
to do is it's going to take a dictionary and collate it into a tuple um just like we did up here 01:13:20.560 |
so if we run that so we're now going to call data loader on our transform data set passing 01:13:28.400 |
in and remember this is a function that returns a function so it's a collation function for this 01:13:34.800 |
data set and there it is so now this looks a lot like what we had in our previous notebook this 01:13:41.040 |
is not returning a dictionary but it's returning a tuple so this is um a really important idea 01:13:50.160 |
for particularly for working with hugging face data sets is that they tend to do things 01:13:55.040 |
with dictionaries and most other things in the pytorch world tend to work with tuples 01:14:01.600 |
so you can just use this now to convert anything that takes that returns dictionaries into something 01:14:09.440 |
that provides tuples by passing it as a collation function to your data loader so remember you know 01:14:16.160 |
the thing you want to be doing this this week is is doing things like import pdb pdb dot set trace 01:14:23.200 |
right put break points step through see exactly what's happening you know um not just here but 01:14:30.720 |
also even more importantly doing it inside inside the inner most inner function um so then you can 01:14:41.440 |
see um what did i do wrong there oh today let's set underscore trace 01:14:49.760 |
um so then we can see exactly what's going on print out b 01:14:59.040 |
list the code and i could step into it and look i'm now inside the default collate function 01:15:11.520 |
which is inside pytorch and so i can now see exactly how that works 01:15:16.000 |
there it all is so it's going to go through and this code is going to look very familiar because 01:15:23.440 |
we've implemented all this ourselves except it's being careful that like it works for lots of 01:15:28.400 |
different types of things dictionaries numpy arrays so on and so forth um 01:15:38.160 |
so the first thing i wanted to do oh actually something i do want to mention here 01:15:42.080 |
this is so useful we want to be able to use it in all of our notebooks 01:15:46.000 |
so rather than copying and pasting this every time it would be really nice to create a python module 01:15:53.760 |
that contains this definition so we've created um a library called nbdev um it's really a whole 01:16:02.400 |
system called nbdev which does exactly that it creates um modules you can use from your notebooks 01:16:08.480 |
and the way you do it is you use this special uh thing we call comment directives which is 01:16:15.600 |
hash pipe and then hash pipe export so you put this at the top of a cell and it says do something 01:16:22.400 |
special for this cell what this does is it says put this into a python module for me please 01:16:26.480 |
export it to a python module what python module is it going to put it in well if you go all the 01:16:33.120 |
way to the top you tell it what default export module to create so it's going to create a module 01:16:40.000 |
called datasets so what i do at the very end of this module is i've got this line that says 01:16:49.440 |
import nbdev nbdev.nbdev export and what that's going to do for me is create 01:17:02.960 |
a library a python library it's going to have a datasets.py in it and we'll see everything that 01:17:11.680 |
we exported here it is collect dict will appear in this for me and so what that means is now in the 01:17:18.720 |
future in my notebooks i will be able to import collect dict from the from my datasets now you 01:17:26.400 |
might wonder well how does it know to call it mini AI what's mini AI well in nbdev you create a 01:17:32.880 |
settings.ini file where you say what the name of your library is so we're going to be using this 01:17:39.840 |
quite a lot now because we're getting to the point where we're starting to implement stuff 01:17:46.880 |
that didn't exist before so previously most of the stuff or pretty much all the stuff we've created 01:17:52.880 |
i've said like oh that already exists in pytorch so we don't need it we just use pytorches 01:17:59.040 |
but we're now getting to a point where we're starting to create stuff that doesn't exist 01:18:03.760 |
anywhere we've created it ourselves and so therefore we want to be able to use it again 01:18:10.320 |
so during the rest of this course we're going to be building together a library called mini AI 01:18:17.600 |
that's going to be our framework our version of something like fastai maybe it's something like 01:18:23.840 |
what fastai3 will end up being we'll see so that's what's going on here to so we're going to be using 01:18:35.200 |
once i start using mini AI i'll show you exactly how to install this but that's what this export 01:18:39.840 |
is and so you might have noticed i also had an export on this in place thing and i also had it 01:18:51.120 |
okay um we want to be able to see what this data set looks like so i thought it now's a good time 01:19:00.400 |
to talk a bit about plotting because knowing how to visualize things well is really important 01:19:06.240 |
and again the idea is we we're not allowed to use fastai's plotting library so we've got to learn 01:19:12.960 |
how to do everything ourselves so here's the basic way to plot some an image using matplotlib 01:19:20.960 |
so we can create a batch grab the x part of it um grab the very first thing in that 01:19:30.320 |
and i am show means show an image and here it is there is our anchor boot um 01:19:38.560 |
so let's start to think about what stuff we might create which we can export to make this a bit 01:19:45.360 |
easier so let's create something called show image which basically does 01:19:54.880 |
iamshow but we're going to do a few extra things we will make sure that it's in the correct 01:20:04.800 |
access order we will make sure it's not on cuda that's on the cpu 01:20:10.400 |
if it's not a numpy array we'll convert it to a numpy array 01:20:15.120 |
we'll be able to pass in an existing access which we'll talk about soon if we want to 01:20:24.240 |
we'll be able to set a title if we want to and also this thing here removes all this ugly 0 5 01:20:31.200 |
blah blah blah axis because we're showing an image we don't want any of that 01:20:34.320 |
so if we try that you can see there we go we've also been able to say what size we want the image 01:20:43.200 |
there it all is now here's something interesting when i say help 01:20:54.720 |
but it also shows a whole lot more things how did that magic thing happen and you can see they work 01:21:02.960 |
because here's fixed size which i didn't add oh sorry i did add well okay that's a bad example 01:21:09.200 |
anyway these other ones all work as well um so how did that happen well the trick is 01:21:16.080 |
that i added star star quags here and star star quags says grab um you can you know pass 01:21:23.760 |
as many or any other arguments as you like that aren't listed and they'll all be put into a 01:21:29.200 |
dictionary with this name and then when i call i am show i pass that entire dictionary star star 01:21:38.640 |
here means as separate arguments and that's how come it works and then how come does it know how 01:21:44.800 |
come it knows what help to provide the reason why is that fast core has a special thing called 01:21:51.200 |
delegates which is a decorator so now you know what a decorator is and you tell it what is it 01:21:59.120 |
that you're going to be passing quags to i'm going to be passing it to i am show and then it 01:22:04.960 |
automatically creates the documentation correctly to show you what quags can do so this is a really 01:22:14.160 |
helpful way of being able to kind of extend existing functions like i am show and still 01:22:21.600 |
get all of their functionality and all of their documentation and add your own so delegates is 01:22:25.920 |
one of the most useful things we have in fast core in my opinion so we're going to export that 01:22:31.600 |
so now we can use show image anytime you want which is nice um something that's really helpful 01:22:38.400 |
to know about matplotlib is how to create subplots so for example what happens if you want to plot 01:22:46.240 |
two images next to each other so in matplotlib subplots creates multiple plots and you pass it 01:22:54.720 |
number of rows and the number of columns so this here has as you see one row and two columns 01:23:03.920 |
and it returns axes now what it calls axes is what it refers to as the individual plots 01:23:11.120 |
so if we now call show image on the first image passing in axes zero it's going to get that here 01:23:20.640 |
right then we call ax.am show that means put the image on this subplot they don't call it 01:23:29.040 |
a subplot unfortunately they call it an axis put it on this axis so that's how come we're able to 01:23:34.320 |
show an image one image on the first axis and then show a second image on the second axis by 01:23:40.800 |
which we mean subplot and there's our two images so that's pretty handy um so i've decided to add 01:23:50.320 |
some additional functionality to subplots so therefore i use delegates on subplots because 01:23:55.040 |
i'm adding functionality to it and i'm going to be taking quags and passing it through to subplots 01:24:02.320 |
and the main thing i wanted to do is to automatically create an appropriate figure size 01:24:07.360 |
by just finding out you tell us what image size you want and i also want to be able to add a 01:24:14.320 |
title for the whole set of subplots and so there it is and then i also want to show you that in 01:24:25.280 |
it'll automatically if we want to create documentation for us as well for our library 01:24:29.680 |
and here is the documentation so as you can see here for the stuff i've added it's telling me 01:24:38.160 |
exactly what each of these parameters are their type their defaults and information about each one 01:24:45.600 |
and that information is automatically coming from these little comments is we call these documents 01:24:52.240 |
this is all automatic stuff done by fast core and nbdev and so you might have noticed when you look 01:24:59.600 |
at fastai library documentation it always has all this info that's that's that's why you don't 01:25:05.520 |
actually have to call showdoc it's automatically added to your documentation for you i'm just 01:25:10.400 |
showing you here what it's going to end up looking like and you can see that it's worked 01:25:14.480 |
with delegates it's put all the extra stuff from delegates in here as well 01:25:18.320 |
and here they all listed out here as well so anyway subplots so let's create a three by three 01:25:27.200 |
set of plots and we'll grab the first two images and so now we can go through each of the subplots 01:25:35.520 |
now it returns it as a three by three basically a list of three lists of three items so i flatten 01:25:42.640 |
them all out into a single list so we'll go through each of those subplots and go through each image 01:25:49.520 |
and show each image on each axis and so here's a quick way to quickly show them all as you can see 01:25:57.440 |
it's a little bit ugly here so we'll keep on adding more useful plotting functionality 01:26:04.480 |
so here's something that again it calls our subplots delegates to it 01:26:08.960 |
but we're going to be able to say for example how many subplots do we want 01:26:15.440 |
and it'll automatically calculate the rows and the columns 01:26:18.080 |
and it's going to remove the axes for any ones that we're not actually using 01:26:25.360 |
and so here we got that so that's what get grid's going to let us do so we're getting quite close 01:26:33.360 |
and so finally why don't we just create a single thing called show images that's going to get our 01:26:40.800 |
grid and it's going to go through our images optionally with a list of titles and show each one 01:26:54.720 |
we have successfully got all of our labeled images 01:27:01.520 |
and so we yeah I think all this stuff for the plotting is pretty useful so as you might have 01:27:14.560 |
noticed they were all exported so in our datasets.py we've got our get grid we've 01:27:19.840 |
got our subplots we've got our show images so that's going to make life easier for us now 01:27:24.800 |
since we have to create everything from scratch we have created all of those things 01:27:29.760 |
so as I mentioned at the very end we have this one line of code to run 01:27:43.840 |
miniai.datasets.minibidai/datasets.py so it's all empty 01:27:53.120 |
and then I run this line of code and now it's back as you can see and it tells you it's auto generated 01:28:06.160 |
we are nearly at the point where we can build our learner and once we've built our learner 01:28:14.880 |
we're going to be able to really dive deep into training and studying models so we've kind of got 01:28:21.120 |
nearly got all of our infrastructure in place before we do there's some pieces of 01:28:29.520 |
python which not everybody knows and I want to kind of talk about and kind of computer 01:28:36.640 |
science concepts I want to talk about so that's what o6 foundations is about 01:28:40.560 |
so this whole section is just going to tell it just going to talk about some stuff in python 01:28:50.880 |
or you know maybe it's a review for some of you as well and it's all stuff we're going to be using 01:28:58.640 |
basically in the next notebook so that's why I wanted to to cover it so we're going to be 01:29:04.560 |
creating a learner class so a learner class is going to be a very general purpose training loop 01:29:11.920 |
which we can get to to do anything that we wanted to do and we're going to be creating things called 01:29:18.000 |
callbacks to make that happen and so therefore we're going to just spend a few moments talking 01:29:23.120 |
about what are callbacks how are they used in in computer science how are they implemented 01:29:29.360 |
look at some examples they come up a lot perhaps the most common place that you see callbacks in 01:29:36.720 |
software is for GUI events so for events from some graphical user interface so the main graphical 01:29:44.800 |
user interface library in jupyter notebooks is called ipy widgets and we can create a widget 01:29:53.520 |
like a button like so and when we display it it shows me a button and at the moment it doesn't 01:30:00.640 |
do anything if I click on it what we can do though is we can add an onclick callback to it 01:30:14.560 |
which is something which is a fun we're going to pass it a function 01:30:17.840 |
which is called when you click it so let's define that function so I'm going to say 01:30:25.440 |
w.onclickf is going to assign the f function to the onclick callback now if I click this 01:30:33.680 |
there you go it's doing it now what does that mean well a callback is simply a call of all 01:30:43.920 |
that you've provided so remember a callable is a more general version of a function so in this 01:30:48.400 |
place it is a function that you've provided that will be called back to when something happens 01:30:54.560 |
so in this case there's something that's happening is that they're clicking a button 01:30:58.080 |
so this is how we are defining and using a callback as a GUI event so basically everything 01:31:07.200 |
in ipy widgets if you want to create your own graphical user interfaces for jupyter 01:31:12.880 |
you can do it with ipy widgets and by using these callbacks so these particular kinds of callbacks 01:31:21.280 |
are called events but it's just a callback all right so that's somebody else's callback 01:31:28.800 |
let's create our own callback so let's say we've got some very slow calculation 01:31:38.080 |
and so it takes a very long time to add up the numbers 0 to 5 squared because we sleep for a 01:31:44.720 |
second after each one so let's run our slow calculation still running oh how's it going 01:31:50.960 |
come on finish our calculation there we go the answer is 30 now for a slow calculation like that 01:31:56.400 |
such as training a model it's a slow calculation it would be nice to do things like i don't know 01:32:02.640 |
print you know print out the loss from time to time or show a progress bar or whatever so 01:32:08.480 |
generally for those kinds of things we would like to define a callback that is called at the end of 01:32:15.920 |
each epoch or batch or every few seconds or something like that so here's how we can modify 01:32:23.120 |
our slow calculation routine such that you can optionally pass at a callback and so all of this 01:32:29.360 |
code's the same except we've added this one line of code that says if there's a callback then 01:32:35.680 |
call it and pass in what what we're where we're up to so then we could create our callback function 01:32:44.400 |
so this is just like we created a full callback function f let's create a show progress callback 01:32:48.960 |
function that's going to tell us how far we've got so now if we call show slow calculation passing 01:32:56.880 |
in our callback you can see it's going to call this function at the end of each step so here we've 01:33:06.960 |
created our own callback so there's nothing special about a callback like it doesn't require its own 01:33:14.160 |
like syntax it's not a new concept it's just an idea really which is the idea of passing in a 01:33:22.160 |
function which some other function will call at particular times such as at the end of a step 01:33:29.040 |
or such as when you click a button so that's what we mean by callbacks 01:33:33.840 |
we don't have to define the function ahead of time we could define the function 01:33:42.160 |
at the same time that we call the slow calculation by using lambda so as we've discussed before 01:33:50.720 |
lambda just defines a function but it doesn't give it a name so here's a function it takes 01:33:55.520 |
one parameter and prints out exactly the same thing as before so here's the same way as doing it 01:34:01.360 |
um we could make it more sophisticated now and rather than always saying also we finished epoch 01:34:11.360 |
whatever we could have let you pass in an exclamation and we print that out and so in 01:34:18.640 |
this case we could now have our lambda call that function 01:34:23.200 |
and so one of the things that we can do now is to again we can create a function that returns a 01:34:33.040 |
function and so we could create a make show progress function where you pass in the exclamation 01:34:39.920 |
we could then create and there's no need to give it a name actually it's just return it directly 01:34:47.680 |
we can return a function that calls that exclamation so here we are passing in nice 01:35:00.080 |
and that's exactly the same as doing something like what we've done before 01:35:07.520 |
we could say instead of using a lambda we can create an inner function like this 01:35:17.280 |
so here is now a function that returns a function this does exactly the same thing 01:35:20.960 |
okay so one way with a lambda one way out of lambda 01:35:26.800 |
and one of the reasons I wanted to show you that is so I can 01:35:30.800 |
I've got so many here's is that we can do exactly the same thing using partial so with partial 01:35:45.680 |
it's going to do exactly the same thing as this kind of make show progress it's going to call 01:35:51.600 |
show progress and pass okay I guess so this is an again an example of a function returning a 01:35:57.360 |
function and so this is a function that calls show progress passing in this as the first parameter 01:36:11.760 |
okay so we go we tend to use partial a lot so that's certainly something worth spending time 01:36:19.200 |
practicing now as we've discussed python doesn't care about types in particular 01:36:29.440 |
and there's nothing about any of this that requires cb to be a function 01:36:36.480 |
it just has to be it just has to be a callable a callable is something that that you can that you 01:36:42.720 |
can call and so as we've discussed another way of creating a callable is defining dunder call 01:36:48.320 |
so here's a class and this is going to work exactly the same as our make show progress 01:36:54.960 |
thing but now as a class so there's a dunder in it which draws the exclamation and a dunder call 01:37:01.520 |
that prints and so now we're creating a object which is callable and does exactly the same thing 01:37:10.560 |
okay so these are all like um fundamental ideas that I want you to get really comfortable with 01:37:21.360 |
the idea of dunder call dunder things in general partials classes because they come up all the time 01:37:31.840 |
um in pytorch code and um and in the code we'll be writing and in fact pretty much all frameworks 01:37:39.760 |
so it's really important to feel comfortable with them and remember you don't have to rely on 01:37:45.520 |
the resources we're providing you know if there's certain things here that are very new to you 01:37:51.520 |
you know google around for some tutorials or ask for help on the forums finding things and so forth 01:37:58.240 |
and then I'm just going to briefly recover something I've mentioned before which is star 01:38:03.040 |
args and star star quags because again they come up a lot um I just wanted to show you how they 01:38:08.400 |
work so if we create a function that has star args and star star quags nothing else and I'm just 01:38:16.160 |
going to have this function just print them now I'm going to call the function I'm going to pass 01:38:22.160 |
three I'm going to pass a and I'm going to pass thing one equals hello now these are passed what 01:38:29.440 |
we would say by position we haven't got a blah equals they're just stuck there things that are 01:38:35.440 |
passed by position are placed in star args if you have one it doesn't have to be called args you can 01:38:41.360 |
call this anything you like but in the star bit and so you can see here that args is a tuple 01:38:49.200 |
containing the positionally passed arguments and then quags is a dictionary containing 01:38:57.120 |
the named arguments so that is all that star args and star star quags too and as I say there's 01:39:04.160 |
nothing special about these names I'll call this a I'll call this b 01:39:11.760 |
okay and it'll do exactly the same thing okay so um this comes up a lot um and so it's it's 01:39:23.120 |
important to remember this is literally all that they're doing and then um on the other hand 01:39:33.200 |
let's say we had a function which takes a couple of okay let's try that print 01:39:44.160 |
a actually just print them directly a b c okay we can also rather than just using them as parameters 01:39:58.400 |
we can also use them when calling something so let's say I create something called args again 01:40:03.920 |
doesn't have to be called args called which contains one comma two and I create something 01:40:10.560 |
called quags that contains a dictionary containing c colon three I can then call g and I can pass in 01:40:23.120 |
star args comma star star quags and that's going to take this one two and pass them as individual 01:40:32.320 |
arguments for positionally and it's going to take the c three and pass that as a named argument c 01:40:38.800 |
equals three and there it is okay so they're kind of two linked but different ways that use star 01:40:48.640 |
and star star um okay now here's a slightly different way of doing callbacks which I really 01:40:59.280 |
like in this case I've now passing in a callback that's not callable but instead it's going to 01:41:07.360 |
have a method called before calc and another method called after calc and I'm so now my callback 01:41:17.760 |
is going to be a class containing a before calc and an after calc method and so if I run that 01:41:28.480 |
you can see it's that there it goes okay and so this is printing before and after every step 01:41:39.200 |
by call calling before calc and after calc so callback actually doesn't have to be a callable 01:41:44.160 |
doesn't have to be a function a callback could be something that contains methods 01:41:48.240 |
so we could have a version of this which actually as you can see here it's going to pass in to after 01:42:00.560 |
calc both the epoch number and the value it's up to but by using star args and star star quags I 01:42:07.520 |
can just safely ignore them if I don't want them right so it's just going to chew them up and not 01:42:13.120 |
complain if I didn't have those here it won't work see because it got passed in vowel equals 01:42:24.640 |
and there's nothing here looking for vowel equals it doesn't like that so this is one good use of 01:42:31.760 |
star args and star star quags is to etap arguments you don't want um or we could use the argument so 01:42:39.920 |
let's actually use epoch and val and print them out and there it is 01:42:49.280 |
so this is a more sophisticated callback that's giving us status as we go 01:43:06.800 |
I'm going to skip this bit because we don't really care about that 01:43:12.960 |
okay so finally let's just review this idea of dunder which we've mentioned before 01:43:20.640 |
but just to to really nail this home anything that looks like this underscore underscore 01:43:26.560 |
something underscore underscore something is special and basically it could be that python 01:43:32.240 |
has to find that special thing or pytorch has to find that special thing or numpy has to find 01:43:36.960 |
that special thing but they're special these are called dunder methods um and some of them 01:43:45.520 |
are defined as part of the python data model and so if you go to the python documentation 01:43:52.640 |
it'll tell you about these various different here's repra which we used earlier 01:43:59.040 |
here's init that we used earlier so they're all here pytorch has some of its own numpy has some 01:44:05.600 |
of its own so for example if python sees plus what it actually does is it calls dunder add 01:44:13.760 |
so if we want to create something that's not very good at adding things 01:44:19.040 |
it actually already also always adds 0.01 to it then i can say sloppy adder one plus sloppy adder 01:44:29.280 |
two equals 3.01 so plus here is actually calling dunder add so if you're not familiar with these 01:44:40.320 |
click on this data model link and read about these specific one two three four five six seven eight 01:44:45.760 |
nine ten eleven methods because we'll be using all of these in the course so 01:44:51.840 |
i'll try to revise them when we can but i'm generally going to assume that you know these 01:44:56.960 |
a particularly interesting one is getatra we've seen setatra already getatra is just the opposite 01:45:08.880 |
take a look at this here's a class it just contains two attributes a and b that are set to 01:45:15.040 |
one and two so i'll create that an object of that class a dot b equals two because i set 01:45:20.240 |
b to two okay now when you say a dot b that's just syntax sugar basically in python what it's 01:45:28.720 |
actually calling behind the scenes is getatra it calls getatra on the object and so this one here 01:45:37.840 |
is the same as getatra a comma b which hopefully oh actually that that'll be um yeah so it calls 01:45:48.640 |
getatra a comma b and this can kind of be fun because you could call getatra a comma and then 01:45:54.160 |
either b or a randomly how's that for crazy so if i run this two one one one two as you can see it's 01:46:04.080 |
random um so yeah python's such a dynamic language you can even set it up so you literally don't know 01:46:12.080 |
what attributes going to be called now getatra behind the scenes is actually calling something 01:46:21.280 |
called done to getatra and by default it'll use the version in the object base class so here's 01:46:28.160 |
something just like a it's got i've got a and b defined but i've also got done to getatra defined 01:46:34.160 |
and so done to getatra it's only called for stuff that hasn't been defined yet 01:46:38.640 |
and it'll pass in the key or the the the name of the attribute so generally speaking if the first 01:46:47.600 |
character is an underscore it's going to be private or special so i'm just going to raise 01:46:52.880 |
an attribute error error otherwise i'm going to steal it and return hello from k so if i go 01:47:02.160 |
b dot a that's defined so it gives me one if i go b dot foo that's not defined so it calls getatra 01:47:10.880 |
and i get back hello from foo and so um uh this gets used a lot in both fastai code and also 01:47:17.040 |
hugging face code um to you know often make it more convenient to access things um so that's 01:47:26.640 |
yeah that's how the getatra function and the done to getatra method work 01:47:32.480 |
um okay so i went over that pretty quickly um since i know for quite a few folks this will be 01:47:40.080 |
all review but i know for folks who haven't seen any of this this is a lot to cover so i'm hoping 01:47:45.680 |
that you'll kind of go back over this revise it slowly experiment with it and look up some 01:47:50.480 |
additional resources and ask on the forum and stuff for anything that's not clear remember um 01:47:56.560 |
everybody has parts of the course that's really easy for them and parts of the course that are 01:48:04.400 |
completely unfamiliar for them and so if this particular part of the course is completely 01:48:08.320 |
unfamiliar to you it's not because this is harder um or going to be more difficult or whatever 01:48:15.760 |
it's just so happens that this is a bit that you're less familiar with or maybe the stuff about 01:48:22.000 |
calculus in the last lesson was a bit that you're less familiar with um there isn't really anything 01:48:27.280 |
particularly in the course that's more difficult than other parts it's just that you know based on 01:48:33.120 |
whether you happen to have that background and so yeah if you spend a few hours studying and 01:48:39.280 |
practicing you know you'll be able to pick up these things and um yeah so don't stress if there 01:48:46.080 |
are things that you don't get right away just take the time and if you yeah if you do get lost please 01:48:52.000 |
ask because people are very keen to help if you've tried asking on the forum hopefully you've noticed 01:48:57.440 |
that people are really keen to help all right so um i think this has been a pretty successful lesson 01:49:05.040 |
we've we've got to a point where we've got a pretty nicely optimized training loop we 01:49:09.280 |
understand exactly what data loaders and data sets do we've got an optimizer we've been playing with 01:49:15.600 |
hugging face data sets and we've got those working really smoothly um so we really feel like we're 01:49:20.720 |
in a pretty good position to to write our generic learner training loop and then we can start 01:49:26.880 |
building and experimenting with lots of models so look forward to seeing you next time to doing