Lesson 10: Deep Learning Foundations to Stable Diffusion, 2022

00:00:00.000 | Hi, everybody, and welcome back. This is lesson 10 of practical deep learning for coders.

00:00:09.920 | It's the second lesson in part two, which is where we're going from deep learning foundations

00:00:14.880 | to stable diffusion. So before we dive back into our notebook, I think first of all, let's

00:00:20.700 | take a look at some of the interesting work that students in the course have done over

00:00:24.640 | the last week. I'm just going to show a small sample of what's on the forum. So check out

00:00:29.760 | the share your work here thread on the forum for many, many, many more examples. So Puro

00:00:38.400 | did some something interesting, which is to create a bunch of images of doing a linear

00:00:46.500 | interpolation. I mean, details actually spherical linear interpolation, but it doesn't matter.

00:00:51.160 | Doing a linear interpolation between two different latent, you know, noisy, you know, latent noise

00:00:56.280 | starting points for an auto picture and then showed all the intermediate results that came

00:01:03.720 | out pretty nice and they did something similar starting with an old car prompt and going

00:01:08.640 | to a modern Ferrari prompt. I can't remember exactly what the prompts were, but you can

00:01:12.720 | see how as it kind of goes through that latent space, it actually is changing the image that's

00:01:20.380 | coming out. I think that's really cool. And then I love the way Namrata took that and

00:01:26.000 | took it to another level in a way, which is starting with a dinosaur and turning into

00:01:30.240 | a bird. And this is a very cool intermediate picture of one of the steps along the way.

00:01:37.480 | The dino bird. I love it. Dino chick. Fantastic. So much creativity on the forums. I loved

00:01:48.480 | this. John Richmond took his daughter's dog and turned it gradually into a unicorn. And

00:01:57.640 | I thought this one along the way actually came out very, very nicely. I think this is

00:02:02.240 | adorable. And I suspect that John has won the dad of the year or dad of the week, maybe

00:02:08.080 | award this week for this fantastic project. And Maureen did something very interesting,

00:02:18.920 | which is she took John O's parrot image from his lesson and tried bringing it across to

00:02:28.840 | various different painter's styles. And so her question was, anyone want to guess the

00:02:33.040 | artists in the prompts? So I'm just going to let you pause it before I move on. If you

00:02:40.000 | want to try to guess. And there they are. Most of them pretty obvious, I guess. I think

00:02:51.320 | it's so funny that Frida Kahlo appears in all of her paintings. So the parents actually

00:02:56.840 | turned into Frida Kahlo. All right. Not all of her paintings, but all of her famous ones.

00:03:01.960 | So the very idea of Frida Kahlo painting without her in it is so unheard of that the parrots

00:03:05.960 | turned into Frida Kahlo. And I like this Jackson Pollock. It's still got the parrot going on

00:03:11.340 | there. So that's a really lovely one, Maureen. Thank you. And this is a good reminder to

00:03:19.760 | make sure that you check out the other two lesson videos. So she was working with John

00:03:27.280 | Ono's stable diffusion lesson. So be sure to check that out if you haven't yet. It is

00:03:35.200 | available on the course web page and on the forums and has lots of cool stuff that you

00:03:41.000 | can work with, including this parrot. And then the other one to remind you about is

00:03:49.120 | the video that Waseem and Tanish did on the math of diffusion. And I do want to read out

00:03:56.880 | what Alex said about this because I'm sure a number of you feel the same way. My first

00:04:02.080 | reaction on seeing something with the title math of diffusion was to assume that, oh,

00:04:05.960 | that's just something for all the smart people who have PhDs in maths on the course. And

00:04:09.640 | it'll probably be completely incomprehensible. But of course, it's not that at all. So be

00:04:16.400 | sure to check this out. Even if you don't think of yourself as a math person, I think

00:04:20.840 | it's some nice background that you may find useful. It's certainly not necessary. But you

00:04:28.480 | might. Yeah, I think it's kind of useful to start to dig in some of the math at this point.

00:04:39.280 | One particularly interesting project that's been happening during the week is from Jason

00:04:43.040 | Antich, who is a bit of a legend around here. Many of you will remember him as being the

00:04:49.480 | guy that created De-Oldify and actually worked closely with us on our research, which together

00:04:57.440 | turned into Nogan and Decrapify and other things, created lots of papers. And Jason has kindly

00:05:07.680 | joined our little research team working on the stuff for these lessons and for developing

00:05:13.680 | a kind of fast AI approach to stable diffusion. And he took the idea that I prompted last

00:05:21.080 | week, which is maybe we should be using classic optimizers rather than differential equation

00:05:28.000 | solvers. And he actually made it work incredibly well already within a week. These faces were

00:05:33.600 | generated on a single GPU in a few hours from scratch by using classic deep learning optimizers,

00:05:45.120 | which is like an unheard of speed to get this quality of image. And we think that this research

00:05:52.160 | direction is looking extremely promising. So really great news there. And thank you,

00:05:58.640 | Jason, for this fantastic progress. Yeah, so maybe we'll do a quick reminder of what we

00:06:09.640 | looked at last week. So last week, I used a bit of a mega one note hand-drawn thing.

00:06:17.000 | I thought this week I might just turn it into some slides that we can use. So the basic

00:06:24.760 | idea, if you remember, is that we started with, if we're doing handwritten digits, for example,

00:06:30.400 | we'd start with a number seven. This would be one of the ones with a stroke through it

00:06:35.040 | that some countries use. And then we add to it some noise. And the seven plus the noise

00:06:43.120 | together would equal this noisy seven. And so what we then do is we present this noisy

00:06:51.440 | seven as an input to a unit. And we have it try to predict which pixels are noise, basically,

00:07:02.320 | or predict the noise. And so the unit tries to predict the noise from the number. It then

00:07:09.800 | compares its prediction to the actual noise. And it's going to then get a loss, which it

00:07:18.680 | can use to update the weights in the unit. And that's basically how stable diffusion,

00:07:26.240 | the main bit, if you like, the unit is created. To make it easier for the unit, we can also

00:07:34.220 | pass in an embedding of the actual digit, the actual number seven. So for example, a

00:07:39.600 | one hot encoded vector, which goes through an embedding layer. And the nice thing about

00:07:45.720 | that to remind you is that if we do this, then we also have the benefit that then later

00:07:50.080 | on we can actually generate specific digits by saying I want a number seven or I want

00:07:54.960 | a number five and it knows what they look like. I've skipped over here the VAE Latents

00:08:01.080 | piece, which we talked about last week. And to remind you, that's just a computational

00:08:06.640 | shortcut. It makes it it makes it faster. And so we don't need to include that in this

00:08:14.600 | picture because it's just a just a computational shortcut that we can pre-process things into

00:08:19.920 | that latent space with the VAE first, if we wish. So that's what the unit does. Now then

00:08:28.120 | to remind you, you know, we want to handle things that are more interesting than just

00:08:31.680 | the number seven. We want to actually handle things where we can say, for example, a graceful

00:08:40.720 | swan or a scene from Hitchcock. And the way we do that is we turn these sentences into

00:08:48.120 | embeddings as well. And we turn them into embeddings by trying to create embeddings

00:08:53.120 | of these sentences, which are as similar as possible to embeddings of the photos or images

00:08:58.280 | that they are connected with. And remind you, the way we did that or the way that was done

00:09:02.760 | originally as part of this thing called clip was to basically download from the internet

00:09:10.120 | lots of examples of lots of images, find their alt tags. And then for each one, we then have

00:09:17.800 | their image and its alt tag. So here's the graceful swan and its alt tag. And then we

00:09:24.240 | build two models, an image encoder that turns each image into some feature vector. And then

00:09:33.320 | we have a text encoder that turns each piece of text into a bunch of features. And then

00:09:38.680 | we create a loss function that says that the features for a graceful swan, the text, should

00:09:44.960 | be as close as possible to the features for the picture of a graceful swan. And specifically,

00:09:51.000 | we take the dot product and then we add up all the green ones because these are the ones

00:09:55.320 | that we want to match and we subtract all the red ones because those are the ones we

00:09:58.880 | don't want to match. Those are where the text doesn't match the image. And so that's the

00:10:04.160 | contrastive lost, which gives us the CL in clip. So that's a review of some stuff we

00:10:11.600 | did last week. And so with this, then we can, we now have a text encoder, which we can now

00:10:17.760 | say a graceful swan, and it will spit out some embeddings. And those are the embeddings

00:10:23.160 | that we can feed into our unit during training. And so then we haven't been doing any of that

00:10:34.320 | training ourselves, except for some fine tuning, because it takes a very long time on a lot

00:10:38.120 | of computers. But instead, we take pre-trained models and do inference. And the way we do

00:10:44.120 | inference is we put in an example of the thing that we want, that we have an embedding for.

00:10:49.480 | So let's say we're doing handwritten digits, and we put in some random noise into the unit.

00:10:56.280 | And then it spits out a prediction of which bits of noise you could remove to leave behind

00:11:02.000 | a picture of the number three. Initially, it's going to do quite a bad job of that.

00:11:07.040 | So we subtract just a little bit of that noise from the image to make it a little bit less

00:11:11.760 | noisy, and we do it again, and we do it a bunch of times. So here's what that looks like, creating

00:11:23.200 | a-- I think somebody here did a smiling picture of Jeremy Howard or something, if I remember

00:11:27.320 | correctly. And if we print out the noise at kind of step zero, and it's step six, and it's

00:11:36.240 | step 12, you can see the first signs of a face starting to appear. Definitely a face

00:11:41.640 | appearing here, 18, 24. By step 30, it's looking much more like a face. By 42, it's getting

00:11:50.800 | there. It's just got a few little blemishes to fix up. And here we are. I think I've slightly

00:11:55.940 | messed up my indexes here because it should finish at 60, not 54, but such is life. So

00:12:03.140 | rather rosy red lips, too, I would have to say. So remember, in the early days, this

00:12:12.200 | took 1,000 steps, and now there are some shortcuts to make it take 60 steps. And this is what

00:12:20.720 | the process looks like. And the reason this doesn't look like normal noise is because

00:12:24.200 | now we are actually doing the VAE Latents thing. And so noisy Latents don't look like

00:12:32.600 | a Gaussian noise. They look like, well, they look like this. This is what happens when

00:12:36.520 | you decode those noisy Latents. Now, you might remember last week I complained that things

00:12:44.200 | are moving too quickly. And there was a couple of papers that had come out the day before

00:12:49.480 | and made everything entirely out of date. So John and I and the team have actually had

00:12:58.120 | time to read those papers. And I thought now would be a good time to start going through

00:13:08.240 | some papers for the first time. So what we're actually going to do is show how these papers

00:13:16.680 | have taken the required number of steps to go through this process down from 60 steps

00:13:23.160 | to 4 steps, which is pretty amazing. So let's talk about that. And the paper is specifically

00:13:33.440 | is this one progressive distillation for fast sampling of diffusion models. So it's only

00:13:45.680 | been a week, so I haven't had much of a chance to try to explain this before. So apologies

00:13:49.040 | in advance if this is awkward, but hopefully it's going to make some sense. What we're

00:13:55.160 | going to start with is so we're going to start with this process, which is gradually denoising

00:14:07.680 | images and actually I wonder if we can copy it. Okay, so how are we going to get this down

00:14:17.760 | from 60 steps to 4 steps? The basic idea is that we're going to do a process. We're going

00:14:32.520 | to do a process called distillation, which I have no idea how to spell, but hopefully

00:14:38.560 | that's close enough that you get the idea. Distillation is a process which is pretty

00:14:43.640 | common in deep learning. And the basic idea of distillation is that you take something

00:14:48.760 | called a teacher network, which is some neural network that already knows how to do something,

00:14:55.720 | but it might be slow and big. And the teacher network is then used by a student network,

00:15:03.400 | which tries to learn how to do the same thing, but faster or with less memory. And in this

00:15:10.640 | case, we want ours to be faster. We want to do less steps. And the way we can do this

00:15:17.000 | conceptually, it's actually, in my opinion, reasonably straightforward. We have. Like when

00:15:29.840 | I look at this and I think like, wow, you know, neural nets are really amazing. So given

00:15:34.920 | your own, it's really amazing. Why is it taking like 18 steps to go from there to there? Like

00:15:47.640 | that seems like something that you should be able to do in one step. The fact that it's

00:15:53.440 | taking 18 steps and originally, of course, that was hundreds and hundreds of steps is

00:16:00.480 | because it's kind of that's just a kind of a side effect of the math of how this thing

00:16:08.000 | was originally developed, you know, this idea of this diffusion process. But the idea in

00:16:15.600 | this paper is something that actually we've, I think I might have even mentioned in the

00:16:20.840 | last lesson, it's something we were thinking of doing ourselves before this paper beat

00:16:24.240 | us to it, which is to say, well, what if we train a new model where the model takes as

00:16:32.760 | input this image, right, and puts it through some other unit, unit B. Okay. And then that

00:16:52.360 | spits out some result. And what we do is we take that result and we compare it to this

00:17:03.200 | image, the thing we actually want. Because the nice thing is now, which we've never really

00:17:08.980 | had before is we have for each intermediate output, like the desired goal where we're

00:17:13.720 | trying to get to. And so we could compare those two just using, you know, whatever means

00:17:20.080 | squared error. Keep on forgetting to change my pen means squared error. And so then if

00:17:29.280 | we keep doing this for lots and lots of images and lots of lots of pairs and exactly this

00:17:33.240 | way, this unit is going to hopefully learn to take these incomplete images and turn them

00:17:41.580 | into complete images. And that is exactly what this paper does. It just says, okay,

00:17:50.960 | now that we've got all these examples of showing what step 36 should turn into at step 54,

00:17:58.720 | let's just feed those examples into a model. And that works. And you'd kind of expect it

00:18:05.540 | to work because you can see that like a human would be able to look at this. And if they

00:18:09.520 | were a competent artist, they could turn that into a, you know, a well finished product.

00:18:15.220 | So you would expect that a computer could as well. There are some little tweaks around

00:18:21.240 | how it makes this work, which I will briefly describe because we need to be able to go

00:18:26.140 | from kind of step one through to step 10 through to step 20 and so forth. And so the way that

00:18:39.540 | it does this, it's actually quite clever. What they do is they initially, so they take their

00:18:45.020 | teacher model. So remember the teacher model is one that has already been trained. Okay.

00:18:50.100 | So the teacher model already is a complete stable diffusion model. That's finished. We

00:18:54.580 | take that as a given and we put in our image. Well, actually it's noise. We put in our noise,

00:19:03.420 | right? And we put it through two time steps. Okay. And then we train our unit B or whatever

00:19:13.780 | you want to call it to try to go directly from the noise to time step number two. And

00:19:20.940 | it's pretty easy for it to do. And so then what they do is they take this. Okay. And

00:19:26.660 | so this thing here, remember is called the student model. They then say, okay, let's

00:19:33.100 | now take that student model and treat that as the new teacher. So they now take their

00:19:40.940 | noise and they run it through the student model twice, once and twice, and they get

00:19:49.900 | out something at the end. And so then they try to create a new student, which is a copy

00:19:57.700 | of the previous student, and it learns to go directly from the noise to two goes of

00:20:02.940 | the student model. And you won't be surprised to hear they now take that new student model

00:20:08.060 | and use that to go two goes. And then they learn, they use that. Then they copy that

00:20:15.940 | to become the next student model. And so they're doing it again and again and again. And each

00:20:21.100 | time they're basically doubling the amount of work. So it goes one to two, effectively

00:20:25.380 | it's then going two to four and then four to eight. And that's basically what they're

00:20:31.260 | what they're doing. And they're doing it for multiple different time steps. So the single

00:20:35.700 | student model is learning to both do these initial steps, trying to jump multiple steps

00:20:47.220 | at a time. And it's also learning to do these later steps, multiple steps at a time. And

00:20:56.620 | that's it, believe it or not. So this is this neat paper that came out last week. And that's

00:21:04.140 | how it works. Now I mentioned that there was actually two papers. The second one is called

00:21:13.720 | On Distillation of Guided Diffusion Models. And the trick now is this second paper, these

00:21:25.440 | came out at basically the same time, if I remember correctly, even though they build

00:21:28.940 | on each other from the same teams, is that they say, okay, this is all very well, but

00:21:37.100 | we don't just want to create random pictures. We want to be able to do guidance, right?

00:21:46.100 | And you might remember, I hope you remember from last week that we used something called

00:21:49.540 | Classifier Free Guided Diffusion Models, which, because I'm lazy, we will just use an acronym,

00:21:58.340 | Classifier Free Guided Diffusion Models. And this one, you may recall, we take, let's say

00:22:07.540 | we want a cute puppy. We put in the prompt cute puppy into our clip text encoder, and

00:22:16.620 | that spits out an embedding. And we put that, let's ignore the VAE Latents business. We

00:22:29.020 | put that into our unit, but we also put the empty prompt into our clip text encoder. We

00:22:43.100 | concatenate these things too together so that then out the other side, we get back two things.

00:22:49.820 | We get back the image of the cute puppy, and we get back the image of some arbitrary thing.

00:22:58.420 | Could be anything. And then we effectively do something very much like taking the weighted

00:23:03.020 | average of these two things together, combine them. And then we use that for the next stage

00:23:11.340 | of our diffusion process. Now, what this paper does is it says, this is all pretty awkward.

00:23:18.260 | We end up having to train two images instead of one. And for different types of levels

00:23:23.740 | of guided diffusion, we have to do it multiple different times. It's all pretty annoying.

00:23:28.620 | How do we skip it? And based on the description of how we did it before, you may be able to

00:23:34.260 | guess. What we do is we do exactly the same student-teacher distillation we did before,

00:23:44.020 | but this time we pass in, in addition, the guidance. And so again, we've got the entire

00:23:56.300 | stable diffusion model, the teacher model available for us. And we are doing actual CFGD,

00:24:07.980 | Classifier Free Guided Diffusion, to create our guided diffusion cute puppy pictures.

00:24:13.780 | And we're doing it for a range of different guidance scales. So you might be doing two

00:24:17.940 | in 7.5 and 12 and whatever, right? And those now are becoming inputs to our student model.

00:24:29.140 | So the student model now has additional inputs. It's getting the noise as always. It's getting

00:24:36.980 | the caption or the prompt, I guess I should say, as always, but it's now also getting

00:24:43.500 | the guidance scale. And so it's learning to find out how all of these things are handled

00:24:53.820 | by the teacher model. Like what does it do after a few steps each time? So it's exactly

00:25:00.420 | the same thing as before, but now it's learning to use the Classifier Free Guided Diffusion

00:25:06.860 | as well. Okay. So that's got quite a lot going on there. And if it's a bit confusing, that's

00:25:17.060 | okay. It is a bit confusing. And what I would recommend is you check out the extra information

00:25:27.700 | from Jono who has a whole video on this. And one of the cool things actually about this

00:25:31.620 | video is it's actually a paper walkthrough. And so part of this course is hopefully we're

00:25:36.620 | going to start reading papers together. Reading papers is extremely intimidating and overwhelming

00:25:43.820 | for all of us, all of the time, at least for me, it never gets any better. There's a lot

00:25:48.540 | of math. And by watching somebody like Jono, who's an expert at this stuff, read through

00:25:54.940 | a paper, you'll kind of get a sense of how he is skipping over lots of the math, right?

00:26:01.340 | To focus on, in this case, the really important thing, which is the actual algorithm. And

00:26:06.180 | when you actually look at the algorithm, you start to realize it's basically all stuff,

00:26:11.420 | nearly all stuff, maybe all stuff that you did in primary school or secondary school.

00:26:15.940 | So we've got division, okay, sampling from a normal distribution, so high school, subtraction,

00:26:23.420 | division, division, multiplication, right? Oh, okay, we've got a log there. But basically,

00:26:30.020 | you know, there's not too much going on. And then when you look at the code, you'll find

00:26:36.060 | that once you turn this into code, of course, it becomes even more understandable if you're

00:26:40.260 | somebody who's more familiar with code, like me. So yeah, definitely check out Jono's video

00:26:49.980 | on this. So another paper came out about three hours ago. And I just had to show you it to

00:27:02.900 | you because I think it's amazing. And so this is definitely the first video about this paper

00:27:12.100 | because yeah, only came out a few hours ago. But check this out. This is a paper called

00:27:16.180 | iMagic. And with this algorithm, you can pass in an input image. This is just a, you know,

00:27:23.100 | a photo you've taken or downloaded off the internet. And then you pass in some text saying

00:27:28.340 | a bird spreading wings. And what it's going to try to do is it's going to try to take

00:27:32.060 | this exact bird in this exact pose and leave everything as similar as possible, but adjust

00:27:36.880 | it just enough so that the prompt is now matched. So here we take this, this little guy here,

00:27:45.100 | and we say, oh, this is actually what we want this to be a person giving the thumbs up.

00:27:49.460 | And this is what it produces. And you can see everything else is very, very similar

00:27:52.460 | to the previous picture. So this dog is not sitting. But if we put in the prompt, a sitting

00:27:59.980 | dog, it turns it into a sitting dog, leaving everything else as similar as possible. So

00:28:08.180 | here's an example of a waterfall. And then you say it's a children's drawing of a waterfall

00:28:11.740 | and now it's become a children's drawing. So lots of people in the YouTube chat going,

00:28:16.340 | oh my God, this is amazing, which it absolutely is. And that's why we're going to show you

00:28:19.580 | how it works. And one of the really amazing things is you're going to realize that you

00:28:23.540 | understand how it works already. Just to show you some more examples, here's the dog image.

00:28:29.940 | Here's the sitting dog, the jumping dog, dog playing with the toy, jumping dog holding

00:28:36.880 | a frisbee. Okay. And here's this guy again, giving the thumbs up, crossed arms in a greeting

00:28:45.020 | pose to namaste hands, holding a cup. So that's pretty amazing. So I had to show you how this

00:28:56.380 | works. And I'm not going to go into too much detail, but I think we can get the idea actually

00:29:06.640 | pretty well. So what we do is again, we start with a fully pre-trained, ready to go generative

00:29:20.860 | model like a stable diffusion model. And this is what this is talking about here, pre-trained

00:29:27.380 | diffusion model. In the paper, they actually use a model called Imogen, but none of the

00:29:31.500 | details as far as I can see in any way depend on what the model is. It should work just

00:29:35.900 | fine for stable diffusion. And we take a photo of a bird spreading wings. Okay. So that's

00:29:41.420 | our, that's our target. And we create an embedding from that using, for example, our clip encoder

00:29:52.220 | as usual. And we then pass it through our pre-trained diffusion model. And we then see

00:30:05.220 | what it creates. And it doesn't create something that's actually like our bird. So then what

00:30:18.020 | they do is they fine-tune this embedding. So this is kind of like textual inversion.

00:30:23.820 | They fine-tune the embedding to try to make the diffusion model output something that's

00:30:30.420 | as similar as possible to the, to the input image. And so you can see here, they're saying,

00:30:37.260 | oh, we're moving our embedding a little bit. They don't do this for very long. They just

00:30:41.820 | want to move it a little bit in the right direction. And then now they lock that in

00:30:46.900 | place and they say, okay, now let's fine-tune the entire diffusion model end to end, including

00:30:53.140 | the VAE or actually with Imogen, they have a super resolution model, but same idea. So

00:30:58.900 | we fine-tune the entire model end to end. And now the embedding, this optimized embedding

00:31:04.020 | we created, we, we, we store in place. We don't change that at all. That's now frozen.

00:31:11.580 | And we try to make it so that the diffusion model now spits out our bird as close as possible.

00:31:22.140 | So you fine-tune that for a few epochs. And so you've now got something that takes this,

00:31:26.300 | embedding that we fine-tuned, goes through a fine-tuned model and spits out our bird.

00:31:30.460 | And then finally, the original target embedding we actually wanted is a photo of a bird spreading

00:31:37.060 | its wings. We ended up with this slightly different embedding and we take the weighted

00:31:42.620 | average of the two. That's called the interpolate step, the weighted average of the two. And we

00:31:47.820 | pass that through this fine-tune diffusion model and we're done. And so that's pretty

00:31:56.700 | amazing. This, this would not take, I don't think a particularly long time or require any

00:32:03.020 | particular special hardware. It's the kind of thing I expect people will be doing, yeah,

00:32:09.740 | in the coming days and weeks. But it's very interesting because, yeah, I mean, the ability

00:32:14.860 | to take any photo of a person or whatever and change it, like literally change what the person's

00:32:26.380 | doing is, you know, societally very important and really means that anybody I guess now can generate

00:32:40.380 | believable photos that never actually existed. I see John O in the chat saying it took

00:32:47.020 | about eight minutes to do it for Imogen on TPUs. Although Imogen is quite a slow

00:32:54.620 | big model, although the TPUs they used were very, the latest TPUs. So might be, you know,

00:33:03.260 | maybe it's an hour or something for stable diffusion on GPUs.

00:33:07.740 | All right. So that is a lot of fun.

00:33:16.220 | All right. So with that, let's go back to our notebook where we left it last time. We had kind

00:33:30.700 | of looked at some applications that we can play with in this diffusion NBS repo, in the stable

00:33:36.140 | diffusion notebook. And what we've got now, and to remind you, when I say we, it's mainly actually

00:33:45.020 | Pedro, Patrick and Suraj, just a little bit of help from me. So hugging face folks. What we/they

00:33:52.700 | have done is they now dig into the pipeline to pull it all apart step by step. So you can see

00:33:59.740 | exactly what happens. The first thing I was just going to mention is this is how you can create

00:34:05.740 | those gradual denoising pictures. And this is thanks to something called the callback.

00:34:13.980 | So you can say here, when you go through the pipeline, every 12 steps, call this function.

00:34:21.980 | And as you can see, it's going to call it with I and T and the latents.

00:34:29.180 | And so then we can just make an image and stick it on the end of an array. And that's all that's

00:34:37.260 | happening here. So this is how you can start to interact with a pipeline without rewriting it

00:34:45.420 | yourself from scratch. But now what we're going to do is we're actually going to write it, we're

00:34:49.580 | going to build it from scratch. So you don't actually have to use a callback because you'll

00:34:54.620 | be able to change it yourself. So let's take a look. So looking inside the pipeline, what exactly is

00:35:03.500 | going on? So what's going to be going on in the pipeline is seeing all of the steps that we saw

00:35:13.900 | in last week's OneNote notes that I drew. And it's going to be all the code. And we're not going to

00:35:20.860 | show the code of how each step is implemented. So for example, the clip text model we talked about,

00:35:27.340 | the thing that takes as input a prompt and creates an embedding, we just take that as a given. So we

00:35:34.860 | download it, open AI is trained one called clip, the IT large patch 14. So we just say from tree

00:35:41.980 | change. So hugging face will transformers will download and create that model for us.

00:35:49.100 | Ditto for the tokenizer. And so ditto for the auto encoder, and ditto for the unit.

00:35:56.460 | So there they all are, we can just grab them. So we just take that all as a given.

00:36:01.500 | These are the three models that we've talked about, the text encoder, the clip encoder,

00:36:07.020 | the VAE, and the unit. So there they are. So given that we now have those,

00:36:16.540 | the next thing we need is that thing that converts time steps into the amount of noise. Remember that

00:36:22.540 | graph we drew. And so we can basically, again, use something that hugging face,

00:36:31.340 | well, actually, in this case, Catherine Carlson has already provided, which is a scheduler,

00:36:37.580 | it's basically something that shows us that connection. So we've got that. So we use that

00:36:44.700 | scheduler. And we say how much noise when we're using. And so we have to make sure that that

00:36:51.580 | matches. And so we just use these numbers that were given. Okay, so now to create our photograph

00:36:59.260 | of astronaut riding a horse again, in 70 steps with a 7.5 guide and scale, batch size of one,

00:37:08.380 | step number one is to take our prompt and tokenize it. Okay, so we looked at that in part one of the

00:37:16.060 | course. So check that out if you can't remember what tokenizing does, but it's just splitting it

00:37:20.620 | in basically splitting it into words, or subword units if they're long and unusual words. So here

00:37:26.860 | are so this will be the start the start of sentence token, and this will be a photograph

00:37:32.620 | of an astronaut, etc. And then you can see the same token is repeated again again at the end.

00:37:38.380 | That's just the padding to say we're all done. And the reason for that is that GPUs and TPUs

00:37:45.340 | really like to do lots of things at once. So we kind of have everything be the same length by

00:37:51.820 | padding them. That may sound like a lot of wasted work, which it kind of is. But a GPU would rather

00:37:58.620 | do lots of things at the same time on exactly the same sized input. So this is why we have all this

00:38:03.660 | padding. So you can see here if we decode that number, it's the end of text marker, just just

00:38:11.340 | padding really in this case. As well as getting the input IDs, so these are just lookups into a

00:38:18.620 | vocabulary. There's also a mask, which is just telling it which ones are actual words as opposed

00:38:25.500 | to padding, which is not very interesting. So we can now take those input IDs, we can put them on

00:38:35.180 | the GPU, and we can run them through the clip encoder. And so for a batch size of one, so you've

00:38:43.180 | got one image that gives us back a 77 by 768, because we've got 77 here. And each one of those

00:38:54.620 | creates a 768 long vector. So we've got a 77 by 768 tensor. So these are the embeddings

00:39:01.820 | for a photograph of an astronaut riding a horse that come from clip. So remember,

00:39:08.540 | everything's pre-trained. So that's all done for us. We're just doing inference. And so remember,

00:39:15.340 | for the classifier free guidance, we also need the embeddings for the empty string.

00:39:23.500 | So we do exactly the same thing.

00:39:36.540 | So now we just concatenate those two together, because this is just a trick to get the GPU to

00:39:42.380 | do both at the same time, because we like the GPU to do as many things at once as possible.

00:39:46.220 | And so now we create our noise. And because we're doing it with a VAE, we can call it

00:39:56.380 | Latents, but it's just noise, really. I wonder if you'd still call it that without the VAE.

00:40:03.340 | Maybe you would have to think about that. So that's just random numbers normally

00:40:09.180 | generated, normally distributed random numbers of size one, that's our batch size.

00:40:14.140 | And the reason that we've got this divided by eight here is because that's what the VAE does.

00:40:20.220 | It allows us to create things that are eight times smaller by height and width.

00:40:24.700 | And then it's going to expand it up again for us later. That's why this is so much faster.

00:40:30.700 | You'll see a lot of this after we put it on the GPU, you'll see a lot of this dot half.

00:40:34.620 | This is converting things into what's called half precision or FB 16.

00:40:39.580 | Details don't matter too much. It's just making it half as big in memory by using less precision.

00:40:45.740 | Modern GPUs are much, much, much, much faster if we do that. So you'll see that a lot.

00:40:51.500 | If you use something like fast AI, you don't have to worry about it, but all this stuff is done for

00:40:56.780 | you. And we'll see that later as we rebuild this with much, much less code later in the course.

00:41:02.060 | So we'll be building our own kind of framework from scratch,

00:41:07.740 | which you'll then be able to maintain and work with yourself.

00:41:12.780 | Okay, so we have to say we want to do 70 steps.

00:41:16.060 | Something that's very important, we won't worry too much about the details right now,

00:41:23.740 | but what you see here is that we take our random noise and we scale it. And that's because

00:41:29.740 | depending on what stage you're up to, you need to make sure that you have the right amount of

00:41:36.860 | variance, basically. Otherwise you're going to get activations and gradients that go out of control.

00:41:42.940 | This is something we're going to be talking about a huge amount

00:41:45.580 | during this course, and we'll show you lots of tricks to handle that kind of thing automatically.

00:41:50.860 | Unfortunately at the moment in the stable diffusion world, this is all done in rather,

00:41:56.940 | in my opinion, kind of ways that are too tied to the details of the model. I think we will be able

00:42:04.620 | to improve it as the course goes on, but for now we'll stick with how everybody else is doing it.

00:42:10.220 | This is how they do it. So we're going to be jumping through. So normally it would take a

00:42:15.740 | thousand time steps, but because we're using a fancy scheduler, we get to skip from 999 to

00:42:21.660 | 984, 984 to 970, and so forth. So we're going down about 14 time steps. And remember, this is a very,

00:42:28.460 | very, very unfortunate word. They're not time steps at all. In fact, they're not even integers.

00:42:33.420 | It's just a measure of how much noise are we adding at each time, and you find out how much noise by

00:42:41.500 | looking it up on this graph. Okay. That's all time step means. It's not a step of time, and it's a

00:42:47.740 | real shame that that word is used because it's incredibly confusing. This is much more helpful.

00:42:53.660 | This is the actual amount of noise at each one of those iterations. And so here you can see the

00:43:02.140 | amount of noise for each of those time steps, and we're going to be going backwards. As you can see,

00:43:11.100 | we start at 999, so we'll start with lots of noise, and then we'll be using less and less and less and

00:43:15.900 | less noise. So we go through the 70 time steps in a for loop, concatenating our two noise bits

00:43:30.380 | together, because we've got the classifier free and the prompt versions, do our scaling,

00:43:39.500 | calculate our predictions from the unit, and notice here we're passing in the time step,

00:43:44.460 | as well as our prompt. That's going to return two things, the unconditional prediction,

00:43:51.980 | so that's the one for the empty string. Remember, we passed in one of the two things we passed in

00:43:57.180 | was the empty string. So we concatenated them together, and so after they come out of the unit,

00:44:04.220 | we can pull them apart again. So dot chunk just means pull them apart into two separate variables,

00:44:09.580 | and then we can do the guide and scale that we talked about last week.

00:44:14.140 | And so now we can do that update where we take a little bit of the noise

00:44:23.660 | and remove it to give us our new latents. So that's the loop. And so at the end of all that,

00:44:33.740 | we decode it in the VAE. The paper that created this VAE tells us that we have to divide it by

00:44:41.180 | this number to scale it correctly. And once we've done that, that gives us a number which is between

00:44:48.620 | negative one and one. Python imaging library expects something between zero and one,

00:44:56.860 | so that's what we do here to make it between zero and one and enforce that to be true.

00:45:02.460 | Put that back on the CPU, make sure it's that the order of the dimensions is the same as what Python

00:45:08.780 | imaging library expects, and then finally convert it up to between zero and 255 as an int,

00:45:16.220 | which is actually what PIO really wants. And there's our picture. So there's all the steps.

00:45:26.620 | So what I then did, this is kind of like, so the way I normally build code, I use notebooks for

00:45:34.860 | everything, is I kind of do things step by step by step, and then I tend to kind of copy them,

00:45:40.460 | and I use shift M. I don't know if you've seen that, but what shift M does, it takes two cells

00:45:46.460 | and combines them like that. And so I basically combined some of the cells together, and I removed

00:45:53.820 | a bunch of the the pros, so you can see the entire thing on one screen. And what I was trying to do

00:46:04.620 | here is I'd like to get to the point where I've got something which I can very quickly do experiments

00:46:09.420 | with. So maybe I want to try some different approach to guidance tree classification,

00:46:13.820 | maybe I want to add some callbacks, so on and so forth. So I kind of like to have everything,

00:46:22.540 | you know, I like to have all of my important code be able to fit into my screen at once.

00:46:28.300 | And so you can see now I do, I've got the whole thing on my screen, so I can keep it all in my

00:46:32.380 | head. One thing I was playing around with was I was trying to understand the actual guidance tree

00:46:41.340 | equation in terms of like how does it work. Computer scientists tend to write things

00:46:50.220 | and software engineers with kind of long words as variable names. Mathematicians tend to use

00:46:55.820 | short just letters normally. For me, when I want to play around with stuff like that, I turn stuff

00:47:01.580 | back into letters. And that's because I actually kind of pulled out one note and I started jutting

00:47:06.540 | down this equation and playing around with it to understand how it behaves. So this is just like,

00:47:13.900 | it's not better or worse, it's just depending on what you're doing. So actually here I said, okay,

00:47:18.860 | g is guidance scale. And then rather than having the unconditional and text embeddings,

00:47:24.940 | I just call them u and t. And now I've got this all down into an equation which I can

00:47:29.420 | write down in a notebook and play with and understand exactly how it works.

00:47:32.860 | So that's something I find really helpful for working with this kind of code is to, yeah,

00:47:40.700 | turn it into a form that I can manipulate algebraically more easily. I also try to make

00:47:46.140 | it look as much like the paper that I'm implementing as possible. Anyways, that's that code. So then I

00:47:52.860 | copied all this again and I basically, oh, I actually did it for two prompts this time.

00:47:59.420 | I thought this was fun. While you're painting an astronaut riding a horse in the style of

00:48:03.340 | Grant Wood, just to remind you, Grant Wood looks like this. Not obviously astronaut material,

00:48:15.500 | which I thought would make it actually kind of particularly interesting. Although he does have

00:48:19.500 | horses. I can't see one here. Some of his pictures have horses. So because I did two

00:48:27.820 | prompts, I got back two pictures I could do. So here's the Grant Wood one. I don't know what's

00:48:33.340 | going on in his back here, but I think it's quite nice. So yeah, I then copied that whole thing again

00:48:39.980 | and merged them all together and then just put it into a function. So I took the little bit which

00:48:48.540 | creates an image and put that into a function. I took the bit which does the tokenizing and

00:48:55.020 | text encoding and put that into a function. And so now all of the code necessary to do

00:49:01.340 | the whole thing from top to bottom fits in these two cells, which makes it for me much easier to

00:49:10.300 | see exactly what's going on. So you can see I've got the text embeddings. I've got the

00:49:16.860 | unconditional embeddings. I've got the embeddings which can catenate the two together,

00:49:20.220 | optional random seed, my latents, and then the loop itself. And you'll also see something I do

00:49:32.700 | which is a bit different to a lot of software engineering is I often create things which are

00:49:36.780 | kind of like longer lines because I try to have each line be kind of like mathematically one thing

00:49:44.620 | that I want to be able to think about as a whole. So yeah, these are some differences between kind of

00:49:50.300 | the way I find numerical programming works well compared to the way I would write a more

00:49:56.060 | traditional software engineering approach. And again, this is partly a personal preference,

00:50:01.100 | but it's something I find works well for me. So we're now at a point where we've got, yeah,

00:50:06.300 | two fun, three functions that easily fit on the screen and do everything. So I can now just say

00:50:11.500 | make samples and display each image. And so this is something for you to experiment with.

00:50:21.660 | And what I specifically suggest as homework is to try picking one of the

00:50:32.860 | extra tricks we learned about like image to image or negative prompts. Negative prompts

00:50:39.340 | would be a nice easy one like see if you can implement negative prompt in your version of this.

00:50:48.780 | Or yeah, try doing image to image. That wouldn't be too hard either. Another one you can add is

00:51:00.220 | try adding callbacks. And the nice thing is then, you know, you've got code which you fully understand

00:51:07.900 | because you know what all the lines do. And you then don't need to wait for the diffusers folks

00:51:16.460 | to update it. The library to do is, for example, the callbacks are only added like a week ago.

00:51:22.140 | So until then you couldn't do callbacks. Well, now you don't have to wait for the diffusers team

00:51:26.140 | to add something. The code's all here for you to play with. So that's my recommendation as a bit

00:51:32.380 | of homework for this week. Okay, so that brings us to the end of our rapid overview of stable

00:51:47.420 | diffusion and some very recent papers that very significantly developed stable diffusion. I hope

00:51:53.180 | that's given you a good sense of the kind of very high level slightly hand wavy version of all this

00:52:00.940 | and you can actually get started playing with some fun code. What we're going to be doing next

00:52:06.540 | is going right back to the start, learning how to multiply two matrices together effectively

00:52:14.860 | and then gradually building from there until we've got to the point that we've rebuilt all

00:52:19.180 | this from scratch and we understand why things work the way they do, understand how to debug

00:52:24.300 | problems, improve performance and implement new research papers as well. So that's going to be

00:52:32.380 | very exciting. And so we're going to have a break and I will see you back here in 10 minutes.

00:52:45.660 | Okay, welcome back everybody. I'm really excited about the next part of this. It's going to require

00:52:53.100 | some serious tenacity and a certain amount of patience. But I think you're going to learn a lot.

00:53:02.620 | A lot of folks I've spoken to have said that previous iterations of this part of the course

00:53:10.060 | is like the best course they've ever done. And this one's going to be dramatically better than

00:53:14.620 | any previous version we've done of this. So hopefully you'll find that the hard work and patience

00:53:22.620 | pays off. We're working now through the course 22 p2 repo. So 2022 course part two.

00:53:33.500 | And notebooks are ordered. So we'll start with notebook number one.

00:53:40.940 | And okay, so the goal is to get to stable diffusion from the foundations, which means we

00:53:49.020 | have to define what are the foundations. So I've decided to define them as follows. We're allowed

00:53:56.060 | to use Python. We're allowed to use the Python standard library. So that's all the stuff that

00:54:01.500 | comes with Python by default. We're allowed to use matplotlib because I couldn't be bothered

00:54:07.100 | creating my own plotting library. And we're allowed to use Jupyter notebooks and NBDev,

00:54:13.420 | which is something that creates modules from notebooks. So basically what we're going to

00:54:19.340 | try to do is to rebuild everything starting from this foundation. Now, to be clear,

00:54:27.980 | what we are allowed to use are the libraries once we have reimplemented them correctly.

00:54:36.060 | And so if we reimplement something from NumPy or PyTorch or whatever, we're then allowed to use

00:54:43.020 | the NumPy or PyTorch or whatever version. Sometimes we'll be creating things that haven't

00:54:50.700 | been created before. And that's then going to be becoming our own library. And we're going to be

00:54:56.860 | calling that library mini AI. So we're going to be building our own little framework as we go.

00:55:03.740 | So, for example, here are some imports. And these imports all come

00:55:08.540 | from the Python standard library, except for these two.

00:55:12.300 | Now, to be clear, one challenge we have is that the models we use in stable diffusion were trained

00:55:24.300 | on millions of dollars worth of equipment for months, which we don't have the time or money.

00:55:31.740 | So another trick we're going to do is we're going to create smaller identical but smaller versions

00:55:38.860 | of them. And so once we've got them working, we'll then be allowed to use the big pre-trained

00:55:44.540 | versions. So that's the basic idea. So we're going to have to end up with our own VAE,

00:55:50.380 | our own unit, our own clip encoder, and so forth.

00:55:59.020 | To some degree, I am assuming that you've completed part one of the course. To some degree,

00:56:05.500 | I will cover everything at least briefly. But if I cover something about deep learning too fast for

00:56:13.100 | you to know what's going on and you get lost, go back and watch part one, or go and Google for

00:56:21.740 | that term. For stuff that we haven't covered in part one, I will go over it very thoroughly and

00:56:27.340 | carefully. All right. So I'm going to assume that you know the basic idea, which is that we're going

00:56:37.420 | to need to be doing some matrix multiplication. So we're going to try to take a deep dive into

00:56:43.820 | matrix multiplication today. And we're going to need some input data. And I quite like working

00:56:49.900 | with MNIST data. MNIST is handwritten digits. It's a classic data set. They're 28 by 28 pixel

00:57:04.380 | grayscale images. And so we can download them from this URL.

00:57:15.980 | So we use the path libs path object a lot. It's part of Python. And it basically takes a string

00:57:23.100 | and turns it into something that you can treat as a path. For example, you can use slash to mean

00:57:28.460 | this file inside this subdirectory. So this is how we create a path object.

00:57:34.140 | Path objects have, for example, a make directory method.

00:57:43.340 | So I like to get everything set up. But I want to be able to rerun this cell lots of times

00:57:48.940 | and not give me errors if I run it more than once. If I run it a second time, it still works.

00:57:55.100 | And in that case, that's because I put this exist OK equals true. How did I know that I can say--

00:58:00.460 | because otherwise, it would try to make the directory. It would already exist in a given

00:58:03.820 | error. How do I know what parameters I can pass to make dir? I just press shift tab. And so when I

00:58:10.540 | hit shift tab, it tells me what options there are. If I press it a few times, it'll actually

00:58:18.460 | pop it down to the bottom of the screen to remind me. I can press escape to get rid of it.

00:58:25.020 | Or you can just-- or else you can just hit tab inside. And it'll list all the things you could

00:58:37.900 | type here, as you can see. All right. So we need to grab this URL. And so Python comes

00:58:52.620 | with something for doing that, which is the URL lib library that's part of Python

00:58:59.340 | that has something called URL retrieve. And something which I'm always a bit surprised

00:59:05.020 | is not widely used as people reading the Python documentation. So you should do that a lot.

00:59:11.580 | So if I click on that, here is the documentation

00:59:19.340 | for URL retrieve. And so I can find exactly what it can take. And I can learn about exactly what

00:59:33.980 | it does. And so I read the documentation from the Python docs for every single method I use.

00:59:48.460 | And I look at every single option that it takes. And then I practice with it.

00:59:53.420 | And to practice with it, I practice inside Jupyter. So if I want this import on its own,

01:00:07.660 | I can hit Control-Shift- and it's going to spit it into two cells. And then I'll hit Alt-Enter

01:00:16.300 | or Option-Enter so I can create something underneath. And I can type URL retrieve, Shift-Tab.

01:00:22.700 | And so there it all is. If I'm like way down somewhere in the notebook and I have no idea

01:00:32.540 | where URL retrieve comes from, I can just hit Shift-Enter. And it actually tells me exactly

01:00:38.140 | where it comes from. And if I want to know more about it, I can just hit Question Mark,

01:00:43.340 | Shift-Enter, and it's going to give me the documentation. And most cool of all,

01:00:49.900 | second question mark, and it gives me the full source code. And you can see it's not a lot.

01:00:58.460 | You know, reading the source code of Python standard library stuff is often quite revealing.

01:01:03.260 | And you can see exactly how they do it. And that's a great way to learn more about

01:01:12.940 | more about this. So in this case, I'm just going to use a very simple functionality,

01:01:24.300 | which is I'm going to say the URL to retrieve and the file name to save it as.

01:01:28.540 | And again, I made it so I can run this multiple times. So it's only going to do the URL retrieve

01:01:36.460 | if the path doesn't exist. If I've already downloaded it, I don't want to download it again.

01:01:41.580 | So I run that cell. And notice that I can put exclamation mark, followed by a line of bash.

01:01:49.980 | And it actually runs this using bash.

01:01:54.220 | If you're using Windows, this won't work. And I would very, very strongly suggest if you're using

01:02:06.780 | Windows, use WSL. And if you use WSL, all of these notebooks will work perfectly. So yeah, do that.

01:02:14.860 | Or write it on paper space or Lambda Labs or something like that, Colab, et cetera.

01:02:19.980 | Okay, so this is a gzip file. So thankfully, Python comes with a gzip module. Python comes

01:02:31.580 | with quite a lot, actually. And so we can open a gzip file using gzip.open. And we can pass in

01:02:37.980 | the path. And we'll say we're going to read it as binary as opposed to text.

01:02:43.580 | Okay, so this is called a context manager. It's a with clause. And what it's going to do is it's

01:02:51.340 | going to open up this gzip file. The gzip object will be called f. And then it runs everything

01:02:59.100 | inside the block. And when it's done, it will close the file. So with blocks can do all kinds

01:03:06.540 | of different things. But in general, with blocks that involve files are going to close the file

01:03:11.660 | automatically for you. So we can now do that. And so you can see it's opened up the gzip file.

01:03:18.860 | And the gzip file contains what's called pickle objects. Pickled objects is basically

01:03:26.060 | Python objects that have been saved to disk. It's the main way that people in pure Python

01:03:32.220 | save stuff. And it's part of the standard library. So this is how we load in from that file.

01:03:39.260 | Now, the file contains a tuple of tuples. So when you put a tuple on the left hand side of an equal

01:03:46.460 | sign, it's quite neat. It allows us to put the first tuple into two variables called x_train

01:03:51.660 | and y_train and the second into x_valid and y_valid. This trick here where you put stuff like this on

01:03:59.180 | the left is called destructuring. And it's a super handy way to make your code kind of

01:04:07.020 | clear and concise. And lots of languages support that, including Python.

01:04:15.340 | Okay. So we've now got some data. And so we can have a look at it. Now, it's a bit tricky

01:04:22.380 | because we're not allowed to use NumPy according to our rules. But unfortunately, this actually

01:04:26.540 | comes as NumPy. So I've turned it into a list. All right. So I've taken the first

01:04:34.140 | image and I've turned it into a list. And so we can look at a few examples of some values in that list.

01:04:44.380 | And here they are. So it looks like they're numbers between zero and one.

01:04:47.420 | And this is what I do when I learn about a new dataset. So when I started writing this notebook,

01:04:56.620 | what you see here, other than the pros here, is what I actually did when I was working

01:05:04.540 | with this data. I wanted to know what it was. So I just grab a little bit of it and look at it.

01:05:14.140 | So I kind of got a sense now of what it is. Now, interestingly,

01:05:20.060 | it's 784. This image is 784 long list. Oh, dear. People freaking out in the comments. No NumPy.

01:05:39.980 | Yeah. No NumPy. Do you see NumPy? No NumPy. Why 784? What is that? Well, that's because these are

01:05:47.500 | 28 by 28 images. So it's just a flat list here of 784 long. So how do I turn this 784 long thing

01:05:57.340 | into 28 by 28? So I want a list of 28 lists of 28, basically, because we don't have matrices.

01:06:05.820 | So how do we do that? And so we're going to be learning a lot of cool stuff in Python here.

01:06:11.100 | Sorry, I can't stop laughing at all the stuff in our chat.

01:06:15.580 | Oh, dear. People are quite reasonably freaking out. That's okay. We'll get there.

01:06:24.300 | I promise. I hope. Otherwise, I'll embarrass myself. All right. So how do I convert a 784

01:06:33.580 | long list into 28 lists, 28 long list of 28 long lists? I'm going to use something called

01:06:43.740 | chunks. And first of all, I'll show you what this thing does. And then I'll show you how it works.

01:06:47.340 | So vowels is currently a list of 10 things. Now, if I take vowels and I pass it to chunks

01:06:56.780 | with five, it creates two lists of five. Here's list number one of five elements. And here's list

01:07:03.740 | number two of five elements. Hopefully, you can see what it's doing. It's chunkifying this list.

01:07:12.860 | And this is the length of each chunk. Now, how did it do that? The way I did it is using a very,

01:07:20.620 | very useful thing in Python that far too many people don't know about, which is called yield.

01:07:26.060 | And what yield does is you can see here, if we're on loop, it's going to go through from zero

01:07:33.020 | up to the length of my list. And it's going to jump by five at a time. It's going to go,

01:07:40.060 | in this case, zero comma five. And then it's going to think of this as being like return for now,

01:07:47.420 | it's going to return the list from zero up to five. So it returns the first bit of the list.

01:07:55.180 | But yield doesn't just return. It kind of like returns a bit and then it continues.

01:08:02.780 | And it returns a bit more. And so specifically, what yield does is it creates an iterator.

01:08:13.420 | An iterator is an iterator is basically something you can actually use it that you can call next on

01:08:21.660 | a bunch of times. So let's try it. So we can say iterator equals.

01:08:29.260 | Okay. Oh, got to run it. So what is iterator? Well, iterator is something that I can basically,

01:08:38.940 | I can call next on. And next basically says yield the next thing. So this should yield

01:08:45.660 | vals zero colon five. There it is. It did, right? There's vals zero colon five.

01:08:56.060 | Now, if I run that again, it's going to give me a different answer

01:09:00.220 | because it's now up to the second part of this loop. Now it returns the last five. Okay. So

01:09:10.540 | this is what an iterator does. Now, if you pass an iterator to Python's list,

01:09:21.260 | it runs through the entire iterator until it's finished and creates a list of the results.

01:09:27.020 | And what does finished looks like? This is what finished looks like. If you call next

01:09:31.580 | and get stop iteration, that means you've run out. And that makes sense, because my loop,

01:09:38.380 | there's nothing left in it. So all of that is to say, we now have a way of taking a list and

01:09:46.540 | chunkifying it. So what if I now take my full image, image number one, chunkify it into chunks

01:09:56.620 | of 28 long and turn that into a list and plot it. We have successfully created an image.

01:10:06.780 | So that's good.

01:10:10.380 | Now, we are done. But there are other ways to create this iterator. And because iterators

01:10:29.660 | and generators, which are closely related, so important, I wanted to show you more about how

01:10:37.100 | to do them in Python. It's one of these things that if you understand this, you'll often find

01:10:44.700 | that you can throw away huge pieces of enterprise software and basically replace it with an iterator.

01:10:52.700 | It lets you stream things one bit at a time. It doesn't store it all in memory.

01:10:58.300 | It's this really powerful thing that once I show it to people, they suddenly go like, oh, wow,

01:11:05.980 | we've been using all this third party software and we could have just created a Python iterator.

01:11:12.300 | Python comes with a whole standard library module called edit tools just to make it easier to work

01:11:19.900 | with iterators. I'll show you one example of something from edit tools, which is iseless.

01:11:26.540 | So let's grab our values again, these 10 values.

01:11:32.540 | Okay. So let's take these 10 values and we can take any list and turn it into an iterator

01:11:45.100 | by passing it to itter, which I should call it. So I don't override this Python.

01:11:55.340 | That's not a keyword, but this thing I don't want to override.

01:11:57.500 | So this is now basically something that I can call. Actually, let's do this.

01:12:03.180 | I'll show you that I can call next on it. So if I now go next it,

01:12:08.620 | you can see it's giving me each item one at a time.

01:12:18.300 | Okay. So that's what converting it into an iterator does.

01:12:21.580 | I slice, converts it into a different kind of iterator. Let's call this maybe I slice iterator.

01:12:42.940 | And so you can see here, what it did was it jumped.

01:12:53.340 | Stop here. So that's what had been better. So I should query,

01:13:03.180 | create the iterator and then call next a few times. Sorry. This is what I went to do.

01:13:11.340 | It's now only returning the first five before it calls stop iteration before it raises stop

01:13:17.100 | iteration. So what I slice does is it grabs the first N things from an iterable,

01:13:24.620 | something that you can iterate. Why is that interesting?

01:13:33.340 | Because I can pass it to list for example. Right. And now if I pass it to list again,

01:13:42.540 | this iterator has now grabbed the first five things. So it's now up to thing number six.

01:13:48.620 | So if I call it again, it's the next five things. And if I call it again, then there's nothing left.

01:13:59.180 | And maybe you can see we've actually now got this defined, but we can do it with I slice.

01:14:06.940 | And here's how we can do it. It's actually pretty tricky.

01:14:10.460 | It in Python, you can pass it something like a list to create an iterator,

01:14:19.180 | or you can pass it. Now this is a really important word. A callable. What's a callable?

01:14:25.260 | A callable is generally speaking, it's a function. It's something that you can put parentheses after.

01:14:31.660 | Could even be a class. Anything you can put parentheses after,

01:14:38.780 | you can just think of it for now as a function. So we're going to pass it a function.

01:14:42.300 | And in the second form, it's going to be called until the function returns

01:14:50.620 | this value here, which in this case is empty list. And we just saw that I slice will return empty

01:14:55.500 | list when it's done. So this here is going to keep calling this function again and again and again.

01:15:07.260 | And we've seen exactly what happens because we've called it ourselves before.

01:15:12.300 | There it is. Until it gets an empty list. So if we do it with 28, then we're going to get

01:15:22.940 | our image again. So we've now got two different ways of creating exactly the same thing.

01:15:34.460 | And if you've never used iterators before, now's a good time to pause the video and play with them,

01:15:43.340 | right? So for example, you could take this here, right? And if you've not seen lambdas before,

01:15:49.500 | they're exactly the same as functions, but you can define them in line. So let's replace that

01:15:53.900 | with a function. Okay, so now I've turned it into a function and then you can experiment with it.

01:16:03.900 | So let's create our iterator

01:16:07.900 | and call f on it. Well, not on it, call f. And you can see there's the first 28.

01:16:19.900 | And each time I do it, I'm getting another 28. Now the first two rows are all empty, but finally,

01:16:27.180 | look, now I've got some values. Call it again. See how each time I'm getting something else.

01:16:32.380 | Just calling it again and again. And that is the values in our iterator. So that gives you a sense

01:16:39.420 | of like how you can use Jupyter to experiment. So what you should do is as soon as you hit

01:16:48.220 | something in my code that doesn't look familiar to you, I recommend pausing the video and experimenting

01:16:57.100 | with that in Jupyter. And for example, itter, most people probably have not used itter at all,

01:17:05.660 | and certainly very few people have used this to argument form. So hit shift tab a few times,

01:17:10.540 | and now you've got at the bottom, there's a description of what it is. Or find out more.

01:17:18.380 | Python itter. Here we are, go to the docs. Well, that's not the right bit of the docs.

01:17:31.020 | See API? Wow, crazy. That's terrible. Let's try searching here. There we go. That's more like it.

01:17:52.060 | So now you've got links. So if it's like, okay, it returns an iterator object, what's that?

01:17:56.620 | Well, click on it. Find out. Now this is really important to know. And here's that stop exception

01:18:01.020 | that we saw. So stop iteration exception. We saw next already. We can find out what iterable is.

01:18:07.900 | And here's an example. And as you can see, it's using exactly the same approach that we did,

01:18:15.740 | but here it's being used to read from a file. This is really cool. Here's how to read from a file.

01:18:22.060 | 64 bytes at a time until you get nothing processing it, right? So the docs of Python are

01:18:29.420 | quite fantastic. As long as you use them, if you don't use them, they're not very useful at all.

01:18:37.500 | And I see Seifer in the comments, our local Haskell programmer,

01:18:47.900 | appreciating this Haskellness in Python. That's good. It's not quite Haskell, I'm afraid,

01:18:53.660 | but it's the closest we're going to come. All right. How are we going for time? Pretty good.

01:19:01.980 | Okay. So now that we've got image, which is a list of lists and each list is 25 long,

01:19:15.500 | we can index into it. So we can say image 20. Well, let's do it. Image 20.

01:19:22.380 | Okay. Is a list of 28 numbers. And then we could index into that.

01:19:32.860 | Okay. So we can index into it. Now, normally, we don't like to do that for matrices. We would

01:19:43.340 | normally rather write it like this. Okay. So that means we're going to have to create our own class

01:19:51.100 | to make that work. So to create a class in Python, you write class, and then you write the name of it.

01:19:59.900 | And then you write some really weird things. The weird things you write have two underscores,

01:20:08.380 | a special word, and then two underscores. These things with two underscores on each side are

01:20:13.980 | called dunder methods, and they're all the special magically named methods which have particular

01:20:20.700 | meanings to Python. And you're just going to learn them, but they're all documented in the Python

01:20:27.340 | object model. Edit object model. Yay, finally. Okay. So it's called data model, not object model.

01:20:42.140 | And so this is basically where all the documentation is about absolutely everything,

01:20:48.620 | and I can click under edit, and it tells you basically this is the thing that constructs

01:20:53.580 | objects. So any time you want to create a class that you want to construct, it's going to store

01:21:02.300 | some stuff. So in this case, it's going to store our image. You have to define dunder in it.

01:21:07.900 | Python's slightly weird in that every method, you have to put self here. For reasons we probably

01:21:17.660 | don't really need to get into right now. And then any parameters. So we're going to be creating an

01:21:22.860 | image passing in the thing to store, the Xs. They're going to be passing in the Xs. And so here we're

01:21:29.580 | just going to store it inside the self. So once I've got this line of code, I've now got something

01:21:35.340 | that knows how to store stuff, the Xs inside itself. So now I want to be able to call square bracket

01:21:43.420 | 20 comma 15. So how do we do that? Well, basically part of the data model is that there's a special

01:21:53.660 | thing called dunder get item. And when you call square brackets on your object, that's what Python

01:21:59.820 | uses. And it's going to pass across the 20 comma 15 here as indices. So we're now basically just

01:22:11.340 | going to return this. So the self.Xs with the first index and the second index. So let's create that

01:22:20.620 | matrix class and run that. And you can now see M 20 comma 15 is the same. Oh, quick note on, you know,

01:22:30.060 | ways in which my code is different to everybody else's, which it is. It's somewhat unusual to put

01:22:37.100 | definitions of methods on the same line as as the the signature like this. I do it quite a lot for

01:22:47.580 | one-liners. As I kind of mentioned before, I find it really helps me to be able to see all the code

01:22:53.580 | I'm working with on the screen at once. A lot of the world's best programmers actually have

01:23:00.060 | had that approach as well. It seems to work quite well for some people that are extremely productive.

01:23:05.180 | It's not common in Python. Some people are quite against it. So if you're at work, and your

01:23:13.260 | colleagues don't write Python this way, you probably shouldn't either. But if you can get away with it,

01:23:18.940 | I think it works quite well. Anyway, okay, so now that we've created something that lets us index

01:23:23.420 | into things like this, we're allowed to use PyTorch because we're allowed to use this one feature in

01:23:29.660 | PyTorch. Okay, so we can now do that. And so now to create a tensor, which is basically a lot like

01:23:42.700 | our matrix, we can now pass a list into tensor to get back a tensor version of that list, or perhaps

01:23:53.180 | more interestingly, we could pass in a list of lists. Maybe let's give this a name.

01:24:04.540 | Whoopsie dozy. That needs to be a list of lists, just like we had before for our image.

01:24:18.380 | In fact, let's do it for our image. Let's just pass in our image. There we go. And so now we

01:24:27.980 | should be able to say tens 20 comma 15. And there we go. Okay, so we've successfully reinvented that.

01:24:43.020 | All right. So now we can convert all of our lists into tenses. There's a convenient

01:24:55.980 | way to do this, which is to use the map function in the Python standard library.

01:25:02.220 | So it takes a function and then some iterables. In this case, one iterable. And it's going to

01:25:14.060 | apply this function to each of these four things and return those four things. And so then I can

01:25:19.900 | put four things on the left to receive those four things. So this is going to call tensor x_train

01:25:26.140 | and put it in x_train. Tensor y_train, put it in y_train, and so forth. So this is converting

01:25:30.780 | all of these lists to tenses and storing them back in the same name. So you can see that

01:25:37.580 | x_train now is a tensor. So that means it has a shape property. It has 50,000 images in it,

01:25:46.860 | which are each 784 long. And you can find out what kind of stuff it contains by calling its .type.

01:25:57.180 | So it contains floats. So this is the tensor class. We'll be using a lot of it.

01:26:02.620 | So of course, you should read its documentation.

01:26:05.420 | I don't love the PyTorch documentation. Some of it's good. Some of it's not good.

01:26:12.860 | It's a bit all over the place. So here's tensor. But it's well worth scrolling through to get a

01:26:18.060 | sense of like, this is actually not bad, right? It tells you how you can construct it. This is how I

01:26:21.820 | constructed one before, passing it lists of lists. You can also pass it NumPy arrays.

01:26:27.660 | You can change types. So on and so forth. So, you know, it's well worth reading through. And like,

01:26:39.980 | you're not going to look at every single method it takes, but you're kind of, if you browse through

01:26:44.060 | it, you'll get a general sense, right? That tensors do just about everything you couldn't think of

01:26:51.340 | for a numeric programming. At some point, you will want to know every single one of these,

01:26:57.820 | or at least be aware roughly what exists. So you know what to search for in the docs.

01:27:03.980 | Otherwise you will end up recreating stuff from scratch, which is much, much slower

01:27:09.180 | than simply reading the documentation to find out it's there.

01:27:14.300 | All right. So instead of, instead of calling chunks or I slice, the thing that is roughly

01:27:21.340 | equivalent in a tensor is the reshape method. So reshape, so to reshape our 50,000 by 784 thing,

01:27:30.380 | we can simply, we want to turn it into 50,000 28 by 28 tensors. So I could write here,

01:27:40.620 | reshape to 50,000 by 28 by 28. But I kind of don't need to, because I could just put minus one here

01:27:51.260 | and it can figure out that that must be 50, that must be 50,000 because it knows that I have 50,000

01:27:58.620 | by 784 items. So it can figure out, so minus one means just fill this with all the rest.

01:28:04.140 | Okay. Now, what does the word tensor mean?

01:28:15.900 | So there's some very interesting history here.

01:28:21.020 | And I'll try not to get too far into it because I'm a bit over enthusiastic about this stuff,

01:28:27.740 | I must admit. I'm very, very interested in the history of tensor programming and array programming.

01:28:33.980 | And it basically goes back to a language called APL. APL is basically originally a mathematical

01:28:43.340 | notation that was developed in the mid to late 50s, 1950s. And at first it was used as a notation

01:28:53.260 | for defining how certain new IBM systems would work. So it was all written out in this notation.

01:29:02.140 | It's kind of like a replacement for mathematical notation that was designed to be more consistent

01:29:08.860 | and kind of more expressive. In the early 60s, so the guy who wrote and made it was called his

01:29:18.060 | name Ken Iverson. In the early 60s, some implementations that actually allow this notation

01:29:24.620 | to be executed on a computer appeared. Both the notation and the executable implementations

01:29:31.020 | slightly confusingly are both called APL. APL's been in constant development ever since that time.

01:29:37.740 | And today is one of the world's most powerful programming languages. And you can try it by

01:29:42.700 | going to try APL. And why am I mentioning it here? Because one of the things Ken Iverson did, well,

01:29:51.420 | he studied an area of physics called tensor analysis. And as he developed APL, he basically

01:30:00.860 | said like, oh, what if we took these ideas from tensor analysis and put them into a programming

01:30:05.180 | language? So in APL, you can and have been able to for some time can basically you can define a

01:30:16.220 | variable. And rather than saying equals, which is a terrible way to define things really

01:30:24.140 | mathematically, because that has a very different meaning most of the time in math. Instead,

01:30:28.860 | we use arrow to define things. We can say, okay, that's going to be a tensor

01:30:35.260 | like so. And then we can look at their contents of A and we can do things like, oh, what if we do A

01:30:44.460 | times three or A minus two and so forth. And as you can see, what it's doing is it's taking

01:30:58.860 | all the contents of this tensor and it's multiplying them all by three or subtracting

01:31:04.300 | two from all of them. Or perhaps more fun, we could put into B a different tensor.

01:31:11.580 | And we can now do things like A divided by B. And you can see it's taking each

01:31:22.860 | of A and dividing by each of B. Now, this is very interesting because now we don't have to write

01:31:34.780 | loops anymore. We can just express things directly. We can multiply things by scalars,

01:31:41.260 | even if they're, this is called a rank one tensor. That is to say it's basically in math,

01:31:47.500 | we'd call it a vector. We can take two vectors and can divide one by the other and so forth.

01:31:53.900 | It's a really powerful idea. Funnily enough, APL didn't call them tensors, even though

01:32:01.020 | Ken Iverson said he got this idea from tensor analysis, APL calls them arrays.

01:32:09.340 | NumPy, which was heavily influenced by APL, also calls them arrays. For some reason, PyTorch,

01:32:18.140 | which is very heavily influenced by APL, sorry, by NumPy, doesn't call them arrays,

01:32:23.820 | it calls them tensors. They're all the same thing. They are rectangular blocks of numbers.

01:32:37.020 | They can be one-dimensional, like a vector. They can be two-dimensional, like a matrix.

01:32:41.900 | They can be three-dimensional, which is like a bunch of stacked matrices,

01:32:46.380 | like a batch of matrices and so forth. If you are interested in APL, which I hope you are,

01:33:01.420 | we have a whole APL and array programming section on our forums, and also we've prepared

01:33:08.220 | a whole set of notes on every single glyph in APL, which also covers all kinds of interesting

01:33:23.420 | mathematical concepts, like complex direction and magnitude, and all kinds of fun stuff like that.

01:33:36.780 | That's all totally optional, but a lot of people who do APL say that they feel like they've become

01:33:42.540 | a much better programmer in the process, and also you'll find here at the forums a set of 17 study

01:33:51.180 | sessions of an hour or two each, covering the entirety of the language, every single glyph.

01:33:57.020 | That's all where this stuff comes from. This batch of 50,000 images, 50,000 28 by 28 images,

01:34:10.540 | is what we call a rank 3 tensor in PyTorch. In NumPy, we would call it an array with three

01:34:20.220 | dimensions. Those are the same thing. What is the rank? The rank is just the number of dimensions.

01:34:30.220 | It's 50,000 images of 28 high by 28 wide, so there are three dimensions that is the rank

01:34:38.220 | of the tensor. If we then pick out a particular image, then we look at its shape,

01:34:46.380 | we could call this a matrix. It's a 28 by 28 tensor, or we could call it a rank 2 tensor.

01:34:56.860 | A vector is a rank 1 tensor. In APL, a scalar is a rank 0 tensor, and that's the way it should be.

01:35:06.140 | A lot of languages and libraries don't, unfortunately, think of it that way.

01:35:11.420 | So what is a scalar? It's a bit dependent on the language. Okay, so we can index into

01:35:17.180 | the zeroth image, 20th row, 15th column to get back this same number.

01:35:33.660 | Okay, so we can take x_train.shape, which is 50,000 by 784,

01:35:44.380 | and you can destructure it into n, which is the number of images, and c, which is the number of

01:35:54.620 | the full number of columns, for example. And we can also, well this is actually part of the standard

01:36:03.980 | library, so we're allowed to use min, so we can find out in y_train what's the smallest number,

01:36:09.180 | and what's the maximum number, so they go from 0 to 9. So you see here, it's not just the number 0,

01:36:18.780 | it's a scalar tensor, 0. They act almost the same, most of the time. So here's some example of a bit

01:36:27.580 | of the y_train, so you can see these are basically, this is going to be the labels,

01:36:33.500 | right, these are our digits, and this is its shape, so there's just 50,000 of these labels.

01:36:48.380 | Okay, and so since we're allowed to use this in the standard library, well it also exists in PyTorch,

01:36:53.500 | so that means we're also allowed to use the .min and .max properties.

01:36:57.740 | All right, so before we wrap up, we're going to do one more thing, and I don't know what the,

01:37:06.140 | we would call kind of anti-cheating, but according to our rules, we're allowed to use random numbers

01:37:14.460 | because there is a random number generator in the Python standard library, but we're going to do

01:37:20.860 | random numbers from scratch ourselves, and the reason we're going to do that is even though

01:37:26.700 | according to the rules we could be allowed to use the standard library one, it's actually extremely

01:37:31.420 | instructive to build our own random number generator from scratch, well at least I think so.

01:37:38.060 | Let's see what you think. So there is no way normally in software to create a random number,

01:37:52.540 | unfortunately. Computers, you know, add, subtract, times, logic gates, stuff like that.

01:38:04.940 | So how does one create random numbers? Well you could go to the Australian National University

01:38:09.580 | Quantum Random Number Generator, and this looks at the quantum fluctuations of the vacuum

01:38:17.500 | and provides an API which will actually hook you in and return quantum random fluctuations of

01:38:29.500 | the vacuum. So that's about, that's the most random thing I'm aware of, so that would be one way to get

01:38:35.180 | random numbers. And there's actually an API for that, so there's a bit of fun. You could do what

01:38:44.700 | Cloudflare does. Cloudflare has a huge wall full of lava lamps, and it uses the pixels of a camera

01:39:04.140 | looking at those lava lamps to generate random numbers. Intel nowadays actually has something

01:39:13.020 | in its chips, which you can call rdrand, which will return random numbers on certain

01:39:25.420 | Intel chips from 2012. All of these things are kind of slow, they can kind of get you one random

01:39:34.700 | number from time to time. We want some way of getting lots and lots of random numbers,

01:39:40.460 | and so what we do is we use something called a pseudorandom number generator.

01:39:46.140 | A pseudorandom number generator is a mathematical function that you can call lots of times,

01:39:54.220 | and each time you call it, it will give you a number that looks random.

01:40:02.540 | To show you what I mean by that, I'm going to run some code.

01:40:07.340 | I've created a function which we'll look at in a moment called rand, and if I call rand 50 times

01:40:17.900 | and plot it, there's no obvious relationship between one call and the next. That's one thing

01:40:25.420 | that I would expect to see from my random numbers. I would expect that each time I call rand,

01:40:31.500 | the numbers would look quite different to each other. The second thing is, rand is meant to

01:40:36.940 | be returning uniformly distributed random numbers, and therefore if I call it lots and lots and lots

01:40:43.180 | of times and plot its histogram, I would expect to see exactly this, which is each from 0 to 0.1,

01:40:51.980 | there's a few, from 0.1 to 0.2, there's a few, from 0.2 to 0.3, there's a few. It's a fairly evenly

01:40:58.140 | spread thing. These are the two key things I would expect to see, an even distribution of random

01:41:03.660 | numbers and that there's no correlation or no obvious correlation from one to the other.

01:41:08.620 | We're going to try and create a function that has these properties. We're not going to derive it

01:41:15.580 | from scratch. I'm just going to tell you that we have a function here called the Wickman-Hill

01:41:19.660 | algorithm. This is actually what Python used to use back in before Python 2.3, and the key reason

01:41:26.140 | we need to know about this is to understand really well the idea of random state. Random state is a

01:41:33.580 | global variable. It's something which is, or at least it can be, most of the time when we use it,

01:41:39.740 | we use it as a random variable, and it's just basically one or more numbers. So we're going

01:41:44.780 | to start with no random state at all, and we're going to create a function called seed that we're

01:41:50.220 | going to pass something to, and I just meshed the keyboard to create this number. Okay, so this is

01:41:55.500 | my random number. You could get this from the ANU quantum vacuum generator or from cloud fairs lava

01:42:02.700 | lamps or from your Intel chips ID rand, or you know in Python land we'd pretty much always use

01:42:08.380 | the number 42. Any of those are fine. So you pass in some number or you can pass in the current tick

01:42:14.300 | count in nanoseconds. There's various ways of getting some random starting point, and if we

01:42:19.900 | pass it into seed it's going to do a bunch of modular divisions and create a tuple of three things,

01:42:31.580 | and it's going to store them in this global state. So rand state now contains three numbers.

01:42:39.740 | Okay, so why did we do that? The reason we did that is because now this function,

01:42:47.660 | which takes our random state, unpacks it into three things, and does again a bunch of

01:42:54.780 | modifications and modulus, and then sticks them together with various kind of weights. Modulo one,

01:43:02.220 | so this is how you can pull out the decimal part. This returns random numbers, but the key thing I

01:43:09.980 | want you to understand is that we pull out the random state at the start. We do some math thingies

01:43:17.020 | to it, and then we store new random state, and so that means that each time I call this I'm going to

01:43:26.460 | get a different number. Okay, so this is a random number generator, and this is really important

01:43:34.860 | because lots of people in the deep learning world screw this up, including me sometimes,

01:43:41.100 | which is to remember that random number generators rely on this state.

01:43:49.180 | So let me show you where that will get you if you're not careful.

01:43:54.060 | If we use this special thing called fork, that creates a whole separate copy of this Python

01:44:02.060 | process. In one copy, os.fork returns true. In the other copy, it returns false,

01:44:11.180 | roughly speaking. So this copy here is this, and if I say this version here, the true version,

01:44:19.500 | is the original non-copied, it's called the parent, and so in my else here, so this will

01:44:25.180 | only be called by the parent. This will only be called by the copy, it's called the child.

01:44:29.980 | And each one I'm calling rand. These are two different random numbers, right?

01:44:34.540 | Wrong. They're the same number. Now, why is that? That's because this process here and this process

01:44:46.380 | here are copies of each other, and therefore they each contain the same numbers in random state.

01:44:55.900 | So this is something that comes up in deep learning all the time, because in deep learning,

01:45:03.340 | we often do parallel processing, for example, to generate lots of augmented images at the same time

01:45:12.620 | using multiple processes. Fast AI used to have a bug, in fact, where we failed to correctly

01:45:19.900 | initialize the random number generator separately in each process. And in fact, to this day,

01:45:26.940 | at least as of October 2022, torch.rand itself, by default, fails to initialize the random number

01:45:37.340 | generator. That's the same number. Okay, so you've got to be careful. Now, I have a feeling

01:45:47.340 | NumPy gets it right. Let's check.

01:45:52.140 | Is that how you do it? I don't quite remember. We'll try.

01:46:02.220 | Nope. Okay, NumPy also doesn't. How interesting. What about Python?

01:46:14.220 | Oh, look at that.

01:46:32.380 | So Python does actually remember to reinitialize the random stream in each fork.

01:46:41.820 | So, you know, this is something that, like, even if you've experimented in Python and you think

01:46:46.060 | everything's working well in your data loader or whatever, and then you switch to PyTorch or NumPy

01:46:50.700 | and now suddenly everything's broken. So this is why we've spent some time re-implementing

01:46:56.860 | the random number generator from scratch, partly because it's fun and interesting and partly

01:47:03.020 | because it's important that you now understand that when you're calling rand or any random number

01:47:08.380 | generator, kind of the default versions in NumPy and PyTorch, this global state is going to be

01:47:14.940 | copied. So you've got to be a bit careful. Now, I will mention our random number generator. Okay,

01:47:26.060 | so this is this is cool. Percent time at percent is a special Jupyter or IPython function.

01:47:34.700 | And percent timer runs a piece of Python code this many times. So to call it 10 times,

01:47:41.740 | well, actually, it'll do seven loops and each one will be seven times and it'll take the mean and

01:47:46.060 | standard deviation. So here I am going to generate random numbers 7,840 times and put them into 10

01:48:01.180 | long chunks. And if I run that, it takes me three milliseconds per loop. If I run it

01:48:10.700 | using PyTorch, this is the exact same thing in PyTorch. It's going to take me 73 microseconds

01:48:20.860 | per loop. So as you can see, although we could use our version, we're not going to because the

01:48:27.180 | PyTorch version is much, much faster. This is how we can create a 784 by 10. And why would we want

01:48:33.420 | this? That's because this is our final layer of our neural net or if we're doing a linear classifier,

01:48:39.260 | our linear weights, we need to be 784 because that's 28 by 28 by 10 because that's the number

01:48:48.060 | of possible outputs, the number of possible digits. All right. That is it. So quite the

01:48:58.940 | intense lesson. I think we can all agree. Should keep you busy for a week. And thanks very much

01:49:06.060 | for joining. And see you next time. Bye, everybody.

Lesson 10: Deep Learning Foundations to Stable Diffusion, 2022

Chapters