Back to Index

Lesson 3: Practical Deep Learning for Coders 2022


Chapters

0:0 Introduction and survey
1:36 "Lesson 0" How to fast.ai
2:25 How to do a fastai lesson
4:28 How to not self-study
5:28 Highest voted student work
7:56 Pets breeds detector
8:52 Paperspace
10:16 JupyterLab
12:11 Make a better pet detector
13:47 Comparison of all (image) models
15:49 Try out new models
19:22 Get the categories of a model
20:40 What’s in the model
21:23 What does model architecture look like
22:15 Parameters of a model
23:36 Create a general quadratic function
27:20 Fit a function by good hands and eyes
30:58 Loss functions
33:39 Automate the search of parameters for better loss
42:45 The mathematical functions
43:18 ReLu: Rectified linear function
45:17 Infinitely complex function
49:21 A chart of all image models compared
52:11 Do I have enough data?
54:56 Interpret gradients in unit?
56:23 Learning rate
60:14 Matrix multiplication
64:22 Build a regression model in spreadsheet
76:18 Build a neuralnet by adding two regression models
78:31 Matrix multiplication makes training faster
81:1 Watch out! it’s chapter 4
82:31 Create dummy variables of 3 classes
83:34 Taste NLP
87:29 fastai NLP library vs Hugging Face library
88:54 Homework to prepare you for the next lesson

Transcript

Hi everybody and welcome to lesson three of practical deep learning for coders we did a quick survey this week to see how people feel that the course is tracking and Over half a few think it's about right-paced and of the rest who aren't Some of you think it's a bit slow and some of you think it's a bit to sorry I'm gonna think it's a bit slow and some of you think it's a bit fast So hopefully we're that's about the best we can do Generally speaking the first two lessons are a little more easy pacing for anybody who's already familiar with the kind of basic technology pieces and then the later lessons get you know more into kind of some of the foundations and today we're going to be talking about You know things like the matrix multiplications and gradients and capitalists and stuff like that So for those of you who are more mathy and less computer II you might find this one more comfortable and vice-a-versa so Remember that there is a official course updates thread where you can see all the up-to-date info about Everything you need to know and of course the course website As well so by the time you know you watch the video of the lesson It's pretty likely that if you come across a question or an issue somebody else will have so definitely search the forum and check the facts First and then of course feel free to ask a question yourself on the forum if you can't find your answer One thing I did want to point out which you'll see in the lessons thread and the course website is There is also a lesson zero lesson zero is based heavily on Radix book meta learning which internally is based heavily on all the things that I've said over the years about how to learn fast AI It's we try to make the course full of Tickbits about the science of learning itself and put them into the course It's a different course to probably any other you've taken and it's I strongly recommend Watching lesson zero as well.

The last bit of lesson zero is about how to set up a Linux box from scratch which you can happily skip over unless that's of interest, but the rest of it is Full of juicy information that I think you'll find useful So the basic idea of What to do to do a faster AI lesson is?

Watch the lecture And I generally you know on the video recommend watching it all the way through without stopping once and then go back and Watch it with lots of pauses running the notebook as you go because otherwise you're kind of like running the notebook Without really knowing where it's heading if that makes sense And the idea of running the notebook is is you you know, there's a few notebooks you could go through So obviously there's the book so going through chapter one of the book going through chapter two of the book as notebooks running every code cell and Experimenting with inputs and outputs to try and understand what's going on And then trying to reproduce those results And then trying to repeat the whole thing with a different data set and if you can do that last step, you know, that's Quite a stretch goal particularly at the start of the course because there's so many new concepts But that really shows that you you've got it sorted now first third bit reproduce results I recommend using you'll find in the fastbook repo.

So the repository for the book. There is a special folder called Clean and clean contains all of the same chapters of the book But with all of the text removed except for headings and all the outputs removed And this is a great way for you to test your understanding of the chapter is before you run each cell Charter say to yourself.

Okay, what's this for? And what's it going to output if anything and if you kind of work through that slowly That's a great way at any time. You're not sure you can jump back to the Version of the notebook with a text to remind yourself and then head back over to the clean version So there's an idea for something which a lot of people find really useful for self-study I Say self-study, but of course as we've mentioned before The best kind of study is Study done to some extent with others for most people You know the research shows that you're more likely to stick with things if you're doing it That's kind of a bit of a social activity there.

The forums are a great place to find and create Study groups and you'll also find on the forums a link to our discord server So yes our discord server where there are some study groups there as well so I'd you know in person study groups virtual study groups are a great way to You know really make good progress and find other people at a similar level to you if there's not a Study group going at your level in your area in your time zone Create one.

So just post something saying hey, let's create a study group So this week there's been a lot of fantastic activity. I can't show all of it. So what I did was I used the summary functionality in the forums to grab all of the things with the highest votes and so I was quickly show a few of those we have a Marvel detector created this week Identify your favorite Marvel character.

I Love this a rock-paper-scissors game where you actually use pictures of the rock-paper-scissors symbols and apparently The computer always loses. That's my favorite kind of game There is a lot of Elon around so very handy to have an Elon detector to you know Either find more of him if that's what you need or maybe less of him I thought this one is very interesting.

I love these kind of really interesting ideas. It's like gee I wonder if this would work. Can you predict the average? temperature of an area based on a aerial photograph And the eye and apparently the answer is yeah, actually you can predict it pretty well here in Brisbane It was predicted.

I believed it was in one and a half Celsius I think this student is actually a genuine meteorologist if I remember correctly he built a cloud detector So then building on top of the what's your favorite Marvel character? There's now also an is it a Marvel character My daughter loves this one.

What dinosaur is this and I'm not as good about dinosaurs as I should be I feel like there's Ten times more dinosaurs than there was when I was a kid, so I'd never know their names. This is very handy This is cool. Choose your own adventure where you choose your path using facial expressions And I think this music genre classification Is also really cool Brian Smith created a Microsoft power app Application that actually runs on a mobile phone.

That's pretty cool I wouldn't be surprised to hear that Brian actually works at Microsoft so also an opportunity to promote his own stuff there I thought this art movement classifier was interesting in that like there's a really interesting discussion on the forum about What it actually shows about similarities between different art movements And I thought this reduction detector project was really was really cool As well, and there's a whole tweet thread and blog post and everything about this one particularly great piece of work Okay, so I'm going to Quickly show you a couple of little tips before we kind of jump into the mechanics of what's behind a neural network Which is I was playing a little bit with how do you make your neural network more accurate During the week and so I created this pet detector and this pet detector is not just predicting Predicting dogs or cats, but what breed is it?

That's obviously a much more difficult exercise Now because I put this out on hugging face spaces you can Download and look at my code because if you just click files and versions on the space which you can find a link on the Forum and the course website You can see them all here and you can download it to your own computer So I'll show you What I've got here now One thing I mentioned is today I'm using a different platform So in the past I've shown you Colab and I've shown you Kaggle And we've also looked at doing stuff on your own computer Not so much training models on your computer, but using the models you've trained to create applications Paper space is a another website a bit like Kaggle and Google But in particular they have a product called gradient notebooks Which is at least as I speak and things change all the time to check the course website but as I speak in my opinion is is by far the best platform for Running this course and for you know doing experimentation I'll explain why as we go so why haven't I been using the past two weeks?

Because I've been waiting for them to build some stuff for us to make it particularly good and they just they just finished So I've been using it all week, and it's totally amazing This is what it looks like so you've got a machine running in the cloud, but the thing that Was very special about it is it's a it's a real it's a real computer.

You're using It's not like that kind of weird virtual version of things that Kaggle or Colab has So if you whack on this button down here, you'll get a full version of JupyterLab Or you can switch over to a full version of plastic Jupyter notebooks And I'm actually going to do stuff in JupyterLab today because it's a pretty good environment for beginners who are not Familiar with the terminal which I know a lot of people in the course are in that situation.

You can do really everything Kind of graphically there's a file browser so here you can see I've got my pets repo It's got a git repository thing you can pull and push to git and then you can also Open up a terminal create new notebooks And so forth so what I tend to do with this is I tend to go into a full screen It's kind of like its own whole IDE And So you can see I've got here my my terminal Here's my notebook They have free GPUs and most importantly there's two good features one is that you can pay I think it's eight or nine dollars a month to get better GPUs and basically as many as you you know as many hours as you want And they have persistent storage so with Colab if you've played with it You might have noticed it's annoying you have to muck around with saving things to Google Drive and stuff on Kaggle There isn't really a way of Kind of having a persistent environment Where else on paper space you have you know whatever you save in your storage.

It's going to be there the next time you come come back so I'm going to be adding Walkthroughs of all of this functionality so look at so if you're interested in really taking advantage of this check those out Okay, so I think the main thing that I wanted you to take away from lesson 2 isn't necessarily all the details of how do you use a particular platform to train models and Deploy them into applications through through JavaScript or online platforms But the key thing I wanted you to understand was the concept.

There's really two pieces There's the training piece and at the end of the training piece you end up with this bottle pickle file, right? And once you've got that That's now a thing where you feed it inputs, and it spits out outputs Based on that model that you trained and then so you don't need You know because that happens pretty fast you generally don't need a GPU once you've got that trained And so then there's a separate step, which is deploying so I'll show you how I trained my pet classifier So you can see I've got two I Python notebooks One is app, which is the one that's going to be doing the inference and production one is the one where I train the model So this first bit I'm going to skip over because you've seen it before I create my image data loaders Check that my data looks okay with show batch train a resnet 34 and I get 7% accuracy So that's pretty good but Check this out.

There's a link here To a notebook I created actually most of the work was done by Ross Whiteman Where we can try to improve this by finding a better architecture There are I think at the moment in the PyTorch image models libraries over 500 Architectures and we'll be learning over the course You know what what they are how they differ, but you know broadly speaking they're all mathematical functions, you know, which are basically matrix multiplications and and these these nonlinearities such as Relus that we're talking about today But most of the time those details don't matter what we care about is three things how fast are they?

How much memory do they use and how accurate are they and so what I've done here with Ross is we've grabbed all of the Models from PyTorch image image models and you can see all the code. We've got is very very little code To create this this plot Now my screen resolutions a bit there we go.

Let's do that and so on this plot The next axis we've got seconds per sample. So how fast is it? So to the left is better who's faster and on the right is how accurate is it? So how how accurate was it on ImageNet in particular and so generally speaking you want things that are up towards the top and left Now we've been mainly working with ResNet and you can see down here Here's ResNet 18 now ResNet 18 is is a particularly small and fast version for prototyping We often use ResNet 34, which is this one here and you can see this kind of like classic model That's very widely used actually nowadays isn't the state-of-the-art anymore So we can start to look up at these ones up here and find out some of these better models the ones that seem to be the most accurate and fast for these levet models So I tried them out on my pets and I found that they didn't work particularly well.

So I thought okay Let's try something else out. So next up. I tried these conv-next models and This one in here was particularly interesting. It's kind of like super high accuracy. It's the you know, if you want 0.001 seconds inference time. It's the most accurate. So I tried that.

So how do we try that? All we do is I can say So the PyTorch image models is in the TIM module. So the very start I imported that And we can say list models and pass in a glob, a match and so this is going to show all the conv-next models and Here I can find the ones that I just saw and all I need to do is when I create the vision learner I just put the name of the model in as a string Okay, so you'll see earlier This one is not a string.

That's because it's a model that fast AI provides the library Fast AI only provides a pretty small number So if you install TIM, so you need to pip install TIM or condor install TIM You'll get hundreds more and you put that in a string So if I now train that, the time for these epochs goes from 20 seconds to 27 seconds.

So it is a little bit slower But the accuracy goes from 7.2 percent down to 5.5 percent. So, you know, that's a pretty big relative difference 7.2 divided by 5.5. Yeah, so about a 30 percent improvement. So that's pretty fantastic and you know, it's It's been a few years, honestly Since we've seen anything Really big ResNet that's widely available and usable on regular GPUs So this is this is a big step.

And so this is a you know, there's a few architectures nowadays that really are Probably better choices a lot of the time and these con so if you are not sure what to use Try these conv-next architectures You might wonder what the names are about. Obviously Tiny's more large etc.

Is how big is the model? So that'll be how much memory is it going to take up? How fast is it? and Then these ones here that say in 22 FT 1k These ones have been trained on more data. So image net there's two different image net data sets There's one that's got a thousand categories of pictures and there's another one.

It's about 22,000 categories of pictures So this is trained on the one with 22,000 categories pictures So these are generally going to be more accurate on kind of standard photos of natural objects Okay, so from there I exported my model and that's the end okay, so now I've trained my model and I'm all done You know other things you could do obviously is add more epochs for example Add image augmentation.

There's various things you can do. But you know, I found this this is actually pretty Pretty hard to beat this by much If any of you find you can do better, I'd love to hear about it So then I'd turn that into an application. I just did the same thing that we saw last week, which was to load the learner As is something I did want to show you The learner once we load it and call predict spits out a list of 37 numbers That's because there are 37 breeds of dog and cat.

So these are the probability of each of those breeds What order they are they in? That's an important question The answer is that fast AI always stores this information about categories This is a category in this case of dog or cat breed in something called the vocab object and it's inside the data loaders So we can grab those categories and that's just a list of strings just tells us the order So if we now zip together the categories and the probabilities we'll get back a dictionary that tells you well like so so here's that list of categories and here's the probability of each one and This was a basset hound so there you can see yep almost certainly a basset hounder So from there just like last week we can go and create our interface and then and then launch it And there we go, okay, so What did we just do really?

What is this magic? model pickle file So we can take a look at the model pickle file. It's an object type called a learner and A learner has two main things in it. The first is the list of pre-processing steps that you did to turn your images into things of the model and that's basically This information here So it's your data blocks or your image data loaders or whatever and then the second thing most importantly is the trained model and So you can actually grab the trained model by just grabbing the dot model attribute So I'm just going to call that m and then if I type m I can look at the model and so here it is Lots of stuff.

So what is this stuff? Well, we'll learn about it all over time, but basically what you'll find is It contains lots of layers because this is a deep learning model and you can see it's kind of like a tree That's because lots of the layers themselves consist of layers so there's a whole layer called the Tim body which is most of it and then right at the end there's a second layer called sequential and then the Tim body contains something called model and It can then it contains something called stem and something called stages and then stages can contain zero one two, etc So what is all this stuff?

Well, let's take a look at one of them So to take a look at one of them, there's a really convenient Method in pytorch called get sub module where we can pass in a kind of a dotted string Navigating through this hierarchy. So zero model stem one goes zero model stem one So this is going to return this layer norm 2d thing.

So what is this layer norm 2d thing? well, the key thing is It's got some code is with the mathematical function. We talked about and then the other thing that we learned about is it has Parameters and so we can list its parameters and look at this. It's just lots and lots and lots of numbers Let's grab another example.

We could have a look at zero dot model dot stages dot zero blocks dot one dot MLP dot FC one and parameters another big bunch of numbers So what's going on here? What are these numbers and where at earth did they come from and how come these? Numbers can figure out whether something is a basset hound or not Okay, so To answer that question we're going to have a look at a Kaggle notebook How does a neural network really work But a local version of it here which I'm going to take you through and the basic idea is Machine learning models are things that fit Functions to data.

So we start out with a very very flexible in fact an infinitely flexible as we've discussed function and your network and We get it to do a particular thing Which is to recognize the patterns in the data examples we give it so Let's do a much simpler example Than a neural network.

Let's do a quadratic So let's create a function f which is 3x squared plus 2x Plus one. Okay, so it's a quadratic with coefficients 3 2 and 1 So we can plot that function f and give it a title If you haven't seen this before things between dollar signs is what's called latex It's basically how we can create kind of typeset mathematical equations Okay, so let's run that And so here you can see the function here you can see the title I passed it and here is quadratic, okay, so what we're going to do is we're going to Imagine that we don't know that's the true Mathematical function we're trying to find as it's obviously much simpler than the function that figures out whether an image is a Basset hound or not that we're just going to start super simple So this is the real function and we're going to try to to recreate it from some data Now it's going to be very helpful if we have an easier way of creating different quadratics So I have to find a kind of a general form of a quadratic here If the with coefficients a b and c and at some particular point x it's going to be ax squared plus bx plus c And so let's test that Okay, so that's for x equals 1.5.

That's 3x squared plus 2x plus 1 which is the quadratic we were did before Now we're going to want to create lots of different quadratics to test them out and find out which one's best so this is a Somewhat advanced but very very helpful feature of Python that's worth learning if you're not familiar with it And it's used in a lot of programming languages.

It's called a partial application of a function. Basically. I want this exact function but I want to fix the values of a b and c to pick a particular quadratic and the way you fix the values of the function is you call this thing in Python called partial and you pass in the function and Then you pass in the values that you want to fix so for example If I now say make quadratic 3 2 1 that's going to create a quadratic equation with coefficients 3 2 and 1 And you can see if I then pass in so that's now f if I pass in 1.5.

I get the exact same value I did before Okay, so we've now got an ability to create any quadratic Equation we want by passing in the parameters of the coefficients of the quadratic That gives us a function that we can then just call as just like any normal function So that only needs one thing now, which is the value of x because the other three a b and c are now fixed So if we plot that function We'll get exactly the same shape because it's the same coefficients Okay so Now I'm going to show an example of of some data some data that Matches the shape of this function, but in real life data is never exactly going to match the shape of a function It's going to have some noise.

So here's a couple of functions to add some noise So you can see I've still got the basic functional form here, but this data is a bit dotted around it The level to which you look at how I implemented these is entirely up to you It's not like super necessary, but it's all stuff which you know the kind of things we use quite a lot So this is to create normally distributed random numbers This is how we set the seed so that each time I run this I've got to get the same random numbers This one is actually particularly helpful this creates a tensor so in this case a vector that goes from negative to to two in Equal steps and there's 20 of them.

That's why there's 20 steps along here So then my y values is just f of x With this amount of noise added Okay, so as I say the details of that don't matter too much. The main thing to know is we've got some Random data now and so this is the idea is now we're going to try to reconstruct the original Quadratic equation find one which matches this data So how would we do that?

Well what we can do is we can create a function called plot quadratic That first of all plots our data as a scatter plot and then it plots a function which is a quadratic a quadratic we pass in Now that's a very helpful thing for experimenting In Jupiter notebooks, which is the at interact?

Function if you add it on top of a function Then it gives you these nice little sliders So here's an example of a quadratic with coefficients 1.5 1.5 1.5 And it doesn't fit particularly well So how would we try to make this fit better? Well, I think what I'd do is I'd take the first slider and I would try moving it to the left and see if it looks better or worse That looks worse to me I think it needs to be more curvy so let's try the other way Yeah, that doesn't look bad let's do the same thing for the next slider have it this way No, I think that's worse.

Let's try the other way Okay final slider Try this way It's worse this way So you can see what we can do we can basically pick each of the coefficients One at a time try increasing a middle bit see if that improves it try decreasing it a little bit See if that improves it find the direction that improves it and then slide it in that direction a little bit and then when we're done we can go back to the first one and see if We can make it any better Now we've done that And actually you can see that's not bad because I know the answer is meant to be 3 2 1 so they're pretty close And I wasn't shooting I promise That's basically What we're going to do that's basically how those parameters Created but we obviously don't have time because the you know big fancy models have Often hundreds of millions of parameters.

We don't have time to try a hundred hundred million sliders, so we did something better Well the first step is we need a better idea of like when I move it is it getting better or is it getting worse? So if you remember back to after Samuel's Description of machine learning that we learned about chapter one of the book and in lesson one We need some Something we can measure which is a number that tells us how good is our model and if we had that then as we move The sliders we could check to see whether it's getting better or worse So this is called a loss function So there's lots of different loss functions you can pick but perhaps the most simple and common is Mean squared error which is going to be so it's going to get in our predictions And it's got the actuals and we're going to go predictions minus actuals squared and take the mean So that's mean squared so If I now rerun the exact same thing I had before but this time I'm going to calculate the loss the MSE between The values that we predict f of x Remember where f is the quadratic we created and the actuals y and this time I'm going to add a title to our function Which is the loss?

So now Let's do this more rigorously We're starting at a mean squared error of eleven point four six. So let's try moving this to the left and see if it gets better No, wait, so move it to the right All right, so around there, okay now let's try this one Okay best when I go to the right Okay, what about C 3.91?

It's getting worse So I keep going So we're about there and so now we can repeat that process, right? So we've we've had each of a B and C move a little bit. Let's go back to a Can I get any better than 3.28? Let's try moving left Yeah, that was a bit better and for B.

Let's try moving left worse right was better and Have it finally see move to the right Oh Definitely better There we go Okay, so That's a more rigorous approach It's still manual But at least we can like we don't have to rely on us to kind of recognize does it look better or worse?

So finally we're going to automate this So the key thing we need to know is for each parameter When we move it up Does the loss get better or when we move it down? Does the loss get better? One approach would be to try right? We could manually increase the parameter a bit and see if the loss improves and vice versa But there's a much faster way And the much faster way is to calculate its derivative So if you've forgotten what a derivative is, no problem.

There's lots of tutorials out there You could go to Khan Academy or something like that But in short the derivative is what I just said the derivative is a function that tells you If you increase the input does the output increase or decrease and by how much so that's called the slope or the gradient now the good news is Pytorch can automatically calculate that for you.

So if you went through Horrifying months of learning derivative rules in year 11 and worried you're going to have to remember them all again. Don't worry you don't You don't have to calculate any of this yourself. It's all done for you. Watch this So the first thing to do is we need a function that takes the coefficients of the quadratic a b and c as inputs I Put them all on the list.

You'll see why in a moment. I kind of call them parameters We create a quadratic passing in those parameters a b and c This star on the front is a very very common thing in Python Basically, it takes these parameters and spreads them out to turn them into a b and c and pass each of them to the function So we've now got a quadratic with those coefficients And then we return the mean squared error of our predictions against our actions So this is a function that's going to take the coefficients of a quadratic and return the loss So let's try it Okay, so if we start with a b and c at 1.5 we get a mean squared error of 11.46 It looks a bit weird it says it's a tensor So don't worry about that too much in short in Pytorch Everything is a tensor a tensor just means that you don't it doesn't just work with numbers It also works with lists or vectors of numbers.

That's got a 1d tensor Rectangles of numbers so tables of numbers. It's got a 2d tensor Layers of tables of numbers that's got a 3d tensor and so forth. So in this case, this is a single number But it's still a tensor. That means it's just wrapped up in the Pytorch Machinery that allows it to do things like calculate derivatives, but it's still just the number 11.46 All right, so what I'm going to do is I'm going to create my parameters a b and c and I'm going to put them all in A single 1d tensor a 1d tensor is also known as a rank 1 tensor So this is a rank 1 tensor and it contains the list of numbers 1.5 1.5 1.5 And then I'm going to tell Pytorch That I want you to calculate the gradient For these numbers whenever we use them in a calculation and the way we do that is we just say requires credit So here is our Tensor it contains 1.5 3 times and it also tells us it's we flagged it to say please calculate gradients for this particular tensor when we use it in calculations So let's now use it in the calculation.

We're going to pass it to that quad MSC. That's the function We just created that gets the MSC a mean squared error for a set of coefficients And not surprisingly, it's the same number we saw before 11.46. Okay Not very exciting But there is one thing that's very exciting which is added an extra thing to the end called grad function And this is the thing that tells us that if we wanted to Pytorch knows how to create calculate the gradients For our inputs and to tell Pytorch just please go ahead and do that calculation You call backward on the result of your loss function.

Now when I run it nothing happens It doesn't look like anything happens. But what does happen is it's just added an attribute called grad Which is the gradient to our inputs ABC. So if we run this cell This tells me that if I increase a the loss will go down If I increase B, the loss will go down a bit less You know if I increase C, the loss will go down Now we want the loss to go down Right.

So that means we should increase a B and C Well, how much by well given that a is says if you increase a even a little bit the loss Improves a lot that suggests we're a long way away from the right answer. So we should probably increase this one a lot This one the second most and this one the third most So this is saying when I increase This parameter the loss decreases.

So in other words, we want to adjust our parameters a B and C By the negative of these we want to increase increase increase So we can do that By saying, okay, let's take our ABC Minus equals so that means equals ABC minus the gradient But we're just going to like decrease it a bit.

We don't want to jump too far. Okay, so just we're just going to go A small distance. So we're going to we're just going to somewhat arbitrarily pick point. Oh one So that is now going to create a new set of parameters Which are going to be a little bit bigger than before because we subtracted negative numbers And we can now calculate the loss again so remember before It was eleven point four six So hopefully it's going to get better Yes, it did ten point one one There's one extra line of code which we didn't mention which is with torch dot no grad Remember earlier on we said that the parameter ABC requires grad and that means pytorch will automatically calculate Its derivative when it's used in a in a function Here it's being used in a function, but we don't want the derivative of this.

This is not our loss Right. This is us updating the gradients. So this is basically the standard in a part of a pytorch loop and every neural net deep learning machine pretty much every machine learning model At least of this style that your build basically looks like this If you look deep inside fast.io source code, you'll see something that basically looks like this So we could automate that right?

So let's just take those steps which is we're going to Calculate let's go back to here. We're going to calculate the mean squared error for our quadratic Call backward and then subtract the gradient times a small number from the gradient Let's do it five times So so far we're up to a loss of ten point one So we're going to calculate our loss call dot backward to calculate the gradients and then with no grad subtract the gradients times a small number and print how we're going and There we go.

The loss keeps improving So we now have Some coefficients And There they are three point two one point nine two point. Oh, so they're definitely heading in the right direction So That's basically how we do it's called optimization Okay, so you'll hear a lot in deep learning about optimizers.

This is the most basic kind of Optimizer, but they're all built on this principle of course It's called gradient descent and you can see why it's called gradient descent. We calculate the gradients and Then do a descent which is we're trying to decrease the loss so Believe it or not.

That's that's The entire foundations of how we create those parameters. So we need one more piece Which is what is the mathematical function that we're finding parameters for? We can't just use quadratics, right because it's pretty unlikely that the relationship between parameters and Whether a pixel is part of a basset hound is a quadratic.

It's going to be something much more complicated No problem It turns out that We can create an infinitely flexible function from this one tiny thing This is called a rectified linear unit The first piece I'm sure you will recognize It's a linear function. We've got our output Y our input X and coefficients M and B.

This is even simpler than our quadratic and This is a line And torch.clip is a function that takes the output Y and if it's greater than that number It turns it into that number. So in other words, this is going to take anything that's negative and make it zero So this function is going to do two things Calculate the output of a line and if it is bigger than or smaller than zero, it'll make it zero So that's rectified linear So let's use partial To take that function and set the M and B to one and one.

So this is now going to be this function here will be Y equals X plus one followed by this torch.clip And here's the shape okay as we'd expect it's a line Until it gets under zero When it comes to the line, it becomes a horizontal line So we can now do the same thing we can take this plot function and make it interactive using interact and We can see what happens when we change these two parameters M and B.

So we're now plotting the rectified linear and fixing M and B So M is the slope And B is the intercept for the shift up and down Okay so that's how those Work now, why is this interesting? Well, it's not interesting of itself but what we could do is we could take this rectified linear function and create a double value Which adds up to rectified linear functions together So there's some slope M1B1, some second slope N2B2.

We're going to calculate it at some point X and So let's take a look at what that function looks like if we plot it and You can see what happens is we get this downward slope and then a hook and then an upward slope So if I change M1, it's going to change the slope of that first bit and B1 is going to change its position Okay, and I'm sure you won't be surprised to hear that M2 changes the slope of the second bit and B2 changes that location Now this is interesting.

Why? Because we don't just have to do a double value We could add as many values together as we want And if we add as many values together as we want, then we can have an arbitrarily squiggly function and with enough values We can match it as close as we want right, so you could imagine incredibly squiggly like I don't know like an audio waveform of me speaking and If I gave you a hundred million values together, you could almost exactly match that Now we want functions that are not just That we've put in 2D We want things that can have more than one input but you can add these together across as many dimensions as you like and so exactly the same thing will give you a value over surfaces or a value over 3D, 4D, 5D and so forth and it's the same idea with this incredibly simple foundation You can construct an arbitrarily accurate precise model Problem is you need some numbers for them, you need parameters.

Oh, no problem. We know how to get parameters We use gradient descent So believe it or not We have just derived big money everything from now on is Tweaks to make it faster and make it need less data You know, this is this is it Now I remember a few years ago when I said something like this in a class Somebody on the forum was like this reminds me of that thing about how to draw an owl Jeremy is basically saying okay step one draw two circles step two, draw the rest of the owl The thing I find I have a lot of trouble explaining to students is when it comes to deep learning there's nothing between these two steps.

When you have values getting added together and gradient descent to optimize the parameters and samples of inputs and outputs that you want The computer draws the owl, right? That's it So we're going to learn about all these other tweaks and they're all very important But when you come down to like trying to understand something in deep learning, just try to keep coming back to remind yourself of what it's doing Which it's using gradient descent to set some parameters to make a wiggly function Which is basically the addition of lots of rectified linear units or something very similar to that match your data Okay, so we've got some questions on the forum Okay, so question from Zakiya with six upvotes so for those of you watching the video what we do in the lesson is we want to make sure that the Questions that you hear answered are the ones that people really care about So we pick the ones which get the most upvotes.

This question is Is there perhaps a way to try out all the different models and automatically find the best performing one? Yes, absolutely you can do that so If we go back to our training script remember there's this thing called list models and It's a list of strings. So you can easily add a for loop around this that basically goes you know for Architecture in Tim dot list models and you could do the whole lot which would be like that and then you could Do that and away you go It's going to take a long time for 500 and something models So generally speaking like I've I've never done anything like that myself I would rather look at a picture like this and say like okay.

Where am I in? the vast majority of the time this is something this would be the biggest I reckon number one mistake of Beginners I see is that they jump to these models From the start of a new project at the start of a new project. I pretty much only use ResNet 18 Because I want to spend all of my time Trying things out and I try different data augmentation.

I'm going to try different ways of cleaning the data I'm going to try you know Different external data I can bring in and so I want to be trying lots of things now I want to be able to try it as fast as possible, right? So Trying better architectures is the very last thing that I do and What I do is once I've spent all this time, and I've got to the point where I've got okay I've got my ResNet 18 or maybe you know ResNet 34 because it's nearly as fast And I'm like okay.

Well. How accurate is it? How fast is it? Do I need it more accurate for what I'm doing do I need it faster for what I'm doing? Could I accept some trade-off to make it a bit slower to make more accurate and so then I'll have a look and I'll say Okay, well I kind of need to be somewhere around 0.001 seconds, and so I try a few of these So that would be how I would think about that Okay next question from the forum is around how do I know if I have enough data?

What are some signs that indicate my problem needs more data? I think it's pretty similar to the architecture question, so you've got something out of data Presumably you've you know you've started using all the data that you have access to you built your model You've done your best Is it good enough?

Do you have the accuracy that you need for whatever it is you're doing? You can't know until you've trained the model, but as you've seen it only takes a few minutes to train a quick model so my very strong opinion is that the vast majority of Projects I see in industry wait far too long before they train their first model You know my opinion you want to train your first model on day one with whatever CSV files or whatever that you can hack together And you might be surprised that none of the fancy stuff You're thinking of doing is necessary because you already have a good enough accuracy for what you need Or you might find quite the opposite you might find that oh my god with we're basically getting no accuracy at all Maybe it's impossible These are things you want to know at the start Not at the end We'll learn lots of techniques both in this part of the course and in part two About ways to really get the most out of your data In particular there's a reasonably recent technique called semi-supervised learning Which actually lets you get dramatically more out of your data And we've also started talking already about data augmentation, which is a classic technique you can use So you generally speaking it depends how expensive is it going to be to get more data?

But also what do you mean when you say get more data? Do you mean more labeled data? Often it's easy to get lots of inputs and hard to get lots of outputs For example in medical imaging where I've spent a lot of time It's generally super easy to jump into the radiology archive and grab more CT scans But it's maybe very difficult and expensive to You know draw segmentation masks and and pixel boundaries and so forth on them So often you can get more You know in this case images Or text or whatever and maybe it's harder to get labels And again, there's a lot of stuff you can do using stuff things like we'll discuss semi-supervised learning to actually take advantage of unlabeled data as well Okay Final question here in the quadratic example where we calculated the initial derivatives for A B and C We got values of minus 10.8 minus 2.4, etc.

What unit are these expressed in? Why don't we adjust our parameters by these values themselves? So I guess the question here is why are we multiplying it by a small number? Which in this case is 0.01? Okay, let's take those two parts of the question What's the unit here the unit is for each increase in x of 1 how much does what sorry in for each increase in in a of 1 so if I increase a from this case We have 1.5.

So if we increase from 1.5 to 2.5 What would happen to the loss? And the answer is it would go down by 10.9 887 now, that's not exactly right because it's kind of like It's kind of like in an infinitely small space right because actually it's going to be curved Right, but if it if it stays its data, that's like that's what would happen so if we increased B by 1 The loss would decrease if it stayed constant You know if the slope stayed the same the loss would decrease by minus 2.1 to 2 Okay, so why would we not just Change it directly by these numbers.

Well, the reason is The reason is that if we have some function that we're fitting And there's some kind of interesting theory that says that once you get close enough to the Optimal value all functions look like quadratics anyway, right? So we can kind of safely draw it in this kind of shape Because this is what they end up looking like if you get close enough And we're like, let's say we're way out Over here.

Okay, so we were measuring I Used my daughter's favorite pens and I sparkly ones. So we're measuring the slope here There's a very steep slope So that seems to suggest we should jump a really long way. So we jump a really long way And what happened? Well, we jumped way too far.

And the reason is that that slope decreased as We moved a lot and so that's generally what's going to happen, right? Particularly as you approach the optimal is generally the slopes going to decrease So that's why we multiply the gradient by a small number And that small number is a very very very important number.

It has a special name It's called the learning rate And this is an example of a Hyper parameter, it's not a parameter. It's not one of the actual coefficients of your function But it's a parameter you use to calculate the parameters Pretty better, right? It's a hyper parameter. And so it's something you have to pick now.

We haven't picked any yet In any of the stuff we've done that I remember and that's because fast AI generally picks reasonable defaults For most things but later in the course we will learn about how to try and find really good Learning rates and you will find sometimes you need to actually spend some time finding a good learning rate You could probably understand the intuition here if you pick a learning rate, that's too big You'll jump too far And so you'll end up way over here and then you will try to Then jump back again and you'll jump too far the other way and you'll actually Diverge and so if you ever see when your model is training, it's getting worse and worse Probably means your learning rates too big What would happen on the other hand if you pick a learning rate that's too small?

Then you're going to Take tiny steps and of course the flatter it gets the smaller the steps are going to get and So you're going to get very very bored So finding the right learning rate is a compromise Between the speed at which you find the answer and the possibility that you're actually going to shoot past it and get worse and worse Okay, so one of the bits of feedback I got quite a lot in the survey is that people want a break halfway through Which I think is a good idea.

So I think now is a good time to have a break So let's come back in 10 minutes at 25 past 7 Okay, hope you had a good rest have a good break I should say So I want to now show you a really really important mathematical computational trick Which is we want to do a whole bunch of?

values All right, so we're going to be wanting to do a whole lot of MX plus base and we want don't just want to do MX plus B. We're going to want to have like lots of Variables so for example every single pixel of an image would be a separate variable so we're going to multiply every single one of those times some coefficient and then add them all together and then do the The crop the the ReLU and then we're going to do it a second time with a second bunch of parameters And then a third time and a fourth time and fifth time It's going to be pretty inconvenient to write out a hundred million ReLU's But so happens there's a mathematical single mathematical operation that does all of those things for us except for the final replace negatives with zeros and it's called matrix multiplication I Expect everybody at some point did matrix multiplication at high school.

I suspect also a lot of you have forgotten works when people talk about linear algebra in deep learning They give the impression you need years of graduate school study to learn all this linear algebra You don't actually all you need almost all the time is matrix multiplication and it couldn't be simpler I'm going to show you a couple of different ways The first is there's a really cool site called matrix multiplication dot XYZ you can put in any matrix you want So I'm going to put in This one So this matrix is saying I've got three rows of data with three variables So maybe they're tiny to the tiny images with three pixels and the value of the first one is 1 2 1 The second is 0 1 1 and the third is 2 3 1 So those are our three rows of data These are our three sets of coefficients.

So we've got a B and C in our data. So so I guess you'd call it x1 x2 and x3 and then here's our first set of coefficients a B and C 2 6 and 1 And then our second set is 5 7 and 8 So here's what happens when we do matrix multiplication that second this matrix here of coefficients gets flipped around And we do This is the multiplications and additions that I mentioned right?

So multiply and multiply add multiply add so that's going to give you the first number because that is the left hand column of the Second matrix times the first row so that gives you the top left result So the next one is going to give us two results, right?

So we've got now the right hand one with the top row and the left hand one with the second row Keep going down Go down And that's it that's what matrix multiplication is it's multiplying things together and adding them up So there'd be one more step to do to make this a layer of a neural network Which is if this had any negatives we replace them with zeros that's my matrix multiplication is the critical Foundation or mathematical operation and basically all of deep learning so the GPUs that we use the thing that they are good at is this matrix multiplication They have special cores called tensor cores Which we can basically only do one thing which is to multiply together two four by four matrices And then they do that lots of times with bigger matrices so I'm going to show you an example of this we're actually going to build a complete machine learning model on real data in the spreadsheet So Fast AI has become kind of famous for a number of things and one of them is using spreadsheets To create deep learning models.

We haven't done it for a couple of years. I'm pretty pumped to show this to you What I've done is I went over to cable Where there's a competition I actually helped create many years ago called Titanic And it's like an ongoing competition. So 14,000 people have entered it.

So 12 teams have entered it so far It's just a competition for a bit of fun There's no end date and the data for it is the data about Who Who survived and who didn't from the real Titanic disaster And so I clicked here on the download button to grab it on my computer that gave me a CSV Which I opened up in Excel The first thing I did then was I just removed a few columns that Clearly were not going to be important things like the name of the passengers the passenger ID just try to make it a bit simpler and so I've ended up with Each row of this is one passenger.

The first column is the dependent variable. The dependent variable is the thing we're trying to predict did they survive and The remaining are some information such as what class of the boat first second or third class for sex their age How many siblings in the family? So you should always look for a data dictionary to find out what's what number of parents and children, okay What was their fare and which of the three cities did they embark on?

Okay, so there's that data Now when I first grabbed it I noticed that There were some people with no age now There's all kinds of things we could do for that. But for this purpose, I just decided to remove them and I found the same thing for embarked. I removed the blanks as well But that left me with nearly all of the data, okay, so then I've put that over here Here's our data with those rows removed Okay, that's the so this these are the columns that came directly from Kaggle So basically what we now want to do is we want to multiply each of these by a coefficient How do you multiply the word male?

by a coefficient and How do you multiply s? coefficient You can't so I converted all of these two numbers male and female are very easy I created a column called is male and as you can see, there's just an if statement that says if sex is male That's one.

Otherwise, it's zero And we can do something very similar for them, but we can have one column called did they embark in Southampton? Same deal and another column for today. What's the court shows boom? Sure, but did they embark in Cherberg? And their P class is one two or three which is a number but it's not really It's not really a continuous measurement of something.

There isn't one or two or three things that different Levels, so I decided to turn those into similar things into these binary. They quote. These are called binary categorical variables So are they first class and? Are they second class? Okay, so that's all that The other thing that I was thinking well, you know that I kind of tried it and checked out what happened and what happened was the people with So I created some random numbers.

So to create the random numbers I just went equals Rand Right and I copied those to the right and then I just went copy and I went paste values So that gave me some random numbers and that's my like so just because like I was like before I said all a B and C Let's just start them at 1.5 1.5 1.5 what we do in real life is we start our parameters at random numbers That are a bit more or a bit less than 0 So these are random numbers Actually, sorry, I slightly lied.

I didn't use Rand. I used Rand minus 0.5 And that way I've got small numbers that were on either side of 0 So then when I took each of these and I multiplied them by Fairs and ages and so forth what happened was that these numbers here Way bigger than You know these numbers here and so in the end all that mattered was what was their fair?

That because they were just bigger than everything else So I wanted everything to basically go from 0 to 1 these numbers were too big So what I did up here is I just grabbed the maximum of this column the maximum of all the fairs is 512 and so then Actually, I do age first.

I did a maximum of age because a similar thing, right? There's 80 year olds and there's two year olds and So then I'm over here. I just did okay. Well, what's their age? Divided by the maximum and so that way all of these are between 0 and 1 Just like all of these are between 0 and 1 So that's how I fix this is called normalizing the data Now we haven't done any of these things when we've done stuff with fast AI That's because fast AI does all of these things for you And we'll learn about how right?

But it's all these things are being done behind the scenes The fair I did something a bit more which is I noticed there's some lots of very small fairs and There's also some a few very big fairs so like $70 and then $7 $7 Generally speaking when you have lots of really big numbers and a few small ones So generally speaking when you've got a few really big numbers and lots of really small numbers.

This is really common with with With money, you know because money kind of follows this relationship where a few people have lots of it And they spend huge amounts of it and most people don't have heaps If you take the log of something that's like that has that kind of extreme distribution You end up with something that's much more evenly distributed.

So I've added this here called log fair as you can see And these are all around one which isn't bad. I could have normalized that as well But I was too lazy. I didn't bother because it seemed okay So at this point you can now see that if we start from here All of these are all around the same kind of level, right?

So none of these columns are going to saturate the others So now I've got my coefficients which are just as I said, they're just random And so now I need to basically calculate Ax1 plus Bx2 plus Cx3 plus blah blah blah blah blah blah blah. Okay, and so to do that You Can use some product in Excel?

I could have typed it out by hand. It'd be very boring But some product is just going to multiply each of these This one will be multiplied by There is it subset by this one This one will be multiplied by this one so forth and then they get all added together Now one thing if you're eagle-eyed you might be wondering is in a linear equation We have y equals mx plus B at the end There's this constant term and I do not have any constant term I've got something here called const, but I don't have any plus at the end How do we how's that working?

Well, there's a nice trick that we pretty much always use in machine learning Which is to add a column of data just containing the number one every time If you have a column of data containing the number one every time and that parameter becomes your constant term So you don't have to have a special Constant term and so it makes out Code a little bit simpler when you do it that way.

It's just a trick but everybody does it Okay, so this is now the result of our linear model So this is not I'm not even going to do value right? I'm just going to do the plane regression, right Now if you've done regression before you might have learned about it as something you kind of solve with various matrix things But in fact, you can solve a regression using gradient descent So I've just kind of had and created a loss for each row.

And so the loss is going to be equal to Our prediction minus whether they survived squared so this is going to be our squared error And there they all are squared errors. And so here I've just Sum them up. I could have taken the mean. I guess that would have been a bit easier to think about But some is going to be give us the same result.

So here's our loss And so now we need to optimize that using gradient descent So Microsoft Excel has a gradient descent optimizer in it called solver So I'll click solver and it'll say okay, what are you trying to optimize? It's this one here and I'm going to do it by changing These cells here And I'm trying to minimize it.

And so we're starting a loss of 55.78 Actually, let's change it to mean as well The word mean or average average All right, so start at 1.03 So optimize that And there we go, so it's gone from 1.03 to 0.1 and so we can check the predictions so the first one It predicted exactly correctly It was they didn't survive and we predict wouldn't survive Ditto for this one It's very close and you can start to see So this one you can start to see a few issues here, which is like sometimes it's predicting less than one So it's less than zero and sometimes it's predicting more than one wouldn't it be cool if we had some way of Wouldn't it be cool if we had some way of constraining it to between zero and one and that's an example of some of the things We're going to learn about that make this stuff work a little bit better, right?

But you can see it's doing an okay job. So this is not deep learning This is not a neural net yet. This is just a regression So to make it into a neural net We need to do it multiple times So I'm just going to do it twice. So now rather than one set of coefficients I've got two sets and again, I just put in random numbers Other than that all the data is the same And so now I'm going to have My sum product again So the first sum product is with my first set of coefficients And my second sum product is with my second set of coefficients So I'm just calling them linear one and linear two Now there's no point adding those up together because if you add up two linear functions together you get another linear function We want to get all those wiggles, right?

So that's why we have to do our value So in microsoft excel value looks like this if the number is less than zero Use zero, otherwise use the number. So that's how we're going to replace the negatives with zeros Um, and then finally If you remember from our spreadsheet We have to add them together.

So we add the values together So that's going to be our prediction and then our loss is the same as the other sheet. It's just survived minus prediction squared And let's change that to mean Not mean average Okay, so let's try solving that Optimize a h1 and this time we're changing all of those So So this is using gradient descent Excel solvers not the fastest in the world, but it gets the job done Okay, let's see how we went 0.08 for our deep learning model versus 0.1 for our regression.

So it's a bit better So there you go. So we've now created our first deep learning neural network from scratch And we did it in microsoft excel everybody's favorite artificial intelligence tool So that was a bit, um slow and painful Be a bit faster and easier if we used matrix multiplication.

So let's finally do that So this next one is going to be exactly the same as the last one but with matrix multiplication So all that data looks the same You'll notice the key difference now is our parameters have been transposed So before I had the parameters Matching the data in terms of being in columns For matrix multiplication the The expectation is the way matrix multiplication works works is that you have to transpose this so it goes The x and y is kind of the opposite way around the rows and columns the opposite way around Other than that it's the same i've got the same I just copied and pasted the random numbers So we had exactly the same starting point and so now Our entire This entire thing here is a single function which which which is matrix multiply all of this by all of this And so when I run that it fills in exactly the same numbers Make this average And so now we can optimize that Okay, like that a minimum By changing these Solve Should get the same number point.

Oh wait, wasn't it? And we do Okay, so that's just another way of doing the same thing so you can see that um um Matrix multiplication it takes like a surprisingly long time at least for me To get an intuitive feel For matrix multiplication is like a single mathematical operation.

So I still find it helpful To kind of remind myself It's just doing these sum products Um and additions Okay, so that is um That is a deep learning neural network in microsoft excel And the titanic Kaggle competition by the way Um is a pretty fun Learning competition if you haven't done much machine learning before Then it's certainly worth like trying out just to kind of get the feel for these how these all get put together so this is um Um, so the the chapter of the book That this lesson goes with is chapter four and chapter four of the book Is the chapter where we lose the most people because it's um, to be honest, it's hard um, but part of the reason it's hard Is I couldn't put this into a book right, so We're teaching it a very different way in the course to what's in the book Um, and you know, you can use the two together, but if you've tried to read the book and been a bit disheartened Um, yeah, try, you know, try following through through the spreadsheet instead Maybe try trading like if you use numbers or google sheets or something you could try to create your own kind of version of it And whatever spreadsheet platform you prefer Or you could try to do it yourself from scratch in python You know if you want to really test yourself um So there's some suggestions Okay Okay question from victor guerra in the excel exercise And germany is doing some feature engineering.

He comes up with two new columns p class one and p class two That is true P class one and p class two Why is there no p class three column Um, is it because p class one if p class one is zero when p class two is zero then p class three must be one So in a way two columns are enough to encode the input with the original column.

Yes That's exactly the reason so there's um, no need to Tell the computer about things that can kind of figure out for itself. Um, so when you create these are called dummy variables so when you create dummy variables for a categorical variable with Three levels like this one you need two dummy variables.

So in general a categorical variable with n levels needs n minus one columns Thanks for the good question So what we're going to be doing in our next lesson is looking at natural language processing So so far we've looked at some computer vision and just now we've looked at some what we call tabular data So so kind of spreadsheet type data Next up where we're going to be looking at natural language processing.

So I'll give you a taste of it So you might want to open up the getting start with getting started with nlp for absolute beginners notebook So here's the getting started with nlp absolute beginners notebook, I will say as a notebook author I Um may sound a bit lame, but I always see when people have uploaded it.

It always makes me really happy So and it also helps other people find it So remember to upvote these notebooks or any other notebooks you you like I also always read all the comments So if you want to ask any questions or make any comments, I enjoy those as well Um So natural language processing Is about rather than taking for example image data and making predictions we take text data That text data most of the time is in the form of pros So like plain english text, uh, so, you know english is the most common language used for nlp But there's nlp models in dozens of different languages nowadays And if you're a non-english speaker You'll find that for many languages There's less resources in non-english languages and there's a great opportunity to provide nlp resources in your language This has actually been one of the things that the fastai community has been fantastic at and the global community Is building nlp Resources for example the first fasti Nlp resource was created by a student from the very first fastai course the indic languages Some of the best resources have come out of fastai alumni and so forth So that's a particularly valuable thing you could look at.

So if your language is not well represented, that's an opportunity Not a problem So some examples of things you could use nlp for well, perhaps the most common and practically useful in my opinion is classification Classification means you take a document now when I say a document that could best be one or two words It could be a book Could be a wikipedia page.

So it could be any length. We use the word document It sounds like that's a specific kind of length But it can be a very short thing a very long thing. We take a document and we try to figure out a category for it Now that can cover many many different kinds of applications.

So One common one that we'll look at a bit is sentiment analysis Um, so for example, is this movie review positive or negative sentiment analysis is very helpful in things like marketing and product development, you know in big companies, there's lots and lots of You know information coming in about your product.

It's very nice to get a quickly sorted out and kind of track metrics from week to week Something like figuring out what author wrote the document would be an example of Classification exercise because you're trying to put it a category in this case is which author I think there's a lot of opportunity in legal discovery.

There's already some products in this area where in this case the category is Is this a legal document in scope or out of scope in the court case? I'm just organizing documents triaging inbound emails so like Which part of the organization should it be sent to is it an urgent or not?

Stuff like that. So these are examples of categories of classification Um, what you'll find is when we look at Classification tasks in nlp is it's going to look very very similar to images But what we're going to do is we're going to use a different library The library we're going to use is called hugging face transformers rather than fastai And there's two reasons for that The main reason why is because I think it's really helpful to see how things are done in more than one library and hugging face transformers, you know, so fastai has A very layered architecture So you can do things at a very high level with very little code or you can dig deeper and deeper and deeper getting more and more fine grip Hacking face transformers doesn't have the same high level api at all that Fastai has so you have to do more stuff manually And so at this point of the course, you know, we're going to actually intentionally use a library, which is a little bit less user-friendly In order to see kind of what it's just depths you have to go through to use other Libraries having said that the reason I picked This particular library is it is particularly good It has really good models in it.

It has a lot of really good techniques in it Not at all surprising because they have hired lots and lots of fastai alumni. So they have very high quality people working on it So Before the next lesson, um, yeah, if you've got time Take it take a look at this notebook and take a look at the data the data we're going to be working with It's quite interesting It's from a kaggle competition Which is trying to figure out in patterns whether two concepts Are referring to the same thing or not where those concepts are represented as english text And when you think about it, that is a classification task because the document is You know basically Text one blah text two blah, and then the category is similar or not similar And in fact in this case they actually have scores It's either going to be basically zero zero point two five point five point seven five or one of like how similar is it?

But it's basically a classification task when you think of it that way um So yeah, you can have a look at the data and um Next week we're going to go through step by step through this notebook And we're going to take advantage of that as an opportunity also to talk about the really important Topics of validation sets and metrics which are two of the most important topics in Not just deep learning but machine learning more generally All right.

Thanks, everybody. I'll see you next week. Bye