Back to Index

Training Parameters - TensorFlow Essentials #3


Chapters

0:0 Introduction
1:59 Learning Rate
4:42 Loss

Transcript

Hi, and welcome to the video on training parameters. So these are essentially settings that we can use to fine-tune the learning process. And there are essentially three parts to remember. We have the optimizer, the loss function, and our performance metrics. Now, the optimizer is used to minimize the model loss.

And popular optimizers that you may have heard about are Adam or Stochastic Gradient Descent, or SGD for short. And then we have the loss function. This is essentially saying, how do we calculate the difference between our predicted values and the true or real values? And for this, we use a loss function.

And the most popular of those that you will see is categorical cross-entropy. But there are a lot of different ones that you see used in different places. And it would not be weird to use another one of those functions. And then, finally, we have the performance metrics. And this is what we use to measure our model's actual performance.

There are a lot of ways to do this. But the simplest and often most useful metric is to just use accuracy. So let's start with the optimizer. So we'll use Adam, as it is the most popular. And all we want to do is create our optimizer variable. And then into that, we pass our optimizer, which is Adam.

And then we also need to pass a learning rate. So this is essentially a number which tells TensorFlow or tells Adam how quickly or how slowly to update model parameters. So a larger number means larger updates, whereas a smaller number means smaller updates. This learning rate is actually one of the most important high parameters in our model because we need to find a balance between not too high and not too low.

And I have an image over here to try and explain why. So you can see here, we have three different plots. First of those is where we have a very low learning rate. And what we have is a step, which is quite small, which goes in the direction downwards on our slope.

So further down, down here is where we want to be because this is a lower level of loss and therefore a higher accuracy. So we want to get down to here, which is the global minima. But because our steps are too low, we get to here. And our model thinks, OK, we need to go back because the gradient is negative in this direction.

So it goes back. And then because the steps are too small, it just gets stuck in this small, little local minima. So where there's a dip, but it's not the lowest dip, it's called local minima. And even if the learning rate is perfect, you can still get stuck in the local minima.

It's completely normal. But obviously, the lower the learning rate, the more likely that is to happen. So obviously, you don't want to be too low. Alternatively, we also don't want to be too high because if our learning rate is too high, then yes, we will miss more of the local minima.

But once we get to the point where we are at a global minima and we want to go down into this sort of descent, our steps are just simply too large to actually do that. They can't go down because we need a smaller learning rate in order to actually get down there.

Otherwise, what you will see during training is that your loss function or loss value is just going up and down. It's all over the place, but it's not really going anywhere. It's just going up and down. And that is generally because your learning rate is too high. However, if we do get a learning rate that is in the middle of these two and it's just right, then we will hopefully get a graph that looks something like this.

So we don't get stuck in the local minima because the step is large enough to miss it. But it's not too large to not be able to converge into the global minima, which is what you can see here. And that is the ideal learning rate that we want to try and get.

A lot of the time, it's just a case of trying things out and trying a few different learning rates and see what works. But this is the logic between not going too high and not going too low. So that is our optimizer, which is in the optimizers attribute. Now we need to go to our loss function.

So the loss is the y-axes in the plot here. So the lower the loss, the higher the model performance or the higher the accuracy. But obviously, we need to actually calculate that. So we will create a loss variable. And we do tf.keras.losses. And then we want to calculate the categorical cross-entropy loss.

There's no other hyperparameters or anything that we need to add into this. So that's all we need for that. And then we also want to initialize a accuracy metric. So we'll put that into a variable called accuracy, or ACC. Now we do tf.keras.metrics, then categorical accuracy. So the reason we are using categorical accuracy here is because in most cases, you probably have a set number of outputs rather than just a single output.

So it will most likely look something like this, rather than like this. So when we have different categories, we need to use categorical accuracy. Then we just pass in the string accuracy there as well. So we've now set up all of our training parameters. And we just need to compile our model now.

So we just call model, compile. And then we pass the optimizer, loss, and accuracy metric. So for metrics, we also always put these within the list. Because typically, you're probably going to want more than one metric. But for now, we're just sticking with accuracy. We don't need to add anything else.

But sometimes, you may have other things in here, like an F1 score or some other metrics as well. And you'd want to pass all of these in as a list. But we're just going to go with accuracy. And now that is our model. It is completely ready for us to start training with it.

But for now, that's everything on the training parameters. So I hope you've enjoyed. And thank you for watching. I will see you again in the next one.