back to indexAI Decision Making — Optimizing Routes
00:00:00.000 |
Today we're going to be taking a look at how we can improve the accuracy of our routes 00:00:08.920 |
So what we've added to the library is the ability to modify your thresholds on a route 00:00:17.640 |
So previously it's the whole layer, now you have individual thresholds per route. 00:00:23.020 |
And of course, if you have many routes, modifying each one of those thresholds one by one, it 00:00:31.220 |
So we've also added in training methods so that you can optimize your route values with 00:00:42.840 |
I'm going to go over to the Semantic Router library, I'm going to go to the docs, and 00:00:49.040 |
then I'm going to go down to this threshold optimization notebook. 00:00:59.160 |
So just the local encoder in this example, because we're using static routes only. 00:01:04.040 |
But that does mean that we can also speed up by using a GPU. 00:01:09.680 |
So in Colab, I'm going to switch across to a T4 GPU, that will speed things up, although 00:01:14.120 |
it is very fast anyway, literally a couple of seconds. 00:01:18.800 |
But nonetheless, you know, if you're doing this for many routes, and you have a fairly 00:01:22.320 |
large training set, then you might want to turn GPU on. 00:01:29.200 |
And we can come down to here where we're just defining our route layer. 00:01:33.840 |
We have a few routes in here to give us, you know, more to optimize with. 00:01:41.200 |
One thing that I would recommend, okay, you can optimize using just the method that we 00:01:46.960 |
But really, what you want to be doing, as well, is you want to be adding utterances 00:01:52.240 |
that you see, you know, where it doesn't trigger the route you would expect it to, you should 00:01:59.240 |
And you should probably be adding just generally more utterances, maybe breaking apart routes 00:02:05.800 |
into different routes, if you're seeing that they don't work so well. 00:02:09.600 |
And the other thing, as you'll see in a moment is that we have a training data set. 00:02:13.920 |
And we can modify that to improve the performance, as well. 00:02:21.920 |
So with our encoder, we are going to initialize our encoder, and I'm going to use a different 00:02:27.720 |
So the default encoder that we use is MiniLM, so it's a pretty old model, to be honest. 00:02:33.720 |
But it's very small and efficient, so we just have it as the default. 00:02:37.280 |
But at some point, that will probably change maybe to this model. 00:02:40.860 |
So this is like a more recent, still very small, but generally, you should be able to 00:02:50.000 |
So yeah, we're going to switch across to that. 00:02:53.720 |
So it's E5 base V2 model, and that will just download quickly. 00:02:57.560 |
OK, now once that has downloaded, we can come down and initialize our route layer. 00:03:02.160 |
So to initialize a route layer, we just need our routes and our encoder, both of which 00:03:14.160 |
So this one should go to politics, this one to chit chat, this one to, I think we had 00:03:20.600 |
a biology question, and this one should just be none. 00:03:24.080 |
OK, and you can see, actually, we get three of four. 00:03:31.560 |
This one actually goes to chit chat, where it shouldn't. 00:03:34.800 |
So OK, let's take that, and let's try and improve what we have. 00:03:42.520 |
So first, I'm just going to show you the evaluation or evaluate method. 00:03:47.680 |
So this is a format that we use for both the evaluate and the fit methods. 00:03:56.520 |
This refers to the utterances that are the input data for our fit method, whereas these, 00:04:06.200 |
so these are the labels, like the intended routes that they should trigger. 00:04:11.940 |
And I just like to keep it like this when I'm going through it, so it feels a bit easier 00:04:19.160 |
So I create this test data set, which is just a list of the tuples, and then I unpack that. 00:04:26.040 |
So we have our utterances here, our labels here, and then we're going to evaluate to 00:04:40.020 |
We can improve that, and you can see with the-- actually, I think it's with MiniLM, 00:04:44.040 |
we do actually get perfect accuracy, but I think it's bad show with this. 00:04:50.800 |
Now what we need to do is create a test data set. 00:04:59.960 |
When you're creating this, one thing that you can do, obviously, is just using an LLM 00:05:09.240 |
So we have politics, chit-chat, mathematics, biology, and we have these as well. 00:05:16.200 |
And we should probably add some more of those. 00:05:18.240 |
So these are the routes that shouldn't be classified as anything, and we add those in 00:05:24.640 |
there because we-- well, if we just kind of have named routes, it's always going to choose-- 00:05:32.000 |
like the similarity thresholds can just increase or decrease, sorry, to capture more area. 00:05:39.800 |
And that means that it might work on this test data set, but maybe not when we have 00:05:47.160 |
So one thing that I would also recommend doing is, OK, we have mathematics, politics, chit-chat. 00:05:53.520 |
Let's add some more routes here or more test data that is kind of similar to those other 00:05:59.760 |
ones, but we don't actually want it to be classified as those other ones. 00:06:03.400 |
That will basically just make it harder for the model, the training function, to get a 00:06:10.480 |
And that's a good thing because we're kind of pushing the model more. 00:06:13.800 |
It needs to try a bit harder to get something good. 00:06:27.260 |
These two are similar to the mathematics routes or tests that we have. 00:06:31.920 |
This one, kind of similar to the chit-chat one, and this one, obviously, to politics. 00:06:36.520 |
But at least for me, they don't quite fit into those. 00:06:40.160 |
So they're very similar, just not quite there. 00:06:42.700 |
So that should be good enough, I think, to get some reasonable performance. 00:06:47.480 |
It won't be anything incredible, I don't think, but we should get something. 00:06:53.260 |
Now let's try and calculate the accuracy using the default thresholds that we have. 00:07:02.620 |
So you can see with MiniLM, that was actually 34.85, which is pretty low. 00:07:08.200 |
And what you find is that different models just have different similarity thresholds 00:07:13.420 |
where it's kind of like something is either similar if it's slightly higher or something 00:07:22.620 |
So this one, you actually, you know, we're probably not in a too bad a place. 00:07:30.200 |
Now let's come down and let's see what we have for the default routes. 00:07:38.720 |
That's coming from the Hugging Face encoder, it's just a default score threshold that we 00:07:44.360 |
Now we have our train data, so the X, and we have our labels, which is Y. 00:07:53.000 |
And by default, it will go over 500 iterations of training or steps. 00:08:12.800 |
So nothing special, but I tend to find with these smaller open source models, it does 00:08:22.360 |
And then we can see the update route thresholds are these. 00:08:27.040 |
So interestingly, mathematics is incredibly low here, which would make me think maybe 00:08:38.740 |
I'm not sure, but that's something that I would probably consider trying. 00:08:46.760 |
So probably around this 75 down to around 60 is where we have that sort of similar to 00:08:57.020 |
or not similar up to similar threshold for this model. 00:09:01.440 |
And as I mentioned, that will vary depending on which model you're using. 00:09:10.720 |
We can just have a look at the valuation again. 00:09:17.120 |
So that is it for this very quick introduction to the optimization function here. 00:09:25.680 |
So for example, R002, which is not quite the latest OpenAI embedding model anymore. 00:09:34.020 |
And also Cohere, I think as well, they would both go up to about 92% after training on 00:09:42.600 |
Maybe there's some differences, but not a huge number. 00:09:46.500 |
So the model does matter a lot here, but also how we're optimizing. 00:09:53.040 |
So we can obviously evaluate and fit, but realistically, we also should be adding new 00:09:59.320 |
utterances to our routes, and we should also be adding more data to our test data as we 00:10:06.600 |
We really want to be iterating on this and sort of improving over time rather than just 00:10:23.960 |
So thank you very much for watching, and I will see you again in the next one.