Today we're going to be taking a look at the idea of semantic routing for AI agents using OpenAI's new third generation embedding models and specifically we're going to see if it still works with the tiny embedding size that OpenAI have come up with for their large model. So essentially their large model by default it's I think 30-72 dimensional embeddings but they say that you can decrease that to just 256 dimensions and still get better than Arda002 performance which is a bit crazy and I mean that's cool if it works, that is all I would say.
So we'll be trying that out and we're going to be trying it out with the semantic router libraries. So the idea behind semantic router is rather than waiting for LM to make all the decisions we can use essentially semantic similarity to make the same decisions. So let's just jump straight in to using these two things together.
So I'm going to go to the docs, we're going to go to encoders and we're going to go to OpenAI Embed 3. Okay so I'm going to open this in Colab and we should come to here. So we introduce the or at least a dimension feature for the OpenAI encoder in 0.19 and OpenAI introduce it to our APIs in OpenAI 1.10.
So you will need these versions to use all of this and that will come with this. Okay so we're just going to run this so 0.20 now actually and the first thing I'm going to do is just set some routes. These are more like well this is a it's more of a protective route it is essentially you know you don't want your AI agents or LMs to be talking about a certain thing so you're putting their protective guardrail that is exactly what we're doing here.
So I'm going to run that and these what you see these utterances here they just define a few examples of queries or messages or interactions that a user might provide which is you know things that we wouldn't want to answer and how this will work is it will not be restricted to just these utterances but it will look for similar utterances as well.
So we'll have that sort of protective route and then we have another one let's say this one if we hit this route we want the agent to respond in a more conversational manner. So let's try with OpenAI's so the text embedding three large model and I'm going to set the dimensions parameter to 256 and just see what happens all right I'm very curious.
So yes we run that I will need an OpenAI API key for this so to get that you go to platform.openai.com and if you're a missing colab it's going to come with this little nice little text box so you can just enter your API key in there and it will work.
So we are okay we're now going to define our our route layer so a route layer well it needs two things it needs an encoder which we just defined and it needs some routes which we defined here okay which is just our our list of routes. So yep we pass them into there we initialize our route layer cool and then we can check the dimensionality of the vectors that have been created by this route layer by looking at this and yes indeed we can see that we have 256 dimensional vectors pretty cool now let's see if it works with a few example questions okay very simple it's not like they're hard but nonetheless I think it's if it passes all which I think it might do that's pretty good okay so we have don't use politics and how's the weather today they both hit the correct routes okay cool and then the other one so this one is not really either of those and you can see that the route that it hits is none right it doesn't hit any route and yeah that's exactly what we would want it to do so that is actually not too bad especially considering it is that 256 dimensional vector so very impressive and you know I haven't even optimized the model whatsoever here so we could probably get even better performance and I mean let's just go ahead and do that let's see how we can optimize this further and just test a larger data set so I'm going to take this a little bit of code here I'm just going to copy this across and I'm just going to go back to this semantic router library here I'm going to go to my docs and I'm going to go to threshold optimization so this is the notebook that shows us how to do this sort of optimization it has like a test data set in there as well so I'm going to run this all pretty quickly so I'm going to okay pip install but we actually don't need the local version because that's when you're running like local models we don't need it here because we're using the openai api so I'm going to run that I'm going to define a few different routes so we again we do have that politics and chitchat routes but we also have two others mathematics and biology so let's add those and then here is where things are going to change a little bit so I'm going to rather than using the open source hugging face encoder I'm going to use openai's encoder and let's just see okay let's see how it will perform okay let's try okay we initialize our route layer and there's going to be a few utterances here that I'm going to test so we can see it gets okay so it gets the politics one it gets the weather one but it doesn't get the I think this one's biology and it doesn't get this oh no this one's correct okay so this should hit none so it just misses the biology one here okay that's fine because we can actually optimize these right we can improve them so I'm just going to show you this quickly I'm just evaluating the performance here yeah fine now this on small data set what about when we add a big data set okay so we have a we have a few more examples here I'm going to add a few more very quickly to make it a little bit harder for the model okay so I've added just four more here which are kind of similar to the other the other routes but they're actually not you know I don't want them to be defined as those other routes so you know these two are very similar to mathematics this one similar to biology and this one kind of similar to I suppose biology and also the chit chat route so that will make it a little bit that will make it a little bit harder for the model so let's see let's see how it does on this again see the accuracy pretty bad right but that's not a good measurement because I'm using the default thresholds for r002 here which as I understand the sort of what is similar and what is not similar for the new third generation models is a lot different in terms of like that value that sliding scale so that's probably not very fair fortunately we can just basically automatically optimize that to and we'll be able to see what the new models do define as being as being optimal okay so let's see we're going to fit this it's going to run over 500 iterations and you know we'll see what the performance is at the end so it looks like about 89 percent let's see what those new thresholds look like so you can see that far far lower which is interesting all right so it seems like the thresholds for the new model at least the 256 dimensions of the embed 3 large model that threshold is is more like around what was this like close to three between like 2.5 to 3 here for biology even lower which that's pretty interesting um it's a it's a lot different so yeah we can we can run this as well so the accuracy I get here is 88.57 that's interesting so let's remember that number and let's try with ardor and just see see what the performance difference is like if any again it's just one test so maybe we don't you know we can't really decide based on this whether you know this means the model is you know bad or good I don't think so we're going to switch this across to ardor 002 the score threshold it doesn't really matter I think the default that we set by default for ardor is like 82 so I'm actually let's start on both in the same place you know let's assume we don't know anything about ardor either so I'm going to run that then so we are going to reinitialize our route layer and we'll just see how they this one does as well here so you see actually you know it's actually does it 75 again but results are you know it does different ones as you can see there so let's redefine all of this let's see what the accuracy is here so 80 so it's starting off fairly strong in comparison but let's see what we can what we can improve on okay actually so if you look at these uh they did not refresh maybe I should have uh maybe I should have refreshed something yeah it's fine I don't think it's a big deal so let's uh let's just run this again and see what happens okay so yeah we can see the accuracy is it was increasing just then let's see what the updated thresholds are okay so they have moved around quite a bit I wonder if we can let's try running it again we can do max it is uh is it I think it's this we can increase this just to see give it more of a chance to optimize okay but it seems to be getting stuck around that those thresholds anyway so yeah still the same so that's what we have for those and then the accuracy was so it was uh 87.14 so slightly worse although I have actually seen this go up to 92% before maybe it's because I added these slightly harder um utterances in there so clearly this actually does so this 256 dimensional third generation embedding model from OpenAI in this case did outperform Arda which is pretty impressive I should put emphasis on the fact that this was just one test they I mean I yeah to to be sure I really do need to just be using the model over quite a long time with a lot of different things to kind of form a good opinion on this but it seemed to work pretty well here so that's I mean better than I would have expected given the tiny embedding sizes it's pretty cool so yeah that's it quick test I'm gonna leave it there so I hope this has been interesting and useful so thank you very much for watching and I will see you again in the next one.
Bye! you