back to indexOpenAI's NEW 256-d Embeddings vs. Ada 002
Chapters
0:0 OpenAI Embed 3 256 dimensions
1:5 Code setup
4:34 Optimizing text-embedding-3-large
8:45 Embed 3 vs Ada 002
11:0 Embed 3 with 256-d beats Ada 002
00:00:00.000 |
Today we're going to be taking a look at the idea of semantic routing for AI agents using OpenAI's 00:00:07.440 |
new third generation embedding models and specifically we're going to see if it still 00:00:13.040 |
works with the tiny embedding size that OpenAI have come up with for their large model. So 00:00:19.200 |
essentially their large model by default it's I think 30-72 dimensional embeddings but they say 00:00:26.720 |
that you can decrease that to just 256 dimensions and still get better than Arda002 performance 00:00:36.880 |
which is a bit crazy and I mean that's cool if it works, that is all I would say. So we'll be 00:00:44.880 |
trying that out and we're going to be trying it out with the semantic router libraries. So the 00:00:48.400 |
idea behind semantic router is rather than waiting for LM to make all the decisions we can use 00:00:54.640 |
essentially semantic similarity to make the same decisions. So let's just jump straight in to using 00:01:02.480 |
these two things together. So I'm going to go to the docs, we're going to go to encoders and we're 00:01:10.240 |
going to go to OpenAI Embed 3. Okay so I'm going to open this in Colab and we should come to here. 00:01:17.120 |
So we introduce the or at least a dimension feature for the OpenAI encoder in 0.19 and 00:01:26.800 |
OpenAI introduce it to our APIs in OpenAI 1.10. So you will need these versions to use all of this 00:01:36.240 |
and that will come with this. Okay so we're just going to run this so 0.20 now actually 00:01:44.720 |
and the first thing I'm going to do is just set some routes. These are more like well this is a 00:01:52.560 |
it's more of a protective route it is essentially you know you don't want your AI agents or LMs to 00:01:58.160 |
be talking about a certain thing so you're putting their protective guardrail that is exactly what 00:02:02.880 |
we're doing here. So I'm going to run that and these what you see these utterances here they 00:02:09.680 |
just define a few examples of queries or messages or interactions that a user might provide which is 00:02:18.080 |
you know things that we wouldn't want to answer and how this will work is it will not be restricted 00:02:25.520 |
to just these utterances but it will look for similar utterances as well. So we'll have that 00:02:31.600 |
sort of protective route and then we have another one let's say this one if we hit this route we 00:02:37.120 |
want the agent to respond in a more conversational manner. So let's try with OpenAI's so the text 00:02:46.400 |
embedding three large model and I'm going to set the dimensions parameter to 256 and just see what 00:02:52.720 |
happens all right I'm very curious. So yes we run that I will need an OpenAI API key for this 00:02:58.960 |
so to get that you go to platform.openai.com and if you're a missing colab it's going to come with 00:03:05.200 |
this little nice little text box so you can just enter your API key in there and it will work. 00:03:10.080 |
So we are okay we're now going to define our our route layer so a route layer well it needs two 00:03:18.400 |
things it needs an encoder which we just defined and it needs some routes which we defined here 00:03:23.680 |
okay which is just our our list of routes. So yep we pass them into there we initialize our route 00:03:32.320 |
layer cool and then we can check the dimensionality of the vectors that have been created by this 00:03:38.000 |
route layer by looking at this and yes indeed we can see that we have 256 dimensional vectors 00:03:44.240 |
pretty cool now let's see if it works with a few example questions okay very simple it's not like 00:03:52.880 |
they're hard but nonetheless I think it's if it passes all which I think it might do that's pretty 00:03:59.760 |
good okay so we have don't use politics and how's the weather today they both hit the correct 00:04:06.480 |
routes okay cool and then the other one so this one is not really either of those and you can see 00:04:13.600 |
that the route that it hits is none right it doesn't hit any route and yeah that's exactly 00:04:19.280 |
what we would want it to do so that is actually not too bad especially considering it is that 256 00:04:26.960 |
dimensional vector so very impressive and you know I haven't even optimized the model whatsoever 00:04:34.320 |
here so we could probably get even better performance and I mean let's just go ahead 00:04:40.000 |
and do that let's see how we can optimize this further and just test a larger data set so I'm 00:04:44.960 |
going to take this a little bit of code here I'm just going to copy this across and I'm just going 00:04:51.840 |
to go back to this semantic router library here I'm going to go to my docs and I'm going to go 00:04:57.040 |
to threshold optimization so this is the notebook that shows us how to do this sort of optimization 00:05:02.400 |
it has like a test data set in there as well so I'm going to run this all pretty quickly so I'm 00:05:10.960 |
going to okay pip install but we actually don't need the local version because that's when you're 00:05:16.240 |
running like local models we don't need it here because we're using the openai api so I'm going 00:05:20.480 |
to run that I'm going to define a few different routes so we again we do have that politics and 00:05:26.720 |
chitchat routes but we also have two others mathematics and biology so let's add those 00:05:31.200 |
and then here is where things are going to change a little bit so I'm going to rather than using the 00:05:36.000 |
open source hugging face encoder I'm going to use openai's encoder and let's just see 00:05:42.560 |
okay let's see how it will perform okay let's try 00:05:48.000 |
okay we initialize our route layer and there's going to be a few 00:05:53.360 |
utterances here that I'm going to test so we can see it gets okay so it gets the politics one it 00:06:02.320 |
gets the weather one but it doesn't get the I think this one's biology and it doesn't get this 00:06:06.720 |
oh no this one's correct okay so this should hit none so it just misses the biology one here 00:06:12.720 |
okay that's fine because we can actually optimize these right we can improve them so 00:06:17.840 |
I'm just going to show you this quickly I'm just evaluating the performance here yeah fine now this 00:06:26.000 |
on small data set what about when we add a big data set okay so we have a we have a few more 00:06:33.280 |
examples here I'm going to add a few more very quickly to make it a little bit harder for the 00:06:37.280 |
model okay so I've added just four more here which are kind of similar to the other the other routes 00:06:43.680 |
but they're actually not you know I don't want them to be defined as those other routes so you 00:06:48.160 |
know these two are very similar to mathematics this one similar to biology and this one kind of 00:06:53.200 |
similar to I suppose biology and also the chit chat route so that will make it a little bit 00:07:00.320 |
that will make it a little bit harder for the model so let's see let's see how it does on this 00:07:06.240 |
again see the accuracy pretty bad right but that's not a good measurement because I'm using the 00:07:14.960 |
default thresholds for r002 here which as I understand the sort of what is similar and what 00:07:22.720 |
is not similar for the new third generation models is a lot different in terms of like that value 00:07:29.360 |
that sliding scale so that's probably not very fair fortunately we can just basically automatically 00:07:38.080 |
optimize that to and we'll be able to see what the new models do define as being as being optimal 00:07:47.600 |
okay so let's see we're going to fit this it's going to run over 500 iterations and you know 00:07:55.600 |
we'll see what the performance is at the end so it looks like about 89 percent let's see what those 00:08:03.120 |
new thresholds look like so you can see that far far lower which is interesting all right so it 00:08:09.440 |
seems like the thresholds for the new model at least the 256 dimensions of the embed 3 large 00:08:22.320 |
model that threshold is is more like around what was this like close to three between like 2.5 to 00:08:28.800 |
3 here for biology even lower which that's pretty interesting um it's a it's a lot different so 00:08:35.520 |
yeah we can we can run this as well so the accuracy I get here is 88.57 that's interesting 00:08:44.640 |
so let's remember that number and let's try with ardor and just see see what the performance 00:08:50.960 |
difference is like if any again it's just one test so maybe we don't you know we can't really 00:08:58.000 |
decide based on this whether you know this means the model is you know bad or good I don't think 00:09:03.360 |
so we're going to switch this across to ardor 002 the score threshold it doesn't really matter I 00:09:12.480 |
think the default that we set by default for ardor is like 82 so I'm actually let's start on 00:09:20.720 |
both in the same place you know let's assume we don't know anything about ardor either so I'm 00:09:25.680 |
going to run that then so we are going to reinitialize our route layer and we'll just see 00:09:32.960 |
how they this one does as well here so you see actually you know it's actually does it 75 again 00:09:40.000 |
but results are you know it does different ones as you can see there so let's redefine all of this 00:09:48.240 |
let's see what the accuracy is here so 80 so it's starting off fairly strong in comparison 00:09:54.400 |
but let's see what we can what we can improve on okay actually so if you look at these uh they 00:10:02.960 |
did not refresh maybe I should have uh maybe I should have refreshed something 00:10:10.400 |
yeah it's fine I don't think it's a big deal so let's uh let's just run this again and see what 00:10:17.920 |
happens okay so yeah we can see the accuracy is it was increasing just then let's see what the 00:10:31.920 |
updated thresholds are okay so they have moved around quite a bit I wonder if we can 00:10:40.240 |
let's try running it again we can do max it is uh is it I think it's this we can increase this 00:10:49.840 |
just to see give it more of a chance to optimize okay but it seems to be getting stuck around that 00:10:56.000 |
those thresholds anyway so yeah still the same so that's what we have for those and then the 00:11:02.800 |
accuracy was so it was uh 87.14 so slightly worse although I have actually seen this go up to 92% 00:11:11.520 |
before maybe it's because I added these slightly harder um utterances in there so clearly this 00:11:19.440 |
actually does so this 256 dimensional third generation embedding model from OpenAI in this 00:11:26.480 |
case did outperform Arda which is pretty impressive I should put emphasis on the fact that this was 00:11:35.920 |
just one test they I mean I yeah to to be sure I really do need to just be using the model over 00:11:44.720 |
quite a long time with a lot of different things to kind of form a good opinion on this 00:11:49.680 |
but it seemed to work pretty well here so that's I mean better than I would have expected given the 00:11:55.360 |
tiny embedding sizes it's pretty cool so yeah that's it quick test I'm gonna leave it there 00:12:03.600 |
so I hope this has been interesting and useful so thank you very much for watching 00:12:08.880 |
and I will see you again in the next one. Bye!