back to index

OpenAI's NEW 256-d Embeddings vs. Ada 002


Chapters

0:0 OpenAI Embed 3 256 dimensions
1:5 Code setup
4:34 Optimizing text-embedding-3-large
8:45 Embed 3 vs Ada 002
11:0 Embed 3 with 256-d beats Ada 002

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're going to be taking a look at the idea of semantic routing for AI agents using OpenAI's
00:00:07.440 | new third generation embedding models and specifically we're going to see if it still
00:00:13.040 | works with the tiny embedding size that OpenAI have come up with for their large model. So
00:00:19.200 | essentially their large model by default it's I think 30-72 dimensional embeddings but they say
00:00:26.720 | that you can decrease that to just 256 dimensions and still get better than Arda002 performance
00:00:36.880 | which is a bit crazy and I mean that's cool if it works, that is all I would say. So we'll be
00:00:44.880 | trying that out and we're going to be trying it out with the semantic router libraries. So the
00:00:48.400 | idea behind semantic router is rather than waiting for LM to make all the decisions we can use
00:00:54.640 | essentially semantic similarity to make the same decisions. So let's just jump straight in to using
00:01:02.480 | these two things together. So I'm going to go to the docs, we're going to go to encoders and we're
00:01:10.240 | going to go to OpenAI Embed 3. Okay so I'm going to open this in Colab and we should come to here.
00:01:17.120 | So we introduce the or at least a dimension feature for the OpenAI encoder in 0.19 and
00:01:26.800 | OpenAI introduce it to our APIs in OpenAI 1.10. So you will need these versions to use all of this
00:01:36.240 | and that will come with this. Okay so we're just going to run this so 0.20 now actually
00:01:44.720 | and the first thing I'm going to do is just set some routes. These are more like well this is a
00:01:52.560 | it's more of a protective route it is essentially you know you don't want your AI agents or LMs to
00:01:58.160 | be talking about a certain thing so you're putting their protective guardrail that is exactly what
00:02:02.880 | we're doing here. So I'm going to run that and these what you see these utterances here they
00:02:09.680 | just define a few examples of queries or messages or interactions that a user might provide which is
00:02:18.080 | you know things that we wouldn't want to answer and how this will work is it will not be restricted
00:02:25.520 | to just these utterances but it will look for similar utterances as well. So we'll have that
00:02:31.600 | sort of protective route and then we have another one let's say this one if we hit this route we
00:02:37.120 | want the agent to respond in a more conversational manner. So let's try with OpenAI's so the text
00:02:46.400 | embedding three large model and I'm going to set the dimensions parameter to 256 and just see what
00:02:52.720 | happens all right I'm very curious. So yes we run that I will need an OpenAI API key for this
00:02:58.960 | so to get that you go to platform.openai.com and if you're a missing colab it's going to come with
00:03:05.200 | this little nice little text box so you can just enter your API key in there and it will work.
00:03:10.080 | So we are okay we're now going to define our our route layer so a route layer well it needs two
00:03:18.400 | things it needs an encoder which we just defined and it needs some routes which we defined here
00:03:23.680 | okay which is just our our list of routes. So yep we pass them into there we initialize our route
00:03:32.320 | layer cool and then we can check the dimensionality of the vectors that have been created by this
00:03:38.000 | route layer by looking at this and yes indeed we can see that we have 256 dimensional vectors
00:03:44.240 | pretty cool now let's see if it works with a few example questions okay very simple it's not like
00:03:52.880 | they're hard but nonetheless I think it's if it passes all which I think it might do that's pretty
00:03:59.760 | good okay so we have don't use politics and how's the weather today they both hit the correct
00:04:06.480 | routes okay cool and then the other one so this one is not really either of those and you can see
00:04:13.600 | that the route that it hits is none right it doesn't hit any route and yeah that's exactly
00:04:19.280 | what we would want it to do so that is actually not too bad especially considering it is that 256
00:04:26.960 | dimensional vector so very impressive and you know I haven't even optimized the model whatsoever
00:04:34.320 | here so we could probably get even better performance and I mean let's just go ahead
00:04:40.000 | and do that let's see how we can optimize this further and just test a larger data set so I'm
00:04:44.960 | going to take this a little bit of code here I'm just going to copy this across and I'm just going
00:04:51.840 | to go back to this semantic router library here I'm going to go to my docs and I'm going to go
00:04:57.040 | to threshold optimization so this is the notebook that shows us how to do this sort of optimization
00:05:02.400 | it has like a test data set in there as well so I'm going to run this all pretty quickly so I'm
00:05:10.960 | going to okay pip install but we actually don't need the local version because that's when you're
00:05:16.240 | running like local models we don't need it here because we're using the openai api so I'm going
00:05:20.480 | to run that I'm going to define a few different routes so we again we do have that politics and
00:05:26.720 | chitchat routes but we also have two others mathematics and biology so let's add those
00:05:31.200 | and then here is where things are going to change a little bit so I'm going to rather than using the
00:05:36.000 | open source hugging face encoder I'm going to use openai's encoder and let's just see
00:05:42.560 | okay let's see how it will perform okay let's try
00:05:48.000 | okay we initialize our route layer and there's going to be a few
00:05:53.360 | utterances here that I'm going to test so we can see it gets okay so it gets the politics one it
00:06:02.320 | gets the weather one but it doesn't get the I think this one's biology and it doesn't get this
00:06:06.720 | oh no this one's correct okay so this should hit none so it just misses the biology one here
00:06:12.720 | okay that's fine because we can actually optimize these right we can improve them so
00:06:17.840 | I'm just going to show you this quickly I'm just evaluating the performance here yeah fine now this
00:06:26.000 | on small data set what about when we add a big data set okay so we have a we have a few more
00:06:33.280 | examples here I'm going to add a few more very quickly to make it a little bit harder for the
00:06:37.280 | model okay so I've added just four more here which are kind of similar to the other the other routes
00:06:43.680 | but they're actually not you know I don't want them to be defined as those other routes so you
00:06:48.160 | know these two are very similar to mathematics this one similar to biology and this one kind of
00:06:53.200 | similar to I suppose biology and also the chit chat route so that will make it a little bit
00:07:00.320 | that will make it a little bit harder for the model so let's see let's see how it does on this
00:07:06.240 | again see the accuracy pretty bad right but that's not a good measurement because I'm using the
00:07:14.960 | default thresholds for r002 here which as I understand the sort of what is similar and what
00:07:22.720 | is not similar for the new third generation models is a lot different in terms of like that value
00:07:29.360 | that sliding scale so that's probably not very fair fortunately we can just basically automatically
00:07:38.080 | optimize that to and we'll be able to see what the new models do define as being as being optimal
00:07:47.600 | okay so let's see we're going to fit this it's going to run over 500 iterations and you know
00:07:55.600 | we'll see what the performance is at the end so it looks like about 89 percent let's see what those
00:08:03.120 | new thresholds look like so you can see that far far lower which is interesting all right so it
00:08:09.440 | seems like the thresholds for the new model at least the 256 dimensions of the embed 3 large
00:08:22.320 | model that threshold is is more like around what was this like close to three between like 2.5 to
00:08:28.800 | 3 here for biology even lower which that's pretty interesting um it's a it's a lot different so
00:08:35.520 | yeah we can we can run this as well so the accuracy I get here is 88.57 that's interesting
00:08:44.640 | so let's remember that number and let's try with ardor and just see see what the performance
00:08:50.960 | difference is like if any again it's just one test so maybe we don't you know we can't really
00:08:58.000 | decide based on this whether you know this means the model is you know bad or good I don't think
00:09:03.360 | so we're going to switch this across to ardor 002 the score threshold it doesn't really matter I
00:09:12.480 | think the default that we set by default for ardor is like 82 so I'm actually let's start on
00:09:20.720 | both in the same place you know let's assume we don't know anything about ardor either so I'm
00:09:25.680 | going to run that then so we are going to reinitialize our route layer and we'll just see
00:09:32.960 | how they this one does as well here so you see actually you know it's actually does it 75 again
00:09:40.000 | but results are you know it does different ones as you can see there so let's redefine all of this
00:09:48.240 | let's see what the accuracy is here so 80 so it's starting off fairly strong in comparison
00:09:54.400 | but let's see what we can what we can improve on okay actually so if you look at these uh they
00:10:02.960 | did not refresh maybe I should have uh maybe I should have refreshed something
00:10:10.400 | yeah it's fine I don't think it's a big deal so let's uh let's just run this again and see what
00:10:17.920 | happens okay so yeah we can see the accuracy is it was increasing just then let's see what the
00:10:31.920 | updated thresholds are okay so they have moved around quite a bit I wonder if we can
00:10:40.240 | let's try running it again we can do max it is uh is it I think it's this we can increase this
00:10:49.840 | just to see give it more of a chance to optimize okay but it seems to be getting stuck around that
00:10:56.000 | those thresholds anyway so yeah still the same so that's what we have for those and then the
00:11:02.800 | accuracy was so it was uh 87.14 so slightly worse although I have actually seen this go up to 92%
00:11:11.520 | before maybe it's because I added these slightly harder um utterances in there so clearly this
00:11:19.440 | actually does so this 256 dimensional third generation embedding model from OpenAI in this
00:11:26.480 | case did outperform Arda which is pretty impressive I should put emphasis on the fact that this was
00:11:35.920 | just one test they I mean I yeah to to be sure I really do need to just be using the model over
00:11:44.720 | quite a long time with a lot of different things to kind of form a good opinion on this
00:11:49.680 | but it seemed to work pretty well here so that's I mean better than I would have expected given the
00:11:55.360 | tiny embedding sizes it's pretty cool so yeah that's it quick test I'm gonna leave it there
00:12:03.600 | so I hope this has been interesting and useful so thank you very much for watching
00:12:08.880 | and I will see you again in the next one. Bye!
00:12:26.860 | [BLANK_AUDIO]