OpenAI's NEW Embedding Models

00:00:00.000 | Way back in December 2022, we had the biggest shift in how we approach AI ever.

00:00:08.480 | That was thanks to OpenAI releasing ChatGPT at the very end of November.

00:00:14.880 | ChatGPT quickly caught a lot of people's attention and it was in the month of December

00:00:21.200 | that the interest in ChatGPT and AI really exploded.

00:00:26.560 | But right in the middle of December, OpenAI released another model that also

00:00:32.400 | changed the entire landscape of AI.

00:00:35.040 | But it didn't go as noticed as ChatGPT and that model was Text Embedding Order 002.

00:00:45.760 | Very creative naming, but behind that name is a model that just completely

00:00:52.960 | changed the way that we do information retrieval for natural language.

00:00:57.760 | Which covers RAG, facades, and also basically any use case where you're retrieving text information.

00:01:06.240 | Now since then, despite a huge explosion in the number of people using RAG and the really

00:01:12.640 | cool things that you can do with RAG, OpenAI remained pretty quiet in their embedding models.

00:01:18.160 | Embedding models are what you need for RAG.

00:01:20.400 | And there has been no new models since December 2022 until now.

00:01:25.760 | OpenAI has just released two new embedding models and a ton of other things as well.

00:01:32.960 | Those two embedding models are called Text Embedding 03 Small and Text Embedding 03 Large.

00:01:40.400 | And when we look at the results that OpenAI is sharing right now, we can see a fairly

00:01:46.960 | decent improvement on English language embeddings with the MTEB benchmark.

00:01:52.720 | But perhaps more impressively, we see a massive improvement in the quality of multilingual

00:02:00.320 | embeddings which are measured using the Miracle benchmark.

00:02:03.200 | Now Order 002, state of the art when it was released and for a very long time afterwards

00:02:09.440 | and still a top performing embedding model, that had an average score of 31.4 on Miracle.

00:02:16.080 | The new Text Embedding 03 Large has an average score of 54.9 on Miracle.

00:02:23.360 | That's a massive difference.

00:02:26.320 | Now, one of the other things you'll notice looking at these new models is that they have

00:02:32.720 | not increased the max context window, so the maximum number of tokens that you can feed

00:02:37.120 | into the model.

00:02:38.080 | That makes a lot of sense with embedding models because what you're trying to do with embeddings

00:02:42.240 | is trying to compress the meaning of some text into a single point.

00:02:45.680 | And if you have a larger chunk of text, there's usually many meanings within that text.

00:02:53.360 | So going large and trying to compress into a single point doesn't, you know, those two

00:02:58.880 | things don't really go together because that large text can have many meanings.

00:03:03.200 | So it always makes sense to use smaller chunks and clearly OpenAI are aware of that.

00:03:08.800 | They're not increasing the maximum number of tokens that you can embed with these models.

00:03:13.600 | Now, the other thing which is maybe not as clear to me is that they have not trained

00:03:19.120 | on more recent data.

00:03:20.400 | The knowledge date cutoff is still September 2021, which is a fair while ago now.

00:03:26.560 | And okay, for embedding models, maybe that isn't quite as important as it is for LLMs,

00:03:32.880 | but it's still important.

00:03:33.920 | It's good to have some context of recent events when you're trying to embed meaning.

00:03:38.400 | So things like COVID, you ask a COVID question, these models, I imagine, are probably not

00:03:44.480 | going to perform as well as, say, Cohere's embedding models, which have been trained

00:03:48.800 | on more recent data.

00:03:50.800 | Nonetheless, this is still very impressive.

00:03:52.560 | And one thing which I think is probably the most impressive thing that I've seen so far

00:03:58.640 | is we're now able to decide how many dimensions we'd like in our vectors.

00:04:05.840 | Now, there is a tradeoff.

00:04:07.680 | You reduce the number of dimensions, you're going to get reduced quality embeddings.

00:04:12.320 | But what is incredibly interesting, and I almost don't quite believe it yet, I still

00:04:19.440 | need to test this, is that they're saying that the large model, TextEmbedding3Large,

00:04:25.840 | you can cut it down from 3,072 dimensions, which is larger than the previous models.

00:04:32.240 | You can cut that down to 256 dimensions and still outperform Arda002, which is a 1,536

00:04:41.840 | dimension embedding model.

00:04:43.760 | Compressing all of that performance into 256 floating point numbers is insane.

00:04:51.840 | So I'm going to test that.

00:04:56.160 | Not right now, but I'm going to test that and just prove to myself that that is possible.

00:05:01.200 | I'm a little bit skeptical, but if so, incredible.

00:05:04.800 | OK, so with that out of the way, let's jump into how we might use this new model.

00:05:10.480 | OK, so jumping right into it, we have this notebook.

00:05:13.280 | I'm going to share with you a link either in the description.

00:05:17.440 | I will try and get a link added to the video as well.

00:05:20.400 | And first thing I'm going to do is download Dataset.

00:05:24.480 | Well, pip install first.

00:05:26.240 | Then I'm going to download Dataset.

00:05:27.840 | OK, so I'm using this AI archive one I've used a million times before.

00:05:31.040 | But it is a good Dataset for testing.

00:05:33.280 | I'm going to remove all of the columns I don't care about.

00:05:35.840 | I'm going to keep just ID text, metadata, typical format.

00:05:40.000 | Then I'm going to initialize, or I'm going to take my OpenAI API key.

00:05:44.960 | OK, so that's platform.openai.com if you need one.

00:05:48.240 | And I'm going to put in here.

00:05:49.360 | And then this is how you create your new embeddings.

00:05:52.320 | OK, exactly the same as what you did before.

00:05:54.720 | You just change the model ID now.

00:05:57.760 | OK, and we'll see those in a moment as well.

00:06:00.320 | So that is our embedding function.

00:06:01.920 | Then we jump down.

00:06:04.000 | We're going to initialize connection to PyCone serverless.

00:06:07.440 | So you get $100 free credit.

00:06:09.760 | And you can create multiple indices, which is what we need.

00:06:15.360 | Because I want to test multiple models here with different dimensionalities.

00:06:19.120 | So that's why I'm using serverless alongside all the other benefits that you get from it as well.

00:06:25.840 | Now taking a look at this, these are the models we're going to take a look at.

00:06:31.600 | Using the default dimensions for now, we will try the others pretty soon.

00:06:37.120 | So we have the original model.

00:06:39.920 | Well, kind of original, the V2 of embeddings from OpenAI.

00:06:44.800 | So this is the one they released in December 2022.

00:06:48.240 | The dimensionality there, 1536.

00:06:51.280 | Most of us will be very familiar with that number by now.

00:06:55.120 | Now the small model uses the same dimensionality.

00:06:57.680 | And you can also decrease this down to 512.

00:07:02.320 | OK, nice little cool thing you can do there.

00:07:06.080 | The other embedding models, the larger one, the one with the insane performance gains,

00:07:11.360 | is this one.

00:07:13.200 | So three large, higher dimensionality.

00:07:15.760 | That means they can pack more meaning into that single vector.

00:07:21.040 | So it makes sense that this is more performance.

00:07:23.600 | But what is very cool is that you can compress this down to 256 dimensions.

00:07:31.040 | And apparently still help perform this model here.

00:07:35.440 | And I mean, that is 100% unheard of within vector embeddings.

00:07:41.920 | Like, 256 dimensions and getting this level of performance is insane.

00:07:46.320 | Let's see.

00:07:48.100 | I don't know.

00:07:49.840 | Maybe.

00:07:50.400 | I mean, they say it's true.

00:07:52.000 | So then I'm going to kind of go through.

00:07:55.840 | I'm going to throw.

00:07:56.960 | I'm going to create three different indexes.

00:08:01.280 | One for each one of the models.

00:08:03.040 | OK.

00:08:03.540 | And then what I'm going to do is just index everything.

00:08:07.680 | Now it takes a little bit of time to index everything.

00:08:11.280 | But we can see, while I'm waiting for that,

00:08:14.480 | we can have a quick look at how long this is taking.

00:08:16.880 | Because this is also something to consider when you're choosing embedding models.

00:08:21.040 | And looking at these.

00:08:22.960 | So straight away, one, the APIs right now are, I think, pretty slow.

00:08:29.680 | Because everything has just been released.

00:08:31.520 | So I expect during normal times, this number will probably be smaller.

00:08:39.760 | So for Arda002, I'm getting 15 and 1/2 minutes to embed everything.

00:08:45.200 | OK, it's to embed and throw everything into Pinecone.

00:08:47.760 | Slightly slower for the small model.

00:08:51.280 | Which, OK, probably maybe hasn't been as optimized as Arda002.

00:08:57.120 | And also maybe more people are using this right now.

00:09:00.160 | But generally, it's, I mean, pretty comparable speed there.

00:09:04.720 | As we might expect, embedding through large is definitely slower.

00:09:09.600 | OK, so right now, we're on track for about 24 minutes for that whole thing to embed.

00:09:17.600 | So, yeah, definitely slower.

00:09:20.160 | That also means your embedding latency is going to be slower.

00:09:22.880 | So, I mean, you can look at this.

00:09:24.800 | OK, this is two seconds.

00:09:26.000 | This is including, like, your network latency and everything.

00:09:29.360 | And also, you know, going to Pinecone as well.

00:09:32.640 | So you have multiple things there.

00:09:34.400 | It's not 100% fair comparison.

00:09:37.520 | But then this one is almost two seconds slower.

00:09:41.520 | Maybe make like a 1.5 seconds slower for a single iteration.

00:09:45.920 | OK, so this one is definitely slower.

00:09:49.120 | It will clearly slow down if you're using RAG or something like that.

00:09:53.440 | It's going to slow down that process a little bit.

00:09:55.360 | Probably not that much compared to, you know, the LLM generation component.

00:10:00.000 | But still something to consider.

00:10:02.800 | So I'm going to wait for this to finish and skip ahead to when it has.

00:10:08.480 | OK, so we are done.

00:10:10.640 | And we now have, OK, it's like 20, just about 24 minutes for that final model.

00:10:16.320 | So I've created this function.

00:10:19.360 | It's just going to go through and basically return documents for us.

00:10:24.160 | So let's try it with R.002 and see what we get.

00:10:30.320 | So we've been talking about red teaming for LLM02.

00:10:33.760 | What do we get?

00:10:35.440 | We get, OK, red teaming chat GPT.

00:10:37.840 | Not, no, not quite there.

00:10:39.360 | Let's try with the new small model.

00:10:46.480 | OK, cool.

00:10:47.920 | And let's see, did we mention LLM02 in here?

00:10:50.800 | No, no LLM02.

00:10:53.760 | So also not quite there.

00:10:55.680 | This was a pretty hard one.

00:10:57.760 | I haven't seen a model get this one yet.

00:11:00.080 | So let's see.

00:11:01.520 | We're starting with a hard question.

00:11:04.400 | OK, let's see.

00:11:08.160 | Let's see what we have here.

00:11:09.440 | OK, so it's talking about red team exercises, this and this.

00:11:12.960 | But I don't see LLM02.

00:11:19.440 | No, nothing in there.

00:11:22.080 | So, OK, maybe that question is too hard for any model, apparently.

00:11:28.000 | So let's try.

00:11:29.520 | All right, let's just go with, can you tell me why I want to use LLM02?

00:11:35.440 | Why would I want to use LLM02?

00:11:41.120 | Now, the models usually can get relevant results here.

00:11:46.480 | So, yeah, straight away, this one, you can see LLM02 scales up to this.

00:11:52.960 | It's helpfulness and safety is pretty good.

00:11:57.280 | Perform better than existing open source models.

00:11:59.760 | OK, cool.

00:12:00.320 | Good, that is, you know, I would hope they can get this one, as LLM02 can.

00:12:07.360 | OK, same result.

00:12:11.200 | I think it's probably the most relevant or one of the most relevant.

00:12:15.120 | So let's see.

00:12:15.760 | Let me see.

00:12:18.800 | So why do I want to use?

00:12:21.120 | And then here we get, so this is a large model, excuse me, is it the same?

00:12:26.160 | Oh, no, same result.

00:12:27.760 | OK, cool.

00:12:29.120 | That's fine.

00:12:30.320 | Let's try another question.

00:12:32.960 | OK, so let's try where we're comparing LLAMA to GPT-4 and just see how many of these manage

00:12:40.000 | to get either GPT-4 in there or LLAMA.

00:12:43.520 | So, OK, this is harder.

00:12:45.600 | OK, you know, that's like four of five results seem relevant.

00:12:50.960 | Are they actually talking about GPT-4 as well?

00:12:55.040 | And yeah, you can see GPT-4 in here.

00:12:58.000 | Don't actually see GPT-4 in here, see GPT-J.

00:13:02.640 | Oh, OK, no, no, no.

00:13:04.960 | So effectiveness of instruction tuning using GPT-4, but not necessarily comparing to GPT-4.

00:13:12.880 | OK, this one, I don't see them talking about LLAMA at all.

00:13:18.080 | So, OK, these two here are not relevant.

00:13:20.160 | This one, compare our chatbots instruction tuning with LLAMA, which LLAMA GPT-4 outperforms

00:13:25.760 | this one, this one, but there's still a gap.

00:13:28.640 | OK, so there's a comparison there.

00:13:30.320 | Fine.

00:13:31.540 | Here, OK, so that's a LLAMA fine-tuned on GPT-4 instructions or outputs, but there is

00:13:38.320 | a comparison.

00:13:39.360 | And again, OK, there's a comparison, right?

00:13:44.000 | So there's like three results that are compared.

00:13:47.680 | Accurate for the small model, let's see.

00:13:50.160 | We compare these, OK, relevant, I would say this one.

00:13:54.720 | Interesting.

00:13:55.280 | Second one, not relevant.

00:13:57.680 | Third one.

00:13:59.200 | All chatbots against GPT-4 comparisons run by a reward model indicator.

00:14:05.600 | All chatbots are compared against.

00:14:08.400 | OK, yeah, yeah, that's relevant.

00:14:09.920 | Two out of three.

00:14:12.880 | Here, I don't see anything where it's comparing to GPT-4.

00:14:19.360 | So I think that's a no.

00:14:21.120 | So it's two out of four now.

00:14:22.800 | OK, and then here there's, you know, talking kind of like about the comparisons.

00:14:26.320 | So three out of five.

00:14:29.200 | But then the other model was slightly, oh, it's the same.

00:14:33.600 | OK.

00:14:34.100 | Now let's go with the best model.

00:14:37.840 | We would expect to see more LLAMA, and I think I do.

00:14:41.600 | So this one has LLAMA in four of those answers.

00:14:44.640 | We compare.

00:14:45.440 | OK, we're comparing.

00:14:48.560 | This one, no.

00:14:49.280 | So look at this one.

00:14:51.360 | OK, they're comparing.

00:14:52.720 | So that's accurate.

00:14:54.720 | This one.

00:14:55.220 | OK, here, comparing again.

00:14:58.640 | And then this final one here, we have, OK, do we have GPT-4?

00:15:06.640 | Here, I think.

00:15:08.720 | So they have like Bard, Chart, GPT, GPT-4.

00:15:11.280 | And then they have some, I mean, this is a table.

00:15:13.200 | It's kind of hard to understand.

00:15:16.160 | But it seems like, OK, that is actually a comparison as well.

00:15:18.560 | So that one.

00:15:19.680 | OK, this one, it got four out of five.

00:15:23.040 | That's the best performing one.

00:15:24.320 | OK, that's good.

00:15:25.760 | That kind of, that correlates with what we would expect.

00:15:30.240 | Cool.

00:15:31.280 | OK, those are the new embedding models from OpenAI.

00:15:34.480 | I think it's kind of hard to see the performance difference there.

00:15:38.240 | I mean, you can see a little bit maybe with the large model.

00:15:42.400 | But given the performance differences we sort of saw in that table,

00:15:45.600 | at least on multilingual, there's a massive leap up, which is insane.

00:15:50.560 | I'm looking forward to trying the very small dimensionality

00:15:53.760 | and just comparing that to Arda002.

00:15:55.520 | I think that is very impressive.

00:15:57.120 | Definitely try that soon.

00:15:58.560 | But for now, looks pretty cool.

00:16:00.240 | Definitely want to try the other models as well that OpenAI have released.

00:16:04.960 | There are a few.

00:16:06.000 | So for now, I'm going to leave it there.

00:16:07.680 | I hope all this has been interesting and useful.

00:16:10.160 | So thank you very much for watching and I'll see you again in the next one.

00:16:14.240 | Bye.

00:16:15.040 | [MUSIC PLAYING]

00:16:18.400 | [MUSIC PLAYING]

00:16:21.760 | [MUSIC PLAYING]

00:16:25.120 | [MUSIC PLAYING]

00:16:28.480 | [BLANK_AUDIO]

OpenAI's NEW Embedding Models

Chapters