back to indexOpenAI's NEW Embedding Models
Chapters
0:0 OpenAI Ada 002
1:25 New OpenAI Embedding Models
3:50 OpenAI Embedding Dimension Parameter
5:4 Using OpenAI Embedding 3
10:8 Comparing Ada 002 to Embed 3
00:00:00.000 |
Way back in December 2022, we had the biggest shift in how we approach AI ever. 00:00:08.480 |
That was thanks to OpenAI releasing ChatGPT at the very end of November. 00:00:14.880 |
ChatGPT quickly caught a lot of people's attention and it was in the month of December 00:00:21.200 |
that the interest in ChatGPT and AI really exploded. 00:00:26.560 |
But right in the middle of December, OpenAI released another model that also 00:00:35.040 |
But it didn't go as noticed as ChatGPT and that model was Text Embedding Order 002. 00:00:45.760 |
Very creative naming, but behind that name is a model that just completely 00:00:52.960 |
changed the way that we do information retrieval for natural language. 00:00:57.760 |
Which covers RAG, facades, and also basically any use case where you're retrieving text information. 00:01:06.240 |
Now since then, despite a huge explosion in the number of people using RAG and the really 00:01:12.640 |
cool things that you can do with RAG, OpenAI remained pretty quiet in their embedding models. 00:01:20.400 |
And there has been no new models since December 2022 until now. 00:01:25.760 |
OpenAI has just released two new embedding models and a ton of other things as well. 00:01:32.960 |
Those two embedding models are called Text Embedding 03 Small and Text Embedding 03 Large. 00:01:40.400 |
And when we look at the results that OpenAI is sharing right now, we can see a fairly 00:01:46.960 |
decent improvement on English language embeddings with the MTEB benchmark. 00:01:52.720 |
But perhaps more impressively, we see a massive improvement in the quality of multilingual 00:02:00.320 |
embeddings which are measured using the Miracle benchmark. 00:02:03.200 |
Now Order 002, state of the art when it was released and for a very long time afterwards 00:02:09.440 |
and still a top performing embedding model, that had an average score of 31.4 on Miracle. 00:02:16.080 |
The new Text Embedding 03 Large has an average score of 54.9 on Miracle. 00:02:26.320 |
Now, one of the other things you'll notice looking at these new models is that they have 00:02:32.720 |
not increased the max context window, so the maximum number of tokens that you can feed 00:02:38.080 |
That makes a lot of sense with embedding models because what you're trying to do with embeddings 00:02:42.240 |
is trying to compress the meaning of some text into a single point. 00:02:45.680 |
And if you have a larger chunk of text, there's usually many meanings within that text. 00:02:53.360 |
So going large and trying to compress into a single point doesn't, you know, those two 00:02:58.880 |
things don't really go together because that large text can have many meanings. 00:03:03.200 |
So it always makes sense to use smaller chunks and clearly OpenAI are aware of that. 00:03:08.800 |
They're not increasing the maximum number of tokens that you can embed with these models. 00:03:13.600 |
Now, the other thing which is maybe not as clear to me is that they have not trained 00:03:20.400 |
The knowledge date cutoff is still September 2021, which is a fair while ago now. 00:03:26.560 |
And okay, for embedding models, maybe that isn't quite as important as it is for LLMs, 00:03:33.920 |
It's good to have some context of recent events when you're trying to embed meaning. 00:03:38.400 |
So things like COVID, you ask a COVID question, these models, I imagine, are probably not 00:03:44.480 |
going to perform as well as, say, Cohere's embedding models, which have been trained 00:03:52.560 |
And one thing which I think is probably the most impressive thing that I've seen so far 00:03:58.640 |
is we're now able to decide how many dimensions we'd like in our vectors. 00:04:07.680 |
You reduce the number of dimensions, you're going to get reduced quality embeddings. 00:04:12.320 |
But what is incredibly interesting, and I almost don't quite believe it yet, I still 00:04:19.440 |
need to test this, is that they're saying that the large model, TextEmbedding3Large, 00:04:25.840 |
you can cut it down from 3,072 dimensions, which is larger than the previous models. 00:04:32.240 |
You can cut that down to 256 dimensions and still outperform Arda002, which is a 1,536 00:04:43.760 |
Compressing all of that performance into 256 floating point numbers is insane. 00:04:56.160 |
Not right now, but I'm going to test that and just prove to myself that that is possible. 00:05:01.200 |
I'm a little bit skeptical, but if so, incredible. 00:05:04.800 |
OK, so with that out of the way, let's jump into how we might use this new model. 00:05:10.480 |
OK, so jumping right into it, we have this notebook. 00:05:13.280 |
I'm going to share with you a link either in the description. 00:05:17.440 |
I will try and get a link added to the video as well. 00:05:20.400 |
And first thing I'm going to do is download Dataset. 00:05:27.840 |
OK, so I'm using this AI archive one I've used a million times before. 00:05:33.280 |
I'm going to remove all of the columns I don't care about. 00:05:35.840 |
I'm going to keep just ID text, metadata, typical format. 00:05:40.000 |
Then I'm going to initialize, or I'm going to take my OpenAI API key. 00:05:44.960 |
OK, so that's platform.openai.com if you need one. 00:05:49.360 |
And then this is how you create your new embeddings. 00:06:04.000 |
We're going to initialize connection to PyCone serverless. 00:06:09.760 |
And you can create multiple indices, which is what we need. 00:06:15.360 |
Because I want to test multiple models here with different dimensionalities. 00:06:19.120 |
So that's why I'm using serverless alongside all the other benefits that you get from it as well. 00:06:25.840 |
Now taking a look at this, these are the models we're going to take a look at. 00:06:31.600 |
Using the default dimensions for now, we will try the others pretty soon. 00:06:39.920 |
Well, kind of original, the V2 of embeddings from OpenAI. 00:06:44.800 |
So this is the one they released in December 2022. 00:06:51.280 |
Most of us will be very familiar with that number by now. 00:06:55.120 |
Now the small model uses the same dimensionality. 00:07:06.080 |
The other embedding models, the larger one, the one with the insane performance gains, 00:07:15.760 |
That means they can pack more meaning into that single vector. 00:07:21.040 |
So it makes sense that this is more performance. 00:07:23.600 |
But what is very cool is that you can compress this down to 256 dimensions. 00:07:31.040 |
And apparently still help perform this model here. 00:07:35.440 |
And I mean, that is 100% unheard of within vector embeddings. 00:07:41.920 |
Like, 256 dimensions and getting this level of performance is insane. 00:08:03.540 |
And then what I'm going to do is just index everything. 00:08:07.680 |
Now it takes a little bit of time to index everything. 00:08:14.480 |
we can have a quick look at how long this is taking. 00:08:16.880 |
Because this is also something to consider when you're choosing embedding models. 00:08:22.960 |
So straight away, one, the APIs right now are, I think, pretty slow. 00:08:31.520 |
So I expect during normal times, this number will probably be smaller. 00:08:39.760 |
So for Arda002, I'm getting 15 and 1/2 minutes to embed everything. 00:08:45.200 |
OK, it's to embed and throw everything into Pinecone. 00:08:51.280 |
Which, OK, probably maybe hasn't been as optimized as Arda002. 00:08:57.120 |
And also maybe more people are using this right now. 00:09:00.160 |
But generally, it's, I mean, pretty comparable speed there. 00:09:04.720 |
As we might expect, embedding through large is definitely slower. 00:09:09.600 |
OK, so right now, we're on track for about 24 minutes for that whole thing to embed. 00:09:20.160 |
That also means your embedding latency is going to be slower. 00:09:26.000 |
This is including, like, your network latency and everything. 00:09:29.360 |
And also, you know, going to Pinecone as well. 00:09:37.520 |
But then this one is almost two seconds slower. 00:09:41.520 |
Maybe make like a 1.5 seconds slower for a single iteration. 00:09:49.120 |
It will clearly slow down if you're using RAG or something like that. 00:09:53.440 |
It's going to slow down that process a little bit. 00:09:55.360 |
Probably not that much compared to, you know, the LLM generation component. 00:10:02.800 |
So I'm going to wait for this to finish and skip ahead to when it has. 00:10:10.640 |
And we now have, OK, it's like 20, just about 24 minutes for that final model. 00:10:19.360 |
It's just going to go through and basically return documents for us. 00:10:24.160 |
So let's try it with R.002 and see what we get. 00:10:30.320 |
So we've been talking about red teaming for LLM02. 00:11:09.440 |
OK, so it's talking about red team exercises, this and this. 00:11:22.080 |
So, OK, maybe that question is too hard for any model, apparently. 00:11:29.520 |
All right, let's just go with, can you tell me why I want to use LLM02? 00:11:41.120 |
Now, the models usually can get relevant results here. 00:11:46.480 |
So, yeah, straight away, this one, you can see LLM02 scales up to this. 00:11:57.280 |
Perform better than existing open source models. 00:12:00.320 |
Good, that is, you know, I would hope they can get this one, as LLM02 can. 00:12:11.200 |
I think it's probably the most relevant or one of the most relevant. 00:12:21.120 |
And then here we get, so this is a large model, excuse me, is it the same? 00:12:32.960 |
OK, so let's try where we're comparing LLAMA to GPT-4 and just see how many of these manage 00:12:45.600 |
OK, you know, that's like four of five results seem relevant. 00:12:50.960 |
Are they actually talking about GPT-4 as well? 00:13:04.960 |
So effectiveness of instruction tuning using GPT-4, but not necessarily comparing to GPT-4. 00:13:12.880 |
OK, this one, I don't see them talking about LLAMA at all. 00:13:20.160 |
This one, compare our chatbots instruction tuning with LLAMA, which LLAMA GPT-4 outperforms 00:13:31.540 |
Here, OK, so that's a LLAMA fine-tuned on GPT-4 instructions or outputs, but there is 00:13:44.000 |
So there's like three results that are compared. 00:13:50.160 |
We compare these, OK, relevant, I would say this one. 00:13:59.200 |
All chatbots against GPT-4 comparisons run by a reward model indicator. 00:14:12.880 |
Here, I don't see anything where it's comparing to GPT-4. 00:14:22.800 |
OK, and then here there's, you know, talking kind of like about the comparisons. 00:14:29.200 |
But then the other model was slightly, oh, it's the same. 00:14:37.840 |
We would expect to see more LLAMA, and I think I do. 00:14:41.600 |
So this one has LLAMA in four of those answers. 00:14:58.640 |
And then this final one here, we have, OK, do we have GPT-4? 00:15:11.280 |
And then they have some, I mean, this is a table. 00:15:16.160 |
But it seems like, OK, that is actually a comparison as well. 00:15:25.760 |
That kind of, that correlates with what we would expect. 00:15:31.280 |
OK, those are the new embedding models from OpenAI. 00:15:34.480 |
I think it's kind of hard to see the performance difference there. 00:15:38.240 |
I mean, you can see a little bit maybe with the large model. 00:15:42.400 |
But given the performance differences we sort of saw in that table, 00:15:45.600 |
at least on multilingual, there's a massive leap up, which is insane. 00:15:50.560 |
I'm looking forward to trying the very small dimensionality 00:16:00.240 |
Definitely want to try the other models as well that OpenAI have released. 00:16:07.680 |
I hope all this has been interesting and useful. 00:16:10.160 |
So thank you very much for watching and I'll see you again in the next one.