back to indexCohere vs. OpenAI embeddings — multilingual search
Chapters
0:0 What are Cohere embeddings
0:46 Cohere v OpenAI on cost
4:37 Cohere v OpenAI on performance
6:37 Implementing Cohere multilingual model
7:55 Data prep and embedding
10:45 Creating a vector index with Pinecone
14:7 Embedding and indexing everything
17:24 Making multilingual queries
21:55 Final throughts on Cohere and OpenAI
00:00:00.000 |
Today, we're going to take a look at Cohere's multilingual embedding model. 00:00:04.560 |
For those of you that are not aware of Cohere, 00:00:07.600 |
they are kind of similar to OpenAI in that they are essentially a service provider 00:00:13.360 |
of large language models and all of the services that come with that. 00:00:18.080 |
Now, right now they are not as well known as OpenAI, 00:00:22.840 |
which is understandable, OpenAI has been around for a bit longer, 00:00:26.120 |
but Cohere is actually a really good company that offers a lot of really good tooling 00:00:31.600 |
that is actually very much comparable to what OpenAI offers. 00:00:36.400 |
And that's actually the first thing I want to look at here. 00:00:38.600 |
I just want to show you a few comparison points 00:00:41.600 |
between Cohere and OpenAI in terms of embedding models. 00:00:46.200 |
Okay, so we're going to first take a look at the cost between these two. 00:00:49.640 |
OpenAI's sort of premier embedding model right now is Arda002, 00:00:58.120 |
Cohere doesn't have a per 1,000 tokens for the cost, 00:01:03.280 |
it actually goes with $1 per 1,000 embeddings. 00:01:09.360 |
Well, basically every call or every chunk of text that you ask Cohere to embed, 00:01:16.480 |
So one embedding, the maximum size of that is actually just over 4,000 tokens. 00:01:26.080 |
as in you are sending 4,000 tokens to every embedding call, 00:01:31.720 |
then that means you would be getting this comparable price here, 00:01:36.400 |
which is actually half price, which is pretty good. 00:01:41.320 |
Now, if we kind of translate this into something that's a bit more understandable, 00:01:45.520 |
we have like 13 paragraphs is roughly about 1,000 tokens. 00:01:51.440 |
So with Arda, with OpenAI, it's $1 per 32,500 paragraphs. 00:02:02.720 |
which is really good, but there is obviously a catch, 00:02:15.360 |
The chances are you're probably not going to use 4,000 embeddings 00:02:22.400 |
So 2,000 tokens, well, that's probably like 26 paragraphs. 00:02:29.120 |
realistically, you're probably going to do much less, right? 00:02:32.800 |
So if, let's say, you're going for more like 1,000 tokens, 00:02:41.600 |
is actually double the price of OpenAI in this instance. 00:02:48.200 |
So it kind of depends on what you're doing there, 00:02:51.200 |
as to whether you are throwing a load of text 00:02:59.200 |
Cohere can be cheaper, but it can also be more expensive, 00:03:13.280 |
Essentially, you can run your own AWS instance. 00:03:22.720 |
in the time that it would take you to encode 1 billion paragraphs, 00:03:32.120 |
It's also a lot quicker, and there are the other benefits as well. 00:03:46.200 |
this is a good indicator of how much it's going to cost you. 00:03:53.760 |
the more storage you need to store all of your embeddings 00:04:03.840 |
So Cohere is half the size of OpenAI in this case. 00:04:10.440 |
you would probably actually be saving money with Cohere 00:04:14.200 |
with this embedding size if you're storing a lot of vectors. 00:04:18.920 |
So, you know, that's definitely something to consider. 00:04:22.120 |
Like if you consider this with the embedding cost initially, 00:04:25.720 |
you know, maybe you're actually saving money with Cohere, 00:04:28.440 |
even if you're just embedding like 1,000 tokens 00:04:34.320 |
Long-term, you're probably going to end up saving money. 00:04:50.120 |
And, okay, I mean, Cohere for sure is coming out on top here. 00:04:56.640 |
like whether this is representative across the board or not. 00:05:01.640 |
But nonetheless, the two models that are comparable here 00:05:08.040 |
and OpenAI's ARDA002 model, which is English. 00:05:15.320 |
So it's pretty interesting that OpenAI's best English language model 00:05:19.080 |
is comparable to Cohere's multilingual model. 00:05:28.320 |
It's like imagine you retrieve all of your items 00:05:34.320 |
and you feed them into like a transform model 00:05:40.640 |
but generally speaking, it will be more accurate. 00:05:43.960 |
So I think they are pretty interesting results. 00:05:54.600 |
but it seems like Cohere, at least from what I've seen here, 00:05:58.560 |
is slightly ahead of OpenAI in terms of performance 00:06:05.960 |
which is not the best comparison, in all fairness, 00:06:19.120 |
that it's going to depend a lot on your particular use case. 00:06:23.720 |
So it's not that Cohere is better than OpenAI. 00:06:26.280 |
It's just that in some cases, they probably are better. 00:06:30.480 |
And in some cases, they're probably cheaper as well. 00:06:37.040 |
Now, how do we actually use Cohere for embeddings? 00:06:41.160 |
So we're going to be focusing on the Cohere multilingual model. 00:06:44.360 |
And this example we're going to be running through 00:06:53.600 |
based on a webinar that we are doing together. 00:06:59.440 |
and I've just kind of reformatted it in a way 00:07:05.160 |
and also show you, kind of focus on the multilingual search component of Cohere 00:07:15.840 |
Right, so the first thing we need to do is our pip installs. 00:07:22.240 |
Again, data from that Cohere and Pinecone client. 00:07:25.200 |
We're using the gRPC client so that we can upsert things faster. 00:07:35.720 |
So a couple of things to point out with Cohere's multilingual model 00:07:42.960 |
I think the benchmarks that they've tested it on 00:07:47.120 |
cover 16 of those languages or something around there. 00:07:51.520 |
And of course, you can create embeddings for longer chunks of text. 00:07:55.360 |
And this is the dataset we're going to be using. 00:07:58.280 |
It's some straight data from Wikipedia that Nils put together, I believe. 00:08:04.120 |
And it's just hosted under Cohere on Hugging Fist datasets. 00:08:11.560 |
For now, we're just going to look at the English and Italian 00:08:13.800 |
and we're going to see how we would put those and create a search with them. 00:08:17.800 |
And then what I'm going to do is switch across to an example 00:08:38.040 |
So if we're embedding these chunks one at a time, 00:08:41.000 |
maybe it would be more expensive using Cohere. 00:08:43.680 |
But I think, in reality, we could put a lot more of these together. 00:08:48.280 |
So we could put together like five of these chunks or more 00:09:18.880 |
From there, you take your API key and you just put it in here. 00:09:29.920 |
Cool. Then this is how you would embed something, right? 00:09:33.520 |
So we have a list of texts that we would like to embed 00:09:43.000 |
So co is just a client that we've initialized up here. 00:09:46.040 |
So co.embed text and then you have your model. 00:09:50.120 |
This is the only multilingual model that Cohere offers at the moment. 00:09:54.920 |
But, I mean, if you compare that to OpenAI right now, 00:10:01.200 |
So I think they've taken the lead with that, which is pretty cool. 00:10:10.440 |
It gives us a response and it has a lot of information in there. 00:10:17.560 |
And then we see dimensionality of those embeddings, 00:10:25.400 |
And then we have two of those vector embeddings there, right? 00:10:34.720 |
All right, now that's how we would use Cohere's embedding model. 00:10:40.080 |
But before we move on to actually creating our index, 00:10:43.120 |
where we're going to sort all of those embeddings, 00:10:47.320 |
So we're going to be using a vector database called Pinecone for this. 00:10:52.000 |
Now, Pinecone, again, we need API key, which we can get from over here. 00:11:06.920 |
So come over here, I can already see I have a couple of indexes in here. 00:11:11.480 |
If this is your first time using Pinecone, it will be empty, 00:11:14.400 |
and that's fine because we're going to create the index in the code. 00:11:23.920 |
take it over into your notebook, and you would paste it here. 00:11:32.600 |
Now, your environment is next to the API key in the console, right? 00:11:40.480 |
Your environment is not necessarily going to be the same as mine. 00:11:45.720 |
Okay, great. So that has initialized, and then we come down here, 00:11:49.920 |
and what we're going to do here is initialize an index, 00:11:53.080 |
which is where we're going to sort all of these embeddings. 00:11:56.160 |
Now, you give your index a name. It doesn't matter what you call it. 00:12:02.240 |
But there are a few things that are important here 00:12:09.080 |
Dimension is the dimensionality of your embedding. 00:12:25.000 |
it's going to be cheaper to store all of your vectors 00:12:30.400 |
So we need that, and our index needs to know this value. 00:12:34.520 |
So it needs to know the expected dimensionality 00:12:39.200 |
Then we have our metric, which is dot product. 00:12:42.200 |
This is needed by Cohere's multilingual model. 00:12:47.520 |
If you look on the, I think, the about page for the multilingual model, 00:12:54.080 |
And then these here, you can actually leave them empty. 00:13:03.040 |
So S1 is basically the storage-optimized pod for Pinecone, 00:13:08.560 |
which means you can put in about 5 million vectors in here 00:13:14.480 |
And then there's also P1, which is like the speed-optimized version, 00:13:19.280 |
which enables you to put in around 1 million vectors for free. 00:13:25.240 |
And then pods is the number of those pods you need. 00:13:28.040 |
So if you needed 10 million vectors, we'd say, "Okay, we need two pods here." 00:13:33.000 |
Cool, but we just need one. We're not paying that much in there. 00:13:39.640 |
We use this gRPC index, which we can also use index. 00:13:44.040 |
So we could also use this, but gRPC index is just more stable, 00:13:51.040 |
And then we're going to describe the index stats. 00:13:56.760 |
So for you, when you're running through this first time, 00:14:07.640 |
Now, with the embedding model and vector index itself, 00:14:11.920 |
we can move on to actually indexing everything. 00:14:14.360 |
So basically, we're just going to loop through our dataset, 00:14:20.760 |
So we're going to embed things with coherent, 00:14:23.760 |
and then what we're going to do is with those embeddings, 00:14:28.840 |
Actually, I don't think I showed you how we do that, 00:14:47.680 |
The line limit, so this is the number of records 00:14:50.480 |
from each language that we would like to include, 00:14:55.320 |
We have our data here, so I'm just formatting this 00:15:02.240 |
And errors, and this is just so we can store a few errors, 00:15:05.440 |
because every now and again, we might hit one, 00:15:09.960 |
It's not necessary, but there are ways to avoid it, basically. 00:15:13.120 |
That and not that hard, but for simplicity's sake, 00:15:16.960 |
So here, I'm just saying, don't go over the line limit, 00:15:21.520 |
and then we're going through English and Italian one at a time. 00:15:29.360 |
So it's actually just the iterable of the data, 00:15:49.200 |
ID, and also including text in there, as well. 00:15:53.800 |
Then what we do is we create this metadata list of dictionaries. 00:15:58.320 |
Now, each dictionary is going to contain some text, 00:16:03.920 |
the URL of the record, and also the language, 00:16:08.520 |
Then what we do is we add everything like this. 00:16:15.160 |
There's nothing too complicated going on there. 00:16:19.160 |
The one thing that I have added in there is occasionally... 00:16:30.600 |
But for some reason, not all of them are like this. 00:16:37.960 |
and they actually exceed the metadata limit in Pinecone, 00:16:45.080 |
So basically, we can add up to around 10 kilobytes of text 00:16:53.440 |
but some of them go over that, and they will throw an error. 00:16:56.680 |
So I'm actually, for now, I'm just skipping those. 00:17:00.840 |
is you would chunk those larger chunks of text 00:17:03.920 |
into smaller chunks, and then just add them individually, 00:17:24.440 |
Now, what we're going to do, so this is the more, 00:17:30.760 |
So to search through, what we do is we take a query, 00:17:39.760 |
So embed is exactly the same as what we did before 00:17:43.280 |
with cohere, and then we query with that embedding, xq here. 00:17:48.800 |
And we return the top three most similar items. 00:18:01.160 |
And then we return it in this kind of format. 00:18:27.840 |
because we don't have that much data in here, 00:19:02.320 |
but if you look on Wikipedia for him in English, 00:19:08.440 |
but there isn't really that much information there. 00:19:12.720 |
we're just getting like Italian results here. 00:19:17.760 |
So this is another one that I think in the English Wikipedia, 00:19:35.480 |
So let's switch across to the larger data set, 00:19:38.240 |
and I'll show you what the results look like there, 00:20:18.680 |
Right, we get this, literally three paragraphs. 00:20:29.200 |
or why being able to search the Italian stuff is useful, 00:20:39.520 |
what is Arancino, but I'm going to spell it wrong 00:20:43.720 |
just to point out the fact that it can actually handle that. 00:20:59.920 |
I kind of half expected to get it wrong anyway. 00:21:02.760 |
All right, so it's, we can go on here, see what it says. 00:21:07.160 |
So Arancino is a speciality of Sicilian cuisine. 00:21:22.840 |
So Arancino, pizza, and furi di zucca, it's amazing. 00:21:55.760 |
Okay, so that's it for this introduction to Cohera. 00:21:59.680 |
I feel like it was a bit longer than I had intended it to be, 00:22:03.560 |
but that's fine, I'm hoping that it was at least useful 00:22:06.400 |
and we kind of went through a lot of things there. 00:22:19.120 |
I think that is very much going to depend on your use case, 00:22:22.120 |
what you're doing, and many other factors, right? 00:22:29.400 |
then you're probably going to get some pretty good performance as well. 00:22:36.600 |
is actually the multilingual aspect of this model. 00:22:40.320 |
At the moment, OpenAI doesn't have any multilingual models, 00:22:46.160 |
Some of them, I think, can handle multilingual queries relatively well, 00:22:56.760 |
especially when you're dealing with multinational companies 00:22:59.880 |
or just companies that are not American or English or Australian as well. 00:23:07.560 |
The rest of the world speaks different languages, 00:23:10.640 |
so having this multilingual model is pretty good. 00:23:16.120 |
So, yeah, I mean, this is still very early days for Cohere. 00:23:21.520 |
I'm pretty excited. I know they have a lot planned 00:23:31.840 |
I hope all this has been useful and interesting.