back to index

Cohere vs. OpenAI embeddings — multilingual search


Chapters

0:0 What are Cohere embeddings
0:46 Cohere v OpenAI on cost
4:37 Cohere v OpenAI on performance
6:37 Implementing Cohere multilingual model
7:55 Data prep and embedding
10:45 Creating a vector index with Pinecone
14:7 Embedding and indexing everything
17:24 Making multilingual queries
21:55 Final throughts on Cohere and OpenAI

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to take a look at Cohere's multilingual embedding model.
00:00:04.560 | For those of you that are not aware of Cohere,
00:00:07.600 | they are kind of similar to OpenAI in that they are essentially a service provider
00:00:13.360 | of large language models and all of the services that come with that.
00:00:18.080 | Now, right now they are not as well known as OpenAI,
00:00:22.840 | which is understandable, OpenAI has been around for a bit longer,
00:00:26.120 | but Cohere is actually a really good company that offers a lot of really good tooling
00:00:31.600 | that is actually very much comparable to what OpenAI offers.
00:00:36.400 | And that's actually the first thing I want to look at here.
00:00:38.600 | I just want to show you a few comparison points
00:00:41.600 | between Cohere and OpenAI in terms of embedding models.
00:00:46.200 | Okay, so we're going to first take a look at the cost between these two.
00:00:49.640 | OpenAI's sort of premier embedding model right now is Arda002,
00:00:54.720 | it comes out to this much per 1,000 tokens.
00:00:58.120 | Cohere doesn't have a per 1,000 tokens for the cost,
00:01:03.280 | it actually goes with $1 per 1,000 embeddings.
00:01:07.560 | What does one embedding mean?
00:01:09.360 | Well, basically every call or every chunk of text that you ask Cohere to embed,
00:01:15.160 | that is one embedding.
00:01:16.480 | So one embedding, the maximum size of that is actually just over 4,000 tokens.
00:01:23.000 | So if you're maxing out every embedding,
00:01:26.080 | as in you are sending 4,000 tokens to every embedding call,
00:01:31.720 | then that means you would be getting this comparable price here,
00:01:36.400 | which is actually half price, which is pretty good.
00:01:41.320 | Now, if we kind of translate this into something that's a bit more understandable,
00:01:45.520 | we have like 13 paragraphs is roughly about 1,000 tokens.
00:01:50.200 | These are the prices, right?
00:01:51.440 | So with Arda, with OpenAI, it's $1 per 32,500 paragraphs.
00:01:57.280 | Cohere is actually $1 per 65,000 paragraphs,
00:02:02.720 | which is really good, but there is obviously a catch,
00:02:06.680 | which is this thing up here, or this.
00:02:11.160 | $1 per 1,000 embeddings, right?
00:02:15.360 | The chances are you're probably not going to use 4,000 embeddings
00:02:19.760 | with every call to Cohere.
00:02:22.400 | So 2,000 tokens, well, that's probably like 26 paragraphs.
00:02:26.560 | If you're embedding 26 paragraphs at a time,
00:02:29.120 | realistically, you're probably going to do much less, right?
00:02:32.800 | So if, let's say, you're going for more like 1,000 tokens,
00:02:37.120 | which I think is more realistic,
00:02:39.120 | then obviously the price of Cohere
00:02:41.600 | is actually double the price of OpenAI in this instance.
00:02:48.200 | So it kind of depends on what you're doing there,
00:02:51.200 | as to whether you are throwing a load of text
00:02:53.880 | into your embeddings or not.
00:02:56.560 | So I think the costs are pretty comparable.
00:02:59.200 | Cohere can be cheaper, but it can also be more expensive,
00:03:03.040 | according to this logic anyway.
00:03:05.520 | Okay, so one thing I missed very quickly
00:03:08.280 | is the on-prem solution that Cohere offers.
00:03:11.760 | So we have it here.
00:03:13.280 | Essentially, you can run your own AWS instance.
00:03:18.280 | And in the time that it would take you,
00:03:20.400 | this is assuming you're running at 100%,
00:03:22.720 | in the time that it would take you to encode 1 billion paragraphs,
00:03:26.160 | if you use Cohere's on-prem solution,
00:03:28.680 | you would end up paying $2,500.
00:03:32.120 | It's also a lot quicker, and there are the other benefits as well.
00:03:36.840 | But I thought when we're talking about cost,
00:03:38.960 | we should definitely include that in there.
00:03:41.480 | So, you know, it depends, essentially.
00:03:44.120 | Embedding size, actually, you know,
00:03:46.200 | this is a good indicator of how much it's going to cost you.
00:03:49.000 | So it's actually under cost.
00:03:51.560 | The higher your embedding size,
00:03:53.760 | the more storage you need to store all of your embeddings
00:03:57.240 | after you've created them, right?
00:03:59.280 | So the embedding size, smaller, is cheaper.
00:04:03.840 | So Cohere is half the size of OpenAI in this case.
00:04:08.000 | So, you know, long-term,
00:04:10.440 | you would probably actually be saving money with Cohere
00:04:14.200 | with this embedding size if you're storing a lot of vectors.
00:04:18.920 | So, you know, that's definitely something to consider.
00:04:22.120 | Like if you consider this with the embedding cost initially,
00:04:25.720 | you know, maybe you're actually saving money with Cohere,
00:04:28.440 | even if you're just embedding like 1,000 tokens
00:04:31.280 | or even 500 tokens at a time.
00:04:34.320 | Long-term, you're probably going to end up saving money.
00:04:37.840 | Now, performance.
00:04:39.640 | So this is kind of hard to judge
00:04:44.000 | because this is a single benchmark
00:04:47.600 | that Nozoram has put together.
00:04:50.120 | And, okay, I mean, Cohere for sure is coming out on top here.
00:04:54.880 | It's kind of hard to say, again,
00:04:56.640 | like whether this is representative across the board or not.
00:05:01.640 | But nonetheless, the two models that are comparable here
00:05:06.000 | are Cohere's multilingual model
00:05:08.040 | and OpenAI's ARDA002 model, which is English.
00:05:12.200 | And this is a English search task.
00:05:15.320 | So it's pretty interesting that OpenAI's best English language model
00:05:19.080 | is comparable to Cohere's multilingual model.
00:05:22.920 | Cohere's English model is better.
00:05:25.160 | And then there's the Cohere Reranker.
00:05:27.120 | This is an embedding model.
00:05:28.320 | It's like imagine you retrieve all of your items
00:05:32.400 | or you get two chunks of text
00:05:34.320 | and you feed them into like a transform model
00:05:37.000 | and compare them directly.
00:05:38.640 | It is basically a lot slower,
00:05:40.640 | but generally speaking, it will be more accurate.
00:05:43.960 | So I think they are pretty interesting results.
00:05:48.800 | It seems like they're kind of on par,
00:05:51.600 | like OpenAI and Cohere are very on par,
00:05:54.600 | but it seems like Cohere, at least from what I've seen here,
00:05:58.560 | is slightly ahead of OpenAI in terms of performance
00:06:04.800 | on that single benchmark,
00:06:05.960 | which is not the best comparison, in all fairness,
00:06:11.080 | but also slightly cheaper in the long run
00:06:13.960 | because of the embedding size.
00:06:16.080 | But again, everything here is so close
00:06:19.120 | that it's going to depend a lot on your particular use case.
00:06:23.720 | So it's not that Cohere is better than OpenAI.
00:06:26.280 | It's just that in some cases, they probably are better.
00:06:30.480 | And in some cases, they're probably cheaper as well.
00:06:33.280 | So that's definitely something to consider.
00:06:37.040 | Now, how do we actually use Cohere for embeddings?
00:06:41.160 | So we're going to be focusing on the Cohere multilingual model.
00:06:44.360 | And this example we're going to be running through
00:06:46.920 | is not really my example.
00:06:50.240 | I've taken this example from Nils Reimers
00:06:53.600 | based on a webinar that we are doing together.
00:06:57.240 | He's basically put all this together
00:06:59.440 | and I've just kind of reformatted it in a way
00:07:02.600 | so that I can show you how it works
00:07:05.160 | and also show you, kind of focus on the multilingual search component of Cohere
00:07:10.120 | and show you how it works.
00:07:12.160 | So let's just jump straight into it.
00:07:15.840 | Right, so the first thing we need to do is our pip installs.
00:07:19.120 | So we have Hugging Fist datasets here.
00:07:22.240 | Again, data from that Cohere and Pinecone client.
00:07:25.200 | We're using the gRPC client so that we can upsert things faster.
00:07:30.040 | We'll see how to use that soon.
00:07:33.160 | Now, I actually have a couple of notes here.
00:07:35.720 | So a couple of things to point out with Cohere's multilingual model
00:07:40.720 | is that it supports more than 100 languages.
00:07:42.960 | I think the benchmarks that they've tested it on
00:07:47.120 | cover 16 of those languages or something around there.
00:07:51.520 | And of course, you can create embeddings for longer chunks of text.
00:07:55.360 | And this is the dataset we're going to be using.
00:07:58.280 | It's some straight data from Wikipedia that Nils put together, I believe.
00:08:04.120 | And it's just hosted under Cohere on Hugging Fist datasets.
00:08:08.840 | So let's have a look at these.
00:08:11.560 | For now, we're just going to look at the English and Italian
00:08:13.800 | and we're going to see how we would put those and create a search with them.
00:08:17.800 | And then what I'm going to do is switch across to an example
00:08:22.040 | where we have way more data in the database
00:08:25.480 | and that covers, I think, nine languages.
00:08:28.560 | But it is pretty interesting.
00:08:30.240 | So this is what a day looks like.
00:08:32.680 | We just have some text in the middle.
00:08:34.800 | That's what we're going to be encoding.
00:08:38.040 | So if we're embedding these chunks one at a time,
00:08:41.000 | maybe it would be more expensive using Cohere.
00:08:43.680 | But I think, in reality, we could put a lot more of these together.
00:08:48.280 | So we could put together like five of these chunks or more
00:08:51.840 | and it should work pretty well.
00:08:54.640 | So, okay, let's go down.
00:08:57.840 | Here, you need a Cohere API key.
00:09:00.800 | So to get that, you would go to here.
00:09:05.360 | So you type in dashboard.cohere.ai.
00:09:10.000 | Okay, and you'll probably have to log in
00:09:11.640 | if you haven't already logged in to Cohere.
00:09:13.760 | And then you go over to the left here
00:09:16.280 | and you will find some API keys.
00:09:18.880 | From there, you take your API key and you just put it in here.
00:09:23.400 | Okay, I have my API key stored already
00:09:27.040 | in a variable called Cohere API key.
00:09:29.920 | Cool. Then this is how you would embed something, right?
00:09:33.520 | So we have a list of texts that we would like to embed
00:09:39.120 | and we just pass them to this co.embed.
00:09:43.000 | So co is just a client that we've initialized up here.
00:09:46.040 | So co.embed text and then you have your model.
00:09:50.120 | This is the only multilingual model that Cohere offers at the moment.
00:09:54.920 | But, I mean, if you compare that to OpenAI right now,
00:09:58.960 | they just offer English models.
00:10:01.200 | So I think they've taken the lead with that, which is pretty cool.
00:10:06.160 | Pull embeddings from response.
00:10:08.600 | So, okay, we create our embeddings.
00:10:10.440 | It gives us a response and it has a lot of information in there.
00:10:13.680 | But all we need are the embeddings, right?
00:10:15.440 | So we're just starting those out.
00:10:17.560 | And then we see dimensionality of those embeddings,
00:10:21.040 | which is going to be 768.
00:10:23.800 | So that's the dimensionality.
00:10:25.400 | And then we have two of those vector embeddings there, right?
00:10:28.800 | So we have two 768 dimensional vectors
00:10:31.920 | because we have two sentences.
00:10:34.720 | All right, now that's how we would use Cohere's embedding model.
00:10:40.080 | But before we move on to actually creating our index,
00:10:43.120 | where we're going to sort all of those embeddings,
00:10:45.120 | we need to initialize an index.
00:10:47.320 | So we're going to be using a vector database called Pinecone for this.
00:10:52.000 | Now, Pinecone, again, we need API key, which we can get from over here.
00:10:57.560 | Again, it's free. So app.pinecone.io.
00:11:01.440 | I'll just copy and paste that.
00:11:04.520 | Okay, cool.
00:11:06.920 | So come over here, I can already see I have a couple of indexes in here.
00:11:11.480 | If this is your first time using Pinecone, it will be empty,
00:11:14.400 | and that's fine because we're going to create the index in the code.
00:11:18.040 | But what you do need is your API key, right?
00:11:21.080 | So your API key is here. You copy that,
00:11:23.920 | take it over into your notebook, and you would paste it here.
00:11:26.920 | Now, again, I've stored mine in a variable.
00:11:30.600 | Then you also have your environment.
00:11:32.600 | Now, your environment is next to the API key in the console, right?
00:11:36.360 | So here, us-east1-gcp.
00:11:40.480 | Your environment is not necessarily going to be the same as mine.
00:11:43.360 | So you should check that.
00:11:45.720 | Okay, great. So that has initialized, and then we come down here,
00:11:49.920 | and what we're going to do here is initialize an index,
00:11:53.080 | which is where we're going to sort all of these embeddings.
00:11:56.160 | Now, you give your index a name. It doesn't matter what you call it.
00:11:59.800 | Okay, you can call it whatever you want.
00:12:02.240 | But there are a few things that are important here
00:12:04.240 | that we should not change.
00:12:06.880 | So dimension.
00:12:09.080 | Dimension is the dimensionality of your embedding.
00:12:12.000 | So it's coming from Cohere, right?
00:12:15.000 | This is where I mentioned before,
00:12:16.440 | there's the price advantage of using Cohere.
00:12:21.440 | When dimensionality is lower, like 768,
00:12:25.000 | it's going to be cheaper to store all of your vectors
00:12:27.680 | if you are needing to pay for that storage.
00:12:30.400 | So we need that, and our index needs to know this value.
00:12:34.520 | So it needs to know the expected dimensionality
00:12:37.320 | of the vectors we're putting into it.
00:12:39.200 | Then we have our metric, which is dot product.
00:12:42.200 | This is needed by Cohere's multilingual model.
00:12:47.520 | If you look on the, I think, the about page for the multilingual model,
00:12:51.560 | it will say you need to use dot product.
00:12:54.080 | And then these here, you can actually leave them empty.
00:12:57.040 | The default values for these are also okay,
00:13:01.040 | but I thought I'd put them in there.
00:13:03.040 | So S1 is basically the storage-optimized pod for Pinecone,
00:13:08.560 | which means you can put in about 5 million vectors in here
00:13:12.400 | for free without paying anything.
00:13:14.480 | And then there's also P1, which is like the speed-optimized version,
00:13:19.280 | which enables you to put in around 1 million vectors for free.
00:13:25.240 | And then pods is the number of those pods you need.
00:13:28.040 | So if you needed 10 million vectors, we'd say, "Okay, we need two pods here."
00:13:33.000 | Cool, but we just need one. We're not paying that much in there.
00:13:35.560 | So we'd run that.
00:13:38.160 | Then we'd connect to the index.
00:13:39.640 | We use this gRPC index, which we can also use index.
00:13:44.040 | So we could also use this, but gRPC index is just more stable,
00:13:48.040 | and it's also faster, so we're doing that.
00:13:51.040 | And then we're going to describe the index stats.
00:13:52.720 | So we're going to see what is in there.
00:13:54.240 | Now, I already created the index before.
00:13:56.760 | So for you, when you're running through this first time,
00:13:59.720 | this will actually say zero.
00:14:01.680 | For me, I've already added things in there,
00:14:03.400 | and that's why it's at 200,100.
00:14:07.640 | Now, with the embedding model and vector index itself,
00:14:11.920 | we can move on to actually indexing everything.
00:14:14.360 | So basically, we're just going to loop through our dataset,
00:14:18.720 | and we're going to do what we just did.
00:14:20.760 | So we're going to embed things with coherent,
00:14:23.760 | and then what we're going to do is with those embeddings,
00:14:26.160 | we're going to add them into Pinecone.
00:14:28.840 | Actually, I don't think I showed you how we do that,
00:14:32.360 | but it's really simple.
00:14:33.240 | It's actually just this line here.
00:14:36.640 | But let me explain what we have here.
00:14:39.760 | So batch size is the number of items
00:14:43.040 | that we're going to send to coherent
00:14:44.240 | and then up into Pinecone at any one time.
00:14:47.680 | The line limit, so this is the number of records
00:14:50.480 | from each language that we would like to include,
00:14:52.920 | that we'd like to embed and add to Pinecone.
00:14:55.320 | We have our data here, so I'm just formatting this
00:14:58.280 | so that it's a bit easier later on
00:15:00.160 | when we get to this bit here.
00:15:02.240 | And errors, and this is just so we can store a few errors,
00:15:05.440 | because every now and again, we might hit one,
00:15:07.880 | and I'll explain why.
00:15:09.960 | It's not necessary, but there are ways to avoid it, basically.
00:15:13.120 | That and not that hard, but for simplicity's sake,
00:15:15.600 | I haven't included them in here.
00:15:16.960 | So here, I'm just saying, don't go over the line limit,
00:15:21.520 | and then we're going through English and Italian one at a time.
00:15:25.920 | We get the relevant batch from our data,
00:15:28.080 | which we've created here.
00:15:29.360 | So it's actually just the iterable of the data,
00:15:32.360 | first English and Italian.
00:15:34.080 | We extract the text from that.
00:15:36.720 | We create our embeddings using that text.
00:15:39.240 | Then we just create some IDs.
00:15:41.200 | This is just an ID variable
00:15:44.040 | that was in the data up at the top here.
00:15:49.200 | ID, and also including text in there, as well.
00:15:53.800 | Then what we do is we create this metadata list of dictionaries.
00:15:58.320 | Now, each dictionary is going to contain some text,
00:16:02.040 | a title from the record,
00:16:03.920 | the URL of the record, and also the language,
00:16:06.160 | so English or Italian.
00:16:08.520 | Then what we do is we add everything like this.
00:16:13.040 | So it's pretty straightforward.
00:16:15.160 | There's nothing too complicated going on there.
00:16:19.160 | The one thing that I have added in there is occasionally...
00:16:23.800 | So we saw the text earlier on.
00:16:27.680 | They were pretty short chunks of text.
00:16:30.600 | But for some reason, not all of them are like this.
00:16:32.480 | It's kind of like a messy data set.
00:16:34.520 | So some of them are actually quite long,
00:16:37.960 | and they actually exceed the metadata limit in Pinecone,
00:16:42.240 | which is 10 kilobytes per vector.
00:16:45.080 | So basically, we can add up to around 10 kilobytes of text
00:16:51.040 | with per vector in Pinecone,
00:16:53.440 | but some of them go over that, and they will throw an error.
00:16:56.680 | So I'm actually, for now, I'm just skipping those.
00:16:58.640 | But in reality, what you do
00:17:00.840 | is you would chunk those larger chunks of text
00:17:03.920 | into smaller chunks, and then just add them individually,
00:17:07.920 | or just store your text somewhere else.
00:17:09.920 | It doesn't have to go into Pinecone.
00:17:11.600 | Right. Now, I've already run this.
00:17:13.640 | I'm not going to run it again.
00:17:16.000 | And yeah, I can just come down to here.
00:17:18.160 | I can run this.
00:17:19.240 | We have our describe index.
00:17:20.600 | It looks the same as it did before for me.
00:17:22.720 | Okay, cool.
00:17:24.440 | Now, what we're going to do, so this is the more,
00:17:26.240 | I think, more interesting part is searching.
00:17:30.760 | So to search through, what we do is we take a query,
00:17:36.240 | we embed it, and then we...
00:17:39.760 | So embed is exactly the same as what we did before
00:17:43.280 | with cohere, and then we query with that embedding, xq here.
00:17:48.800 | And we return the top three most similar items.
00:17:52.880 | And then we want to include metadata,
00:17:54.400 | which is going to contain our text, title,
00:17:57.280 | and a couple of other things.
00:17:58.880 | The URL is pretty important.
00:18:01.160 | And then we return it in this kind of format.
00:18:03.360 | We include this.
00:18:05.360 | This is a pretty good idea from Nils.
00:18:08.080 | We include the translate URL.
00:18:10.480 | That will just allow us,
00:18:11.920 | so when we're getting Italian results
00:18:13.960 | or any other language results,
00:18:15.680 | we just click on this.
00:18:16.920 | It will take us to Google Translate,
00:18:18.960 | and we can see what it actually says.
00:18:21.720 | So let's run this.
00:18:23.480 | And we can try both of these.
00:18:25.600 | I'm not even sure if they work that well
00:18:27.840 | because we don't have that much data in here,
00:18:30.600 | but we can try.
00:18:32.280 | Okay.
00:18:33.880 | I don't know any...
00:18:35.920 | Okay, yeah, so number three here.
00:18:38.560 | So this is, you know, he's famous in Italy,
00:18:42.200 | but I think less famous outside of Italy.
00:18:45.640 | So if we go to here,
00:18:48.680 | you see translation,
00:18:50.560 | and you can see, okay,
00:18:51.840 | he's one of the most important
00:18:52.960 | and prestigious personalities
00:18:54.360 | in the fight against the mafia.
00:18:55.680 | He was killed by Cosa Nostra
00:18:57.680 | together with his wife and so on and so on.
00:19:00.280 | Right, so he's super famous in Italy,
00:19:02.320 | but if you look on Wikipedia for him in English,
00:19:06.120 | I think it mentions a little bit about him,
00:19:08.440 | but there isn't really that much information there.
00:19:10.640 | So that's why we're getting, you know,
00:19:12.720 | we're just getting like Italian results here.
00:19:15.000 | And then if we go for this one as well.
00:19:17.760 | So this is another one that I think in the English Wikipedia,
00:19:20.720 | there's like a paragraph about this.
00:19:23.440 | But then if you go to the Italian Wikipedia,
00:19:27.720 | there is a ton of these.
00:19:29.680 | Now, in this, I don't have...
00:19:32.760 | Yeah, I don't have enough data in here.
00:19:35.480 | So let's switch across to the larger data set,
00:19:38.240 | and I'll show you what the results look like there,
00:19:41.000 | which are much better.
00:19:42.200 | Okay, I can ask about this one here.
00:19:46.120 | So what is the Mafia Capital case?
00:19:48.640 | Okay, and we get Mafia Capitale here.
00:19:53.040 | And if you go to translate,
00:19:55.000 | you can see, yes, that is, you know,
00:19:57.800 | that is the thing that I was talking about.
00:20:00.520 | And then if we go to Wikipedia here,
00:20:03.120 | I'm going to point out, okay,
00:20:04.480 | so you get all of this text, which is tons.
00:20:09.240 | If we go to the English version,
00:20:11.800 | okay, so I'm searching in Google here,
00:20:14.880 | Mafia Capitale, what do we get?
00:20:18.680 | Right, we get this, literally three paragraphs.
00:20:22.720 | So, you know, basically nothing.
00:20:25.160 | So you can see why it would be bringing
00:20:26.720 | the Italian stuff here rather than the,
00:20:29.200 | or why being able to search the Italian stuff is useful,
00:20:34.000 | even if you're speaking English.
00:20:36.440 | Now, another one we're going to ask,
00:20:39.520 | what is Arancino, but I'm going to spell it wrong
00:20:43.720 | just to point out the fact that it can actually handle that.
00:20:48.800 | Maybe Arancino, oh, I spot it.
00:20:52.240 | No, no, I did get it right, okay.
00:20:55.520 | So this is wrong.
00:20:57.360 | The one I did before was actually correct.
00:20:59.920 | I kind of half expected to get it wrong anyway.
00:21:02.760 | All right, so it's, we can go on here, see what it says.
00:21:07.160 | So Arancino is a speciality of Sicilian cuisine.
00:21:13.640 | Arancini di riso, it's very nice,
00:21:17.200 | if you ever have the chance to try it.
00:21:19.680 | You should have this with a pizza.
00:21:22.840 | So Arancino, pizza, and furi di zucca, it's amazing.
00:21:27.920 | It's like my favorite meal.
00:21:30.800 | Okay, so let's try one more.
00:21:34.760 | Who is Emma Maroni?
00:21:39.560 | Is that right? Yes.
00:21:41.480 | Okay, so go to here.
00:21:46.040 | And I don't actually know who this is,
00:21:49.840 | so I hope this is correct.
00:21:52.360 | It's apparently this person.
00:21:55.760 | Okay, so that's it for this introduction to Cohera.
00:21:59.680 | I feel like it was a bit longer than I had intended it to be,
00:22:03.560 | but that's fine, I'm hoping that it was at least useful
00:22:06.400 | and we kind of went through a lot of things there.
00:22:09.160 | So, yeah, I just wanted to share this.
00:22:12.520 | It's a alternative to OpenAI.
00:22:15.360 | I'm not saying it's necessarily better,
00:22:17.240 | I'm not saying it's necessarily cheaper.
00:22:19.120 | I think that is very much going to depend on your use case,
00:22:22.120 | what you're doing, and many other factors, right?
00:22:25.440 | You can train these models, for example,
00:22:27.360 | if you're able to train them,
00:22:29.400 | then you're probably going to get some pretty good performance as well.
00:22:33.800 | And I suppose one big factor here
00:22:36.600 | is actually the multilingual aspect of this model.
00:22:40.320 | At the moment, OpenAI doesn't have any multilingual models,
00:22:44.320 | or none are actually trained to do that.
00:22:46.160 | Some of them, I think, can handle multilingual queries relatively well,
00:22:50.200 | but they haven't been trained for that.
00:22:52.640 | And this can be relatively problematic,
00:22:56.760 | especially when you're dealing with multinational companies
00:22:59.880 | or just companies that are not American or English or Australian as well.
00:23:05.800 | I'm not going to forget you.
00:23:07.560 | The rest of the world speaks different languages,
00:23:10.640 | so having this multilingual model is pretty good.
00:23:16.120 | So, yeah, I mean, this is still very early days for Cohere.
00:23:21.520 | I'm pretty excited. I know they have a lot planned
00:23:24.640 | and that will be really interesting to see.
00:23:28.520 | But for now, I think we'll leave it there.
00:23:31.840 | I hope all this has been useful and interesting.
00:23:35.080 | So, thank you very much for watching
00:23:37.080 | and I will see you again in the next one.
00:23:40.480 | (MUSIC)
00:23:45.480 | (MUSIC)
00:23:50.480 | (MUSIC)