Cohere for VPs of AI: Vivek Muppalla

00:00:00.040 | Hey folks, this is Vivek. I'm super excited to chat with all of y'all. We'll give a quick talk

00:00:19.120 | about what Cohere is all about and we'll make sure we have enough time to chat about your

00:00:24.760 | production challenges with these models or anything you want to chat about. Quick intro,

00:00:30.880 | so we are a leading data security focused enterprise AI company. Our focus is building

00:00:38.380 | trustworthy enterprise AI models with our partners for real-world business use cases and we work with

00:00:47.440 | a lot of like strategics across various clouds and have a bit of a swizzle in play when it

00:00:54.640 | comes to where we can ship and deploy and I'll talk to that a little later. So we have a crack

00:01:00.400 | team of ML researchers and seasoned enterprise operators. Aidan, who's our co-founder and CEO,

00:01:08.140 | was one of the authors on the Seminole Transformer paper. We have Phil, who's our chief scientist. He

00:01:15.420 | was an NLP lead at DeepMind and a professor at Oxford. Nils, who leads a lot of our retrieval efforts, was built as

00:01:24.520 | and then Patrick, who was the co-author on the rag paper, which is a big focus for us at Cohere. Fantastic team that's helping us build lots of amazing things.

00:01:37.360 | Here's a quick overview of our product line. We have two main arcs, the generative side and then the advanced retrieval models. On the generative side, we have two flagship models, which is command R and R+. R is our work cost.

00:01:54.400 | Super low cost model, that's great for most enterprise use cases. And then we have R+, which is your more powerful larger model for more complex like reasoning, tool use, drag use cases.

00:02:09.240 | And on the retrieval side, most people have used some form of an embedding model. And I'll get into that a little later in the talk. But the re-ranker is something special that I haven't seen quite often in the market. But we think it adds a lot of value, especially to our rag pipelines. So we'll get into that too.

00:02:29.240 | So when we build at Cohere, so we have five guiding ethos as to how we want to go about things. The first is, obviously, everybody's testing their models and all sorts of academic benchmarks. But for us, what is really important is the performance on enterprise use cases.

00:02:48.240 | So we've worked with a lot of our partners to ensure that we have an eval suite that is highly customized to enterprise use cases across, let's say, health, HR, finance.

00:03:00.400 | And that's a bit of our goalpost as we ship each of these model versions. And we constantly benchmark on how we're performing at each of these industries and use cases that our customers care about.

00:03:13.400 | And then the next thing is all about efficiency and scalability, right? We're not particularly chasing the race for having the largest model out there. But what we really care about is the practical use of these models, right?

00:03:27.400 | How do these models get used? And how cheap is it? And how easy is it for you to run it as a customer?

00:03:34.400 | The next big thing is, obviously, customization. As much as we'd like for all of these models to work out of the box, there's always a certain niche that customers want to customize this for.

00:03:48.400 | And we offer a variety of things, some of which are pretty intrusive. We've helped our customers with taking our base model and retraining that with their enterprise-specific data for domain adaptation.

00:04:03.400 | We can do a full retraining of the model for you with your data. And then, obviously, the pretty typical last few layers retraining, which is self-serve on our platform.

00:04:13.400 | Data prominence and privacy is another big focus for us. We've worked quite a bit to ensure that all of the data that we've collected for building our models meets up to the enterprise standards.

00:04:28.400 | And we offer indemnification for any IP claims that you might run into as a customer. And then, obviously, we don't ever use any of your data to train our model. So that's a guarantee from us.

00:04:41.400 | And deployment flexibility. As I mentioned, we're available on pretty much every major cloud provider. And then, we also allow you to deploy on-prem or in your own VPC wherever your compute and your data is. That's where we'll meet you at.

00:04:57.400 | Cool. So just a quick look at your typical performance metrics. As I mentioned, something that enterprises repeatedly tell us is they care about multilingual, they care about RAG, and tool use for upcoming agentic use cases.

00:05:16.400 | So a lot of our focus has been in these areas. These are some benchmarks from Hotpot QA, Bamboogle, Berkeley function calling that our models are quite good at.

00:05:29.400 | And another example of how we try to innovate is we try to make sure that as we are building these stacks, we're incorporating all of the features that people care about out of the box and they don't have to do extra work.

00:05:46.400 | Citations on RAG is a great example of this. For most people, you have to do a lot of work to actually build this functionality with other APIs.

00:05:55.400 | But this really comes out of the box with like coherence models and APIs. You don't have to do anything additional as a developer to build, get citations, which is very important for any RAG-based application.

00:06:10.400 | On the multilingual front, we have one of the best performance when it comes to the Flores multilingual evaluation. And we also have a bit of a secret sauce with our tokenizer, which helps keep costs really low.

00:06:24.400 | Right. And that's again, TCO is again a very big thing for enterprises and it allows our customers to take that same model, deploy across the globe on with their customers, which is very important.

00:06:41.400 | So, switching gears towards the embeddings models, again, given Nils and his team have been innovators in this space for a while now, and we've done quite a bit of work to make sure our performance is great on noisy data and at a super low cost in this particular space.

00:07:02.400 | So, we're actually pretty excited about what our embeddings models can do and this is almost always like one of the top things that our customers are excited about.

00:07:13.400 | But embeddings is a pretty complex space and not without its challenge. So, we try to build this like fun demo where we took all of the archive papers and we asked it a question.

00:07:26.400 | When was the attention paper published by Aidan Gomez, who's our founder, and we tried this across like a bunch of like embeddings model.

00:07:35.400 | So, some common patterns that we see is archive is a great example of like where you have different kinds of like data, right?

00:07:43.400 | You have like the title, when was the paper published, the dates, you have the various authors, you have like the actual paper itself, and in many ways this represents the kind of data you might see in enterprises, right?

00:07:55.400 | So, when you actually build the embeddings for this, you get a fairly complex like vector space and your search queries might not actually like map neatly to this, right?

00:08:07.400 | And this is sort of like where our re-ranker comes in. So, what our re-ranker does is once you have all of these retrieved documents, it's a cross-encoder that helps you re-rank the retrieved set and make sure that that's the one that you send into your context with the generative model.

00:08:26.400 | And here's the re-ranker in action. So, what this demo is showing you is we search for the transformer paper by Aidan.

00:08:37.400 | So, you have three different types of like search patterns over here: a lexical search, an embeddings-based search, and the cohere re-rank-based search, right?

00:08:47.400 | And the various forms of these like retrievals obviously give you the responses, but they are stacked rank at different places in the retrieval set,

00:08:55.400 | which means the overall accuracy of your rag system might be low.

00:09:00.400 | And re-rank is what's helping you to make sure that isn't the case.

00:09:04.400 | This builds on top of like other things that people care about like chunking strategies, but making those more optimal.

00:09:11.400 | Another impact of this re-ranker is again total cost of operation because most of the expense for your models is coming in from the input tokens, right?

00:09:21.400 | And if you were able to like narrow in to the right context and do that quickly, you could pass in very minimal amount of context to your large language model,

00:09:33.400 | which drives down your overall cost of like operation. And that is again very important in the enterprise setting.

00:09:40.400 | Yeah, and when it comes to deployment options, like I said, we have our SaaS API that we can help you manage run your workloads.

00:09:52.400 | But then we're also on all of the major cloud AI services, SageMaker, Bedrock, OCI, and private deployment across all of these cloud providers and also on premise deployments if that's something you care about.

00:10:08.400 | So security and privacy obviously is a pretty top of mind for us. So we make sure we're compliant with the standards that our customers are often asking us for.

00:10:20.400 | And then the last bit is just enabling like developers. So we obviously have a pretty tight integration with things like Langchain, Lama Index, but we also have an open source like toolkit that comes out of the box with various forms of like connectors.

00:10:37.400 | And that lets you ingest data pretty easily into your systems and lets you have full control over the things you're building and don't have to really look for a ton of different options as you're developing your enterprise applications.

00:10:55.400 | So that's it from me and I'd love to take any questions or chat. And we also have Sandra here from Coher, so she'll be happy to help.

00:11:06.400 | Thank you very much. Yeah. Maybe two questions. So on the first or second slide, you show your investors and the selected partners, right?

00:11:13.400 | Yeah. So Accenture, so McKinsey. Yeah. Can you explain a little bit like how that partnership work, right?

00:11:20.400 | Yeah. And then the second question, like you can skip if someone else like has another one is, can you like maybe without disclosing customers and all, give us like a few samples of where you clients picked up Coher versus other solutions and kind of work?

00:11:35.400 | So explain where you win right on the enterprise world.

00:11:39.400 | Yeah, absolutely. Happy to chat about that. So I think the typical challenge with all of these enterprise like generative AI models is the last mile challenge, right?

00:11:51.400 | So there's so many different like arcs of customization that's needed with the enterprises.

00:11:56.400 | And there's a lot of like traditional like players who've been around for a while and have great relationships with the enterprises, have a deep understanding of like the various like business domains.

00:12:10.400 | That's where the McKinsey and Accenture and all of these companies come in. So they've been really helpful for us to go develop like the product, like make sure that we're able to effectively bridge that like last mile gap with them.

00:12:24.400 | And yeah, that hopefully that helps. And then onto your second question, I would say in terms of like winnability, I think like the main aspects that has been resonating a lot with our customers is this control over the data and control over the compute, right?

00:12:42.400 | Given we're available pretty much everywhere. A lot of like the customers care about that private cloud deployment, the ability to fine tune in that private cloud environment, which is pretty big for a lot of people and making sure that their enterprise data does not leave their own ecosystem.

00:13:00.400 | And specific examples for that have been companies in let's say HR or healthcare or even folks who are trying to take their in-house like code and build a custom model that's working with their code base.

00:13:15.400 | Those are the styles of applications that we've seen a lot of like impact and success with.

00:13:27.400 | So I'm curious for text classification. Well, what is the latest best practice?

00:13:33.400 | I see on your website, you have a classified endpoint where you build a classifier. Is that still the recommendation?

00:13:39.400 | Yeah, that's a great question. The way I like to think about these things is there's always the arc of like, and I think like Jerry in his earlier talk to this, right, which was what phase of like development you're in.

00:13:54.400 | If you're trying to like prototype or you're trying to productionize, and what's your scale of like a production setting, which is important to consider for these things.

00:14:04.400 | If you're just trying to get off the ground like quickly, I would say just using the generative model off the shelf is obviously always great.

00:14:13.400 | It gets you off the ground really quickly. When you're trying to like productionize something, that's when I start thinking, hey, do I need a bespoke model?

00:14:21.400 | Or like the general model is good enough? And what are sort of like the cost of operation like differences and like the cost of like maintenance differences and also the scale, right?

00:14:31.400 | Like I mean, if you're going to try to do something that's, you know, tens of thousands like TPS, then having a purpose built like model for that is the route that go versus, you know, a more heavy general model, which might serve other needs.

00:14:47.400 | So yeah, both of those are good options depending on what you're trying to accomplish.

00:14:52.400 | Just a quick follow up. If we actually train a classifier with you, is that also a transformer model, just more specialized?

00:14:59.400 | Yeah, exactly.

00:15:00.400 | Yeah, got it. Thank you.

00:15:02.400 | No, I think we're done. Oh, no, we've got one more question. Here we go.

00:15:09.400 | Just a quick question. When we were looking at Cohere, you know, probably early last or middle of last year, one of the challenges with the models that we found were the input context size limit were quite small.

00:15:21.400 | How has that evolved as you guys have sort of created the next, you know, sets of models?

00:15:31.400 | Yeah, that's a great question. So our latest generation models are fairly competitive 128K context input windows. And we're constantly looking to figure out how to like up them. So context window, I would say should be not a problem for like most applications at the moment. Yeah.

00:15:49.400 | cool, we're actually slightly early

00:15:55.580 | but yeah, I'd like to thank you

00:15:57.680 | for giving the talk, thank you for that

00:15:58.740 | absolutely, thank you so much

00:15:59.760 | thank you

00:16:00.620 | yeah, thank you

00:16:01.760 | thank you