Open sourcing the AI ecosystem ft. Arthur Mensch of Mistral AI and Matt Miller

I'm excited to introduce our first speaker, Arthur from Mistral. Arthur is the founder and CEO of Mistral AI. Despite just being nine months old as a company and having many fewer resources than some of the large foundation model companies so far, I think they've really shocked everybody by putting out incredibly high quality models approaching GPT-4 and Calibre out into the open.

So we're thrilled to have Arthur with us today, all the way from France, to share more about the opportunity behind building an open source. And please-- interviewing Arthur will be my partner, Matt Miller, who is dressed in his best French wear to honor Arthur today and helps lead our efforts in Europe.

So please welcome Matt and Arthur. With all the efficiency of a French train, right? Just-- Right on time. Right on time. We were sweating a little bit back there because you just walked in the door. But good to see you. Thanks for coming all this way. Thanks for being with us here at AISN today.

Thank you for hosting us. Yeah, absolutely. We'd love to maybe start with the background story of why you chose to start Mistral. And maybe just take us to the beginning. We all know about your successful career at DeepMind, your work on the Chinchilla paper. Tell us, maybe share with us-- we always love to hear at Sequoia, and I know that our founder community also loves to hear that spark that gave you the idea to launch and to start to break out and start your own company.

Yeah, sure. So we started the company in April, but I guess the idea was out there for a couple of months before. Timothée and I were in master together. Guillaume and I were in school together. So we knew each other from before. And we had been in the field for 10 years doing research.

And so we loved the way AI progressed because of the open exchanges that occurred between academic labs, industrial labs, and how everybody was able to build on top of one another. And it was still the case, I guess, even in the beginning of the LLM era, where OpenAI and DeepMind were actually contributing to one another roadmap.

And this kind of stopped in 2022. So basically, one of the last papers doing important changes to the way we train models was Chinchilla. And that was the last model that Google ever published, last important model in the field that Google published. And so for us, it was a bit of a shame that the field stopped doing open contributions that early in the AI journey because we were very far away from finishing it.

And so when we saw Chad GPT at the end of the year-- and I think we reflected on the fact that there was some opportunity for doing things differently, for doing things from France. Because in France, as it turned out, there was a lot of talented people that were a bit bored in big tech companies.

And so that's how we figured out that there was an opportunity for building very strong open source models, going very fast with a lean team of experienced people, and try to correct the direction that the field was taking. So we wanted to push the open source models much more.

And I think we did a good job at that because we've been followed by various companies in our trajectory. Wonderful. And so it was really a lot of the open source movement was a lot of the drive behind starting the company. Yeah, that was one of the driver-- our intention and the mission that we gave ourselves is really to bring AI to the hands of every developer.

And the way it was done and the way it is still done by our competitors is very close. And so we want to push a much more open platform. And we want to spread the adoption and accelerate the adoption through that strategy. So that's very much at the core-- the reason why we started the company.

Wonderful. And just recently, I mean, fast forward to today, you released Mistral Large. You've been on this tear of amazing partnerships with Microsoft, Snowflake, Databricks, announcers. So how do you balance what you're going to do open source with what you're going to do commercially? And how you're going to think about the trade-off?

Because that's something that many open source companies contend with. How do they keep their community thriving? But then how do they also build a successful business to contribute to their community? Yeah, it's a hard question. And the way we've addressed it is currently through two families of model. But this might evolve with time.

We intend to stay the leader in open source. So that kind of puts a pressure on the open source family, because there's obviously some contenders out there. I think compared to how various software providers playing this strategy developed, we need to go faster. Because AI develops actually faster than software.

It develops faster than databases. MongoDB played a very good game at that. And this is a good example of what we could do. But we need to adapt faster. So yeah, there's obviously this tension. And we are constantly thinking on how we should contribute to the community, but also how we should show and start getting some commercial adoption, enterprise deals, et cetera.

And there's obviously a tension. And for now, I think we've done a good job at doing it. But it's a very dynamic thing to think through. So it's basically every week we think of what we should release next on both families. And you have been the fastest in developing models, fastest reaching different benchmarking levels, one of the most leanest in amount of expenditure to reach these benchmarks out of any of the foundational model companies.

What do you think is giving you that advantage to move quicker than your predecessors and more efficiently? I think we like to get our hands dirty. Machine learning has always been about crunching numbers, looking at your data, doing a lot of extract, transform, and load, and things that are oftentimes not fascinating.

And so we hire people that were willing to do that stuff. And I think that has been critical to our speed. And that's something that we want to keep up. Awesome. And in addition to the large model, you also have several small models that are extremely popular. When would you tell people that they should spend their time working with you on the small models?

When would you tell them working on the large models? And where do you think the economic opportunity for Mistral lies? Is it in doing more of the big or doing more of the small? And I think this is an observation that every LLM provider has made, that one size does not fit all.

And depending on what you want to-- when you make an application, you typically have different large language model calls. And some should be low latency, because they don't require a lot of intelligence. But some should be higher latency and require more intelligence. And an efficient application should leverage both of them, potentially using the large models as an orchestrator for the small ones.

And I think the challenge here is, how do you make sure that everything works? So you end up with a system that is not only a model, but it's really two models plus an outer loop of calling your model, calling systems, calling functions. And I think some of the developer challenges that we also want to address is, how do you make sure that this works, that you can evaluate it properly?

How do you make sure that you can do continuous integration? How do you change-- how do you move from one version to another of a model and make sure that your application has actually improved and not deteriorated? So all of these things are addressed by various companies. But these are also things that we think should be core to our value proposition.

And what are some of the most exciting things you see being built on Mistral? What are the things that you get really excited about, that you see the community doing or customers doing? I think pretty much every young startup in the Bay Area has been using it for fine-tuning purposes, for fast application making.

So really, I think part of the value of Mistral, for instance, is that it's very fast. And so you can make applications that are more involved. And so we've seen web search companies using us. We've seen all of the standard enterprise stuff as well, like knowledge management, marketing. The fact that you have access to the weights means that you can pour in your editorial tone much more.

So that's-- yeah, we see the typical use cases. I think the-- but the value is that-- or the open source part is that developers have control, so they can deploy it everywhere. They can have very high quality of service because they can use their dedicated instances, for instance. And they can modify the weights to suit their needs and to bump the performance to a level which is close to the largest ones, the largest models, while being much cheaper.

And what's the next big thing do you think that we're going to get to see from you guys? Can you give us a sneak peek of what might be coming soon, or what we should be expecting from Mistral? Yeah, for sure. So we have-- so Mistral-Large was good, but not good enough.

So we are working on improving it quite heavily. We have interesting open source models on various vertical domains that we'll be announcing very soon. We have-- the platform is currently just APIs, like serverless APIs. And so we are working on making customization part of it, so the fine tuning part.

And obviously, and I think as many other companies, we're heavily betting on multilingual data and multilingual model. Because as a European company, we're also well-positioned. And this is a demand of our customers that I think is higher than here. And then, yeah, eventually, in the months to come, we will also release some multimodal models.

OK, exciting. We'll look forward to that. As you mentioned, many of the people in this room are using Mistral models. Many of the companies we work with every day here in the Silicon Valley ecosystem are already working with Mistral. How should they work with you? And how should they work with the company?

And what's the best way for them to work with you? Well, they can reach out. So we have some developer relations that are really pushing the community forward, making guides, also gathering use cases to showcase what you can build with Mistral models. So this is-- we're very investing a lot on the community.

Something that basically makes the model better and that we are trying to set up is our ways for us to get evaluations, benchmarks, actual use cases on which we can evaluate our models on. And so having a mapping of what people are building with our model is also a way for us to make a better generation of new open source models.

And so please engage with us to discuss how we can help, how-- discuss your use cases. We can advertise it. We can also gather some insight of the new evaluations that we should add to our evaluation suit to verify that our models are getting better over time. And on the commercial side, our models are available on our platform.

So the commercial models are actually working better than the open source ones. They're also available on various cloud providers so that it facilitates adoption for enterprises. And customization capabilities like fine tuning, which really made the value of the open source models, are actually coming very soon. Wonderful. And you talked a little bit about the benefits of being in Europe.

You touched on it briefly. You're already this example, global example, of the great innovations that can come from Europe and are coming from Europe. Talk a little bit more about the advantages of building a business from France and building this company from Europe. The advantage and drawbacks, I guess.

Yeah, both, both. I guess one advantage is that you have a very strong junior pool of talent. So there's a lot of people coming from masters in France, in Poland, in the UK that we can train in like three months and get them up to speed, get them basically producing as much as a million dollar engineer in the Bay Area for 10 times the cost.

So that's kind of efficient. Shh, don't tell them all that. They're going to all hire people in France. I'm sure. Like the workforce is very good, engineers and machine learning engineers. Generally speaking, we have a lot of support from the state, which is actually more important in Europe than in the US.

They tend to over-regulate a bit too fast. So we've been telling them not to, but they don't always listen. And then generally, I mean, yeah, like European companies like to work with us because we're European and we are better in European languages, as it turns out. Like French, the French Mistral Lodge is actually probably the strongest French model out there.

So yeah, I guess that's not an advantage, but at least there's a lot of opportunities that are geographical and that we're leveraging. Wonderful. And paint the picture for us five years from now. Like I know that this world's moving so fast. I mean, just think of all the things you've gone through in the two years.

It's not even two years old as a company. It's almost two years old as a company. But five years from now, where does Mistral sit? What do you think you have achieved? What does this landscape look like? So our bet is that basically the platform and the infrastructure of artificial intelligence will be open.

And based on that, we'll be able to create assistance and then potentially autonomous agent. And we believe that we can become this platform by being the most open platform out there, by being independent from cloud providers, et cetera. So in five years from now, I have literally no idea of what this is going to look like.

If you looked at the field in 2019, I don't think you could bet on where we would be today. But we are evolving toward more and more autonomous agents. We can do more and more tasks. I think the way we work is going to be changed profoundly. And making such agents and assistants is going to be easier and easier.

So right now, we're focusing on the developer world. But I expect that AI technology is, in itself, so easily controllable through human languages, human language that potentially, at some point, the developer becomes the user. And so we are evolving toward any user being able to create its own assistant or its own autonomous agent.

I'm pretty sure that in five years from now, this will be something that you learn to do at school. Awesome. Well, we have about five minutes left. Just want to open up in case there's any questions from the audience. Don't be shy. Sonia's got a question. How do you see the future of open source versus commercial models playing out for your company?

I think you made a huge splash with open source at first. As you mentioned, some of the commercial models are even better now. How do you imagine that plays out over the next handful of years? Well, I guess the one thing we optimize for is to be able to continuously produce open models with a sustainable business model to actually fuel the development of the next generation.

And so I think that, as I've said, this is going to evolve with time. But in order to stay relevant, we need to stay the best at producing open source models, at least on some part of the spectrum. So that can be the small models, that can be the very big models.

And so that's very much something that sets the constraints of whatever we can do. Staying relevant in the open source world, staying the best solution for developers is really our mission, and we'll keep doing it. David? There's got to be questions from more than just the Sequoia partners, guys.

Come on. Can you talk to us a little bit about Llama3 and Facebook and how you think about competition with them? Well, Llama3 is working on, I guess, making models. I'm not sure they will be open source. I have no idea of what's going on there. So far, I think we've been delivering faster and smaller models.

So we expect to be continuing doing it. But generally, the good thing about open source is that it's never too much of a competition, because once you have, like, if you have several actors, normally that should actually benefit to everybody. And so there should be some-- if they turn out to be very strong, there will be some cross-pollination, and we'll welcome it.

One thing that's made you guys different from other proprietary model providers is the partnerships with Snowflakes and Databricks, for example, and running natively in their clouds, as opposed to just having API connectivity. Curious if you can talk about why you did those deals, and then also what you see as the future of, say, Databricks or Snowflake in the brave new MLM world.

I guess you should ask them. But I think, generally speaking, AI models become very strong if they are connected to data and grounding information. As it turns out, the enterprise data is oftentimes either on Snowflake or on Databricks, or sometimes on AWS. And so being able for customers to be able to deploy the technology exactly where their data is is, I think, quite important.

I expect that this will continue being the case, especially as, I believe, we'll move on to more stateful AI deployment. So today, we deploy serverless APIs with not much state. It's really like lambda functions. But as we go forward and as we make models more and more specialized, as we make them more tuned to use cases, and as we make them self-improving, you will have to manage state.

And those could actually be part of the data cloud. So there's an open question of where do you put the AI state. And I think that's-- my understanding is that Snowflake and Databricks would like it to be on their data cloud. - And I think there was a question right behind him, the gray switch.

- I'm curious where you draw the line between openness and proprietary. So you're releasing the weights. Would you also be comfortable sharing more about how you train the models, the recipe for how you collect the data, how you do mixture of experts training? Or do you draw the line at, like, we release the weights and the rest is proprietary?

- That's where we draw the line. And I think the reason for that is that it's a very competitive landscape. And so it's similar to the tension there is in between having some form of revenue to sustain the next generation. And there's also a tension between what you actually disclose and everything that-- yeah, in order to stay ahead of the curve and not to give your recipe to your competitors.

And so, again, this is a moving line. There's also some game theory at stake. Like, if everybody starts doing it, then we could do it. But for now, we are not taking this risk, indeed. - I'm curious, when another company releases weights for a model like Grok, for example, and you only see the weights, what kinds of practices do you guys do internally to see what you can learn from it?

- You can't learn a lot of things from weights. We don't even look at it. It's actually too big for us to deploy. Grok is quite big. - Or was there any architecture learning? - I guess they are using, like, a mixture of experts, pretty standard setting, with a couple of tricks that I knew about, actually.

Yeah, there's not a lot of things to learn from the recipe themselves by looking at the weights. You can try to infer things, but that's-- like, reverse engineering is not that easy. It's basically compressing information, and it compresses information sufficiently highly so that you can't really find out what's going on.

- The cube is coming. - It's OK. Yeah, I'm just curious about, like, what are you guys going to focus on? The model sizes, your opinions on that. Is, like, you guys going to still go on the small? Or, yeah, going to go with the larger ones, basically? - So model size are kind of set by, like, scaling loads.

So it depends on, like, the compute you have. Based on the compute you have, based on the learning infrastructure you want to go to, you make some choices. And so you optimize for training cost and for inference cost. And then there's obviously-- there's the weights in between, like, for-- depends on the weight that you put on the training cost amortization.

The more you amortize it, the more you can compress models. But basically, our goal is to be low latency and to be relevant on the reasoning front. So that means having a family of model that goes from the small ones to the very large ones. - Hi. Are there any plans for Mistral to expand into the application stack?

So, for example, when AI released the custom GPTs and the assistance API, is that the direction that you think that Mistral will take in the future? - Yeah. So I think, as I've said, we're really focusing on the developer first. But there's many-- like, the frontier is pretty thin in between developers and users for this technology.

So that's the reason why we released an assistant demonstrator called Le Chat, which is the cat in English. And the point here is to expose it to enterprises as well and make them able to connect their data, connect their context. I think that answers some need from our customers that many of the people we've been talking to are willing to adopt the technology, but they need an entry point.

So if you just give them APIs, they're going to say, OK, but I need an integrator. And then if you don't have an integrator at hand, and oftentimes this is the case, it's good if you have an off-the-shelf solution at least to get them into the technology and show them what they could build for their core business.

So that's the reason why we now have two product offerings. The first one, which is the platform, and then we have Le Chat, which should evolve into an enterprise off-the-shelf solution. - More over there. Just wondering, where would you be drawing the line between stop doing prompt engineering and start doing fine-tuning?

Because a lot of my friends and our customers are suffering from where they should be, stop doing more prompt engineering. - I think that's the number one pain point that is hard to solve from a product standpoint. The question is, normally your workflow should be what should you evaluate on.

And based on that, have your model find out a way of solving your task. And so right now, this is still a bit manual. You go and you have several versions of prompting. But this is something that actually AI can help solving. And I expect that this is going to grow more and more automatically across time.

And this is something that we'd love to try and enable. - I wanted to ask a bit more of a personal question. As a founder in the cutting edge of AI, how do you balance your time between explore and exploit? How do you yourself stay on top of a field that's rapidly evolving and becoming larger and deeper every day?

How do you stay on top? - So I think this question has-- I mean, we explore on the science part, on the product part, and on the business part. And the way you balance it is effectively hard. For a startup, you do have to exploit a lot because you need to ship fast.

But on the science part, for instance, you have two or three people that are working on the next generation of models. And sometimes they lose time. But if you don't do that, you are at risk of becoming irrelevant. And this is very true for the product side as well.

So right now, we have a very simple product. But being able to try out new features and see how they pick up is something that we need to do. And on the business part, you never know who is actually quite mature enough to use your technology. So yeah, the balance between exploitation and exploration is something that we master well at the science level because we've been doing it for years.

And somehow, it transcribes into the product and the business. But I guess we are currently still learning to do it properly. - So one more question from me, and then I think we'll be done. We're out of time. But in the scope of two years, models big, models small that have taken the world by storm, killer go-to-market partnerships, just tremendous momentum at the center of the AI ecosystem.

What advice would you give to founders here? What you have achieved and the pace at which you have achieved is truly extraordinary. What advice would you give to people here who are at different levels of starting and running and building their own businesses around the AI opportunity? - I would say it's always day one.

So I guess, yeah, we are-- I mean, we got some mind share. But I mean, there's still many proof points that we need to establish. And so, yeah, like, being a founder is basically waking up every day and figuring out that you need to build everything from scratch every time, all the time.

So it's, I guess, a bit exhausting. But it's also exhilarating. And so I would recommend to be quite ambitious, usually. Being more ambitious-- I mean, ambition can get you very far. And so, yeah, you should dream big. That would be my advice. - Awesome. Thank you, Artur. Thanks for being with us today.

(applause) (audience applauds)

Open sourcing the AI ecosystem ft. Arthur Mensch of Mistral AI and Matt Miller

Transcript