back to indexOpen sourcing the AI ecosystem ft. Arthur Mensch of Mistral AI and Matt Miller
00:00:03.200 |
I'm excited to introduce our first speaker, Arthur 00:00:11.440 |
Despite just being nine months old as a company 00:00:17.600 |
of the large foundation model companies so far, 00:00:20.040 |
I think they've really shocked everybody by putting out 00:00:22.360 |
incredibly high quality models approaching GPT-4 and Calibre 00:00:26.800 |
So we're thrilled to have Arthur with us today, 00:00:30.880 |
more about the opportunity behind building an open source. 00:00:35.160 |
And please-- interviewing Arthur will be my partner, 00:00:38.440 |
Matt Miller, who is dressed in his best French wear 00:00:41.400 |
to honor Arthur today and helps lead our efforts in Europe. 00:00:53.640 |
With all the efficiency of a French train, right? 00:01:22.640 |
at DeepMind, your work on the Chinchilla paper. 00:01:27.560 |
love to hear at Sequoia, and I know that our founder community 00:01:32.160 |
the idea to launch and to start to break out and start 00:01:39.760 |
but I guess the idea was out there for a couple of months 00:01:50.120 |
And we had been in the field for 10 years doing research. 00:01:59.840 |
occurred between academic labs, industrial labs, 00:02:03.960 |
and how everybody was able to build on top of one another. 00:02:29.040 |
doing important changes to the way we train models 00:02:42.560 |
that the field stopped doing open contributions 00:02:48.120 |
that early in the AI journey because we were very 00:02:53.160 |
And so when we saw Chad GPT at the end of the year-- 00:03:00.480 |
that there was some opportunity for doing things differently, 00:03:14.600 |
was an opportunity for building very strong open source 00:03:17.800 |
models, going very fast with a lean team of experienced people, 00:03:22.600 |
and try to correct the direction that the field was taking. 00:03:27.280 |
So we wanted to push the open source models much more. 00:03:33.060 |
because we've been followed by various companies 00:03:39.360 |
And so it was really a lot of the open source movement 00:03:42.000 |
was a lot of the drive behind starting the company. 00:03:50.480 |
our intention and the mission that we gave ourselves 00:03:52.720 |
is really to bring AI to the hands of every developer. 00:03:57.280 |
it is still done by our competitors is very close. 00:04:00.400 |
And so we want to push a much more open platform. 00:04:04.680 |
and accelerate the adoption through that strategy. 00:04:14.720 |
And just recently, I mean, fast forward to today, 00:04:19.520 |
You've been on this tear of amazing partnerships 00:04:22.120 |
with Microsoft, Snowflake, Databricks, announcers. 00:04:30.280 |
And how you're going to think about the trade-off? 00:04:33.160 |
Because that's something that many open source companies 00:04:38.320 |
But then how do they also build a successful business 00:04:51.160 |
So that kind of puts a pressure on the open source family, 00:04:54.320 |
because there's obviously some contenders out there. 00:04:58.840 |
I think compared to how various software providers playing 00:05:02.280 |
this strategy developed, we need to go faster. 00:05:06.200 |
Because AI develops actually faster than software. 00:05:12.480 |
And this is a good example of what we could do. 00:05:21.080 |
And we are constantly thinking on how we should contribute 00:05:23.880 |
to the community, but also how we should show and start 00:05:27.720 |
getting some commercial adoption, enterprise deals, 00:05:34.160 |
And for now, I think we've done a good job at doing it. 00:05:36.600 |
But it's a very dynamic thing to think through. 00:05:41.560 |
of what we should release next on both families. 00:05:44.720 |
And you have been the fastest in developing models, 00:05:48.600 |
fastest reaching different benchmarking levels, 00:05:51.800 |
one of the most leanest in amount of expenditure 00:05:54.560 |
to reach these benchmarks out of any of the foundational model 00:05:59.160 |
What do you think is giving you that advantage 00:06:12.200 |
Machine learning has always been about crunching numbers, 00:06:15.820 |
looking at your data, doing a lot of extract, transform, 00:06:19.600 |
and load, and things that are oftentimes not fascinating. 00:06:23.440 |
And so we hire people that were willing to do that stuff. 00:06:27.800 |
And I think that has been critical to our speed. 00:06:32.240 |
And that's something that we want to keep up. 00:06:41.200 |
When would you tell people that they should spend their time 00:06:44.680 |
When would you tell them working on the large models? 00:06:46.520 |
And where do you think the economic opportunity 00:06:49.840 |
Is it in doing more of the big or doing more of the small? 00:07:10.220 |
And some should be low latency, because they don't 00:07:16.820 |
And an efficient application should leverage both of them, 00:07:20.040 |
potentially using the large models as an orchestrator 00:07:28.880 |
So you end up with a system that is not only a model, 00:07:31.580 |
but it's really two models plus an outer loop 00:07:33.720 |
of calling your model, calling systems, calling functions. 00:07:43.580 |
make sure that this works, that you can evaluate it properly? 00:07:46.340 |
How do you make sure that you can do continuous integration? 00:07:50.520 |
how do you move from one version to another of a model 00:07:52.720 |
and make sure that your application has actually 00:07:56.860 |
So all of these things are addressed by various companies. 00:08:01.900 |
think should be core to our value proposition. 00:08:04.980 |
And what are some of the most exciting things 00:08:08.800 |
What are the things that you get really excited about, 00:08:11.040 |
that you see the community doing or customers doing? 00:08:13.280 |
I think pretty much every young startup in the Bay Area 00:08:22.720 |
So really, I think part of the value of Mistral, for instance, 00:08:27.240 |
And so you can make applications that are more involved. 00:08:31.240 |
And so we've seen web search companies using us. 00:08:35.740 |
We've seen all of the standard enterprise stuff 00:08:39.900 |
as well, like knowledge management, marketing. 00:08:45.940 |
means that you can pour in your editorial tone much more. 00:08:48.940 |
So that's-- yeah, we see the typical use cases. 00:08:55.580 |
or the open source part is that developers have control, 00:09:02.100 |
because they can use their dedicated instances, 00:09:06.180 |
And they can modify the weights to suit their needs 00:09:10.960 |
is close to the largest ones, the largest models, 00:09:17.820 |
think that we're going to get to see from you guys? 00:09:19.660 |
Can you give us a sneak peek of what might be coming soon, 00:09:24.980 |
So we have-- so Mistral-Large was good, but not good enough. 00:09:28.780 |
So we are working on improving it quite heavily. 00:09:35.140 |
on various vertical domains that we'll be announcing very soon. 00:09:40.660 |
We have-- the platform is currently just APIs, 00:09:45.340 |
And so we are working on making customization part of it, 00:09:50.940 |
And obviously, and I think as many other companies, 00:09:59.700 |
Because as a European company, we're also well-positioned. 00:10:08.220 |
And then, yeah, eventually, in the months to come, 00:10:18.300 |
As you mentioned, many of the people in this room 00:10:30.260 |
And what's the best way for them to work with you? 00:10:37.620 |
that are really pushing the community forward, 00:10:45.140 |
to showcase what you can build with Mistral models. 00:10:52.900 |
Something that basically makes the model better 00:10:56.020 |
and that we are trying to set up is our ways for us 00:11:00.340 |
to get evaluations, benchmarks, actual use cases on which we 00:11:07.980 |
are building with our model is also a way for us 00:11:09.940 |
to make a better generation of new open source models. 00:11:13.620 |
And so please engage with us to discuss how we can help, 00:11:21.380 |
We can also gather some insight of the new evaluations 00:11:27.020 |
to verify that our models are getting better over time. 00:11:38.660 |
They're also available on various cloud providers 00:11:40.860 |
so that it facilitates adoption for enterprises. 00:11:44.500 |
And customization capabilities like fine tuning, 00:11:46.620 |
which really made the value of the open source models, 00:11:51.420 |
And you talked a little bit about the benefits 00:11:59.140 |
of the great innovations that can come from Europe 00:12:03.460 |
Talk a little bit more about the advantages of building 00:12:24.260 |
that we can train in like three months and get them up to speed, 00:12:27.260 |
get them basically producing as much as a million dollar 00:12:32.020 |
engineer in the Bay Area for 10 times the cost. 00:12:43.300 |
Like the workforce is very good, engineers and machine 00:12:50.020 |
Generally speaking, we have a lot of support from the state, 00:12:53.580 |
which is actually more important in Europe than in the US. 00:13:16.100 |
is actually probably the strongest French model out 00:13:24.300 |
that are geographical and that we're leveraging. 00:13:27.180 |
And paint the picture for us five years from now. 00:13:29.540 |
Like I know that this world's moving so fast. 00:13:39.580 |
But five years from now, where does Mistral sit? 00:13:49.620 |
and the infrastructure of artificial intelligence 00:13:54.700 |
And based on that, we'll be able to create assistance and then 00:14:00.820 |
And we believe that we can become this platform 00:14:06.940 |
by being independent from cloud providers, et cetera. 00:14:09.420 |
So in five years from now, I have literally no idea 00:14:17.160 |
I don't think you could bet on where we would be today. 00:14:19.660 |
But we are evolving toward more and more autonomous agents. 00:14:24.100 |
I think the way we work is going to be changed profoundly. 00:14:31.460 |
So right now, we're focusing on the developer world. 00:14:33.740 |
But I expect that AI technology is, in itself, 00:14:39.580 |
so easily controllable through human languages, 00:14:43.660 |
human language that potentially, at some point, 00:14:58.000 |
this will be something that you learn to do at school. 00:15:04.760 |
Just want to open up in case there's any questions 00:15:14.120 |
versus commercial models playing out for your company? 00:15:16.440 |
I think you made a huge splash with open source at first. 00:15:18.820 |
As you mentioned, some of the commercial models 00:15:28.480 |
open models with a sustainable business model 00:15:30.800 |
to actually fuel the development of the next generation. 00:15:42.880 |
need to stay the best at producing open source 00:15:45.280 |
models, at least on some part of the spectrum. 00:15:52.400 |
that sets the constraints of whatever we can do. 00:16:01.320 |
is really our mission, and we'll keep doing it. 00:16:06.820 |
There's got to be questions from more than just 00:16:11.080 |
Can you talk to us a little bit about Llama3 and Facebook 00:16:14.960 |
and how you think about competition with them? 00:16:17.360 |
Well, Llama3 is working on, I guess, making models. 00:16:30.520 |
But generally, the good thing about open source 00:16:33.100 |
is that it's never too much of a competition, because once you 00:16:39.120 |
normally that should actually benefit to everybody. 00:16:53.440 |
is the partnerships with Snowflakes and Databricks, 00:16:56.260 |
for example, and running natively in their clouds, 00:17:02.040 |
Curious if you can talk about why you did those deals, 00:17:05.280 |
and then also what you see as the future of, say, 00:17:07.760 |
Databricks or Snowflake in the brave new MLM world. 00:17:12.800 |
But I think, generally speaking, AI models become very strong 00:17:16.680 |
if they are connected to data and grounding information. 00:17:23.840 |
is oftentimes either on Snowflake or on Databricks, 00:17:32.560 |
able to deploy the technology exactly where their data is 00:17:38.520 |
I expect that this will continue being the case, 00:17:44.440 |
especially as, I believe, we'll move on to more 00:17:47.920 |
So today, we deploy serverless APIs with not much state. 00:17:55.560 |
But as we go forward and as we make models more and more 00:17:58.040 |
specialized, as we make them more tuned to use cases, 00:18:06.600 |
And those could actually be part of the data cloud. 00:18:09.880 |
So there's an open question of where do you put the AI state. 00:18:14.840 |
my understanding is that Snowflake and Databricks 00:18:19.840 |
- And I think there was a question right behind him, 00:18:24.280 |
- I'm curious where you draw the line between openness 00:18:32.840 |
about how you train the models, the recipe for how you collect 00:18:35.600 |
the data, how you do mixture of experts training? 00:18:37.720 |
Or do you draw the line at, like, we release the weights 00:18:51.160 |
there is in between having some form of revenue 00:18:56.000 |
And there's also a tension between what you actually 00:19:05.680 |
and not to give your recipe to your competitors. 00:19:14.760 |
Like, if everybody starts doing it, then we could do it. 00:19:17.920 |
But for now, we are not taking this risk, indeed. 00:19:22.720 |
- I'm curious, when another company releases weights 00:19:29.800 |
and you only see the weights, what kinds of practices 00:19:33.400 |
do you guys do internally to see what you can learn from it? 00:19:36.800 |
- You can't learn a lot of things from weights. 00:19:48.920 |
- I guess they are using, like, a mixture of experts, 00:19:51.760 |
pretty standard setting, with a couple of tricks 00:20:11.600 |
and it compresses information sufficiently highly 00:20:14.240 |
so that you can't really find out what's going on. 00:20:33.400 |
Is, like, you guys going to still go on the small? 00:20:35.840 |
Or, yeah, going to go with the larger ones, basically? 00:20:39.000 |
- So model size are kind of set by, like, scaling loads. 00:20:43.200 |
So it depends on, like, the compute you have. 00:20:45.760 |
Based on the compute you have, based on the learning 00:20:48.640 |
infrastructure you want to go to, you make some choices. 00:20:51.800 |
And so you optimize for training cost and for inference cost. 00:21:01.480 |
depends on the weight that you put on the training cost 00:21:05.920 |
The more you amortize it, the more you can compress models. 00:21:19.880 |
that goes from the small ones to the very large ones. 00:21:32.280 |
So, for example, when AI released the custom GPTs 00:21:35.480 |
and the assistance API, is that the direction 00:21:37.840 |
that you think that Mistral will take in the future? 00:21:48.280 |
is pretty thin in between developers and users 00:21:51.200 |
So that's the reason why we released an assistant 00:21:54.840 |
demonstrator called Le Chat, which is the cat in English. 00:21:57.920 |
And the point here is to expose it to enterprises as well 00:22:08.760 |
I think that answers some need from our customers 00:22:14.120 |
that many of the people we've been talking to 00:22:24.400 |
And then if you don't have an integrator at hand, 00:22:32.200 |
and show them what they could build for their core business. 00:22:34.740 |
So that's the reason why we now have two product offerings. 00:22:38.780 |
and then we have Le Chat, which should evolve into an enterprise 00:22:47.800 |
Just wondering, where would you be drawing the line 00:22:53.680 |
Because a lot of my friends and our customers 00:23:02.800 |
that is hard to solve from a product standpoint. 00:23:20.220 |
And so right now, this is still a bit manual. 00:23:22.600 |
You go and you have several versions of prompting. 00:23:26.120 |
But this is something that actually AI can help solving. 00:23:29.660 |
And I expect that this is going to grow more and more 00:23:33.940 |
And this is something that we'd love to try and enable. 00:23:38.580 |
- I wanted to ask a bit more of a personal question. 00:23:43.700 |
how do you balance your time between explore and exploit? 00:23:46.140 |
How do you yourself stay on top of a field that's 00:23:48.460 |
rapidly evolving and becoming larger and deeper every day? 00:23:56.340 |
I mean, we explore on the science part, on the product part, 00:24:00.700 |
And the way you balance it is effectively hard. 00:24:12.060 |
are working on the next generation of models. 00:24:19.380 |
And this is very true for the product side as well. 00:24:26.900 |
and see how they pick up is something that we need to do. 00:24:36.460 |
So yeah, the balance between exploitation and exploration 00:24:41.580 |
is something that we master well at the science level 00:24:46.060 |
And somehow, it transcribes into the product and the business. 00:24:53.740 |
- So one more question from me, and then I think we'll be done. 00:25:01.780 |
models small that have taken the world by storm, 00:25:07.220 |
just tremendous momentum at the center of the AI ecosystem. 00:25:14.700 |
at which you have achieved is truly extraordinary. 00:25:19.260 |
here who are at different levels of starting and running 00:25:39.220 |
is basically waking up every day and figuring out 00:25:42.420 |
that you need to build everything from scratch 00:25:51.740 |
And so I would recommend to be quite ambitious, usually.