Best of 2024: Open Models [LS LIVE! at NeurIPS 2024]

00:00:00.000 | [Music]

00:00:06.000 | All right.

00:00:07.000 | All right.

00:00:08.000 | Cool.

00:00:09.000 | Yeah, thanks for having me over.

00:00:12.000 | I'm Luca.

00:00:13.000 | I'm a research scientist at the Allen Institute for AI.

00:00:17.000 | I threw together a few slides on sort of like a recap

00:00:21.000 | of like interesting themes in Open Models for 2024.

00:00:26.000 | I have about maybe 20, 25 minutes of slides,

00:00:30.000 | and then we can chat if there are any questions.

00:00:33.000 | If I can advance to the next slide.

00:00:36.000 | Okay, cool.

00:00:37.000 | So I did the quick check of like to sort of get a sense

00:00:42.000 | of like how much 2024 was different from 2023.

00:00:46.000 | So I went out hugging face and sort of tried to get a picture

00:00:49.000 | of what kind of models were released in 2023

00:00:52.000 | and like what do we get in 2024.

00:00:55.000 | 2023, we got things like both LAMA 1 and 2.

00:00:59.000 | We got Mistral, we got NPT, Falcon models.

00:01:03.000 | I think the Yi model came at the tail end of the year.

00:01:05.000 | It was a pretty good year.

00:01:07.000 | But then I did the same for 2024,

00:01:10.000 | and it's actually quite stark difference.

00:01:15.000 | You have models that are, you know,

00:01:18.000 | reveling frontier level performance of what you can get

00:01:22.000 | from closed models from like Quen, from DeepSeek.

00:01:26.000 | We got LAMA 3, we got all sorts of different models.

00:01:30.000 | I added our own Olmo at the bottom.

00:01:34.000 | There's this growing group of like fully open models

00:01:37.000 | that I'm going to touch on a little bit later.

00:01:40.000 | But, you know, just looking at the slides,

00:01:42.000 | it feels like 2024 was just smooth sailing,

00:01:47.000 | happy news, much better than previous year.

00:01:50.000 | And, you know, you can plot,

00:01:52.000 | you can pick your favorite benchmark,

00:01:56.000 | or least favorite, I don't know,

00:01:58.000 | depending on what point you're trying to make,

00:02:00.000 | and plot, you know, your closed model, your open model,

00:02:05.000 | and sort of spin it in ways that show that,

00:02:08.000 | oh, you know, open models are much closer

00:02:11.000 | to where closed models are today

00:02:14.000 | versus last year where the gap was fairly significant.

00:02:20.000 | So one thing that I think,

00:02:24.000 | I don't know if I have to convince people in this room,

00:02:27.000 | but usually when I give this talks about like open models,

00:02:31.000 | there is always like this background question

00:02:33.000 | in people's mind of like, why should we use open models?

00:02:37.000 | Is it just use model APIs argument?

00:02:40.000 | You know, it's just an HTTP request

00:02:43.000 | to get output from one of the best model out there.

00:02:46.000 | Why do I have to set up infra and use local models?

00:02:50.000 | And there are really like two answer.

00:02:53.000 | There is the more researchy answer for this,

00:02:57.000 | which is where my background lays,

00:02:59.000 | which is just research.

00:03:02.000 | If you wanna do research on language models,

00:03:05.000 | research thrives on open models.

00:03:07.000 | There is like large worth of research on modeling,

00:03:11.000 | on how these models behave,

00:03:12.000 | on evaluation, on inference,

00:03:14.000 | on mechanistic interpretability

00:03:17.000 | that could not happen at all

00:03:20.000 | if you didn't have open models.

00:03:22.000 | There are also for AI builders,

00:03:26.000 | there are also like good use cases for using local models.

00:03:31.000 | You know, you have some,

00:03:33.000 | this is like a very not comprehensive slides,

00:03:36.000 | but you have things like there are some application

00:03:38.000 | where local models just blow closed models out of the water.

00:03:43.000 | So like retrieval, it's a very clear example.

00:03:46.000 | You might have like constraints like edge AI applications

00:03:50.000 | where it makes sense.

00:03:52.000 | But even just like in terms of like stability,

00:03:54.000 | being able to say this model is not changing under the hood,

00:03:57.000 | there's plenty of good cases for open models.

00:04:03.000 | And the community is just not models.

00:04:08.000 | As I stole this slide

00:04:09.000 | from one of the Quantum announcement blog posts,

00:04:14.000 | but it's super cool to see like how much tech exists

00:04:19.000 | around open models and serving them

00:04:22.000 | or making them efficient and hosting them.

00:04:24.000 | It's pretty cool.

00:04:25.000 | And it's, if you think about like

00:04:33.000 | where the term opens come from,

00:04:34.000 | comes from like the open source,

00:04:36.000 | really open models meet the core tenants of open source,

00:04:44.000 | specifically when it comes around collaboration.

00:04:47.000 | There is truly a spirit like through these open models,

00:04:50.000 | you can build on top of other people innovation.

00:04:54.000 | We see a lot of these, even in our own work of like,

00:04:58.000 | as we iterate in the various version of Alma,

00:05:01.000 | it's not just like every time we collect from scratch,

00:05:05.000 | all the data.

00:05:06.000 | Now the first step is like, okay,

00:05:08.000 | what are the cool data sources

00:05:10.000 | and datasets people have put together

00:05:12.000 | for language model for training?

00:05:14.000 | Or when it comes to like our post-training pipeline,

00:05:18.000 | one of the steps is you wanna do some DPO.

00:05:26.000 | I use a lot of outputs of other models

00:05:29.000 | to improve your preference model.

00:05:31.000 | So it's really having like an open sort of ecosystem benefits

00:05:36.000 | and accelerates the development of open models.

00:05:39.000 | One thing that we got in 2024,

00:05:44.000 | which is not a specific model,

00:05:46.000 | but I thought it was really significant

00:05:48.000 | is we got our first open source AI definition.

00:05:53.000 | So this is from the open source initiative.

00:05:56.000 | They've been generally the steward

00:05:59.000 | of a lot of the open source licenses

00:06:02.000 | when it comes to software.

00:06:04.000 | And so they embarked on this journey

00:06:06.000 | and trying to figure out, okay,

00:06:08.000 | how does a license, an open source license

00:06:10.000 | for a model look like?

00:06:12.000 | Majority of the work is very dry

00:06:16.000 | because licenses are dry.

00:06:18.000 | So I'm not gonna walk through the license step-by-step,

00:06:21.000 | but I'm just gonna pick out one aspect that is very good.

00:06:27.000 | And then one aspect that personally

00:06:29.000 | feels like it needs improvement.

00:06:31.000 | On the good side, this open source AI license,

00:06:36.000 | actually, this is very intuitive.

00:06:39.000 | If you have a build open source software

00:06:41.000 | and you have some expectation around

00:06:43.000 | like what open source looks like for software,

00:06:48.000 | for AI, sort of matches your intuition.

00:06:52.000 | So the weights need to be fairly available,

00:06:56.000 | the code must be released with an open source license,

00:07:00.000 | and there shouldn't be like license clauses

00:07:03.000 | that block specific use cases.

00:07:06.000 | So under this definition, for example,

00:07:09.000 | LAMA or some of the QUEN models

00:07:11.000 | are not open source because the license says

00:07:14.000 | you can't use this model for this,

00:07:17.000 | or it says if you use this model,

00:07:19.000 | you have to name the output this way

00:07:22.000 | or derivative needs to be named that way.

00:07:24.000 | Those clauses don't meet open source definition.

00:07:27.000 | And so they will not be covered.

00:07:29.000 | The LAMA license will not be covered

00:07:31.000 | under the open source definition.

00:07:33.000 | It's not perfect.

00:07:38.000 | One of the thing that internally,

00:07:43.000 | in discussion with OSI, we were sort of disappointed,

00:07:47.000 | is around the language for data.

00:07:53.000 | So you might imagine that an open source AI model

00:07:57.000 | means a model where the data is freely available.

00:08:00.000 | There were discussion around that,

00:08:02.000 | but at the end of the day,

00:08:03.000 | they decided to go with a softened stance

00:08:05.000 | where they say a model is open source

00:08:09.000 | if you provide sufficient detail information

00:08:12.000 | on how to sort of replicate the data pipeline

00:08:16.000 | so you have an equivalent system.

00:08:18.000 | Sufficiently detailed?

00:08:21.000 | It's very fuzzy.

00:08:23.000 | Don't like that.

00:08:24.000 | An equivalent system is also very fuzzy.

00:08:27.000 | And this doesn't take into account

00:08:31.000 | the accessibility of the process, right?

00:08:33.000 | It might be that you provide enough information,

00:08:36.000 | but this process costs, I don't know,

00:08:38.000 | $10 million to do.

00:08:40.000 | Now the open source definition,

00:08:42.000 | like any open source license,

00:08:44.000 | has never been about accessibility,

00:08:46.000 | so that's never a factor in open source software,

00:08:49.000 | how accessible software is.

00:08:51.000 | I can make a piece of open source,

00:08:54.000 | put it on my hard drive and never access it.

00:08:56.000 | That software is still open source.

00:08:58.000 | The fact that it's not widely distributed

00:09:00.000 | doesn't change the license,

00:09:01.000 | but practically the right expectation

00:09:04.000 | of what we want good open sources to be,

00:09:07.000 | so it's kind of sad to see that

00:09:09.000 | the data component in this license

00:09:13.000 | is not as open as some of us would like it to be.

00:09:19.000 | And I linked a blog post that Nathan wrote

00:09:21.000 | on the topic that it's less rambly

00:09:24.000 | and easier to follow through.

00:09:27.000 | One thing that in general,

00:09:30.000 | I think it's fair to say

00:09:32.000 | about the state of open models in 2024

00:09:36.000 | is that we know a lot more

00:09:38.000 | than what we knew in 2023.

00:09:41.000 | Both on the training data,

00:09:44.000 | the pre-training data you curate,

00:09:47.000 | on how to do all the post-training,

00:09:50.000 | especially on the RL side.

00:09:53.000 | 2023 was a lot of throwing random darts at the board.

00:09:57.000 | I think 2024, we have clear recipes

00:10:01.000 | that don't get the same results as a closed lab

00:10:04.000 | because there is a cost

00:10:05.000 | in actually matching what they do,

00:10:07.000 | but at least we have a good sense of,

00:10:10.000 | okay, this is the path to get state-of-the-art language model.

00:10:16.000 | I think that one thing that is a downside of 2024

00:10:20.000 | is that I think we are more research constrained than 2023.

00:10:25.000 | It feels like the barrier for compute

00:10:28.000 | that you need to move innovation along

00:10:32.000 | has just been rising and rising.

00:10:36.000 | So if you go back to this slide,

00:10:38.000 | there is now this cluster of models

00:10:41.000 | that are released by the Compute Rich Club.

00:10:46.000 | Membership is hotly debated.

00:10:49.000 | Some people don't want to be called rich

00:10:52.000 | because it comes to expectations.

00:10:53.000 | Some people want to be called rich,

00:10:55.000 | but I don't know, there's debate.

00:10:57.000 | These are players that have 10,000, 50,000 GPUs at minimum,

00:11:03.000 | and so they can do a lot of work

00:11:05.000 | and a lot of exploration in improving models

00:11:08.000 | that is not very accessible.

00:11:11.000 | To give you a sense of how I personally

00:11:14.000 | think about research budgets

00:11:17.000 | for each part of the language model pipeline

00:11:22.000 | is on the pre-training side,

00:11:24.000 | you can maybe do something with 1,000 GPUs.

00:11:27.000 | Really, you want 10,000.

00:11:29.000 | And if you want real estate of the art,

00:11:31.000 | your deep-seek minimum is like 50,000.

00:11:35.000 | You can scale to infinity.

00:11:36.000 | The more you have, the better it gets.

00:11:38.000 | Everyone on that side still complains

00:11:40.000 | that they don't have enough GPUs.

00:11:43.000 | Post-training is a super wide sort of spectrum.

00:11:48.000 | You can do as little with like eight GPUs.

00:11:52.000 | As long as you're able to run a good version,

00:11:59.000 | I'll say a Lama model, you can do a lot of work there.

00:12:03.000 | You can scale.

00:12:04.000 | A lot of the methodology just scales with compute.

00:12:07.000 | If you're interested in your open replication

00:12:13.000 | of what OpenAI's O1 is,

00:12:16.000 | you're going to be on the 10K spectrum of GPUs.

00:12:20.000 | Inference, you can do a lot with very few resources.

00:12:23.000 | Evaluation, you can do a lot with,

00:12:25.000 | well, I should say at least one GPUs

00:12:28.000 | if you want to evaluate open models.

00:12:32.000 | But in general, like if you care a lot about intervention

00:12:37.000 | to do on this model,

00:12:38.000 | which is my preferred area of research,

00:12:42.000 | then the resources that you need are quite significant.

00:12:48.000 | One of the trends that has emerged in 2024

00:12:54.000 | is this cluster of fully open models.

00:12:58.000 | So OMO, the model that we build, AI2 being one of them.

00:13:04.000 | And it's nice that it's not just us.

00:13:06.000 | There's like a cluster of other mostly research efforts

00:13:11.000 | who are working on this.

00:13:15.000 | And so it's good to give you a primer

00:13:19.000 | of what like fully open means.

00:13:22.000 | So fully open, the easy way to think about it is

00:13:26.000 | instead of just releasing a model checkpoint that you run,

00:13:29.000 | you release a full recipe

00:13:31.000 | so that other people working on that space

00:13:36.000 | can pick and choose whatever they want from your recipe

00:13:40.000 | and create their own model

00:13:41.000 | or improve on top of your model.

00:13:43.000 | You're giving out the full pipeline

00:13:45.000 | and all the details there

00:13:47.000 | instead of just like the end output.

00:13:51.000 | So I pull up the screenshot from our recent MOE model.

00:13:56.000 | And like for this model, for example,

00:13:58.000 | we released the model itself, data that was trained on,

00:14:01.000 | the code, both for training and inference,

00:14:04.000 | all the logs that we got through the training run,

00:14:08.000 | as well as every intermediate checkpoint.

00:14:13.000 | And like the fact that you release

00:14:15.000 | different part of the pipeline

00:14:17.000 | allows others to do really cool things.

00:14:20.000 | So for example, this tweet from early this year

00:14:23.000 | from Fox and News Research,

00:14:25.000 | they use our pre-training data

00:14:28.000 | to do a replication of the BitNet paper in the open.

00:14:32.000 | So they took just a really,

00:14:35.000 | like the initial part of a pipeline

00:14:37.000 | and then did the thing on top of it.

00:14:41.000 | It goes both ways.

00:14:42.000 | So for example, for the old MOE2 model,

00:14:46.000 | a lot of our pre-trained data

00:14:48.000 | for the first stage of pre-training

00:14:51.000 | was from this DCLM initiative

00:14:55.000 | that was led by folks, a variety of institutions.

00:15:00.000 | It was a really nice group effort.

00:15:02.000 | But for when it was nice to be able to say, okay,

00:15:07.000 | the state-of-the-art in terms of like

00:15:09.000 | what is done in the open has improved.

00:15:11.000 | We don't have to like do all this work from scratch

00:15:14.000 | to catch up the state-of-the-art.

00:15:16.000 | We can just take it directly and integrate it

00:15:19.000 | and do our own improvements on top of that.

00:15:24.000 | I'm going to spend a few minutes doing like a shameless plug

00:15:27.000 | for some of our fully open recipes.

00:15:33.000 | So indulge me in this.

00:15:35.000 | So a few things that we released this year was,

00:15:37.000 | as I was mentioning, this old MOE model,

00:15:41.000 | which is, I think still is state-of-the-art MOE model

00:15:46.000 | in its size class.

00:15:48.000 | And it's also fully open.

00:15:50.000 | So every component of this model are available.

00:15:54.000 | We released a multimodal model called Molmo.

00:15:57.000 | Molmo is not just a model, but it's a full recipe

00:16:00.000 | of how you go from a text-only model

00:16:03.000 | to a multimodal model.

00:16:05.000 | And we apply this recipe on top of Quent checkpoints,

00:16:08.000 | on top of Olmo checkpoints,

00:16:10.000 | as well as on top of Olmo E.

00:16:12.000 | And I think there's been replication doing that

00:16:14.000 | on top of Mistral as well.

00:16:21.000 | On the post-training side, we recently released Tulu 3.

00:16:25.000 | Same story.

00:16:26.000 | This is a recipe on how you go from a base model

00:16:29.000 | to a state-of-the-art post-training model.

00:16:33.000 | We used the Tulu recipe on top of Olmo, on top of Llama,

00:16:37.000 | and there has been open replication effort

00:16:40.000 | to do that on top of Quent as well.

00:16:42.000 | It's really nice to see when your recipe is kind of turnkey.

00:16:47.000 | You can apply it to different models,

00:16:48.000 | and it kind of just works.

00:16:50.000 | And finally, the last thing we released this year

00:16:53.000 | was Olmo 2, which so far is the best state-of-the-art

00:16:58.000 | fully open language model.

00:17:00.000 | It sort of combines aspects from all three

00:17:02.000 | of these previous models.

00:17:04.000 | What we learned on the data side from Olmo E,

00:17:07.000 | and what we learned on making models

00:17:09.000 | that are easy to adapt from the Olmo project

00:17:11.000 | and the Tulu project.

00:17:12.000 | I will close with a little bit of reflection

00:17:17.000 | on ways this ecosystem of open models--

00:17:22.000 | it's not all roses.

00:17:24.000 | It's not all happy.

00:17:25.000 | It feels like day to day, it's always in peril.

00:17:30.000 | And I talked a little bit about the compute issues

00:17:33.000 | over there, but it's really not just compute.

00:17:37.000 | One thing that is on top of my mind

00:17:39.000 | is due to the environment and how

00:17:44.000 | growing feelings about how AI is treated,

00:17:48.000 | it's actually harder to get access

00:17:50.000 | to a lot of the data that was used to train a lot of the models

00:17:54.000 | up to last year.

00:17:55.000 | So this is a screenshot from really fabulous work

00:17:58.000 | from Shane Longpere, who I think is in Europe, about just access

00:18:05.000 | of--

00:18:06.000 | like diminishing access to data for language model pre-training.

00:18:10.000 | So what they did is they went through every snapshot

00:18:15.000 | of Common Crawl.

00:18:17.000 | Common Crawl is this publicly available scrape

00:18:20.000 | of a subset of the internet.

00:18:22.000 | And they looked at how, for any given website,

00:18:27.000 | whether a website that was accessible in, say, 2017,

00:18:31.000 | whether it was accessible or not in 2024.

00:18:34.000 | And what they found is as a reaction

00:18:37.000 | to the existence of closed models,

00:18:42.000 | like OpenAI or Cloud, GPT or Cloud,

00:18:47.000 | a lot of content owners have blanket blocked any type

00:18:51.000 | of crawling to their website.

00:18:54.000 | And this is something that we see also internally at AI2.

00:18:57.000 | Like one project that we started this year

00:18:59.000 | is we wanted to understand, like,

00:19:03.000 | if you're a good citizen of the internet

00:19:06.000 | and you crawl following norms and policy that

00:19:11.000 | have been established in the last 25 years, what can you crawl?

00:19:15.000 | And we found that there's a lot of websites

00:19:18.000 | where the norms of how you express preference

00:19:22.000 | of whether to crawl or not are broken.

00:19:24.000 | A lot of people would block a lot of crawling

00:19:27.000 | but do not advertise that in robots.txt.

00:19:30.000 | You can only tell that they're crawling,

00:19:32.000 | that they're blocking you in crawling when you try doing it.

00:19:35.000 | Sometimes you can't even crawl the robots.txt

00:19:38.000 | to check whether you're allowed or not.

00:19:41.000 | And then a lot of websites, there's

00:19:45.000 | like all these technologies that historically have existed

00:19:49.000 | to make website serving easier, such as Cloudflare or DNS.

00:19:55.000 | They're now being repurposed for blocking AI or any type

00:20:00.000 | of crawling in a way that is very opaque to the content

00:20:05.000 | owners themselves.

00:20:07.000 | So you go to these websites, you try to access them,

00:20:11.000 | and they're not available.

00:20:13.000 | And you get a feeling it's like, oh, something changed

00:20:16.000 | on the DNS side that it's blocking this.

00:20:20.000 | And likely the content owner has no idea.

00:20:22.000 | They're just using Cloudflare for better load balancing.

00:20:27.000 | And this is something that was sort of sprung on them

00:20:30.000 | with very little notice.

00:20:31.000 | I think the problem is this blocking really impacts people

00:20:40.000 | in different ways.

00:20:43.000 | It disproportionately helps companies

00:20:46.000 | that have a head start, which are usually the closed labs.

00:20:50.000 | And it hurts incoming newcomer players,

00:20:55.000 | where you either have to do things in a sketchy way,

00:20:59.000 | or you're never going to get that content

00:21:02.000 | that the closed lab might have.

00:21:04.000 | So there was a lot of coverage.

00:21:06.000 | I'm going to plug Nathan's blog post again.

00:21:11.000 | I think the title of this one is very succinct,

00:21:14.000 | which is we're actually not--

00:21:16.000 | before thinking about running out of training data,

00:21:19.000 | we're actually running out of open training data.

00:21:22.000 | And so if we want better open models,

00:21:24.000 | they should be on top of our mind.

00:21:28.000 | The other thing that has emerged is

00:21:31.000 | that there is strong lobbying efforts

00:21:34.000 | on trying to define any kind of open source

00:21:38.000 | AI as a new, extremely risky danger.

00:21:45.000 | And I want to be precise here.

00:21:47.000 | The problem is not not considering

00:21:52.000 | the risk of this technology.

00:21:53.000 | Every technology has risks that should always be considered.

00:21:56.000 | The thing that is, to me, is--

00:21:59.000 | sorry, is ingenious-- is just putting this AI

00:22:03.000 | on a pedestal and calling it an unknown alien technology that

00:22:09.000 | has new and undiscovered potentials to destroy humanity.

00:22:16.000 | When in reality, all the dangers, I think,

00:22:18.000 | are rooted in dangers that we know

00:22:21.000 | from existing software industry or existing issues that

00:22:27.000 | come when using software on a lot of sensitive domains,

00:22:32.000 | like medical areas.

00:22:35.000 | And I also noticed a lot of efforts

00:22:37.000 | that have actually been going on in trying

00:22:39.000 | to make these open models safe.

00:22:42.000 | I pasted one here from AI2.

00:22:45.000 | But there's actually a lot of work

00:22:47.000 | that has been going on on like, OK, how do you make--

00:22:50.000 | if you're distributing this model openly,

00:22:53.000 | how do you make it safe?

00:22:55.000 | What's the right balance between accessibility

00:22:57.000 | on open models and safety?

00:23:00.000 | And then also, there's annoying brushing

00:23:03.000 | of concerns that are then proved to be unfounded under the rug.

00:23:09.000 | If you remember at the beginning of this year,

00:23:11.000 | it was all about bio-risk of these open models.

00:23:15.000 | The whole thing fizzled out because there's been--

00:23:19.000 | finally, there's been rigorous research,

00:23:22.000 | not just this paper from Cohere folks,

00:23:25.000 | but there's been rigorous research showing

00:23:27.000 | that this is really not a concern that we

00:23:30.000 | should be worried about.

00:23:31.000 | Again, there is a lot of dangerous use of AI application.

00:23:34.000 | But this one was just like a lobbying ploy

00:23:38.000 | to just make things sound scarier than they actually are.

00:23:43.000 | So I've got to preface this part.

00:23:45.000 | It says this is my personal opinion.

00:23:47.000 | It's not my employer.

00:23:48.000 | But I look at things like the SP1047 from California.

00:23:53.000 | And I think we kind of dodged a bullet on this legislation.

00:23:59.000 | The open source community, a lot of the community

00:24:02.000 | came together at sort of the last minute

00:24:06.000 | and did a very good effort trying

00:24:08.000 | to explain all the negative impact of this bill.

00:24:13.000 | But I feel like there's a lot of excitement

00:24:17.000 | on building these open models or researching

00:24:20.000 | on these open models.

00:24:22.000 | And lobbying is not sexy.

00:24:24.000 | It's kind of boring, but it's sort of

00:24:28.000 | necessary to make sure that this ecosystem can really thrive.

00:24:34.000 | This end of presentation, I have some links, emails,

00:24:38.000 | sort of standard thing in case anyone wants to reach out.

00:24:41.000 | And if folks have questions or anything

00:24:46.000 | they wanted to discuss, sort of open the floor.

00:24:50.000 | I'm very curious how we should build incentives

00:24:53.000 | to build open models, things like Francois Chollet's Arc

00:24:57.000 | Prize and other initiatives like that.

00:24:59.000 | What is your opinion on how we should

00:25:01.000 | better align incentives in the community

00:25:03.000 | so that open models stay open?

00:25:05.000 | The incentive bit is like really hard.

00:25:07.000 | Like even-- it's something that actually even we

00:25:10.000 | think a lot about it internally.

00:25:13.000 | Because like building open models is risky.

00:25:16.000 | It's very expensive.

00:25:19.000 | And so people don't want to take risky bets.

00:25:22.000 | I think definitely like the challenges, like our challenge,

00:25:28.000 | I think those are like very valid approaches for it.

00:25:32.000 | And then I think in general, promoting, building,

00:25:38.000 | any kind of effort to participate in this challenge,

00:25:41.000 | in those challenges, if we can promote

00:25:43.000 | doing that on top of open models.

00:25:46.000 | And sort of really lean into this multiplier effect,

00:25:51.000 | I think that is a good way to go.

00:25:55.000 | If there were more money for efforts, like research

00:26:01.000 | efforts around open models, there's a lot of--

00:26:04.000 | I think there's a lot of investments in companies

00:26:06.000 | that at the moment are releasing their model in the open, which

00:26:10.000 | is really cool.

00:26:11.000 | But it's usually more because of commercial interest

00:26:15.000 | and not wanting to support this open models in the long term.

00:26:21.000 | It's a really hard problem because I think everyone

00:26:24.000 | is operating sort of in what--

00:26:27.000 | everyone is at their local maximum, right?

00:26:29.000 | In ways that really optimize their position on the market,

00:26:33.000 | the global maximum is harder to achieve.

00:26:38.000 | Yeah, I'm super excited to be here

00:26:40.000 | to talk to you guys about Mistral.

00:26:43.000 | A really short and quick recap of what we have done,

00:26:47.000 | what kind of models and products we have released

00:26:50.000 | in the past a year and a half.

00:26:53.000 | So most of you have already known

00:26:56.000 | that we are a small startup founded about a year

00:27:00.000 | and a half ago in Paris.

00:27:02.000 | In May, 2003, it was founded by three of our co-founders.

00:27:06.000 | And in September, 2003, we released our first open source

00:27:10.000 | model, Mistral 7B.

00:27:13.000 | Yeah, how many of you have used or heard about Mistral 7B?

00:27:17.000 | Hey, pretty much everyone.

00:27:19.000 | Thank you.

00:27:20.000 | Yeah, it's pretty popular.

00:27:23.000 | And our community really loved this model.

00:27:27.000 | And in December, 2003, we released another popular model

00:27:32.000 | with the MLE architecture, Mistral 8x7B.

00:27:36.000 | And going into this year, you can

00:27:40.000 | see we have released a lot of things this year.

00:27:43.000 | First of all, in February, 2004, we

00:27:45.000 | released Mistral Small, Mistral Large, Le Chat,

00:27:49.000 | which is our chat interface.

00:27:51.000 | I will show you in a little bit.

00:27:53.000 | We released an embedding model for converting your text

00:27:59.000 | into embedding vectors.

00:28:01.000 | And all of our models are available on the cloud

00:28:06.000 | resources.

00:28:07.000 | So you can use our model on Google Cloud, AWS, Azure,

00:28:11.000 | Snowflake, IBM.

00:28:13.000 | So very useful for enterprise who wants

00:28:16.000 | to use our model through cloud.

00:28:18.000 | And in April and May this year, we

00:28:21.000 | released another powerful open source MLE model, 8x22B.

00:28:27.000 | And we also released our first code model,

00:28:30.000 | Coastral, which is amazing at 80-plus languages.

00:28:34.000 | And then we provided another fine-tuning service

00:28:37.000 | for customization.

00:28:39.000 | So because we know the community love to fine-tune our models,

00:28:42.000 | so we provide you a very nice and easy option

00:28:45.000 | for you to fine-tune our model on our platform.

00:28:48.000 | And also, we released our fine-tuning code base

00:28:51.000 | called Mistral Fine-Tune.

00:28:52.000 | It's open source.

00:28:53.000 | So feel free to take a look.

00:28:56.000 | And more models.

00:28:59.000 | On July to November this year, we

00:29:02.000 | released many, many other models.

00:29:06.000 | First of all is the two new best small models.

00:29:10.000 | We have Mistral 3B, great for deploying on edge devices.

00:29:16.000 | We have Mistral 8B.

00:29:18.000 | If you used to use Mistral 7B, Mistral 8B

00:29:21.000 | is a great replacement with much stronger performance

00:29:24.000 | than Mistral 7B.

00:29:26.000 | We also collaborated with NVIDIA

00:29:28.000 | and open sourced another model, Nemo 12B, another great model.

00:29:33.000 | And just a few weeks ago, we updated Mistral Large

00:29:37.000 | with version 2 with the updated state-of-the-art features

00:29:42.000 | and really great function calling capabilities.

00:29:45.000 | It's supporting function calling latently.

00:29:48.000 | And we released two multi-modal models.

00:29:51.000 | Mistral 12B, it's open source.

00:29:54.000 | And Mistral Large, just amazing models

00:29:58.000 | for not understanding images, but also

00:30:01.000 | great at text understanding.

00:30:03.000 | So yeah, a lot of the image models

00:30:05.000 | are not so good at text understanding.

00:30:07.000 | But Mistral Large and Mistral 12B

00:30:10.000 | are good at both image understanding and text

00:30:13.000 | understanding.

00:30:14.000 | And of course, we have models for research.

00:30:17.000 | Coastal Mamba is built on Mamba architecture and Mithril,

00:30:22.000 | great with working with math problems.

00:30:26.000 | So yeah, that's another model.

00:30:28.000 | Here's another view of our model reference.

00:30:37.000 | We have several premier models, which

00:30:39.000 | means these models are mostly available through our API.

00:30:44.000 | I mean, all of the models are available throughout our API

00:30:48.000 | except for Mistral 3B.

00:30:51.000 | But for the premier model, they have a special license,

00:30:55.000 | Mistral Research License.

00:30:57.000 | You can use it for free for exploration.

00:30:59.000 | But if you want to use it for enterprise, for production use,

00:31:03.000 | you will need to purchase a license from us.

00:31:06.000 | So on the top row here, we have Mistral 3B and 8B

00:31:10.000 | as our premier model.

00:31:12.000 | Mistral Small for best low latency use cases.

00:31:16.000 | Mistral Large is great for your most sophisticated use cases.

00:31:20.000 | Pixel Large is the frontier class multimodal model.

00:31:24.000 | And we have Coastal, great for coding.

00:31:26.000 | And then again, Mistral Embedding Model.

00:31:29.000 | And the bottom of the slides here,

00:31:32.000 | we have several Apache 2.0 licensed open way models

00:31:37.000 | free for the community to use.

00:31:38.000 | And also, if you want to fine-tune it,

00:31:40.000 | use it for customization, production, feel free to do so.

00:31:44.000 | The latest, we have Pixtel 12B.

00:31:47.000 | We also have Mistral Nemo, Coastal Mamba, and Mistral,

00:31:53.000 | as I mentioned.

00:31:55.000 | And we have three legacy models that we don't update anymore.

00:31:59.000 | So we recommend you to move to our newer models

00:32:03.000 | if you are still using them.

00:32:06.000 | And then just a few weeks ago,

00:32:09.000 | we did a lot of improvements to our code interface, Lachet.

00:32:16.000 | How many of you have used Lachet?

00:32:19.000 | Oh, no, only a few.

00:32:21.000 | Okay, I highly recommend Lachet.

00:32:23.000 | It's chat.mistral.ai.

00:32:26.000 | It's free to use.

00:32:27.000 | It has all the amazing capabilities

00:32:29.000 | I'm going to show you right now.

00:32:31.000 | But before that, Lachet in French means cat.

00:32:34.000 | So this is actually a cat logo.

00:32:38.000 | Yeah, if you can tell, this is cat eyes.

00:32:44.000 | Yeah, so first of all, I want to show you something.

00:32:48.000 | Maybe let's take a look at image understanding.

00:33:00.000 | So here I have a receipt, and I want to ask--

00:33:07.000 | sorry, just going to get the prompts.

00:33:14.000 | Going back.

00:33:24.000 | What's going on?

00:33:31.000 | Yeah, I had an issue with Wi-Fi here,

00:33:32.000 | so hopefully it would work.

00:33:38.000 | Cool, so basically, I have a receipt,

00:33:41.000 | and I said I ordered coffee and a sausage.

00:33:45.000 | How much do I owe at an 18% tip?

00:33:49.000 | So hopefully it was able to get the cost of the coffee

00:33:52.000 | and the sausage and ignore the other things.

00:33:56.000 | And, yeah, I don't really understand this,

00:33:58.000 | but I think this is coffee.

00:34:01.000 | It's, yeah, nine.

00:34:04.000 | Yeah, and then cost of the sausage.

00:34:06.000 | We have 22 here.

00:34:09.000 | Yep, and then it was able to add the cost,

00:34:12.000 | calculate the tip, and all that.

00:34:15.000 | Great, so it's great at image understanding.

00:34:18.000 | It's great at OCR tasks.

00:34:20.000 | So if you have OCR tasks, please use it.

00:34:23.000 | It's free on Lachat.

00:34:25.000 | It's also available through our API.

00:34:28.000 | And also I want to show you a Canvas example.

00:34:31.000 | A lot of you may have used Canvas with other tools before,

00:34:38.000 | but with Lachat, it's completely free again.

00:34:43.000 | Here I'm asking it to create a Canvas

00:34:45.000 | use PyScript to execute Python in my browser.

00:34:50.000 | So, ooh, what's going on?

00:34:55.000 | Okay, let's see if it works.

00:34:58.000 | Import this.

00:35:00.000 | Oh.

00:35:04.000 | Yep, okay.

00:35:06.000 | So, yeah, so basically it's executing Python here,

00:35:10.000 | exactly what we wanted.

00:35:13.000 | And the other day I was trying to ask Lachat

00:35:16.000 | to create a game for me.

00:35:19.000 | Let's see if we can make it work.

00:35:24.000 | Yeah, the Tetris game.

00:35:27.000 | Yep.

00:35:31.000 | Let's just get one row, maybe.

00:35:44.000 | Ah!

00:35:45.000 | Oh, no.

00:35:51.000 | Okay, never mind.

00:35:52.000 | You get the idea.

00:35:53.000 | I failed my mission.

00:35:58.000 | Okay, here we go.

00:36:01.000 | Yay!

00:36:04.000 | Cool.

00:36:05.000 | Yeah, so as you can see, Lachat can write, like,

00:36:09.000 | a code about a simple game pretty easily,

00:36:12.000 | and you can ask Lachat to explain the code,

00:36:15.000 | make updates, however you like.

00:36:19.000 | Another example.

00:36:21.000 | There is a bar here I want to move.

00:36:24.000 | Okay.

00:36:26.000 | Right, okay.

00:36:27.000 | And let's go back.

00:36:33.000 | Another one.

00:36:36.000 | Yeah, we also have web search capabilities,

00:36:38.000 | like you can ask what's the latest AI news.

00:36:42.000 | Image generation is pretty cool.

00:36:44.000 | Generate an image about researchers in Vancouver.

00:36:56.000 | Yeah, Black Forest Labs, FlexPro.

00:36:59.000 | Again, this is free, so.

00:37:03.000 | Oh, cool.

00:37:05.000 | I guess researchers here are mostly from University of British Columbia.

00:37:10.000 | That's smart.

00:37:11.000 | Yeah, so this is Lachat.

00:37:14.000 | Please feel free to use it and let me know if you have any feedback.

00:37:19.000 | We're always looking for improvement,

00:37:21.000 | and we're going to release a lot more powerful features in the coming years.

00:37:25.000 | Thank you.

00:37:26.000 | [APPLAUSE]