back to index

Best of 2024: Open Models [LS LIVE! at NeurIPS 2024]


Whisper Transcript | Transcript Only Page

00:00:00.000 | [Music]
00:00:06.000 | All right.
00:00:07.000 | All right.
00:00:08.000 | Cool.
00:00:09.000 | Yeah, thanks for having me over.
00:00:12.000 | I'm Luca.
00:00:13.000 | I'm a research scientist at the Allen Institute for AI.
00:00:17.000 | I threw together a few slides on sort of like a recap
00:00:21.000 | of like interesting themes in Open Models for 2024.
00:00:26.000 | I have about maybe 20, 25 minutes of slides,
00:00:30.000 | and then we can chat if there are any questions.
00:00:33.000 | If I can advance to the next slide.
00:00:36.000 | Okay, cool.
00:00:37.000 | So I did the quick check of like to sort of get a sense
00:00:42.000 | of like how much 2024 was different from 2023.
00:00:46.000 | So I went out hugging face and sort of tried to get a picture
00:00:49.000 | of what kind of models were released in 2023
00:00:52.000 | and like what do we get in 2024.
00:00:55.000 | 2023, we got things like both LAMA 1 and 2.
00:00:59.000 | We got Mistral, we got NPT, Falcon models.
00:01:03.000 | I think the Yi model came at the tail end of the year.
00:01:05.000 | It was a pretty good year.
00:01:07.000 | But then I did the same for 2024,
00:01:10.000 | and it's actually quite stark difference.
00:01:15.000 | You have models that are, you know,
00:01:18.000 | reveling frontier level performance of what you can get
00:01:22.000 | from closed models from like Quen, from DeepSeek.
00:01:26.000 | We got LAMA 3, we got all sorts of different models.
00:01:30.000 | I added our own Olmo at the bottom.
00:01:34.000 | There's this growing group of like fully open models
00:01:37.000 | that I'm going to touch on a little bit later.
00:01:40.000 | But, you know, just looking at the slides,
00:01:42.000 | it feels like 2024 was just smooth sailing,
00:01:47.000 | happy news, much better than previous year.
00:01:50.000 | And, you know, you can plot,
00:01:52.000 | you can pick your favorite benchmark,
00:01:56.000 | or least favorite, I don't know,
00:01:58.000 | depending on what point you're trying to make,
00:02:00.000 | and plot, you know, your closed model, your open model,
00:02:05.000 | and sort of spin it in ways that show that,
00:02:08.000 | oh, you know, open models are much closer
00:02:11.000 | to where closed models are today
00:02:14.000 | versus last year where the gap was fairly significant.
00:02:20.000 | So one thing that I think,
00:02:24.000 | I don't know if I have to convince people in this room,
00:02:27.000 | but usually when I give this talks about like open models,
00:02:31.000 | there is always like this background question
00:02:33.000 | in people's mind of like, why should we use open models?
00:02:37.000 | Is it just use model APIs argument?
00:02:40.000 | You know, it's just an HTTP request
00:02:43.000 | to get output from one of the best model out there.
00:02:46.000 | Why do I have to set up infra and use local models?
00:02:50.000 | And there are really like two answer.
00:02:53.000 | There is the more researchy answer for this,
00:02:57.000 | which is where my background lays,
00:02:59.000 | which is just research.
00:03:02.000 | If you wanna do research on language models,
00:03:05.000 | research thrives on open models.
00:03:07.000 | There is like large worth of research on modeling,
00:03:11.000 | on how these models behave,
00:03:12.000 | on evaluation, on inference,
00:03:14.000 | on mechanistic interpretability
00:03:17.000 | that could not happen at all
00:03:20.000 | if you didn't have open models.
00:03:22.000 | There are also for AI builders,
00:03:26.000 | there are also like good use cases for using local models.
00:03:31.000 | You know, you have some,
00:03:33.000 | this is like a very not comprehensive slides,
00:03:36.000 | but you have things like there are some application
00:03:38.000 | where local models just blow closed models out of the water.
00:03:43.000 | So like retrieval, it's a very clear example.
00:03:46.000 | You might have like constraints like edge AI applications
00:03:50.000 | where it makes sense.
00:03:52.000 | But even just like in terms of like stability,
00:03:54.000 | being able to say this model is not changing under the hood,
00:03:57.000 | there's plenty of good cases for open models.
00:04:03.000 | And the community is just not models.
00:04:08.000 | As I stole this slide
00:04:09.000 | from one of the Quantum announcement blog posts,
00:04:14.000 | but it's super cool to see like how much tech exists
00:04:19.000 | around open models and serving them
00:04:22.000 | or making them efficient and hosting them.
00:04:24.000 | It's pretty cool.
00:04:25.000 | And it's, if you think about like
00:04:33.000 | where the term opens come from,
00:04:34.000 | comes from like the open source,
00:04:36.000 | really open models meet the core tenants of open source,
00:04:44.000 | specifically when it comes around collaboration.
00:04:47.000 | There is truly a spirit like through these open models,
00:04:50.000 | you can build on top of other people innovation.
00:04:54.000 | We see a lot of these, even in our own work of like,
00:04:58.000 | as we iterate in the various version of Alma,
00:05:01.000 | it's not just like every time we collect from scratch,
00:05:05.000 | all the data.
00:05:06.000 | Now the first step is like, okay,
00:05:08.000 | what are the cool data sources
00:05:10.000 | and datasets people have put together
00:05:12.000 | for language model for training?
00:05:14.000 | Or when it comes to like our post-training pipeline,
00:05:18.000 | one of the steps is you wanna do some DPO.
00:05:26.000 | I use a lot of outputs of other models
00:05:29.000 | to improve your preference model.
00:05:31.000 | So it's really having like an open sort of ecosystem benefits
00:05:36.000 | and accelerates the development of open models.
00:05:39.000 | One thing that we got in 2024,
00:05:44.000 | which is not a specific model,
00:05:46.000 | but I thought it was really significant
00:05:48.000 | is we got our first open source AI definition.
00:05:53.000 | So this is from the open source initiative.
00:05:56.000 | They've been generally the steward
00:05:59.000 | of a lot of the open source licenses
00:06:02.000 | when it comes to software.
00:06:04.000 | And so they embarked on this journey
00:06:06.000 | and trying to figure out, okay,
00:06:08.000 | how does a license, an open source license
00:06:10.000 | for a model look like?
00:06:12.000 | Majority of the work is very dry
00:06:16.000 | because licenses are dry.
00:06:18.000 | So I'm not gonna walk through the license step-by-step,
00:06:21.000 | but I'm just gonna pick out one aspect that is very good.
00:06:27.000 | And then one aspect that personally
00:06:29.000 | feels like it needs improvement.
00:06:31.000 | On the good side, this open source AI license,
00:06:36.000 | actually, this is very intuitive.
00:06:39.000 | If you have a build open source software
00:06:41.000 | and you have some expectation around
00:06:43.000 | like what open source looks like for software,
00:06:48.000 | for AI, sort of matches your intuition.
00:06:52.000 | So the weights need to be fairly available,
00:06:56.000 | the code must be released with an open source license,
00:07:00.000 | and there shouldn't be like license clauses
00:07:03.000 | that block specific use cases.
00:07:06.000 | So under this definition, for example,
00:07:09.000 | LAMA or some of the QUEN models
00:07:11.000 | are not open source because the license says
00:07:14.000 | you can't use this model for this,
00:07:17.000 | or it says if you use this model,
00:07:19.000 | you have to name the output this way
00:07:22.000 | or derivative needs to be named that way.
00:07:24.000 | Those clauses don't meet open source definition.
00:07:27.000 | And so they will not be covered.
00:07:29.000 | The LAMA license will not be covered
00:07:31.000 | under the open source definition.
00:07:33.000 | It's not perfect.
00:07:38.000 | One of the thing that internally,
00:07:43.000 | in discussion with OSI, we were sort of disappointed,
00:07:47.000 | is around the language for data.
00:07:53.000 | So you might imagine that an open source AI model
00:07:57.000 | means a model where the data is freely available.
00:08:00.000 | There were discussion around that,
00:08:02.000 | but at the end of the day,
00:08:03.000 | they decided to go with a softened stance
00:08:05.000 | where they say a model is open source
00:08:09.000 | if you provide sufficient detail information
00:08:12.000 | on how to sort of replicate the data pipeline
00:08:16.000 | so you have an equivalent system.
00:08:18.000 | Sufficiently detailed?
00:08:21.000 | It's very fuzzy.
00:08:23.000 | Don't like that.
00:08:24.000 | An equivalent system is also very fuzzy.
00:08:27.000 | And this doesn't take into account
00:08:31.000 | the accessibility of the process, right?
00:08:33.000 | It might be that you provide enough information,
00:08:36.000 | but this process costs, I don't know,
00:08:38.000 | $10 million to do.
00:08:40.000 | Now the open source definition,
00:08:42.000 | like any open source license,
00:08:44.000 | has never been about accessibility,
00:08:46.000 | so that's never a factor in open source software,
00:08:49.000 | how accessible software is.
00:08:51.000 | I can make a piece of open source,
00:08:54.000 | put it on my hard drive and never access it.
00:08:56.000 | That software is still open source.
00:08:58.000 | The fact that it's not widely distributed
00:09:00.000 | doesn't change the license,
00:09:01.000 | but practically the right expectation
00:09:04.000 | of what we want good open sources to be,
00:09:07.000 | so it's kind of sad to see that
00:09:09.000 | the data component in this license
00:09:13.000 | is not as open as some of us would like it to be.
00:09:19.000 | And I linked a blog post that Nathan wrote
00:09:21.000 | on the topic that it's less rambly
00:09:24.000 | and easier to follow through.
00:09:27.000 | One thing that in general,
00:09:30.000 | I think it's fair to say
00:09:32.000 | about the state of open models in 2024
00:09:36.000 | is that we know a lot more
00:09:38.000 | than what we knew in 2023.
00:09:41.000 | Both on the training data,
00:09:44.000 | the pre-training data you curate,
00:09:47.000 | on how to do all the post-training,
00:09:50.000 | especially on the RL side.
00:09:53.000 | 2023 was a lot of throwing random darts at the board.
00:09:57.000 | I think 2024, we have clear recipes
00:10:01.000 | that don't get the same results as a closed lab
00:10:04.000 | because there is a cost
00:10:05.000 | in actually matching what they do,
00:10:07.000 | but at least we have a good sense of,
00:10:10.000 | okay, this is the path to get state-of-the-art language model.
00:10:16.000 | I think that one thing that is a downside of 2024
00:10:20.000 | is that I think we are more research constrained than 2023.
00:10:25.000 | It feels like the barrier for compute
00:10:28.000 | that you need to move innovation along
00:10:32.000 | has just been rising and rising.
00:10:36.000 | So if you go back to this slide,
00:10:38.000 | there is now this cluster of models
00:10:41.000 | that are released by the Compute Rich Club.
00:10:46.000 | Membership is hotly debated.
00:10:49.000 | Some people don't want to be called rich
00:10:52.000 | because it comes to expectations.
00:10:53.000 | Some people want to be called rich,
00:10:55.000 | but I don't know, there's debate.
00:10:57.000 | These are players that have 10,000, 50,000 GPUs at minimum,
00:11:03.000 | and so they can do a lot of work
00:11:05.000 | and a lot of exploration in improving models
00:11:08.000 | that is not very accessible.
00:11:11.000 | To give you a sense of how I personally
00:11:14.000 | think about research budgets
00:11:17.000 | for each part of the language model pipeline
00:11:22.000 | is on the pre-training side,
00:11:24.000 | you can maybe do something with 1,000 GPUs.
00:11:27.000 | Really, you want 10,000.
00:11:29.000 | And if you want real estate of the art,
00:11:31.000 | your deep-seek minimum is like 50,000.
00:11:35.000 | You can scale to infinity.
00:11:36.000 | The more you have, the better it gets.
00:11:38.000 | Everyone on that side still complains
00:11:40.000 | that they don't have enough GPUs.
00:11:43.000 | Post-training is a super wide sort of spectrum.
00:11:48.000 | You can do as little with like eight GPUs.
00:11:52.000 | As long as you're able to run a good version,
00:11:59.000 | I'll say a Lama model, you can do a lot of work there.
00:12:03.000 | You can scale.
00:12:04.000 | A lot of the methodology just scales with compute.
00:12:07.000 | If you're interested in your open replication
00:12:13.000 | of what OpenAI's O1 is,
00:12:16.000 | you're going to be on the 10K spectrum of GPUs.
00:12:20.000 | Inference, you can do a lot with very few resources.
00:12:23.000 | Evaluation, you can do a lot with,
00:12:25.000 | well, I should say at least one GPUs
00:12:28.000 | if you want to evaluate open models.
00:12:32.000 | But in general, like if you care a lot about intervention
00:12:37.000 | to do on this model,
00:12:38.000 | which is my preferred area of research,
00:12:42.000 | then the resources that you need are quite significant.
00:12:48.000 | One of the trends that has emerged in 2024
00:12:54.000 | is this cluster of fully open models.
00:12:58.000 | So OMO, the model that we build, AI2 being one of them.
00:13:04.000 | And it's nice that it's not just us.
00:13:06.000 | There's like a cluster of other mostly research efforts
00:13:11.000 | who are working on this.
00:13:15.000 | And so it's good to give you a primer
00:13:19.000 | of what like fully open means.
00:13:22.000 | So fully open, the easy way to think about it is
00:13:26.000 | instead of just releasing a model checkpoint that you run,
00:13:29.000 | you release a full recipe
00:13:31.000 | so that other people working on that space
00:13:36.000 | can pick and choose whatever they want from your recipe
00:13:40.000 | and create their own model
00:13:41.000 | or improve on top of your model.
00:13:43.000 | You're giving out the full pipeline
00:13:45.000 | and all the details there
00:13:47.000 | instead of just like the end output.
00:13:51.000 | So I pull up the screenshot from our recent MOE model.
00:13:56.000 | And like for this model, for example,
00:13:58.000 | we released the model itself, data that was trained on,
00:14:01.000 | the code, both for training and inference,
00:14:04.000 | all the logs that we got through the training run,
00:14:08.000 | as well as every intermediate checkpoint.
00:14:13.000 | And like the fact that you release
00:14:15.000 | different part of the pipeline
00:14:17.000 | allows others to do really cool things.
00:14:20.000 | So for example, this tweet from early this year
00:14:23.000 | from Fox and News Research,
00:14:25.000 | they use our pre-training data
00:14:28.000 | to do a replication of the BitNet paper in the open.
00:14:32.000 | So they took just a really,
00:14:35.000 | like the initial part of a pipeline
00:14:37.000 | and then did the thing on top of it.
00:14:41.000 | It goes both ways.
00:14:42.000 | So for example, for the old MOE2 model,
00:14:46.000 | a lot of our pre-trained data
00:14:48.000 | for the first stage of pre-training
00:14:51.000 | was from this DCLM initiative
00:14:55.000 | that was led by folks, a variety of institutions.
00:15:00.000 | It was a really nice group effort.
00:15:02.000 | But for when it was nice to be able to say, okay,
00:15:07.000 | the state-of-the-art in terms of like
00:15:09.000 | what is done in the open has improved.
00:15:11.000 | We don't have to like do all this work from scratch
00:15:14.000 | to catch up the state-of-the-art.
00:15:16.000 | We can just take it directly and integrate it
00:15:19.000 | and do our own improvements on top of that.
00:15:24.000 | I'm going to spend a few minutes doing like a shameless plug
00:15:27.000 | for some of our fully open recipes.
00:15:33.000 | So indulge me in this.
00:15:35.000 | So a few things that we released this year was,
00:15:37.000 | as I was mentioning, this old MOE model,
00:15:41.000 | which is, I think still is state-of-the-art MOE model
00:15:46.000 | in its size class.
00:15:48.000 | And it's also fully open.
00:15:50.000 | So every component of this model are available.
00:15:54.000 | We released a multimodal model called Molmo.
00:15:57.000 | Molmo is not just a model, but it's a full recipe
00:16:00.000 | of how you go from a text-only model
00:16:03.000 | to a multimodal model.
00:16:05.000 | And we apply this recipe on top of Quent checkpoints,
00:16:08.000 | on top of Olmo checkpoints,
00:16:10.000 | as well as on top of Olmo E.
00:16:12.000 | And I think there's been replication doing that
00:16:14.000 | on top of Mistral as well.
00:16:21.000 | On the post-training side, we recently released Tulu 3.
00:16:25.000 | Same story.
00:16:26.000 | This is a recipe on how you go from a base model
00:16:29.000 | to a state-of-the-art post-training model.
00:16:33.000 | We used the Tulu recipe on top of Olmo, on top of Llama,
00:16:37.000 | and there has been open replication effort
00:16:40.000 | to do that on top of Quent as well.
00:16:42.000 | It's really nice to see when your recipe is kind of turnkey.
00:16:47.000 | You can apply it to different models,
00:16:48.000 | and it kind of just works.
00:16:50.000 | And finally, the last thing we released this year
00:16:53.000 | was Olmo 2, which so far is the best state-of-the-art
00:16:58.000 | fully open language model.
00:17:00.000 | It sort of combines aspects from all three
00:17:02.000 | of these previous models.
00:17:04.000 | What we learned on the data side from Olmo E,
00:17:07.000 | and what we learned on making models
00:17:09.000 | that are easy to adapt from the Olmo project
00:17:11.000 | and the Tulu project.
00:17:12.000 | I will close with a little bit of reflection
00:17:17.000 | on ways this ecosystem of open models--
00:17:22.000 | it's not all roses.
00:17:24.000 | It's not all happy.
00:17:25.000 | It feels like day to day, it's always in peril.
00:17:30.000 | And I talked a little bit about the compute issues
00:17:33.000 | over there, but it's really not just compute.
00:17:37.000 | One thing that is on top of my mind
00:17:39.000 | is due to the environment and how
00:17:44.000 | growing feelings about how AI is treated,
00:17:48.000 | it's actually harder to get access
00:17:50.000 | to a lot of the data that was used to train a lot of the models
00:17:54.000 | up to last year.
00:17:55.000 | So this is a screenshot from really fabulous work
00:17:58.000 | from Shane Longpere, who I think is in Europe, about just access
00:18:06.000 | like diminishing access to data for language model pre-training.
00:18:10.000 | So what they did is they went through every snapshot
00:18:15.000 | of Common Crawl.
00:18:17.000 | Common Crawl is this publicly available scrape
00:18:20.000 | of a subset of the internet.
00:18:22.000 | And they looked at how, for any given website,
00:18:27.000 | whether a website that was accessible in, say, 2017,
00:18:31.000 | whether it was accessible or not in 2024.
00:18:34.000 | And what they found is as a reaction
00:18:37.000 | to the existence of closed models,
00:18:42.000 | like OpenAI or Cloud, GPT or Cloud,
00:18:47.000 | a lot of content owners have blanket blocked any type
00:18:51.000 | of crawling to their website.
00:18:54.000 | And this is something that we see also internally at AI2.
00:18:57.000 | Like one project that we started this year
00:18:59.000 | is we wanted to understand, like,
00:19:03.000 | if you're a good citizen of the internet
00:19:06.000 | and you crawl following norms and policy that
00:19:11.000 | have been established in the last 25 years, what can you crawl?
00:19:15.000 | And we found that there's a lot of websites
00:19:18.000 | where the norms of how you express preference
00:19:22.000 | of whether to crawl or not are broken.
00:19:24.000 | A lot of people would block a lot of crawling
00:19:27.000 | but do not advertise that in robots.txt.
00:19:30.000 | You can only tell that they're crawling,
00:19:32.000 | that they're blocking you in crawling when you try doing it.
00:19:35.000 | Sometimes you can't even crawl the robots.txt
00:19:38.000 | to check whether you're allowed or not.
00:19:41.000 | And then a lot of websites, there's
00:19:45.000 | like all these technologies that historically have existed
00:19:49.000 | to make website serving easier, such as Cloudflare or DNS.
00:19:55.000 | They're now being repurposed for blocking AI or any type
00:20:00.000 | of crawling in a way that is very opaque to the content
00:20:05.000 | owners themselves.
00:20:07.000 | So you go to these websites, you try to access them,
00:20:11.000 | and they're not available.
00:20:13.000 | And you get a feeling it's like, oh, something changed
00:20:16.000 | on the DNS side that it's blocking this.
00:20:20.000 | And likely the content owner has no idea.
00:20:22.000 | They're just using Cloudflare for better load balancing.
00:20:27.000 | And this is something that was sort of sprung on them
00:20:30.000 | with very little notice.
00:20:31.000 | I think the problem is this blocking really impacts people
00:20:40.000 | in different ways.
00:20:43.000 | It disproportionately helps companies
00:20:46.000 | that have a head start, which are usually the closed labs.
00:20:50.000 | And it hurts incoming newcomer players,
00:20:55.000 | where you either have to do things in a sketchy way,
00:20:59.000 | or you're never going to get that content
00:21:02.000 | that the closed lab might have.
00:21:04.000 | So there was a lot of coverage.
00:21:06.000 | I'm going to plug Nathan's blog post again.
00:21:11.000 | I think the title of this one is very succinct,
00:21:14.000 | which is we're actually not--
00:21:16.000 | before thinking about running out of training data,
00:21:19.000 | we're actually running out of open training data.
00:21:22.000 | And so if we want better open models,
00:21:24.000 | they should be on top of our mind.
00:21:28.000 | The other thing that has emerged is
00:21:31.000 | that there is strong lobbying efforts
00:21:34.000 | on trying to define any kind of open source
00:21:38.000 | AI as a new, extremely risky danger.
00:21:45.000 | And I want to be precise here.
00:21:47.000 | The problem is not not considering
00:21:52.000 | the risk of this technology.
00:21:53.000 | Every technology has risks that should always be considered.
00:21:56.000 | The thing that is, to me, is--
00:21:59.000 | sorry, is ingenious-- is just putting this AI
00:22:03.000 | on a pedestal and calling it an unknown alien technology that
00:22:09.000 | has new and undiscovered potentials to destroy humanity.
00:22:16.000 | When in reality, all the dangers, I think,
00:22:18.000 | are rooted in dangers that we know
00:22:21.000 | from existing software industry or existing issues that
00:22:27.000 | come when using software on a lot of sensitive domains,
00:22:32.000 | like medical areas.
00:22:35.000 | And I also noticed a lot of efforts
00:22:37.000 | that have actually been going on in trying
00:22:39.000 | to make these open models safe.
00:22:42.000 | I pasted one here from AI2.
00:22:45.000 | But there's actually a lot of work
00:22:47.000 | that has been going on on like, OK, how do you make--
00:22:50.000 | if you're distributing this model openly,
00:22:53.000 | how do you make it safe?
00:22:55.000 | What's the right balance between accessibility
00:22:57.000 | on open models and safety?
00:23:00.000 | And then also, there's annoying brushing
00:23:03.000 | of concerns that are then proved to be unfounded under the rug.
00:23:09.000 | If you remember at the beginning of this year,
00:23:11.000 | it was all about bio-risk of these open models.
00:23:15.000 | The whole thing fizzled out because there's been--
00:23:19.000 | finally, there's been rigorous research,
00:23:22.000 | not just this paper from Cohere folks,
00:23:25.000 | but there's been rigorous research showing
00:23:27.000 | that this is really not a concern that we
00:23:30.000 | should be worried about.
00:23:31.000 | Again, there is a lot of dangerous use of AI application.
00:23:34.000 | But this one was just like a lobbying ploy
00:23:38.000 | to just make things sound scarier than they actually are.
00:23:43.000 | So I've got to preface this part.
00:23:45.000 | It says this is my personal opinion.
00:23:47.000 | It's not my employer.
00:23:48.000 | But I look at things like the SP1047 from California.
00:23:53.000 | And I think we kind of dodged a bullet on this legislation.
00:23:59.000 | The open source community, a lot of the community
00:24:02.000 | came together at sort of the last minute
00:24:06.000 | and did a very good effort trying
00:24:08.000 | to explain all the negative impact of this bill.
00:24:13.000 | But I feel like there's a lot of excitement
00:24:17.000 | on building these open models or researching
00:24:20.000 | on these open models.
00:24:22.000 | And lobbying is not sexy.
00:24:24.000 | It's kind of boring, but it's sort of
00:24:28.000 | necessary to make sure that this ecosystem can really thrive.
00:24:34.000 | This end of presentation, I have some links, emails,
00:24:38.000 | sort of standard thing in case anyone wants to reach out.
00:24:41.000 | And if folks have questions or anything
00:24:46.000 | they wanted to discuss, sort of open the floor.
00:24:50.000 | I'm very curious how we should build incentives
00:24:53.000 | to build open models, things like Francois Chollet's Arc
00:24:57.000 | Prize and other initiatives like that.
00:24:59.000 | What is your opinion on how we should
00:25:01.000 | better align incentives in the community
00:25:03.000 | so that open models stay open?
00:25:05.000 | The incentive bit is like really hard.
00:25:07.000 | Like even-- it's something that actually even we
00:25:10.000 | think a lot about it internally.
00:25:13.000 | Because like building open models is risky.
00:25:16.000 | It's very expensive.
00:25:19.000 | And so people don't want to take risky bets.
00:25:22.000 | I think definitely like the challenges, like our challenge,
00:25:28.000 | I think those are like very valid approaches for it.
00:25:32.000 | And then I think in general, promoting, building,
00:25:38.000 | any kind of effort to participate in this challenge,
00:25:41.000 | in those challenges, if we can promote
00:25:43.000 | doing that on top of open models.
00:25:46.000 | And sort of really lean into this multiplier effect,
00:25:51.000 | I think that is a good way to go.
00:25:55.000 | If there were more money for efforts, like research
00:26:01.000 | efforts around open models, there's a lot of--
00:26:04.000 | I think there's a lot of investments in companies
00:26:06.000 | that at the moment are releasing their model in the open, which
00:26:10.000 | is really cool.
00:26:11.000 | But it's usually more because of commercial interest
00:26:15.000 | and not wanting to support this open models in the long term.
00:26:21.000 | It's a really hard problem because I think everyone
00:26:24.000 | is operating sort of in what--
00:26:27.000 | everyone is at their local maximum, right?
00:26:29.000 | In ways that really optimize their position on the market,
00:26:33.000 | the global maximum is harder to achieve.
00:26:38.000 | Yeah, I'm super excited to be here
00:26:40.000 | to talk to you guys about Mistral.
00:26:43.000 | A really short and quick recap of what we have done,
00:26:47.000 | what kind of models and products we have released
00:26:50.000 | in the past a year and a half.
00:26:53.000 | So most of you have already known
00:26:56.000 | that we are a small startup founded about a year
00:27:00.000 | and a half ago in Paris.
00:27:02.000 | In May, 2003, it was founded by three of our co-founders.
00:27:06.000 | And in September, 2003, we released our first open source
00:27:10.000 | model, Mistral 7B.
00:27:13.000 | Yeah, how many of you have used or heard about Mistral 7B?
00:27:17.000 | Hey, pretty much everyone.
00:27:19.000 | Thank you.
00:27:20.000 | Yeah, it's pretty popular.
00:27:23.000 | And our community really loved this model.
00:27:27.000 | And in December, 2003, we released another popular model
00:27:32.000 | with the MLE architecture, Mistral 8x7B.
00:27:36.000 | And going into this year, you can
00:27:40.000 | see we have released a lot of things this year.
00:27:43.000 | First of all, in February, 2004, we
00:27:45.000 | released Mistral Small, Mistral Large, Le Chat,
00:27:49.000 | which is our chat interface.
00:27:51.000 | I will show you in a little bit.
00:27:53.000 | We released an embedding model for converting your text
00:27:59.000 | into embedding vectors.
00:28:01.000 | And all of our models are available on the cloud
00:28:06.000 | resources.
00:28:07.000 | So you can use our model on Google Cloud, AWS, Azure,
00:28:11.000 | Snowflake, IBM.
00:28:13.000 | So very useful for enterprise who wants
00:28:16.000 | to use our model through cloud.
00:28:18.000 | And in April and May this year, we
00:28:21.000 | released another powerful open source MLE model, 8x22B.
00:28:27.000 | And we also released our first code model,
00:28:30.000 | Coastral, which is amazing at 80-plus languages.
00:28:34.000 | And then we provided another fine-tuning service
00:28:37.000 | for customization.
00:28:39.000 | So because we know the community love to fine-tune our models,
00:28:42.000 | so we provide you a very nice and easy option
00:28:45.000 | for you to fine-tune our model on our platform.
00:28:48.000 | And also, we released our fine-tuning code base
00:28:51.000 | called Mistral Fine-Tune.
00:28:52.000 | It's open source.
00:28:53.000 | So feel free to take a look.
00:28:56.000 | And more models.
00:28:59.000 | On July to November this year, we
00:29:02.000 | released many, many other models.
00:29:06.000 | First of all is the two new best small models.
00:29:10.000 | We have Mistral 3B, great for deploying on edge devices.
00:29:16.000 | We have Mistral 8B.
00:29:18.000 | If you used to use Mistral 7B, Mistral 8B
00:29:21.000 | is a great replacement with much stronger performance
00:29:24.000 | than Mistral 7B.
00:29:26.000 | We also collaborated with NVIDIA
00:29:28.000 | and open sourced another model, Nemo 12B, another great model.
00:29:33.000 | And just a few weeks ago, we updated Mistral Large
00:29:37.000 | with version 2 with the updated state-of-the-art features
00:29:42.000 | and really great function calling capabilities.
00:29:45.000 | It's supporting function calling latently.
00:29:48.000 | And we released two multi-modal models.
00:29:51.000 | Mistral 12B, it's open source.
00:29:54.000 | And Mistral Large, just amazing models
00:29:58.000 | for not understanding images, but also
00:30:01.000 | great at text understanding.
00:30:03.000 | So yeah, a lot of the image models
00:30:05.000 | are not so good at text understanding.
00:30:07.000 | But Mistral Large and Mistral 12B
00:30:10.000 | are good at both image understanding and text
00:30:13.000 | understanding.
00:30:14.000 | And of course, we have models for research.
00:30:17.000 | Coastal Mamba is built on Mamba architecture and Mithril,
00:30:22.000 | great with working with math problems.
00:30:26.000 | So yeah, that's another model.
00:30:28.000 | Here's another view of our model reference.
00:30:37.000 | We have several premier models, which
00:30:39.000 | means these models are mostly available through our API.
00:30:44.000 | I mean, all of the models are available throughout our API
00:30:48.000 | except for Mistral 3B.
00:30:51.000 | But for the premier model, they have a special license,
00:30:55.000 | Mistral Research License.
00:30:57.000 | You can use it for free for exploration.
00:30:59.000 | But if you want to use it for enterprise, for production use,
00:31:03.000 | you will need to purchase a license from us.
00:31:06.000 | So on the top row here, we have Mistral 3B and 8B
00:31:10.000 | as our premier model.
00:31:12.000 | Mistral Small for best low latency use cases.
00:31:16.000 | Mistral Large is great for your most sophisticated use cases.
00:31:20.000 | Pixel Large is the frontier class multimodal model.
00:31:24.000 | And we have Coastal, great for coding.
00:31:26.000 | And then again, Mistral Embedding Model.
00:31:29.000 | And the bottom of the slides here,
00:31:32.000 | we have several Apache 2.0 licensed open way models
00:31:37.000 | free for the community to use.
00:31:38.000 | And also, if you want to fine-tune it,
00:31:40.000 | use it for customization, production, feel free to do so.
00:31:44.000 | The latest, we have Pixtel 12B.
00:31:47.000 | We also have Mistral Nemo, Coastal Mamba, and Mistral,
00:31:53.000 | as I mentioned.
00:31:55.000 | And we have three legacy models that we don't update anymore.
00:31:59.000 | So we recommend you to move to our newer models
00:32:03.000 | if you are still using them.
00:32:06.000 | And then just a few weeks ago,
00:32:09.000 | we did a lot of improvements to our code interface, Lachet.
00:32:16.000 | How many of you have used Lachet?
00:32:19.000 | Oh, no, only a few.
00:32:21.000 | Okay, I highly recommend Lachet.
00:32:23.000 | It's chat.mistral.ai.
00:32:26.000 | It's free to use.
00:32:27.000 | It has all the amazing capabilities
00:32:29.000 | I'm going to show you right now.
00:32:31.000 | But before that, Lachet in French means cat.
00:32:34.000 | So this is actually a cat logo.
00:32:38.000 | Yeah, if you can tell, this is cat eyes.
00:32:44.000 | Yeah, so first of all, I want to show you something.
00:32:48.000 | Maybe let's take a look at image understanding.
00:33:00.000 | So here I have a receipt, and I want to ask--
00:33:07.000 | sorry, just going to get the prompts.
00:33:14.000 | Going back.
00:33:24.000 | What's going on?
00:33:31.000 | Yeah, I had an issue with Wi-Fi here,
00:33:32.000 | so hopefully it would work.
00:33:38.000 | Cool, so basically, I have a receipt,
00:33:41.000 | and I said I ordered coffee and a sausage.
00:33:45.000 | How much do I owe at an 18% tip?
00:33:49.000 | So hopefully it was able to get the cost of the coffee
00:33:52.000 | and the sausage and ignore the other things.
00:33:56.000 | And, yeah, I don't really understand this,
00:33:58.000 | but I think this is coffee.
00:34:01.000 | It's, yeah, nine.
00:34:04.000 | Yeah, and then cost of the sausage.
00:34:06.000 | We have 22 here.
00:34:09.000 | Yep, and then it was able to add the cost,
00:34:12.000 | calculate the tip, and all that.
00:34:15.000 | Great, so it's great at image understanding.
00:34:18.000 | It's great at OCR tasks.
00:34:20.000 | So if you have OCR tasks, please use it.
00:34:23.000 | It's free on Lachat.
00:34:25.000 | It's also available through our API.
00:34:28.000 | And also I want to show you a Canvas example.
00:34:31.000 | A lot of you may have used Canvas with other tools before,
00:34:38.000 | but with Lachat, it's completely free again.
00:34:43.000 | Here I'm asking it to create a Canvas
00:34:45.000 | use PyScript to execute Python in my browser.
00:34:50.000 | So, ooh, what's going on?
00:34:55.000 | Okay, let's see if it works.
00:34:58.000 | Import this.
00:35:04.000 | Yep, okay.
00:35:06.000 | So, yeah, so basically it's executing Python here,
00:35:10.000 | exactly what we wanted.
00:35:13.000 | And the other day I was trying to ask Lachat
00:35:16.000 | to create a game for me.
00:35:19.000 | Let's see if we can make it work.
00:35:24.000 | Yeah, the Tetris game.
00:35:31.000 | Let's just get one row, maybe.
00:35:45.000 | Oh, no.
00:35:51.000 | Okay, never mind.
00:35:52.000 | You get the idea.
00:35:53.000 | I failed my mission.
00:35:58.000 | Okay, here we go.
00:36:04.000 | Cool.
00:36:05.000 | Yeah, so as you can see, Lachat can write, like,
00:36:09.000 | a code about a simple game pretty easily,
00:36:12.000 | and you can ask Lachat to explain the code,
00:36:15.000 | make updates, however you like.
00:36:19.000 | Another example.
00:36:21.000 | There is a bar here I want to move.
00:36:24.000 | Okay.
00:36:26.000 | Right, okay.
00:36:27.000 | And let's go back.
00:36:33.000 | Another one.
00:36:36.000 | Yeah, we also have web search capabilities,
00:36:38.000 | like you can ask what's the latest AI news.
00:36:42.000 | Image generation is pretty cool.
00:36:44.000 | Generate an image about researchers in Vancouver.
00:36:56.000 | Yeah, Black Forest Labs, FlexPro.
00:36:59.000 | Again, this is free, so.
00:37:03.000 | Oh, cool.
00:37:05.000 | I guess researchers here are mostly from University of British Columbia.
00:37:10.000 | That's smart.
00:37:11.000 | Yeah, so this is Lachat.
00:37:14.000 | Please feel free to use it and let me know if you have any feedback.
00:37:19.000 | We're always looking for improvement,
00:37:21.000 | and we're going to release a lot more powerful features in the coming years.
00:37:25.000 | Thank you.
00:37:26.000 | [APPLAUSE]