back to indexBuilding State of the Art Open Weights Tool Use: The Command R Family: Sandra Kublik

Chapters
0:0 Introduction
2:11 Community Response
4:18 The Journey
6:32 Post Training
8:20 Open Source UI
10:22 Optimizing Tool Use
11:1 SingleStep vs MultiStep
12:14 MultiStep API
00:00:33.660 |
It's competitive against GPT-4 Turbo, Cloud Opus, 00:01:11.660 |
to push the boundaries of what's possible with LLMs, 00:02:11.460 |
Your response, as a community using the model, 00:02:27.600 |
a hundred and fifty thousand times from Hanging Face, 00:02:34.200 |
Folks at Hanging Face actually liked the model so much, 00:03:37.640 |
and a new generative search demo by Complexity. 00:03:52.340 |
is two Discord server bots that are powering. 00:04:17.060 |
So I want to share the journey of building the R models, 00:04:28.180 |
and to show you that we've committed ourselves 00:05:00.280 |
and when you want to use the model in the Rack context, 00:05:04.600 |
you need to prompt it to not only look for the information, 00:05:12.160 |
and know how to differentiate between the conversation history 00:05:21.160 |
Another problem is overcoming models' natural bias 00:05:27.060 |
results towards focusing on the beginning of the document. 00:05:39.060 |
that are really showing the problem of models 00:05:54.000 |
Another challenge is steering an ongoing battle 00:06:09.520 |
you want the model to be able to tap into the knowledge 00:06:22.620 |
about who is the current president of the United States. 00:06:36.040 |
we've been able to optimize the model behavior 00:06:41.940 |
and to decide when the external information is needed 00:06:49.040 |
Sometimes the pre-trained knowledge is enough. 00:06:56.700 |
to be able to run search queries successfully, 00:07:17.560 |
so that you don't really have to think about it. 00:08:04.620 |
but powerful enough to cover a lot of your use cases. 00:08:09.820 |
They have a great balance of token efficiency, 00:08:16.580 |
normally you would have to line up a big pipeline of LLMs. 00:08:20.280 |
We've also heard from you that creating a UX and UI 00:08:31.620 |
It's not a small feat, and we know it first-hand 00:08:35.380 |
because we've spent considerable amount of time working on it ourselves. 00:08:49.660 |
So you're able to have a conversation history. 00:08:59.380 |
You're able to plug it into different types of tools. 00:09:05.880 |
and knowing how much you're struggling either way, 00:09:08.840 |
we decided that it's going to be a good idea to open source the UI, 00:09:21.380 |
and you can now load it and start building with it. 00:09:29.760 |
It has plug-and-play components and source code 00:09:32.880 |
for an interface app that we've built with Next.js. 00:09:36.720 |
It has a small SQL database for conversation history. 00:09:43.720 |
which lets you customize how you're accessing command R models. 00:09:58.720 |
and here you can customize access to tools and data sources. 00:10:10.720 |
It has document upload, and it's using web search, 00:10:15.720 |
but honestly, you can add support for any tools 00:10:18.720 |
and any data sources that you're interested in. 00:10:20.720 |
Lately, we've been focused on optimizing tool use, 00:10:32.720 |
It's kind of extension of this Rack formula I mentioned earlier, 00:10:36.720 |
where we began by training the models to be really good 00:10:43.720 |
and then it naturally progressed into broader tool use. 00:10:55.720 |
that's kind of our ideal scenario that we're working towards. 00:11:06.720 |
It's really useful for situations where you have a single action to be performed, 00:11:15.720 |
It could be searching for documents or sending out an email. 00:11:19.720 |
Multistep, on the other hand, it's really good for scenarios where you have to carry out a sequence of actions, 00:11:31.720 |
with each action building on top of the previous ones. 00:11:34.720 |
So, in the same example, it would be searching for that document, 00:11:40.720 |
being able to compare it against another document, 00:11:56.720 |
you want the system to be able to reflect and correct errors, 00:12:02.720 |
And we are teaching the models to retrieve the information 00:12:06.720 |
many times over from these different data sources. 00:12:16.720 |
Most of the time when people use the term agents and multistep, 00:12:22.720 |
It's essentially a scenario where software is performing a sequence of actions, 00:12:27.720 |
with each action building on the previous steps. 00:12:30.720 |
Last week, we released multistep API, super hyped about it. 00:12:40.720 |
And so, all you need to do is you need to describe the tools that the model has on their hands, 00:12:45.720 |
what these tools do, and then some parameters. 00:12:50.720 |
After user request is made, the model is going to create a plan. 00:12:55.720 |
And it's going to figure out how to use these tools to fulfill the user request. 00:13:01.720 |
And once it calls each tool, it's going to reflect on the contents, 00:13:06.720 |
and it's going to adapt the initial plan if it's necessary. 00:13:09.720 |
So, for example, if the model is calling an API and it returns an error, 00:13:14.720 |
it's going to automatically retry calling it again and coming up with a new plan. 00:13:20.720 |
We've outlined this behavior in this huge multistep preamble. 00:13:28.720 |
Essentially, it's a massive prompt that explains the model what it needs to do in order to get the job done. 00:13:41.720 |
We've trained command R and R+ to generate claims that are verifiable through citations. 00:13:48.720 |
And again, big on citations, we really believe that when you can explain which tool has been used by the model for each response, 00:13:59.720 |
it's going to make a difference and it's going to make the system better. 00:14:04.720 |
Command R+ has competitive performance to plot OPPOS, GPT-4 Turbo, but it is three to five times cheaper. 00:14:14.720 |
So that's a massive difference when it comes to scalability and being able to use it in production. 00:14:20.720 |
We test the R family on standard complex reasoning benchmarks and command R+ is close to or on par with GPT-4 Turbo. 00:14:36.720 |
We're going to keep hammering on the multistep.