back to indexStanford CS25: V4 I Aligning Open Language Models
00:00:05.000 |
Today we're happy to have Nathan Lambert, a research scientist at the Allen Institute for AI, 00:00:13.000 |
who focuses on RLHF and the author of interconnects.ai. 00:00:19.000 |
He'll be presenting a really cool talk on aligning open language models today. 00:00:27.000 |
Yeah, thanks for the intro. Okay, this is a long time coming. 00:00:34.000 |
I think generally since ChatGPT you'll see a lot has obviously happened, 00:00:40.000 |
but I don't think it's been a blur for me as much as anyone else. 00:00:44.000 |
So kind of taking the time to retell what has happened in this kind of fine-tuning and alignment space 00:00:50.000 |
since ChatGPT happened is something that I thought was a worthy undertaking. 00:00:57.000 |
but it will probably give you a lot of context on why people are mentioning certain things 00:01:07.000 |
I don't know exactly if questions are going to come to me or if I will see it the whole time. 00:01:11.000 |
I think clarifying questions are good, maybe not discussions the whole time, 00:01:16.000 |
and I'll try to keep sure that there's time for questions at the end. 00:01:26.000 |
Generally, we're going to talk about language models. 00:01:28.000 |
It's what everyone wants to talk about these days. 00:01:30.000 |
I need to do some of the older history so that I can talk about recent history. 00:01:34.000 |
The place that I like to start is actually with Claude Shannon, 00:01:37.000 |
who kind of had this paper talking about approximating, arranging characters to create language models. 00:01:45.000 |
That's probably why Anthropic called their models Claude. 00:01:51.000 |
And a lot has happened since these very early papers on predicting sequences of text, 00:01:58.000 |
and this is largely built on this loss function, which is called the autoregressive loss function. 00:02:04.000 |
So if you kind of have this training example where you have something like I saw A, 00:02:08.000 |
and you're trying to predict what comes after this, 00:02:10.000 |
the whole idea is that there's going to be one correct token that has the correct label, 00:02:14.000 |
and their training loss is going to increase the probability of that token 00:02:17.000 |
and decrease the probability of everything else. 00:02:20.000 |
This very simple loss function, classifying which token to use and actually predict, 00:02:27.000 |
And this kind of took another turn in 2017 when this transformer paper was born. 00:02:35.000 |
It's a great exercise to actually dig into what the attention mechanism is doing, 00:02:47.000 |
ELMo was the earliest one, which was contextualized word embeddings. 00:02:50.000 |
In the same year, we also had GPT-1 and BERT released, 00:02:54.000 |
which is kind of the beginning of core ideas on which modern language models 00:03:00.000 |
And just getting these better models, training on large internet-scaled CORPA, 00:03:06.000 |
BERT was a classifier, GPT-1 was generating text, 00:03:09.000 |
and we kind of continue along these trends through the years. 00:03:13.000 |
GPT-2 is when we started learning about scaling laws. 00:03:16.000 |
And if you use orders of magnitude more compute, 00:03:19.000 |
the actual test loss will continue to decrease in a linear fashion with respect 00:03:25.000 |
These ideas now are commonplace when we talk about language models. 00:03:29.000 |
GPT-2 also pioneered a lot of discussions on releasing language models. 00:03:34.000 |
So GPT-2, when it was first announced, they were holding access back because 00:03:39.000 |
of the risks of language models, and this started a lot of the conversations 00:03:42.000 |
around what you should or should not release with language models. 00:03:46.000 |
They eventually actually released GPT-2, and you could download the models 00:03:50.000 |
on Hugging Face and use them, but this is where that kind of conversation 00:03:55.000 |
In 2020 is when language models really started to be noticeably good. 00:03:59.000 |
So GPT-3 is when a lot of people are like, "Whoa, this can actually do 00:04:03.000 |
really interesting things if I kind of create a really clever prompt, 00:04:06.000 |
figure out how to give it my information correctly." 00:04:09.000 |
And GPT-3 could do a ton of things with kind of this few-shot 00:04:13.000 |
or multi-shot learning, which is when you give it a few examples 00:04:16.000 |
in the prompt and then ask it to do another rendition of it. 00:04:20.000 |
And with this power came many harms, and this is kind of a discussion 00:04:24.000 |
of what are the risks of releasing language models, 00:04:30.000 |
Very important problems that kind of culminated in 2021 00:04:33.000 |
with the Stochastics Parrots paper, which is arguing about whether 00:04:38.000 |
or not language models can be too big is in the title, but it's really 00:04:42.000 |
a critique on how we should be thinking about language models, 00:04:46.000 |
what are the limits of them, are they actually doing the things, 00:04:49.000 |
like are they actually thinking or doing any of these human things, 00:04:52.000 |
or are they just kind of following patterns in the data? 00:04:56.000 |
And then just the year after, this is kind of like the tragedy 00:05:00.000 |
of Stochastic Parrots, as no one talks about it now, 00:05:03.000 |
is that ChattoobeeT came a year later and totally reshaped 00:05:07.000 |
the whole narrative around language models one more time. 00:05:10.000 |
And this is really where we start today's talk, is like, 00:05:13.000 |
how does this idea of alignment emerge in ChattoobeeT, 00:05:20.000 |
So the question that I ask myself is like, or I tell a lot of people is, 00:05:27.000 |
And what we saw in the release day, so if you go back and read 00:05:30.000 |
the actual OpenAI blog about RLHF, they list all these limitations, 00:05:35.000 |
but they say that RLHF was an important tool to launching ChattoobeeT. 00:05:38.000 |
And the limitations that they list are really the things that we're 00:05:41.000 |
still researching and that we're talking about in this talk. 00:05:43.000 |
It's a great blog post to go back to, but a good way to frame it 00:05:46.000 |
is that RLHF seems to be necessary, but it's not sufficient. 00:05:50.000 |
You can't do something like ChattoobeeT or Gemini or Claude 00:05:53.000 |
with a technique that -- without something like RLHF. 00:05:56.000 |
But it's not the thing -- like, pre-training is still most of the work, 00:06:00.000 |
but the fact that RLHF is needed is really important to kind of 00:06:03.000 |
contextualize all these improvements that we've seen in the open 00:06:08.000 |
Some examples that I like to cite on RLHF being relied upon, 00:06:12.000 |
you can list many more models here than I have. 00:06:15.000 |
This kind of -- this figure from Anthropics Constitutional AI paper 00:06:19.000 |
is the single one that I go back to all the time, 00:06:22.000 |
showing how just kind of using RLHF can get these more desirable behaviors 00:06:29.000 |
So these kind of ELO measurements aren't kind of calibrated, 00:06:33.000 |
so we don't know how to compare LLAMA3 on this chart compared 00:06:37.000 |
to Anthropics models, but the level of investment that Anthropics has had 00:06:41.000 |
in these kind of techniques and showing this kind of wide-ranging 00:06:44.000 |
improvements of their models with RLHF is a kind of flag that we can follow 00:06:49.000 |
to try to learn how to do alignment with this much precision 00:06:52.000 |
and with this much kind of impact as places like Anthropic 00:06:57.000 |
One such example is just a simple quote from the LLAMA2 paper, 00:07:01.000 |
which is kind of like the colloquial way of reading this quote, 00:07:06.000 |
which I will read, is that, "Whoa, RLHF worked really easily." 00:07:09.000 |
And what the quote is is, "Meanwhile, reinforcement learning, 00:07:13.000 |
known for its instability, seemed a somewhat shadowy field for those 00:07:19.000 |
However, reinforcement learning proved highly effective, 00:07:21.000 |
particularly given its cost and time effectiveness." 00:07:24.000 |
So this is one of the biggest endorsements of RLHF, 00:07:27.000 |
and it's always fun for me because I came from the RL side 00:07:31.000 |
But for NLP researchers to say these things, like, yes, 00:07:34.000 |
reinforcement learning is known for instability, and given that it is 00:07:38.000 |
cost-effective and time-effective for an RL person, that's shocking. 00:07:42.000 |
It's like RL has never been particularly cost- 00:07:45.000 |
and time-effective, but for in these language model domain, 00:07:48.000 |
where we're fine-tuning with it rather than learning from scratch, 00:07:51.000 |
to have people in NLP that are saying this is just really striking 00:07:58.000 |
And the timeline of alignment and open alignment is really like, 00:08:03.000 |
Like, these benefits didn't show up in models that people were playing 00:08:08.000 |
So this is kind of a little atlas that I've thrown together. 00:08:11.000 |
I also made a hugging face collection where I tried to add all the models 00:08:15.000 |
that I talk about to it, so you can actually click on the models 00:08:17.000 |
or try to use them if you're so inclined of actually running 00:08:22.000 |
It's just kind of another way of documenting the artifacts 00:08:25.000 |
that I talk about and the artifacts that, for me, this is a good review. 00:08:30.000 |
What mattered in this really noisy journey in the last year? 00:08:38.000 |
This little plot of model icons could probably look more like an exponential 00:08:46.000 |
And there's so much history of NLP that people are building on 00:08:50.000 |
in the alignment space that is totally swept under the rug here. 00:08:55.000 |
A lot of academic and infrastructure contributions that I'm not talking 00:08:59.000 |
about but are really important to kind of this proliferation 00:09:04.000 |
So just kind of describing what this image that I have here is. 00:09:09.000 |
To kind of summarize, some of these are base models. 00:09:14.000 |
I'm not going to focus on base models as much as fine-tuned models. 00:09:24.000 |
The base models are the bedrock of this ecosystem. 00:09:28.000 |
And then the alignment models are what people -- 00:09:31.000 |
the aligned models are a lot of times what people can play with 00:09:34.000 |
and what you could try out, what you could do yourself 00:09:36.000 |
on much less computing infrastructure and all these things. 00:09:39.000 |
So I'm going to talk more about the aligned models, 00:09:47.000 |
Another thing that's not fun but I'm going to do for the sake of kind 00:09:51.000 |
of flag-posting, no one really likes listening to definitions. 00:09:55.000 |
Here are some things that you'll hear thrown around. 00:09:58.000 |
This isn't even all of them when talking about "alignment." 00:10:01.000 |
Here, alignment I've defined as a general notion of training a model 00:10:05.000 |
to mirror a user's desires, really with any loss function. 00:10:10.000 |
So there's a difference between instruction fine-tuning 00:10:15.000 |
Instruction fine-tuning is about trying to get a model 00:10:18.000 |
that will respond to queries, format, and instructions, 00:10:21.000 |
while supervised fine-tuning is more about learning 00:10:31.000 |
And then two kind of more ones I need to touch on, 00:10:35.000 |
and we could go on even longer, is reinforcement learning 00:10:41.000 |
It's a specific tool for aligning ML models to human data. 00:10:46.000 |
It's kind of a class of tools, so it has some sort of -- 00:10:49.000 |
you learn a preference model and then you extract information from it. 00:10:52.000 |
So there are so many different ways to do it. 00:10:55.000 |
And then there's a term that I'm kind of trying to grow, 00:11:03.000 |
but there's the question of how do we differentiate something 00:11:08.000 |
which doesn't use an RL optimizer, from all of RLHF. 00:11:14.000 |
but it's good to have some common rounds to build on 00:11:17.000 |
because I might be going through some of these things pretty quickly. 00:11:30.000 |
because it's really tapping into a lot of different personal stories. 00:11:34.000 |
It's hard to retell how crazy things were when ChatGPT dropped. 00:11:43.000 |
but there was a lot of uncertainty on what the future held, 00:11:48.000 |
especially -- it was clear that language models were important, 00:11:52.000 |
but it is not clear -- there's a lot of articles on like -- 00:11:55.000 |
titled "We're Going to Reproduce Open ChatGPT," 00:12:09.000 |
But there's so much excitement that everyone is saying 00:12:13.000 |
and trying to figure out the right coalitions for actually doing so. 00:12:24.000 |
What is the difference between a dialogue agent 00:12:30.000 |
And everything kind of follows from here with what people are building. 00:12:34.000 |
But personally, I just remember multiple meetings 00:12:37.000 |
where people were like, "Yeah, you should do it. 00:12:40.000 |
And when you look back, that goal is just so wild 00:12:45.000 |
"We need to build this thing into open source." 00:12:50.000 |
because you can't open source a whole system that way. 00:12:59.000 |
which is when things start to get grounded in actual models. 00:13:02.000 |
So the first Llama Suite was released, I think, in February. 00:13:08.000 |
And then these instruction-tuned models started to show up 00:13:13.000 |
The first one to really crack the narrative was this Alpaca model. 00:13:17.000 |
And it did a bunch of things that still are used today. 00:13:21.000 |
So this was trained on 52,000 self-instruct style data 00:13:30.000 |
But this wasn't even data generated from ChatGPT. 00:13:33.000 |
It was generated from one of OpenAI's API models. 00:13:39.000 |
this is all on how to apply instruction fine-tuning. 00:13:42.000 |
And this is this thing I mentioned on the definition slide. 00:13:47.000 |
that will respond to specific styles of inputs. 00:13:58.000 |
So you want the model to know it is an agent. 00:14:04.000 |
Excuse me, you can do this in the system prompt, 00:14:12.000 |
we make the model capable of having these behaviors. 00:14:24.000 |
you continue training with this autoregressive loss function 00:14:31.000 |
And then the language model will predict an answer. 00:14:39.000 |
But what made Alpaca and a lot of these early models, 00:14:43.000 |
and even today, really popular and accessible, 00:14:56.000 |
Self-instruct was a paper from Allen AI and UW in 2022, 00:15:01.000 |
before ChatGPT, where essentially the idea is, 00:15:09.000 |
this training data for fine-tuning a language model, 00:15:21.000 |
And then what we now see as more common practice today, 00:15:28.000 |
create a list of prompts that are similar to this, 00:15:41.000 |
is a really big list of question-answer pairs, 00:15:45.000 |
but you don't need to go through the bottleneck 00:15:47.000 |
of getting humans to sit down and write all of them. 00:15:49.000 |
So what Alpaca was really, why Alpaca worked, 00:15:59.000 |
is from the Alpaca paper or blog post, one of the two. 00:16:11.000 |
and then they ended up with over 50,000 tasks, 00:16:17.000 |
And then what they did is they took these meta-weights 00:16:21.000 |
that had just come out and they instruction fine-tuned them, 00:16:28.000 |
This is a pattern that we've seen many times with Alpaca, 00:16:33.000 |
you generate some data from a stronger language model 00:16:39.000 |
but this was the first model to actually release this. 00:16:48.000 |
and stuff like this, so thanks for asking them, 00:16:55.000 |
it felt like there was a new model every week. 00:16:59.000 |
and really what they changed was they added new sources 00:17:10.000 |
They also introduced the idea of LLM as a judge, 00:17:13.000 |
which is now obvious from a lot of their later evaluation work. 00:17:17.000 |
But let's talk about why shared GPT was so interesting. 00:17:33.000 |
that were similar to what people were asking chat GPT. 00:17:42.000 |
and it would let you share your prompts from chat GPT 00:17:48.000 |
So it was making it easier to share the prompts 00:17:51.000 |
in your conversations before OpenAI made a tool to do this, 00:17:55.000 |
and now there's this legal gray area over the data set 00:18:00.000 |
because most of these data sets are unlicensed, 00:18:02.000 |
and they were kind of created without consent 00:18:08.000 |
of whether or not people should be training on this data, 00:18:11.000 |
but the fact of the matter is that shared GPT 00:18:13.000 |
was really important to this kind of acceleration 00:18:18.000 |
because the diversity of data is just so much stronger 00:18:27.000 |
It's only today and in the last, like, few months 00:18:30.000 |
or six months for some of them that we're getting data sets 00:18:44.000 |
and then a project from the Allen Institute of AI, 00:18:50.000 |
but the users were given consent at the start 00:18:53.000 |
that their data was going to be collected and released 00:18:55.000 |
in exchange for using a language model for free. 00:18:58.000 |
So there's a lot of happenstance in the story 00:19:02.000 |
where something like this, which is legally gray, 00:19:09.000 |
where these little things helped enable the ecosystem, 00:19:14.000 |
"Oh, we don't know if that should have happened." 00:19:26.000 |
if you look at the time frames, it's pretty obvious 00:19:28.000 |
that a lot of these were developed concurrently, 00:19:36.000 |
kind of a different diverse set of data sets. 00:19:44.000 |
They also used enthropic data that has been released, 00:19:46.000 |
and they had some human evaluation from grad students. 00:19:51.000 |
and the evaluations weren't necessarily better, 00:19:54.000 |
but it was an important model that a lot of people noticed 00:20:03.000 |
Something they might ask looking at these slides 00:20:16.000 |
and it was distributed to researchers upon request, 00:20:20.000 |
and the license prohibited people from updating 00:20:29.000 |
and then you had to run a script to convert it 00:20:36.000 |
So this was kind of a really frustrating phase 00:20:49.000 |
and we now today see different license restrictions 00:20:57.000 |
essentially, if I fine-tune a model for my research 00:21:12.000 |
but there's always been restrictions on using LlamaWeights 00:21:18.000 |
And the final model that I kind of group into this batch 00:21:24.000 |
So Dolly was fine-tuned from a different base model. 00:21:27.000 |
It was fine-tuned from the Pythia models from Eleuther, 00:21:29.000 |
which are a suite of early scaling experiments 00:21:32.000 |
from Eleuther AI, which is still used extensively. 00:21:36.000 |
But they added some human-written data to the loop, 00:21:40.000 |
because almost all the projects that I'll mention today 00:21:43.000 |
talk about synthetic data or data derived from OpenAI, 00:21:48.000 |
that actually added new human data to the loop, 00:21:50.000 |
and this is what everyone remembered Dolly for. 00:21:57.000 |
which is trained in a time where this type of inference 00:22:10.000 |
where we're going to start with different model sizes 00:22:14.000 |
I'll talk about what MTBench is in a few slides. 00:22:31.000 |
how the scores continue to progress over time 00:22:35.000 |
as the community gets better at these things. 00:22:43.000 |
Probably still the single busiest human coordination project 00:22:52.000 |
I think it's easy now, if you get into fine-tuning, 00:23:01.000 |
to the process of alignment in this whole summer, 00:23:06.000 |
So essentially, there's this quote on the top, 00:23:14.000 |
and these kind of like human-written responses 00:23:45.000 |
the first majorly successful project of the era, 00:23:54.000 |
It's like really one of the most important things 00:24:04.000 |
but on April 28th of 2023, typo on the slide, 00:24:13.000 |
which looks now like the style of training models, 00:24:17.000 |
except for the dataset, which is now popular. 00:24:24.000 |
They had some human evaluations that were solid. 00:24:52.000 |
This is the last slide of this kind of first chapter 00:24:55.000 |
on instruction tuning, was the idea of QLORA, 00:24:58.000 |
which was kind of unlocked a whole new bunch of players 00:25:12.000 |
which is the idea of you can freeze some model -- 00:25:23.000 |
You'd use the same approach of instruction data 00:25:26.000 |
with question-answering, but it takes much less memory. 00:25:32.000 |
by adding very specific quantization and GPU tricks 00:25:40.000 |
Tim Detmers and team also released this Guanaco model 00:25:49.000 |
I have a few more slides on it, on the method. 00:25:51.000 |
So you can kind of see on the right this difference, 00:25:56.000 |
They look similar, where LORA, you have fewer parameters, 00:26:08.000 |
So this is an approximation of if you're fine-tuning 00:26:15.000 |
with full fine-tuning, different amount of bits, 00:26:18.000 |
but full fine-tuning versus LORA versus QLORA. 00:26:24.000 |
one A100 GPU has about 80 gigabytes of memory, 00:26:39.000 |
to actually get the ability to fine-tune models at the 7 00:26:48.000 |
And like Guanaco did this, and they released 33 billion 00:26:55.000 |
which were clear steps up in the kind of state 00:26:59.000 |
And they also figured out ways to filter this Open Assistant 00:27:10.000 |
I'm going to kind of pause and skim through the questions 00:27:14.000 |
and if not, I'll save the relevant ones for later. 00:27:20.000 |
They're great questions, and I appreciate them, 00:27:31.000 |
where it seemed like things were a little bit slower 00:27:37.000 |
at a lot of the things that came out of this time, 00:27:42.000 |
Everyone read it, but we didn't know what to do with it yet, 00:27:44.000 |
and the new evaluations are still really used. 00:27:48.000 |
Transitioning in, setting the scene for being 00:27:54.000 |
were continuing to try to build on these LORA methods 00:27:58.000 |
I remember a lot of excitement at Hugging Face 00:28:03.000 |
where we could do RLHF on 7 billion parameter models, 00:28:09.000 |
It was really cool to see the loss going down. 00:28:12.000 |
It was great to bring more people into the space, 00:28:15.000 |
but weeks and weeks would go by, and you're like, 00:28:21.000 |
in the blog post and trained a really good model with it?" 00:28:24.000 |
And the kind of consensus now is that these LORA methods 00:28:32.000 |
in how you use them or how the gradients flow 00:28:35.000 |
that make it much, much harder to get a really good model out. 00:28:41.000 |
such that LORA is your only option, definitely use it, 00:28:46.000 |
figuring out how to scale is normally a better solution 00:28:49.000 |
than just using something like LORA that fits 00:28:55.000 |
Another defining moment of this era was the LLAMA2 backlash. 00:29:02.000 |
which is like the famous line was people asked LLAMA 00:29:05.000 |
how to kill a Python process, and it would say no, 00:29:08.000 |
and this really started a whole bunch of new discussions 00:29:18.000 |
Here's an example from a paper for a safety evaluation test set 00:29:27.000 |
or should they follow the instructions that I want?" 00:29:32.000 |
It'll differ by organization. It'll differ by individual. 00:29:35.000 |
And this is the point where this became very serious 00:29:39.000 |
and something that people actually had to reckon with 00:29:41.000 |
because there were models that were actively disagree-- 00:29:45.000 |
people were really disagreeing with this specific take. 00:29:51.000 |
but one of the things it led to is this idea of uncensored models. 00:29:55.000 |
It's a really popular category on kind of a hugging face right now 00:30:03.000 |
So if we're using synthetic data and I ask a language model a question, 00:30:09.000 |
it's going to say, "I'm sorry. I'm a language model. 00:30:13.000 |
And the idea of uncensored models is to remove those points from our kind of-- 00:30:19.000 |
remove those points from our fine-tuning data set. 00:30:22.000 |
I think there's a lot of confusion over the name 00:30:26.000 |
at this stage really aren't censored to begin with, 00:30:31.000 |
and the method for creating these data sets needed more filtering 00:30:35.000 |
or they needed some way of becoming unbiased. 00:30:38.000 |
So like there's a lot of people now that only build models 00:30:42.000 |
to try to make them unbiased against any sort of refusal. 00:30:45.000 |
A refusal is when you ask a language model something and it says no. 00:30:48.000 |
And this goes on today, and this came out of this LLAMA2 thing. 00:30:56.000 |
where there's a lot of good, solid models being trained, 00:31:00.000 |
but either they didn't have a lot of documentation, 00:31:02.000 |
they didn't have the right release team to splash as big as they should have, 00:31:06.000 |
the methods were complicated to implement, or something like this. 00:31:10.000 |
So I could run through these, and I remember all these models coming out, 00:31:14.000 |
but none of them were really things that are household names like Alpaca is today. 00:31:22.000 |
where they created this method called InvolInstruct, 00:31:26.000 |
All these things were clearly working for them 00:31:32.000 |
but for whatever reason, the narrative wasn't actually changed. 00:31:36.000 |
There's some new datasets, like UltraLM is from OpenBMB in China 00:31:42.000 |
that is releasing new datasets, more people training on shared GPT. 00:31:46.000 |
The model called XWinLM was the first one to be a similar ballpark, 00:31:51.000 |
and it's also trained with RLHF, so not just that Carper model. 00:31:56.000 |
But for whatever reason, these didn't really splash. 00:32:00.000 |
And that was this kind of summer after LLAMA2 00:32:06.000 |
but the narrative wasn't changing all that much, 00:32:09.000 |
at least from my perspective, but that's why I'm here. 00:32:16.000 |
while the models weren't seeming that different, 00:32:22.000 |
that ended up kind of being the standard of today. 00:32:25.000 |
So you can see the dates here, so May 3rd, ChatBot Arena, 00:32:31.000 |
Sometime in early July, the OpenLLM Leaderboard. 00:32:34.000 |
All of these things were created about the same time, 00:32:37.000 |
where there's a desperate need to get some sort of signal 00:32:40.000 |
on what our fine-tuned models are doing in the open. 00:32:43.000 |
Like, we don't have the capability of paying humans 00:32:46.000 |
to compare our responses like they do at Anthropic, 00:32:49.000 |
where they're always trying new models on humans. 00:32:53.000 |
We need something that you could sit down as an engineer 00:33:04.000 |
but it's important to take this from the perspective 00:33:06.000 |
of what can I use when I'm trying to align models, 00:33:11.000 |
versus what is kind of this long-term signal. 00:33:19.000 |
as something that is defining corporate strategy, 00:33:25.000 |
as defining the biggest language model players. 00:33:31.000 |
But if I'm an engineer, A, many small providers 00:33:40.000 |
it used to take weeks to get your models rating, 00:34:04.000 |
but I'll just kind of keep rolling through this. 00:34:07.000 |
AlpacaEval is the idea of you have a list of prompts 00:34:10.000 |
that you compare to a strong other base model, 00:34:19.000 |
and then you ask a language model which is better. 00:34:28.000 |
So data sets from OpenAssistant, Vicuna, Koala, Anthropic. 00:34:32.000 |
Like, all these data sets that people have been using, 00:34:48.000 |
to provide a rating is going to have some ceiling 00:34:51.000 |
where we don't know how to compare two really good models. 00:35:09.000 |
and it's not clear how to interpret these top results. 00:35:12.000 |
So this is an older screenshot of Leaderboard, 00:35:14.000 |
but what does beating a model 95% of the time 00:35:21.000 |
That's the questions that we can't really answer 00:35:25.000 |
AlpacaEval 2 came out, which takes steps to this, 00:35:28.000 |
where it compares to GPT-4 rather than DaVinci 3. 00:35:49.000 |
And we need to get more specific in our evaluations 00:35:52.000 |
because I don't really know if I care too much 00:36:00.000 |
And this is the opaqueness of all of our evaluations. 00:36:05.000 |
where we don't know what an increase in score means. 00:36:27.000 |
I generate the completion to 80 diverse prompts, 00:37:14.000 |
if MTBench and AlpacaEval have really low scores." 00:37:30.000 |
In pre-training, we have, like, MMLU and HelloSwag 00:37:36.000 |
And if you get, like, a 2% improvement on average, 01:01:34.000 |
but you probably need to be good at engineering 01:01:48.000 |
I don't know if people can talk via microphone, 01:01:52.000 |
but I'm just going to keep talking to myself. 01:01:58.000 |
There's a question around the future of alignment, 01:02:00.000 |
given simple methods can circumvent fine-tuning. 01:02:06.000 |
like safety is not the only thing that matters. 01:02:13.000 |
So how much RLHF improves the user experience 01:02:16.000 |
and how much it improves code and math abilities. 01:02:48.000 |
Llama3 said that they use instruction fine-tuning, 01:02:57.000 |
I don't know how they're using all of these things, 01:02:59.000 |
but I think they're shifting the abilities incrementally 01:03:01.000 |
to provide nice initialization for the next method 01:03:40.000 |
and probably get less boosts from alignment training 01:03:43.000 |
if there's not this kind of general improvement 01:03:54.000 |
Does anyone have some in-person questions to ask Nathan? 01:04:14.000 |
and where do you see them having the most impact? 01:04:20.000 |
This is one of the things that I'm excited about, 01:04:24.000 |
Like, I'm not particularly ideologically aligned 01:04:28.000 |
with like the effective accelerationist stuff, 01:04:33.000 |
to create a language model that they like to use, 01:04:44.000 |
So it's like academics aren't used to looking there, 01:05:15.000 |
and you'll never be able to keep track of everything, 01:05:29.000 |
but industry is also fun if you want to do a startup. 01:05:33.000 |
You just have to think about what you want to do. 01:05:54.000 |
- I haven't seen it be particularly successful, 01:06:10.000 |
So the fact that it's been around for so long 01:06:25.000 |
You mentioned GPT-4 being used as an evaluation method, 01:06:37.000 |
I mean, this is why it's nice to have human evaluation, 01:06:44.000 |
from reading LLAMA-3 stuff and giving this lecture, 01:06:49.000 |
is how to disambiguate various biases in evaluation 01:07:04.000 |
For stuff like LLAMA-3 training on so many tokens, 01:07:08.000 |
would that actually make it harder to align this model 01:07:20.000 |
but every model will have kind of a different point 01:07:26.000 |
so that's why you'll need a different learning rate 01:07:30.000 |
so you will need a different kind of way of continuing it, 01:07:40.000 |
I mean, I don't even have an intuition for it, 01:07:42.000 |
just to know that I have bought this thing in the past 01:07:51.000 |
It's just that there's more information into the model, 01:07:57.000 |
It just takes more and more data to get marginal improvements, 01:08:00.000 |
so Meadow is willing to invest more money into the model 01:08:11.000 |
Do you think synthetic data generation, like Cosmopedia, 01:08:23.000 |
I also think it's a good way to get around the fact 01:08:25.000 |
that Google is paying Reddit $60 million a year 01:08:31.000 |
to use their data so that we can no longer train 01:08:35.000 |
I think that Cosmopedia and synthetic data sets 01:08:41.000 |
and there are rumors that industry is doing something similar. 01:09:07.000 |
- It's mostly like it ends up extracting more from the data, 01:09:12.000 |
so it's like the benchmarks end up being a little bit better 01:09:15.000 |
if we get it set up correctly with the same starting point. 01:09:20.000 |
It's like you choose a set of evaluations that you care about 01:09:23.000 |
and you look at them, and through fine-tuning, 01:09:26.000 |
it's primarily a group of great grad students doing this. 01:09:29.000 |
It's just running a ton of models and trainings, 01:09:32.000 |
and they're seeing that PPO reliably can be doing 01:09:34.000 |
a little bit better, and it's like this is the fine margins 01:09:46.000 |
Do you foresee a better evaluation method to be determined 01:09:54.000 |
which means rule-based metrics are dead forever? 01:10:03.000 |
This is becoming philosophical, which is like I'm trying 01:10:05.000 |
not to say no to things in the language model space 01:10:10.000 |
It's like I should try not to bet against progress continuing. 01:10:17.000 |
and it's like at multiple stages in the last few months 01:10:22.000 |
So it's like if you just assume that things will get better 01:10:24.000 |
and they will work, it's like just makes it a little bit easier 01:10:47.000 |
of squashing specific parts of this distribution 01:10:58.000 |
- Yeah, I think that's phrased generally enough 01:11:02.000 |
Alignment is about changing the distribution, 01:11:12.000 |
It can be these kind of multi-string different things 01:11:23.000 |
- Here's one from -- how do you envision the usage 01:11:26.000 |
of watermarks for both open and closed language marks? 01:11:32.000 |
- I think it a lot of times feels like a losing battle. 01:11:35.000 |
I think that a practical solution in the future 01:11:38.000 |
is that if you want to prove something that is human-made, 01:11:42.000 |
you can prove that it was generated by a human 01:11:44.000 |
by having a certain tool rather than trying to understand 01:11:52.000 |
So the assumption will be that all content was made 01:11:57.000 |
It's not what I would consider a sociologically good answer. 01:12:11.000 |
feel free to send them over to me on the Zoom chat. 01:12:16.000 |
- Yeah, that was much better than me half-reading the question. 01:12:24.000 |
What are your thoughts on different optimization functions 01:12:27.000 |
to train large-language models rather than using MLE? 01:12:31.000 |
What could be good research directions there? 01:12:37.000 |
- I think this is the whole idea of what RLHF represents. 01:12:41.000 |
And that's why, like, if you ask people who have been in NLP longer, 01:12:44.000 |
one of the most compelling arguments for RLHF for me 01:12:47.000 |
is, like, you now have extreme flexibility on the loss function 01:12:51.000 |
while we were kind of limited on what our regressive losses could do. 01:12:54.000 |
So there's kind of arguments that it's, like, why is there any limit 01:12:57.000 |
if we could just keep doing more and more tokens of RL training? 01:13:00.000 |
It's a really, like, general framing, but, like, RL's loss function, 01:13:05.000 |
you make it so that the training of a language model 01:13:07.000 |
can incorporate many different things, and that's very exciting. 01:13:11.000 |
That could be, like, the 10-year goal of RLHF. 01:13:16.000 |
- To what extent is training on adversarial data effective 01:13:21.000 |
for defending against crescendo and other simple multi-turn attacks? 01:13:27.000 |
- I haven't spent as much time on safety as I would want to, 01:13:29.000 |
but I think that it's, like, it'll be this everlasting dance 01:13:33.000 |
where if you have example data, you can defend against it, 01:13:35.000 |
but it will not be impossible to generate new data. 01:13:38.000 |
So it mostly comes down to the use case that you're looking at protecting. 01:13:42.000 |
So if you want to protect something really important, 01:13:44.000 |
you need to have layers on that that are not just sensitive 01:13:47.000 |
to a new prompting technique, but, like, limit what the model can do. 01:13:50.000 |
That's kind of--it's, like, a use-focused theme, 01:13:53.000 |
while the kind of whole, like, security is a very complicated thing otherwise. 01:14:06.000 |
Do you see potential in quantization methods such as BitNet, like 1.58 bit? 01:14:12.000 |
If so, do you think BitNet will become popular? 01:14:17.000 |
- I have no idea. I wouldn't--this is what I mean. 01:14:20.000 |
It's like, okay, sounds cool. Wouldn't rule it out. 01:14:27.000 |
- You think there's a need or a way to control large-scale data extraction 01:14:38.000 |
- I do think there's a lot of wills and a lot of ways to explore 01:14:41.000 |
making the synthetic data better. I think it's very early. 01:14:44.000 |
I have a project that's going on it, and it is one of the few ways 01:14:48.000 |
that can generate more tokens, which is, like-- 01:14:51.000 |
like, people are actually running out of tokens, 01:14:53.000 |
especially if you try not to train on things that you're not supposed to train on. 01:14:56.000 |
It's, like, then you can just generate more data, 01:14:59.000 |
and as we've seen with LLAMA, if you have the compute, more data will help you. 01:15:12.000 |
Any chance you can kind of expand upon or share your opinions 01:15:15.000 |
on self-play-like things like OpenAI super alignment work? 01:15:22.000 |
- I think people will keep using language models in the loop of training 01:15:25.000 |
other language models, but it's a kind of broad field 01:15:29.000 |
that doesn't have full agreement on how to do it. 01:15:37.000 |
- Okay, great. And I think we're pretty much out of time, 01:15:39.000 |
so if folks want to get in touch or have more questions, 01:15:46.000 |
- Okay, great. But, yeah, thanks so much again for taking the time 01:15:50.000 |
and giving us such a great talk. So, yeah, give it up for Nathan. 01:15:56.000 |
- Thanks, everyone. - And I think the slides, 01:15:57.000 |
as well as the Hugging Face collection, are all posted on our website 01:16:00.000 |
as well as Discord, so in case anybody wants to follow along. 01:16:11.000 |
- Yeah, no worries. Thanks, everyone. - See everyone soon.