back to index

Stanford CS25: V4 I Aligning Open Language Models


Whisper Transcript | Transcript Only Page

00:00:00.000 | [silence]
00:00:05.000 | Today we're happy to have Nathan Lambert, a research scientist at the Allen Institute for AI,
00:00:13.000 | who focuses on RLHF and the author of interconnects.ai.
00:00:19.000 | He'll be presenting a really cool talk on aligning open language models today.
00:00:24.000 | So thank you for joining us, Nathan.
00:00:27.000 | Yeah, thanks for the intro. Okay, this is a long time coming.
00:00:32.000 | This is a brief intro on why I'm doing this.
00:00:34.000 | I think generally since ChatGPT you'll see a lot has obviously happened,
00:00:40.000 | but I don't think it's been a blur for me as much as anyone else.
00:00:44.000 | So kind of taking the time to retell what has happened in this kind of fine-tuning and alignment space
00:00:50.000 | since ChatGPT happened is something that I thought was a worthy undertaking.
00:00:55.000 | So this is not really a one-on-one lecture,
00:00:57.000 | but it will probably give you a lot of context on why people are mentioning certain things
00:01:01.000 | and what still matters and what does not.
00:01:03.000 | So hopefully this is fun.
00:01:06.000 | I can see the chat.
00:01:07.000 | I don't know exactly if questions are going to come to me or if I will see it the whole time.
00:01:11.000 | I think clarifying questions are good, maybe not discussions the whole time,
00:01:16.000 | and I'll try to keep sure that there's time for questions at the end.
00:01:20.000 | So let's get into it.
00:01:26.000 | Generally, we're going to talk about language models.
00:01:28.000 | It's what everyone wants to talk about these days.
00:01:30.000 | I need to do some of the older history so that I can talk about recent history.
00:01:34.000 | The place that I like to start is actually with Claude Shannon,
00:01:37.000 | who kind of had this paper talking about approximating, arranging characters to create language models.
00:01:45.000 | That's probably why Anthropic called their models Claude.
00:01:48.000 | That's pretty well known.
00:01:51.000 | And a lot has happened since these very early papers on predicting sequences of text,
00:01:58.000 | and this is largely built on this loss function, which is called the autoregressive loss function.
00:02:04.000 | So if you kind of have this training example where you have something like I saw A,
00:02:08.000 | and you're trying to predict what comes after this,
00:02:10.000 | the whole idea is that there's going to be one correct token that has the correct label,
00:02:14.000 | and their training loss is going to increase the probability of that token
00:02:17.000 | and decrease the probability of everything else.
00:02:20.000 | This very simple loss function, classifying which token to use and actually predict,
00:02:25.000 | has enabled wild things.
00:02:27.000 | And this kind of took another turn in 2017 when this transformer paper was born.
00:02:32.000 | Attention was all you need.
00:02:34.000 | Everyone here has heard about this.
00:02:35.000 | It's a great exercise to actually dig into what the attention mechanism is doing,
00:02:39.000 | not the focus of this talk.
00:02:41.000 | We'll quickly kind of keep going.
00:02:43.000 | In 2018, there was three main things.
00:02:45.000 | These are slightly out of order.
00:02:47.000 | ELMo was the earliest one, which was contextualized word embeddings.
00:02:50.000 | In the same year, we also had GPT-1 and BERT released,
00:02:54.000 | which is kind of the beginning of core ideas on which modern language models
00:02:58.000 | and transformers were trending towards.
00:03:00.000 | And just getting these better models, training on large internet-scaled CORPA,
00:03:06.000 | BERT was a classifier, GPT-1 was generating text,
00:03:09.000 | and we kind of continue along these trends through the years.
00:03:13.000 | GPT-2 is when we started learning about scaling laws.
00:03:16.000 | And if you use orders of magnitude more compute,
00:03:19.000 | the actual test loss will continue to decrease in a linear fashion with respect
00:03:23.000 | to the log compute.
00:03:25.000 | These ideas now are commonplace when we talk about language models.
00:03:29.000 | GPT-2 also pioneered a lot of discussions on releasing language models.
00:03:34.000 | So GPT-2, when it was first announced, they were holding access back because
00:03:39.000 | of the risks of language models, and this started a lot of the conversations
00:03:42.000 | around what you should or should not release with language models.
00:03:46.000 | They eventually actually released GPT-2, and you could download the models
00:03:50.000 | on Hugging Face and use them, but this is where that kind of conversation
00:03:53.000 | around release strategies emerged.
00:03:55.000 | In 2020 is when language models really started to be noticeably good.
00:03:59.000 | So GPT-3 is when a lot of people are like, "Whoa, this can actually do
00:04:03.000 | really interesting things if I kind of create a really clever prompt,
00:04:06.000 | figure out how to give it my information correctly."
00:04:09.000 | And GPT-3 could do a ton of things with kind of this few-shot
00:04:13.000 | or multi-shot learning, which is when you give it a few examples
00:04:16.000 | in the prompt and then ask it to do another rendition of it.
00:04:20.000 | And with this power came many harms, and this is kind of a discussion
00:04:24.000 | of what are the risks of releasing language models,
00:04:27.000 | what types of groups will be hurt by this.
00:04:30.000 | Very important problems that kind of culminated in 2021
00:04:33.000 | with the Stochastics Parrots paper, which is arguing about whether
00:04:38.000 | or not language models can be too big is in the title, but it's really
00:04:42.000 | a critique on how we should be thinking about language models,
00:04:46.000 | what are the limits of them, are they actually doing the things,
00:04:49.000 | like are they actually thinking or doing any of these human things,
00:04:52.000 | or are they just kind of following patterns in the data?
00:04:56.000 | And then just the year after, this is kind of like the tragedy
00:05:00.000 | of Stochastic Parrots, as no one talks about it now,
00:05:03.000 | is that ChattoobeeT came a year later and totally reshaped
00:05:07.000 | the whole narrative around language models one more time.
00:05:10.000 | And this is really where we start today's talk, is like,
00:05:13.000 | how does this idea of alignment emerge in ChattoobeeT,
00:05:17.000 | and then what happens after this?
00:05:20.000 | So the question that I ask myself is like, or I tell a lot of people is,
00:05:24.000 | can ChattoobeeT exist without RLHF?
00:05:27.000 | And what we saw in the release day, so if you go back and read
00:05:30.000 | the actual OpenAI blog about RLHF, they list all these limitations,
00:05:35.000 | but they say that RLHF was an important tool to launching ChattoobeeT.
00:05:38.000 | And the limitations that they list are really the things that we're
00:05:41.000 | still researching and that we're talking about in this talk.
00:05:43.000 | It's a great blog post to go back to, but a good way to frame it
00:05:46.000 | is that RLHF seems to be necessary, but it's not sufficient.
00:05:50.000 | You can't do something like ChattoobeeT or Gemini or Claude
00:05:53.000 | with a technique that -- without something like RLHF.
00:05:56.000 | But it's not the thing -- like, pre-training is still most of the work,
00:06:00.000 | but the fact that RLHF is needed is really important to kind of
00:06:03.000 | contextualize all these improvements that we've seen in the open
00:06:05.000 | in the last 14 months or so.
00:06:08.000 | Some examples that I like to cite on RLHF being relied upon,
00:06:12.000 | you can list many more models here than I have.
00:06:15.000 | This kind of -- this figure from Anthropics Constitutional AI paper
00:06:19.000 | is the single one that I go back to all the time,
00:06:22.000 | showing how just kind of using RLHF can get these more desirable behaviors
00:06:27.000 | from their model in really dramatic ways.
00:06:29.000 | So these kind of ELO measurements aren't kind of calibrated,
00:06:33.000 | so we don't know how to compare LLAMA3 on this chart compared
00:06:37.000 | to Anthropics models, but the level of investment that Anthropics has had
00:06:41.000 | in these kind of techniques and showing this kind of wide-ranging
00:06:44.000 | improvements of their models with RLHF is a kind of flag that we can follow
00:06:49.000 | to try to learn how to do alignment with this much precision
00:06:52.000 | and with this much kind of impact as places like Anthropic
00:06:55.000 | or OpenAI will have.
00:06:57.000 | One such example is just a simple quote from the LLAMA2 paper,
00:07:01.000 | which is kind of like the colloquial way of reading this quote,
00:07:06.000 | which I will read, is that, "Whoa, RLHF worked really easily."
00:07:09.000 | And what the quote is is, "Meanwhile, reinforcement learning,
00:07:13.000 | known for its instability, seemed a somewhat shadowy field for those
00:07:17.000 | in the NLP research community.
00:07:19.000 | However, reinforcement learning proved highly effective,
00:07:21.000 | particularly given its cost and time effectiveness."
00:07:24.000 | So this is one of the biggest endorsements of RLHF,
00:07:27.000 | and it's always fun for me because I came from the RL side
00:07:29.000 | and then I've been learning NLP.
00:07:31.000 | But for NLP researchers to say these things, like, yes,
00:07:34.000 | reinforcement learning is known for instability, and given that it is
00:07:38.000 | cost-effective and time-effective for an RL person, that's shocking.
00:07:42.000 | It's like RL has never been particularly cost-
00:07:45.000 | and time-effective, but for in these language model domain,
00:07:48.000 | where we're fine-tuning with it rather than learning from scratch,
00:07:51.000 | to have people in NLP that are saying this is just really striking
00:07:56.000 | for how much impact it can have.
00:07:58.000 | And the timeline of alignment and open alignment is really like,
00:08:01.000 | when do we see these benefits?
00:08:03.000 | Like, these benefits didn't show up in models that people were playing
00:08:06.000 | with for quite a bit of time.
00:08:08.000 | So this is kind of a little atlas that I've thrown together.
00:08:11.000 | I also made a hugging face collection where I tried to add all the models
00:08:15.000 | that I talk about to it, so you can actually click on the models
00:08:17.000 | or try to use them if you're so inclined of actually running
00:08:20.000 | the models yourself.
00:08:22.000 | It's just kind of another way of documenting the artifacts
00:08:25.000 | that I talk about and the artifacts that, for me, this is a good review.
00:08:28.000 | I'm like, what mattered?
00:08:30.000 | What mattered in this really noisy journey in the last year?
00:08:33.000 | Of course, some disclaimers.
00:08:35.000 | I'm not covering every model since ChatGPT.
00:08:38.000 | This little plot of model icons could probably look more like an exponential
00:08:44.000 | than this kind of capped bar.
00:08:46.000 | And there's so much history of NLP that people are building on
00:08:50.000 | in the alignment space that is totally swept under the rug here.
00:08:55.000 | A lot of academic and infrastructure contributions that I'm not talking
00:08:59.000 | about but are really important to kind of this proliferation
00:09:02.000 | of fine-tuning models.
00:09:04.000 | So just kind of describing what this image that I have here is.
00:09:09.000 | To kind of summarize, some of these are base models.
00:09:14.000 | I'm not going to focus on base models as much as fine-tuned models.
00:09:18.000 | The base models are extremely important.
00:09:20.000 | Like none of this happens without LLAMA.
00:09:22.000 | None of this happens without LLAMA2.
00:09:24.000 | The base models are the bedrock of this ecosystem.
00:09:28.000 | And then the alignment models are what people --
00:09:31.000 | the aligned models are a lot of times what people can play with
00:09:34.000 | and what you could try out, what you could do yourself
00:09:36.000 | on much less computing infrastructure and all these things.
00:09:39.000 | So I'm going to talk more about the aligned models,
00:09:41.000 | but everything matters.
00:09:42.000 | It's one big ecosystem.
00:09:47.000 | Another thing that's not fun but I'm going to do for the sake of kind
00:09:51.000 | of flag-posting, no one really likes listening to definitions.
00:09:55.000 | Here are some things that you'll hear thrown around.
00:09:58.000 | This isn't even all of them when talking about "alignment."
00:10:01.000 | Here, alignment I've defined as a general notion of training a model
00:10:05.000 | to mirror a user's desires, really with any loss function.
00:10:09.000 | It's not restricted.
00:10:10.000 | So there's a difference between instruction fine-tuning
00:10:13.000 | and supervised fine-tuning.
00:10:15.000 | Instruction fine-tuning is about trying to get a model
00:10:18.000 | that will respond to queries, format, and instructions,
00:10:21.000 | while supervised fine-tuning is more about learning
00:10:24.000 | a specific task's capabilities.
00:10:26.000 | These get interchanged all the time.
00:10:28.000 | Like, that's okay.
00:10:29.000 | It's good to know that they're different.
00:10:31.000 | And then two kind of more ones I need to touch on,
00:10:35.000 | and we could go on even longer, is reinforcement learning
00:10:38.000 | from human feedback.
00:10:39.000 | There's this multistage process.
00:10:41.000 | It's a specific tool for aligning ML models to human data.
00:10:46.000 | It's kind of a class of tools, so it has some sort of --
00:10:49.000 | you learn a preference model and then you extract information from it.
00:10:52.000 | So there are so many different ways to do it.
00:10:54.000 | It's really an approach.
00:10:55.000 | And then there's a term that I'm kind of trying to grow,
00:10:58.000 | which is preference fine-tuning,
00:11:00.000 | which could encompass RLHF methods like PPO,
00:11:03.000 | but there's the question of how do we differentiate something
00:11:06.000 | like direct preference optimization,
00:11:08.000 | which doesn't use an RL optimizer, from all of RLHF.
00:11:12.000 | And I'll kind of come back to this,
00:11:14.000 | but it's good to have some common rounds to build on
00:11:17.000 | because I might be going through some of these things pretty quickly.
00:11:27.000 | This is a chapter that I cover in one slide
00:11:30.000 | because it's really tapping into a lot of different personal stories.
00:11:34.000 | It's hard to retell how crazy things were when ChatGPT dropped.
00:11:40.000 | People were not really losing their mind,
00:11:43.000 | but there was a lot of uncertainty on what the future held,
00:11:48.000 | especially -- it was clear that language models were important,
00:11:52.000 | but it is not clear -- there's a lot of articles on like --
00:11:55.000 | titled "We're Going to Reproduce Open ChatGPT,"
00:11:58.000 | which you can't really have an open model
00:12:02.000 | that does what a closed product does.
00:12:05.000 | There's a difference between model weights
00:12:07.000 | and this product that ChatGPT represents.
00:12:09.000 | But there's so much excitement that everyone is saying
00:12:11.000 | they're going to do these things
00:12:13.000 | and trying to figure out the right coalitions for actually doing so.
00:12:16.000 | And it's interesting.
00:12:18.000 | This delay is kind of this land grab
00:12:20.000 | where people are learning the basic things,
00:12:22.000 | like what is red teaming?
00:12:24.000 | What is the difference between a dialogue agent
00:12:26.000 | and a predictive language model?
00:12:28.000 | What tools should we use?
00:12:30.000 | And everything kind of follows from here with what people are building.
00:12:34.000 | But personally, I just remember multiple meetings
00:12:37.000 | where people were like, "Yeah, you should do it.
00:12:38.000 | You should go try to build open ChatGPT."
00:12:40.000 | And when you look back, that goal is just so wild
00:12:43.000 | that so many people are just going like,
00:12:45.000 | "We need to build this thing into open source."
00:12:48.000 | It doesn't even make sense
00:12:50.000 | because you can't open source a whole system that way.
00:12:54.000 | But there are some things that make a bit --
00:12:57.000 | this makes a lot more sense,
00:12:59.000 | which is when things start to get grounded in actual models.
00:13:02.000 | So the first Llama Suite was released, I think, in February.
00:13:06.000 | I have the date in some notes somewhere.
00:13:08.000 | And then these instruction-tuned models started to show up
00:13:11.000 | on this first Llama model.
00:13:13.000 | The first one to really crack the narrative was this Alpaca model.
00:13:17.000 | And it did a bunch of things that still are used today.
00:13:21.000 | So this was trained on 52,000 self-instruct style data
00:13:25.000 | distilled from text DaVinci 3.
00:13:27.000 | There's a lot in the sentence.
00:13:28.000 | I'll say what self-instruct means.
00:13:30.000 | But this wasn't even data generated from ChatGPT.
00:13:33.000 | It was generated from one of OpenAI's API models.
00:13:36.000 | So if we talk about --
00:13:39.000 | this is all on how to apply instruction fine-tuning.
00:13:42.000 | And this is this thing I mentioned on the definition slide.
00:13:45.000 | But really, it's about making a model
00:13:47.000 | that will respond to specific styles of inputs.
00:13:50.000 | What often happens at a technical level here
00:13:52.000 | is that the model is learning to integrate
00:13:54.000 | something called, like, a chat template,
00:13:56.000 | the ability to include system prompts.
00:13:58.000 | So you want the model to know it is an agent.
00:14:01.000 | You want the model to know what day it is.
00:14:04.000 | Excuse me, you can do this in the system prompt,
00:14:06.000 | which is something the user doesn't see,
00:14:08.000 | but it steers the behavior of the model.
00:14:10.000 | And instruction tuning is really where
00:14:12.000 | we make the model capable of having these behaviors.
00:14:15.000 | But the question is, like,
00:14:16.000 | what data are we training this behavior on?
00:14:19.000 | So the most common example is kind of --
00:14:24.000 | you continue training with this autoregressive loss function
00:14:27.000 | on question/answer pairs.
00:14:29.000 | So it's like, what is a transformer?
00:14:31.000 | And then the language model will predict an answer.
00:14:34.000 | It could be from stack overflow.
00:14:35.000 | It could be something else.
00:14:37.000 | And this example is human data.
00:14:39.000 | But what made Alpaca and a lot of these early models,
00:14:43.000 | and even today, really popular and accessible,
00:14:45.000 | is by using data to answer questions
00:14:48.000 | that is generated by an AI.
00:14:51.000 | So this is where the kind of idea
00:14:52.000 | of self-instruct data comes in.
00:14:56.000 | Self-instruct was a paper from Allen AI and UW in 2022,
00:15:01.000 | before ChatGPT, where essentially the idea is,
00:15:05.000 | how do we expand on the distribution
00:15:07.000 | and instruction data that we have,
00:15:09.000 | this training data for fine-tuning a language model,
00:15:12.000 | without getting more humans in the loop?
00:15:14.000 | So what you really have to do
00:15:17.000 | is you start with some high-quality,
00:15:19.000 | often human prompts.
00:15:21.000 | And then what we now see as more common practice today,
00:15:24.000 | but was very new then,
00:15:26.000 | is asking a stronger language model,
00:15:28.000 | create a list of prompts that are similar to this,
00:15:32.000 | but still diverse.
00:15:33.000 | And then once you have a list of prompts,
00:15:36.000 | you can use ChatGPT or another model
00:15:38.000 | to actually generate completions.
00:15:40.000 | Because then what you have
00:15:41.000 | is a really big list of question-answer pairs,
00:15:45.000 | but you don't need to go through the bottleneck
00:15:47.000 | of getting humans to sit down and write all of them.
00:15:49.000 | So what Alpaca was really, why Alpaca worked,
00:15:53.000 | is because of realizing this
00:15:55.000 | and taking in this better model from OpenAI.
00:15:58.000 | So you can see this figure here on the right
00:15:59.000 | is from the Alpaca paper or blog post, one of the two.
00:16:03.000 | They took this model from OpenAI
00:16:05.000 | and they asked it to generate more tasks.
00:16:08.000 | So they had 175 to start,
00:16:11.000 | and then they ended up with over 50,000 tasks,
00:16:14.000 | and they also generated completions
00:16:15.000 | from this OpenAI model.
00:16:17.000 | And then what they did is they took these meta-weights
00:16:21.000 | that had just come out and they instruction fine-tuned them,
00:16:24.000 | and then you end up with Alpaca.
00:16:26.000 | This is kind of a pattern.
00:16:28.000 | This is a pattern that we've seen many times with Alpaca,
00:16:30.000 | which is essentially you take,
00:16:33.000 | you generate some data from a stronger language model
00:16:37.000 | and you fine-tune on it.
00:16:38.000 | It sounds so obvious today,
00:16:39.000 | but this was the first model to actually release this.
00:16:44.000 | I can now see questions coming in.
00:16:46.000 | I'll answer the ones that are clarifying
00:16:48.000 | and stuff like this, so thanks for asking them,
00:16:51.000 | and we can come back to more at the end.
00:16:54.000 | Once Alpaca happened,
00:16:55.000 | it felt like there was a new model every week.
00:16:57.000 | The second model was Vicuna,
00:16:59.000 | and really what they changed was they added new sources
00:17:04.000 | of prompts to the distribution.
00:17:07.000 | So you can see that I say shared GPT.
00:17:10.000 | They also introduced the idea of LLM as a judge,
00:17:13.000 | which is now obvious from a lot of their later evaluation work.
00:17:17.000 | But let's talk about why shared GPT was so interesting.
00:17:21.000 | So shared GPT was one of the only data sets
00:17:26.000 | that got open language model builders,
00:17:30.000 | so people like me, prompts,
00:17:33.000 | that were similar to what people were asking chat GPT.
00:17:38.000 | So what was happening was you would install
00:17:40.000 | this browser plug-in,
00:17:42.000 | and it would let you share your prompts from chat GPT
00:17:47.000 | on Twitter or whatever.
00:17:48.000 | So it was making it easier to share the prompts
00:17:51.000 | in your conversations before OpenAI made a tool to do this,
00:17:55.000 | and now there's this legal gray area over the data set
00:18:00.000 | because most of these data sets are unlicensed,
00:18:02.000 | and they were kind of created without consent
00:18:04.000 | or they were released without consent.
00:18:06.000 | So there's a legal question
00:18:08.000 | of whether or not people should be training on this data,
00:18:11.000 | but the fact of the matter is that shared GPT
00:18:13.000 | was really important to this kind of acceleration
00:18:15.000 | and progress on fine-tuning models
00:18:18.000 | because the diversity of data is just so much stronger
00:18:21.000 | than what people were going to get
00:18:23.000 | from, like, this alpaca self-instruct idea,
00:18:25.000 | and it set the bar much higher.
00:18:27.000 | It's only today and in the last, like, few months
00:18:30.000 | or six months for some of them that we're getting data sets
00:18:32.000 | that can replace these.
00:18:34.000 | So you see I mentioned LLM says chat 1M,
00:18:37.000 | which is just a million conversations
00:18:39.000 | from Chatbot Arena, which took a lot of work
00:18:41.000 | to clean out personal information,
00:18:44.000 | and then a project from the Allen Institute of AI,
00:18:46.000 | which is WildChat,
00:18:48.000 | which is really similar to shared GPT,
00:18:50.000 | but the users were given consent at the start
00:18:53.000 | that their data was going to be collected and released
00:18:55.000 | in exchange for using a language model for free.
00:18:58.000 | So there's a lot of happenstance in the story
00:19:02.000 | where something like this, which is legally gray,
00:19:05.000 | the data is still on Hugging Face,
00:19:06.000 | but it looks kind of odd,
00:19:09.000 | where these little things helped enable the ecosystem,
00:19:12.000 | even though looking back, it's like,
00:19:14.000 | "Oh, we don't know if that should have happened."
00:19:20.000 | Following Vicuna is one called Koala,
00:19:23.000 | also from Berkeley, and, like,
00:19:26.000 | if you look at the time frames, it's pretty obvious
00:19:28.000 | that a lot of these were developed concurrently,
00:19:30.000 | and then the release dates just happened
00:19:32.000 | to be slightly different.
00:19:34.000 | Koala is mostly known for having
00:19:36.000 | kind of a different diverse set of data sets.
00:19:39.000 | They used some from Alpaca.
00:19:41.000 | They used some from shared GPT again.
00:19:44.000 | They also used enthropic data that has been released,
00:19:46.000 | and they had some human evaluation from grad students.
00:19:49.000 | So this just added more data diversity,
00:19:51.000 | and the evaluations weren't necessarily better,
00:19:54.000 | but it was an important model that a lot of people noticed
00:19:56.000 | just from these kind of bringing back up
00:19:58.000 | these new data sets that had been
00:20:00.000 | in the literature from years prior.
00:20:03.000 | Something they might ask looking at these slides
00:20:05.000 | is that it's like, "Why weight differences?"
00:20:07.000 | I have all these slides like weight diff
00:20:09.000 | to Llama7b, weight diff.
00:20:12.000 | Essentially, when Llama was released,
00:20:14.000 | it was released as research only,
00:20:16.000 | and it was distributed to researchers upon request,
00:20:20.000 | and the license prohibited people from updating
00:20:22.000 | Llama1 to HuggingFace.
00:20:24.000 | So it was kind of this annoying phase where,
00:20:26.000 | in order to use a model on HuggingFace,
00:20:27.000 | you had to clone it,
00:20:29.000 | and then you had to run a script to convert it
00:20:31.000 | with this kind of delta into the new model
00:20:34.000 | in order to actually use it.
00:20:36.000 | So this was kind of a really frustrating phase
00:20:38.000 | from a user perspective
00:20:39.000 | because it just made experimentation
00:20:41.000 | have one more barrier to entry,
00:20:43.000 | and thankfully, it was changed with Llama2,
00:20:45.000 | but it was really something that many people
00:20:47.000 | dealt with at the time,
00:20:49.000 | and we now today see different license restrictions
00:20:52.000 | on how Llama is used.
00:20:55.000 | I mean, the Llama3 released today,
00:20:57.000 | essentially, if I fine-tune a model for my research
00:21:00.000 | and I release it at AI2,
00:21:02.000 | Llama3 needs to be in the name.
00:21:04.000 | So if I wanted to release a new Tulu model,
00:21:07.000 | it would have to be Llama3 Tulu4 DPO.
00:21:10.000 | It's like the namings are going to be crazy,
00:21:12.000 | but there's always been restrictions on using LlamaWeights
00:21:14.000 | or how you share them.
00:21:18.000 | And the final model that I kind of group into this batch
00:21:21.000 | of this real first swing was Dolly.
00:21:24.000 | So Dolly was fine-tuned from a different base model.
00:21:27.000 | It was fine-tuned from the Pythia models from Eleuther,
00:21:29.000 | which are a suite of early scaling experiments
00:21:32.000 | from Eleuther AI, which is still used extensively.
00:21:36.000 | But they added some human-written data to the loop,
00:21:38.000 | which is just really important
00:21:40.000 | because almost all the projects that I'll mention today
00:21:43.000 | talk about synthetic data or data derived from OpenAI,
00:21:46.000 | but there's only a few of them
00:21:48.000 | that actually added new human data to the loop,
00:21:50.000 | and this is what everyone remembered Dolly for.
00:21:53.000 | And a lot of its performance limitations
00:21:55.000 | are probably from the base model Pythia,
00:21:57.000 | which is trained in a time where this type of inference
00:22:00.000 | that people expect wasn't as popular,
00:22:02.000 | and it was kind of before the scaling laws
00:22:04.000 | were thought of differently.
00:22:08.000 | You can kind of see through these
00:22:10.000 | where we're going to start with different model sizes
00:22:12.000 | and different MTBench scores.
00:22:14.000 | I'll talk about what MTBench is in a few slides.
00:22:16.000 | It's an evaluation tool,
00:22:18.000 | and this is really just to ground you
00:22:20.000 | on how the scores would change over time.
00:22:25.000 | So I have these throughout the talk
00:22:27.000 | as we kind of go through different models
00:22:29.000 | just to show kind of how this --
00:22:31.000 | how the scores continue to progress over time
00:22:33.000 | from one small change to another
00:22:35.000 | as the community gets better at these things.
00:22:38.000 | While I was talking about human data,
00:22:40.000 | so remember, Dolly is all about human data.
00:22:43.000 | Probably still the single busiest human coordination project
00:22:50.000 | for data generation was OpenAssistant.
00:22:52.000 | I think it's easy now, if you get into fine-tuning,
00:22:55.000 | to see the OpenAssistant dataset
00:22:57.000 | and not realize how important it was
00:23:01.000 | to the process of alignment in this whole summer,
00:23:04.000 | and it is still used today.
00:23:06.000 | So essentially, there's this quote on the top,
00:23:08.000 | but the leaders ran a community project
00:23:11.000 | to generate human-written prompts
00:23:14.000 | and these kind of like human-written responses
00:23:18.000 | in many different languages with rating them
00:23:20.000 | so you could use it as preferences.
00:23:23.000 | The slide has a bug where it's like
00:23:25.000 | it has over 10,000 annotated trees
00:23:28.000 | and over 1,000 volunteers.
00:23:30.000 | This is still used extensively today.
00:23:32.000 | It will come up again in the talk.
00:23:34.000 | They also released models.
00:23:35.000 | So the first model used on Hugging Chat,
00:23:39.000 | which I don't remember the launch date of,
00:23:41.000 | was an OpenAssistant model.
00:23:43.000 | So OpenAssistant was probably
00:23:45.000 | the first majorly successful project of the era,
00:23:47.000 | and the dataset is still used today.
00:23:50.000 | Where I will end this talk is saying
00:23:52.000 | that we need more things like this.
00:23:54.000 | It's like really one of the most important things
00:23:57.000 | is we need more human data in the open.
00:24:01.000 | This is a quick aside.
00:24:02.000 | It's kind of out of the flow of the talk,
00:24:04.000 | but on April 28th of 2023, typo on the slide,
00:24:08.000 | of April 28th of 2023,
00:24:10.000 | Stable Vicuno was released from Carper AI,
00:24:13.000 | which looks now like the style of training models,
00:24:17.000 | except for the dataset, which is now popular.
00:24:23.000 | They got PPO to work.
00:24:24.000 | They had some human evaluations that were solid.
00:24:27.000 | It was a good chat model.
00:24:28.000 | It wasn't out of distribution,
00:24:31.000 | but Carper AI was really ahead at the time,
00:24:34.000 | and then it seems like priorities
00:24:36.000 | kind of shifted from stability.
00:24:38.000 | But it's important to know the extent
00:24:40.000 | by which there were still some players
00:24:42.000 | who knew how to do ROHF really early on,
00:24:45.000 | even though they were pretty rare.
00:24:52.000 | This is the last slide of this kind of first chapter
00:24:55.000 | on instruction tuning, was the idea of QLORA,
00:24:58.000 | which was kind of unlocked a whole new bunch of players
00:25:02.000 | into actually being able to find two models.
00:25:05.000 | So for the quick 60-second overview,
00:25:09.000 | LORA stands for low-rank adaptation,
00:25:12.000 | which is the idea of you can freeze some model --
00:25:15.000 | you freeze most of the model weights,
00:25:16.000 | and you add new weights to specific layers
00:25:19.000 | that you can then fine-tune,
00:25:21.000 | as if you were fine-tuning the whole model.
00:25:23.000 | You'd use the same approach of instruction data
00:25:26.000 | with question-answering, but it takes much less memory.
00:25:29.000 | QLORA was a technique that built upon this
00:25:32.000 | by adding very specific quantization and GPU tricks
00:25:35.000 | to make it so the memory requirements
00:25:37.000 | to fine-tune models was even lower.
00:25:40.000 | Tim Detmers and team also released this Guanaco model
00:25:44.000 | with it, which was another big step up
00:25:48.000 | in performance of these models.
00:25:49.000 | I have a few more slides on it, on the method.
00:25:51.000 | So you can kind of see on the right this difference,
00:25:54.000 | full fine-tuning, LORA.
00:25:56.000 | They look similar, where LORA, you have fewer parameters,
00:25:58.000 | is kind of what the smaller shapes mean.
00:26:00.000 | In QLORA, they quantize the base model
00:26:03.000 | that you're propagating gradients through
00:26:05.000 | to save most of the memory.
00:26:08.000 | So this is an approximation of if you're fine-tuning
00:26:11.000 | different model sizes on the top,
00:26:13.000 | so 7 billion, 13 billion, 30 billion,
00:26:15.000 | with full fine-tuning, different amount of bits,
00:26:18.000 | but full fine-tuning versus LORA versus QLORA.
00:26:21.000 | And you can kind of see, for reference,
00:26:24.000 | one A100 GPU has about 80 gigabytes of memory,
00:26:28.000 | and these are really hard GPUs to get.
00:26:31.000 | Plenty of consumer GPUs will only
00:26:33.000 | have like 24 to 32 gigabytes of memory.
00:26:37.000 | So you need to use these QLORA techniques
00:26:39.000 | to actually get the ability to fine-tune models at the 7
00:26:44.000 | or 13 billion parameter size.
00:26:48.000 | And like Guanaco did this, and they released 33 billion
00:26:52.000 | and 65 billion parameter LAMA fine-tunes,
00:26:55.000 | which were clear steps up in the kind of state
00:26:57.000 | that they are at the time.
00:26:59.000 | And they also figured out ways to filter this Open Assistant
00:27:03.000 | data set that I mentioned, and this kind
00:27:05.000 | of filtered version of Open Assistant
00:27:07.000 | is what is still most popular today.
00:27:10.000 | I'm going to kind of pause and skim through the questions
00:27:12.000 | and see if there's anything on that section,
00:27:14.000 | and if not, I'll save the relevant ones for later.
00:27:18.000 | Anyway, I'm going to keep going.
00:27:20.000 | They're great questions, and I appreciate them,
00:27:23.000 | but they're mostly not specific enough
00:27:26.000 | where it's worth the digression.
00:27:28.000 | This Chapter 2 phase is really like
00:27:31.000 | where it seemed like things were a little bit slower
00:27:35.000 | on the ground, but when we look back
00:27:37.000 | at a lot of the things that came out of this time,
00:27:39.000 | like the DPO paper was in this era.
00:27:42.000 | Everyone read it, but we didn't know what to do with it yet,
00:27:44.000 | and the new evaluations are still really used.
00:27:48.000 | Transitioning in, setting the scene for being
00:27:51.000 | not sure if things work, a lot of people
00:27:54.000 | were continuing to try to build on these LORA methods
00:27:57.000 | and QLORA methods.
00:27:58.000 | I remember a lot of excitement at Hugging Face
00:28:00.000 | where we were setting up our RLHF pipeline
00:28:03.000 | where we could do RLHF on 7 billion parameter models,
00:28:07.000 | and we could maybe do it on a consumer GPU.
00:28:09.000 | It was really cool to see the loss going down.
00:28:12.000 | It was great to bring more people into the space,
00:28:15.000 | but weeks and weeks would go by, and you're like,
00:28:18.000 | "Why has no one picked up what we released
00:28:21.000 | in the blog post and trained a really good model with it?"
00:28:24.000 | And the kind of consensus now is that these LORA methods
00:28:30.000 | just have some sort of weird limitation
00:28:32.000 | in how you use them or how the gradients flow
00:28:35.000 | that make it much, much harder to get a really good model out.
00:28:39.000 | If you only have a certain number of GPUs
00:28:41.000 | such that LORA is your only option, definitely use it,
00:28:44.000 | but for people that have more GPUs,
00:28:46.000 | figuring out how to scale is normally a better solution
00:28:49.000 | than just using something like LORA that fits
00:28:51.000 | and is easier in the short term.
00:28:55.000 | Another defining moment of this era was the LLAMA2 backlash.
00:29:00.000 | I'm guessing some people remember this,
00:29:02.000 | which is like the famous line was people asked LLAMA
00:29:05.000 | how to kill a Python process, and it would say no,
00:29:08.000 | and this really started a whole bunch of new discussions
00:29:12.000 | around what kind of alignment means
00:29:15.000 | or what models should or should not do.
00:29:18.000 | Here's an example from a paper for a safety evaluation test set
00:29:22.000 | called XS Test, and it's just like,
00:29:25.000 | "Should chat models be safe
00:29:27.000 | or should they follow the instructions that I want?"
00:29:30.000 | And this is a fundamental question.
00:29:32.000 | It'll differ by organization. It'll differ by individual.
00:29:35.000 | And this is the point where this became very serious
00:29:39.000 | and something that people actually had to reckon with
00:29:41.000 | because there were models that were actively disagree--
00:29:45.000 | people were really disagreeing with this specific take.
00:29:48.000 | I don't have any clear solution to it,
00:29:51.000 | but one of the things it led to is this idea of uncensored models.
00:29:55.000 | It's a really popular category on kind of a hugging face right now
00:30:01.000 | where the idea is you remove filtering.
00:30:03.000 | So if we're using synthetic data and I ask a language model a question,
00:30:07.000 | like if I ask chat GPT how to make a bomb,
00:30:09.000 | it's going to say, "I'm sorry. I'm a language model.
00:30:11.000 | I shouldn't make this."
00:30:13.000 | And the idea of uncensored models is to remove those points from our kind of--
00:30:19.000 | remove those points from our fine-tuning data set.
00:30:22.000 | I think there's a lot of confusion over the name
00:30:24.000 | because language models were never--
00:30:26.000 | at this stage really aren't censored to begin with,
00:30:29.000 | but it's really that the data set
00:30:31.000 | and the method for creating these data sets needed more filtering
00:30:35.000 | or they needed some way of becoming unbiased.
00:30:38.000 | So like there's a lot of people now that only build models
00:30:42.000 | to try to make them unbiased against any sort of refusal.
00:30:45.000 | A refusal is when you ask a language model something and it says no.
00:30:48.000 | And this goes on today, and this came out of this LLAMA2 thing.
00:30:54.000 | But otherwise, this is a transition period
00:30:56.000 | where there's a lot of good, solid models being trained,
00:31:00.000 | but either they didn't have a lot of documentation,
00:31:02.000 | they didn't have the right release team to splash as big as they should have,
00:31:06.000 | the methods were complicated to implement, or something like this.
00:31:10.000 | So I could run through these, and I remember all these models coming out,
00:31:14.000 | but none of them were really things that are household names like Alpaca is today.
00:31:19.000 | The team from behind WizardLM,
00:31:22.000 | where they created this method called InvolInstruct,
00:31:24.000 | which is a synthetic data method.
00:31:26.000 | All these things were clearly working for them
00:31:30.000 | based on the models they were generating,
00:31:32.000 | but for whatever reason, the narrative wasn't actually changed.
00:31:36.000 | There's some new datasets, like UltraLM is from OpenBMB in China
00:31:42.000 | that is releasing new datasets, more people training on shared GPT.
00:31:46.000 | The model called XWinLM was the first one to be a similar ballpark,
00:31:51.000 | and it's also trained with RLHF, so not just that Carper model.
00:31:56.000 | But for whatever reason, these didn't really splash.
00:32:00.000 | And that was this kind of summer after LLAMA2
00:32:03.000 | where fine-tuning was chugging along,
00:32:06.000 | but the narrative wasn't changing all that much,
00:32:09.000 | at least from my perspective, but that's why I'm here.
00:32:14.000 | But what was happening in the background,
00:32:16.000 | while the models weren't seeming that different,
00:32:19.000 | is that new evaluation tools were coming out
00:32:22.000 | that ended up kind of being the standard of today.
00:32:25.000 | So you can see the dates here, so May 3rd, ChatBot Arena,
00:32:28.000 | June 8th, AlpacaEval, June 22nd, MTBench.
00:32:31.000 | Sometime in early July, the OpenLLM Leaderboard.
00:32:34.000 | All of these things were created about the same time,
00:32:37.000 | where there's a desperate need to get some sort of signal
00:32:40.000 | on what our fine-tuned models are doing in the open.
00:32:43.000 | Like, we don't have the capability of paying humans
00:32:46.000 | to compare our responses like they do at Anthropic,
00:32:49.000 | where they're always trying new models on humans.
00:32:51.000 | That's way too expensive.
00:32:53.000 | We need something that you could sit down as an engineer
00:32:56.000 | and get feedback in 10 to 15 minutes.
00:33:00.000 | So I kind of run through these in order,
00:33:02.000 | and a lot of these are obvious,
00:33:04.000 | but it's important to take this from the perspective
00:33:06.000 | of what can I use when I'm trying to align models,
00:33:09.000 | and what is an immediate feedback
00:33:11.000 | versus what is kind of this long-term signal.
00:33:14.000 | So ChatBot Arena is obviously fantastic.
00:33:17.000 | Like, everyone looks at this today
00:33:19.000 | as something that is defining corporate strategy,
00:33:25.000 | as defining the biggest language model players.
00:33:28.000 | If Cloud 3 is better than GPT-4.
00:33:31.000 | But if I'm an engineer, A, many small providers
00:33:35.000 | aren't going to get their models in,
00:33:37.000 | and B, it takes -- especially previously,
00:33:40.000 | it used to take weeks to get your models rating,
00:33:42.000 | but now it takes days.
00:33:44.000 | Like, I need to know what my models are
00:33:47.000 | before I decide to actually release it.
00:33:49.000 | So that's the biggest thing where, like,
00:33:51.000 | I know I need something beyond ChatBot Arena
00:33:54.000 | just for my engineering development.
00:33:57.000 | And this is where, like,
00:33:58.000 | AlpacaEval and MTBench really thrive.
00:34:00.000 | So AlpacaEval --
00:34:02.000 | God, the slide formatting got changed,
00:34:04.000 | but I'll just kind of keep rolling through this.
00:34:07.000 | AlpacaEval is the idea of you have a list of prompts
00:34:10.000 | that you compare to a strong other base model,
00:34:14.000 | like OpenAI's DaVinci 3 or GPT-4,
00:34:19.000 | and then you ask a language model which is better.
00:34:21.000 | And the data set here is compiled
00:34:23.000 | from all these popular data sets
00:34:25.000 | that I have been talking about so far.
00:34:28.000 | So data sets from OpenAssistant, Vicuna, Koala, Anthropic.
00:34:32.000 | Like, all these data sets that people have been using,
00:34:35.000 | they took the test sets from those,
00:34:37.000 | and that's what AlpacaEval mirrors.
00:34:39.000 | It's kind of a known thing.
00:34:41.000 | It has some limitations
00:34:43.000 | because there's only so many prompts,
00:34:46.000 | and it's like asking a language model
00:34:48.000 | to provide a rating is going to have some ceiling
00:34:51.000 | where we don't know how to compare two really good models.
00:34:55.000 | So it has more samples than MTBench,
00:34:57.000 | so there's more --
00:35:00.000 | so there's just kind of smaller error bars,
00:35:02.000 | and it's easier to use
00:35:03.000 | because it's a single-turn generation.
00:35:05.000 | But we've heard about the length bias
00:35:07.000 | for a really long time,
00:35:09.000 | and it's not clear how to interpret these top results.
00:35:12.000 | So this is an older screenshot of Leaderboard,
00:35:14.000 | but what does beating a model 95% of the time
00:35:19.000 | mean to another language model?
00:35:21.000 | That's the questions that we can't really answer
00:35:23.000 | in the short term.
00:35:25.000 | AlpacaEval 2 came out, which takes steps to this,
00:35:28.000 | where it compares to GPT-4 rather than DaVinci 3.
00:35:34.000 | DaVinci 3 was an Instruct GPT variant.
00:35:37.000 | But at the end of the day,
00:35:39.000 | if, like, GPT-4 is answering these questions
00:35:43.000 | in the Alpaca style really well,
00:35:46.000 | so what does beating GPT-4 exactly mean?
00:35:49.000 | And we need to get more specific in our evaluations
00:35:52.000 | because I don't really know if I care too much
00:35:55.000 | about a 20% or 30% score in AlpacaEval 2
00:35:59.000 | because I don't know what it means.
00:36:00.000 | And this is the opaqueness of all of our evaluations.
00:36:04.000 | We'll see this time and time again
00:36:05.000 | where we don't know what an increase in score means.
00:36:08.000 | That's, like, the next step
00:36:09.000 | after being able to do it easily.
00:36:12.000 | This update was pretty recent.
00:36:16.000 | MTBench is pretty similar,
00:36:18.000 | where instead of comparing to one model,
00:36:21.000 | you ask a language model to provide a score
00:36:24.000 | to a list of prompts.
00:36:25.000 | So if I have a model I'm training,
00:36:27.000 | I generate the completion to 80 diverse prompts,
00:36:30.000 | and then I ask GPT-4, "Hey, from 0 to 10,
00:36:33.000 | how good were each of those completions?"
00:36:36.000 | And this is good,
00:36:38.000 | but it runs into the same problem of, like,
00:36:40.000 | what if our model is getting really good?
00:36:42.000 | If our model is getting really good,
00:36:44.000 | it's just, like, it becomes saturated.
00:36:47.000 | Like, GPT-4 only gets to about 9,
00:36:50.000 | and there's only about 80 prompts
00:36:52.000 | in the evaluation set and all these things.
00:36:54.000 | And it's just --
00:36:56.000 | And one of the nuance points
00:36:57.000 | is that there's actually a variance.
00:36:59.000 | So even if you set the temperature to 0,
00:37:01.000 | GPT-4 versions change.
00:37:03.000 | Your own generations from your model
00:37:05.000 | you're trying to train can change.
00:37:08.000 | And this makes it better where it's like,
00:37:12.000 | "Okay, I can tell if a model was really bad,
00:37:14.000 | if MTBench and AlpacaEval have really low scores."
00:37:17.000 | But it's hard to, like --
00:37:19.000 | It's still --
00:37:20.000 | We have this goal for a precise evaluation.
00:37:23.000 | So, like, in pre-training,
00:37:25.000 | we have MTBench and HelloSwag
00:37:28.000 | and all of the --
00:37:29.000 | Or, sorry.
00:37:30.000 | In pre-training, we have, like, MMLU and HelloSwag
00:37:32.000 | and all these things that people can look at
00:37:34.000 | and average over 10 tasks.
00:37:36.000 | And if you get, like, a 2% improvement on average,
00:37:38.000 | you're doing great.
00:37:39.000 | But we don't have this clear indicator
00:37:41.000 | in alignment evaluation.
00:37:44.000 | The OpenLLM leaderboard was the same,
00:37:46.000 | where it's --
00:37:48.000 | This came out of the team
00:37:49.000 | I was working on at Hugging Base,
00:37:50.000 | where we were just trying to evaluate
00:37:52.000 | more models to get more signal,
00:37:55.000 | which was that we needed to know
00:37:57.000 | what our competitors were doing
00:37:58.000 | and get some ballpark estimate.
00:38:00.000 | And what this grew into
00:38:01.000 | is this whole kind of, like,
00:38:02.000 | ecosystem-supporting discovery tool
00:38:05.000 | just because, like,
00:38:06.000 | getting any signal is so useful.
00:38:08.000 | So, this is where we were starting
00:38:09.000 | with evaluation,
00:38:10.000 | which was just, like, no signal.
00:38:11.000 | And why this leaderboard was so successful
00:38:13.000 | is because it gave everyone access
00:38:15.000 | to a little bit more signal
00:38:16.000 | on the models that they're evaluating,
00:38:18.000 | but it didn't really solve
00:38:20.000 | any of the fundamental problems.
00:38:21.000 | And it didn't show us that, like,
00:38:23.000 | doing RLHF on models
00:38:24.000 | would actually make the scores go up.
00:38:26.000 | It's starting to get better today,
00:38:27.000 | but that's, like, a year on
00:38:29.000 | from the launch of this leaderboard.
00:38:32.000 | So, this is, like --
00:38:33.000 | These problems are still --
00:38:34.000 | This is talking about a section
00:38:36.000 | from July of 2023,
00:38:38.000 | and it seems like the things
00:38:39.000 | that if I were to go into --
00:38:40.000 | like, go into work and talk to people
00:38:42.000 | about what we're going to do
00:38:43.000 | with our new models,
00:38:44.000 | like, these are the same questions
00:38:45.000 | that we're still asking,
00:38:46.000 | which is why it's --
00:38:48.000 | These evaluation schools
00:38:50.000 | are still so useful,
00:38:51.000 | and it's why people still talk
00:38:52.000 | about Opaqa Eval,
00:38:53.000 | but it shows how much of an opportunity
00:38:54.000 | there still is.
00:38:57.000 | So, this is kind of a summary
00:38:58.000 | of what I was talking about.
00:38:59.000 | It's, like, how easy is it
00:39:01.000 | to use these evaluations?
00:39:03.000 | Like, Chatbot Arena is everything.
00:39:05.000 | Like, Andrej Karpathy tweets about it,
00:39:07.000 | and it's great,
00:39:08.000 | and you can go there
00:39:09.000 | and you can use models,
00:39:10.000 | but, like, I don't know
00:39:12.000 | how to make sense of that
00:39:14.000 | as if I'm trying to sit down
00:39:15.000 | every day and write code.
00:39:17.000 | And Opaqa Eval and MTBench
00:39:19.000 | mostly solve this by being cheap
00:39:21.000 | and pretty accessible,
00:39:23.000 | but I really, really think
00:39:24.000 | there's a huge opportunity here
00:39:26.000 | to come out with more.
00:39:27.000 | So, a colleague at AI2
00:39:28.000 | launched WildBench,
00:39:29.000 | which is a good tool
00:39:31.000 | that kind of fits in.
00:39:33.000 | It's like a Chatbot Arena
00:39:34.000 | Opaqa Eval hybrid,
00:39:35.000 | and you can use it
00:39:36.000 | a little bit faster.
00:39:37.000 | It's, like, how are we going
00:39:38.000 | to continue to push this along
00:39:40.000 | is a great question,
00:39:41.000 | and I would love to hear
00:39:42.000 | what people think.
00:39:46.000 | We'll take another pause.
00:39:47.000 | I think we're getting good questions
00:39:50.000 | in the chat around RLHF
00:39:53.000 | and other things.
00:39:55.000 | To what extent do aligned models
00:39:57.000 | actually reason about
00:39:59.000 | whether user intent is malicious
00:40:01.000 | rather than perform target detection
00:40:03.000 | to avoid unsafe topics?
00:40:06.000 | This is a question
00:40:07.000 | that I wanted to read
00:40:08.000 | because it kind of gets
00:40:09.000 | at this model versus system topic.
00:40:11.000 | So, when ChatGBT was released
00:40:13.000 | on day one,
00:40:14.000 | it has an output filter
00:40:16.000 | that does moderation.
00:40:17.000 | The language model
00:40:18.000 | that is instruction tuned
00:40:19.000 | or RLHF tuned
00:40:20.000 | generates a bunch of text,
00:40:22.000 | and then a separate model
00:40:23.000 | says yes or no,
00:40:25.000 | and that's, like,
00:40:26.000 | where it actually does detection,
00:40:28.000 | and with the release of LLAMA3,
00:40:30.000 | there's another model
00:40:31.000 | that's called, like, LLAMAGuard,
00:40:33.000 | and this is a classifier
00:40:34.000 | which will take this text,
00:40:36.000 | do the moderation,
00:40:37.000 | and say which type
00:40:38.000 | of unsafe topic it is.
00:40:40.000 | The actual model
00:40:41.000 | that it's generating
00:40:42.000 | does no reasoning
00:40:43.000 | over kind of what is actually
00:40:46.000 | an unsafe topic.
00:40:50.000 | So, I'll come back to other ones.
00:40:51.000 | I'm going to do some discussions
00:40:52.000 | about RLHF right now,
00:40:56.000 | so this will kind of give
00:40:57.000 | good grounds for
00:40:58.000 | where we can continue
00:40:59.000 | some of these discussions on.
00:41:02.000 | There's ORPO or REINFORCE.
00:41:04.000 | I don't cover all of them
00:41:05.000 | in the lecture,
00:41:06.000 | but I kind of lead on
00:41:07.000 | to why we would talk about them.
00:41:13.000 | So, this chapter
00:41:14.000 | is when I started
00:41:15.000 | to get validation
00:41:16.000 | as an RL researcher
00:41:17.000 | that being opportunistic
00:41:19.000 | and going to work
00:41:20.000 | in language models
00:41:21.000 | was actually a good idea.
00:41:23.000 | For a lot of this,
00:41:24.000 | there was a lot of uncertainty
00:41:27.000 | over if people
00:41:29.000 | in this kind of open ecosystem
00:41:31.000 | were even going to be able
00:41:32.000 | to use RLHF at all
00:41:34.000 | or if being a "RLHF researcher"
00:41:36.000 | for me meant
00:41:37.000 | I was going to do
00:41:38.000 | instruction fine-tuning
00:41:39.000 | and talk about evals
00:41:40.000 | and never think about RL again.
00:41:43.000 | It turned out to be wrong.
00:41:47.000 | I'm going to review
00:41:48.000 | some RL fundamentals
00:41:49.000 | just to kind of make sure
00:41:50.000 | we're talking the same language
00:41:51.000 | as we talk about this,
00:41:52.000 | and this will lead
00:41:53.000 | into direct preference optimization.
00:41:55.000 | So, there's a reason
00:41:56.000 | why I'm doing math.
00:41:57.000 | I know this is not
00:41:58.000 | a normal lecture,
00:41:59.000 | but here is the equation
00:42:01.000 | where you'll see this
00:42:02.000 | in RLHF papers.
00:42:03.000 | This is what we're optimizing
00:42:04.000 | when we're optimizing RLHF.
00:42:07.000 | It looks kind of nebulous here.
00:42:09.000 | I'll break it down.
00:42:10.000 | So, on the left side,
00:42:11.000 | we're really maximizing
00:42:12.000 | with respect to some policy, pi,
00:42:15.000 | this reward that is parameterized
00:42:17.000 | by a network, phi,
00:42:19.000 | and we have a penalty
00:42:20.000 | that is this kind of KL term,
00:42:22.000 | which is the distance
00:42:23.000 | from our policy to some reference.
00:42:25.000 | We want to increase reward,
00:42:27.000 | but we want to constrain the model
00:42:29.000 | so that this kind of optimization
00:42:31.000 | doesn't go too far.
00:42:34.000 | And the primary questions
00:42:35.000 | when doing this is,
00:42:37.000 | how do we implement
00:42:38.000 | a good reward function
00:42:39.000 | and how do we optimize the reward?
00:42:41.000 | This is a really RL-centric way
00:42:43.000 | of doing it, which is like,
00:42:44.000 | if you give me a reward function,
00:42:47.000 | I can optimize it.
00:42:48.000 | And the classic RL idea
00:42:50.000 | was I'm in an environment
00:42:51.000 | that environment has
00:42:52.000 | the reward function built in.
00:42:54.000 | In RLHF, we're designing
00:42:56.000 | our own reward function.
00:42:57.000 | So this adds a lot of weirdness
00:43:00.000 | to the actual optimization
00:43:02.000 | that we're doing.
00:43:04.000 | And what we do is
00:43:06.000 | to get this reward function,
00:43:08.000 | is we learn what is called
00:43:09.000 | a preference or reward model.
00:43:10.000 | And the most popular way to do this
00:43:12.000 | is to take a language model
00:43:16.000 | that's kind of predicting
00:43:17.000 | this separation of two preferences.
00:43:20.000 | This is called a Bradley-Terry model,
00:43:22.000 | which goes back to some economics.
00:43:24.000 | But the key idea is that
00:43:25.000 | the reward will be proportional
00:43:27.000 | to the probability
00:43:29.000 | that the text I have
00:43:30.000 | would be chosen over
00:43:31.000 | any other arbitrary text.
00:43:33.000 | Quickly sounds really theory-like,
00:43:35.000 | but it outputs a scalar,
00:43:36.000 | which is now a reward function
00:43:38.000 | and is based on this pairwise data.
00:43:42.000 | So the idea is with this equation,
00:43:45.000 | what if we just use gradient ascent
00:43:46.000 | on this equation?
00:43:48.000 | And instead of trying to learn
00:43:50.000 | a preference model
00:43:51.000 | and learn this R,
00:43:53.000 | what if we just use
00:43:54.000 | gradient ascent directly?
00:43:55.000 | This is really what
00:43:56.000 | direct preference optimization
00:43:57.000 | is doing.
00:43:58.000 | There's a bunch of math in here
00:43:59.000 | to get what this R is,
00:44:01.000 | but this was released back in May.
00:44:03.000 | So we've already moved on
00:44:04.000 | months ahead.
00:44:05.000 | This chapter starts in kind of
00:44:06.000 | late September, October.
00:44:08.000 | Back in May,
00:44:09.000 | when we're still talking
00:44:10.000 | about Open Assistant,
00:44:11.000 | this DPO paper came out.
00:44:13.000 | It's a fantastic paper.
00:44:14.000 | If you hadn't read it,
00:44:15.000 | it's a great way to learn
00:44:16.000 | about language model math.
00:44:18.000 | It's worth reading,
00:44:20.000 | but the core idea is like,
00:44:21.000 | why are we spending all this time
00:44:24.000 | learning a reward model
00:44:25.000 | when we can just use gradient ascent
00:44:27.000 | and solve for the loss function?
00:44:30.000 | Some key ideas to think about
00:44:31.000 | with DPO is that DPO
00:44:34.000 | is extremely simple to implement.
00:44:36.000 | On the right here side
00:44:37.000 | is the example code
00:44:38.000 | from the DPO paper
00:44:40.000 | where it's like,
00:44:41.000 | as long as you have access
00:44:42.000 | to the log probs from a model,
00:44:44.000 | which is a very core thing
00:44:45.000 | for training language models,
00:44:47.000 | you can compute the DPO loss.
00:44:50.000 | Because of this,
00:44:51.000 | because the loss function
00:44:52.000 | is at a nice abstraction,
00:44:54.000 | it scales nicely
00:44:55.000 | with existing libraries.
00:44:57.000 | And what it's actually doing
00:44:59.000 | is training an implicit
00:45:00.000 | reward function.
00:45:01.000 | So the reward is a function
00:45:02.000 | of the log probs.
00:45:04.000 | I don't have the equation here
00:45:05.000 | because it quickly becomes
00:45:06.000 | a rabbit hole.
00:45:08.000 | But whatever the whole DPO
00:45:10.000 | versus PPO debate means,
00:45:12.000 | or ORPO,
00:45:14.000 | I don't remember
00:45:15.000 | what the paper title is.
00:45:18.000 | We're going to see
00:45:19.000 | a lot of these things
00:45:20.000 | because it's simple
00:45:21.000 | and it scales well.
00:45:22.000 | That doesn't necessarily mean
00:45:23.000 | the fundamental limits are higher,
00:45:25.000 | but sometimes it doesn't matter
00:45:26.000 | if the limits are higher
00:45:28.000 | if it's easier to make progress
00:45:29.000 | on something.
00:45:30.000 | Because it feels better
00:45:31.000 | when progress is being made.
00:45:33.000 | So that's really a core thing
00:45:34.000 | is we'll keep seeing these models
00:45:35.000 | and we are.
00:45:37.000 | And there's this whole debate
00:45:38.000 | that has gone on,
00:45:40.000 | kind of crushing a whole bunch
00:45:41.000 | of these questions
00:45:42.000 | by redirecting them
00:45:44.000 | in a very political manner.
00:45:46.000 | But it's like,
00:45:47.000 | should we use reinforced?
00:45:48.000 | What about PPO?
00:45:49.000 | What about other things?
00:45:51.000 | They're very different styles
00:45:53.000 | of optimization.
00:45:54.000 | So in one half,
00:45:56.000 | we're using RL update rules,
00:45:58.000 | which is ultimately about
00:45:59.000 | learning a value function
00:46:00.000 | and then learning to update,
00:46:02.000 | taking gradient steps
00:46:03.000 | with respect to that value function.
00:46:05.000 | In DPO,
00:46:06.000 | we're taking gradient steps
00:46:07.000 | directly from the probabilities
00:46:08.000 | of the language model.
00:46:09.000 | They're very different
00:46:10.000 | optimization regimes.
00:46:12.000 | And there's this great meme
00:46:13.000 | where like all of this,
00:46:14.000 | like there was a month
00:46:15.000 | where the whole NLP Twitter
00:46:17.000 | was just arguing about this,
00:46:19.000 | but both of them
00:46:21.000 | are continuing to progress
00:46:24.000 | and that is good.
00:46:25.000 | It will not just be one
00:46:26.000 | or the other.
00:46:30.000 | So what really made this debate
00:46:32.000 | kick into gear
00:46:33.000 | was this release
00:46:34.000 | of the Zephyr beta model
00:46:35.000 | from HuggingFace.
00:46:37.000 | It's after I left HuggingFace
00:46:38.000 | with the team I was on.
00:46:40.000 | And it was the first model
00:46:41.000 | to make a splash with DPO
00:46:42.000 | and it was a big step up
00:46:43.000 | in how models were perceived.
00:46:45.000 | This model was added
00:46:46.000 | to like the use search engine.
00:46:47.000 | People were using
00:46:48.000 | all sorts of crazy things.
00:46:49.000 | So it just felt really good to use.
00:46:51.000 | It was building on this
00:46:52.000 | better base model.
00:46:53.000 | Mistral had come out.
00:46:54.000 | A new data set,
00:46:55.000 | this ultra feedback data set
00:46:57.000 | that I mentioned
00:46:58.000 | is still one of the core
00:46:59.000 | data sets used today
00:47:00.000 | when we're kind of
00:47:01.000 | practicing alignment.
00:47:02.000 | This was back in September,
00:47:04.000 | October that this model came back.
00:47:06.000 | One of the core things
00:47:07.000 | to getting DPO to work
00:47:09.000 | was using really low learning rates
00:47:11.000 | like 5E minus 7.
00:47:13.000 | There's memes about 3E minus 4
00:47:15.000 | being the only learning rate
00:47:16.000 | you need to do deep learning
00:47:18.000 | and changing it being kind of a joke.
00:47:20.000 | DPO is the case where
00:47:21.000 | that is not even remotely true.
00:47:23.000 | And then you can see
00:47:24.000 | the MT bench scores again
00:47:25.000 | continuing to rise.
00:47:27.000 | So this is like a validation proof
00:47:28.000 | that DPO works
00:47:29.000 | that came four months
00:47:31.000 | after the paper was released.
00:47:33.000 | That delay is something
00:47:34.000 | that nobody expected.
00:47:35.000 | We were kind of losing hope
00:47:36.000 | on DPO at many times
00:47:38.000 | and now look at where it is.
00:47:40.000 | And then when I joined AI2,
00:47:42.000 | they were already working
00:47:43.000 | on this project
00:47:44.000 | and I had just helped
00:47:45.000 | to kind of get it across the line.
00:47:46.000 | It's like the classic advisor thing
00:47:48.000 | where sometimes it's just easier
00:47:50.000 | is the first model to scale DPO
00:47:52.000 | to 70 billion parameters.
00:47:53.000 | The last question was,
00:47:54.000 | oh yeah, DPO works on small models.
00:47:56.000 | Will anyone ever use it
00:47:57.000 | on a big model?
00:47:58.000 | The answer is yes.
00:47:59.000 | And it's like built on the same recipe
00:48:01.000 | as Zephyr with a little bit
00:48:03.000 | different instruction tuning data
00:48:04.000 | sets, but scores continued to climb.
00:48:08.000 | This model was so close
00:48:11.000 | to beating GPT 3.5 on chatbot arena.
00:48:14.000 | It was like a couple ELO points below.
00:48:16.000 | So we didn't get the title
00:48:17.000 | of being the first ones to do that,
00:48:19.000 | but open models were starting
00:48:21.000 | to get that kind of chatty behavior
00:48:23.000 | that for so long had eluded them
00:48:25.000 | because we hadn't figured out scale
00:48:27.000 | because we hadn't figured
00:48:28.000 | out these data sets.
00:48:29.000 | So it was great progress.
00:48:30.000 | Very important and kind
00:48:32.000 | of major transition
00:48:33.000 | in my career where now it's like,
00:48:34.000 | okay, RHF methods really can work.
00:48:39.000 | And these weren't just,
00:48:40.000 | I was not the only one touching
00:48:41.000 | things that did this.
00:48:43.000 | A couple other projects
00:48:44.000 | that are really important.
00:48:45.000 | So NVIDIA had STEER-LM
00:48:47.000 | where STEER-LM was collecting
00:48:50.000 | feedback data where there
00:48:51.000 | was attributes on it,
00:48:52.000 | like how helpful the message was,
00:48:54.000 | how concise the message was.
00:48:56.000 | And they did a bunch of fine tuning
00:48:57.000 | and released good, very solid models.
00:49:01.000 | And they also showed that PPO
00:49:02.000 | was better than DPO,
00:49:03.000 | which is interesting.
00:49:05.000 | And then Berkeley came out
00:49:06.000 | with this Starling-LM Alpha
00:49:08.000 | where they had a new preference
00:49:10.000 | data set, Nectar,
00:49:12.000 | which is still looked at today.
00:49:14.000 | And then they also used this kind
00:49:15.000 | of PPO method after training
00:49:17.000 | a reward model.
00:49:18.000 | And both of these came out
00:49:19.000 | about the same time,
00:49:20.000 | and they're like, huh,
00:49:21.000 | DPO isn't doing as well for us.
00:49:23.000 | The models are really good.
00:49:25.000 | Recently, the second Starling
00:49:26.000 | model came out.
00:49:28.000 | Its reward model is very strong
00:49:29.000 | in my testing.
00:49:31.000 | It's a 7b model that's almost
00:49:32.000 | reaching GPT levels in chatbot arena.
00:49:35.000 | It's crazy how fast
00:49:36.000 | these models are going.
00:49:38.000 | But we still get a lot of models
00:49:39.000 | that are both with PPO or with DPO.
00:49:42.000 | It's really not one or the other
00:49:44.000 | at this point.
00:49:52.000 | Okay.
00:49:53.000 | I think this is a reasonable time
00:49:54.000 | for me to take a couple
00:49:55.000 | of these questions.
00:49:56.000 | I might come back to them
00:49:57.000 | in more slides.
00:49:58.000 | But someone asked,
00:50:01.000 | "Is there a particular alignment
00:50:02.000 | method that I use?"
00:50:05.000 | This is teasing a paper,
00:50:07.000 | but there was a recent paper
00:50:08.000 | that came out where --
00:50:10.000 | I don't remember the group.
00:50:11.000 | I can find it later.
00:50:13.000 | But they did what they called
00:50:14.000 | a systematic study of PPO and DPO,
00:50:17.000 | and they showed that PPO was better.
00:50:19.000 | I will say that in the experiments
00:50:21.000 | that I'm seeing at Allen AI,
00:50:22.000 | I'm also seeing PPO to be stronger,
00:50:25.000 | and we hope to release
00:50:26.000 | this stuff soon.
00:50:28.000 | It's not one crushes the other.
00:50:31.000 | It's that we're seeing
00:50:32.000 | that for some reason,
00:50:34.000 | PPO is just getting
00:50:35.000 | a bit more performance.
00:50:37.000 | And then the logical question is,
00:50:38.000 | "Why not reinforce?"
00:50:40.000 | which is another one
00:50:41.000 | of these questions.
00:50:42.000 | I would love to try it.
00:50:43.000 | It's just like we have the code
00:50:44.000 | that we have,
00:50:45.000 | and we don't want to touch things
00:50:46.000 | that are working well,
00:50:47.000 | and there's just so few people
00:50:49.000 | that are kind of working
00:50:50.000 | in this space,
00:50:51.000 | which I'm like,
00:50:52.000 | "Let's get more people
00:50:53.000 | working on these things,"
00:50:55.000 | because there's so few people
00:50:57.000 | that can answer all these questions.
00:50:58.000 | So there's another question
00:50:59.000 | that says like,
00:51:00.000 | "Some say reinforce can work
00:51:02.000 | as well as if not better than PPO."
00:51:05.000 | It probably can.
00:51:07.000 | It comes down to your infrastructure,
00:51:09.000 | carefully fine-tuning it,
00:51:10.000 | what people are excited about,
00:51:12.000 | and a lot of luck.
00:51:13.000 | So we'll see these continue
00:51:14.000 | to play out throughout the year,
00:51:17.000 | but it's complicated.
00:51:22.000 | I'll come back
00:51:23.000 | to the Lama 3 question.
00:51:24.000 | I have one slide for that
00:51:26.000 | in a little bit,
00:51:28.000 | but really this modern ecosystem
00:51:29.000 | is how investment
00:51:32.000 | in releasing open models
00:51:34.000 | that people can use
00:51:35.000 | is continuing to grow
00:51:37.000 | into 2024.
00:51:39.000 | I think there's always been
00:51:40.000 | this tenuous period of like,
00:51:42.000 | there's only a few people
00:51:43.000 | releasing these aligned models.
00:51:44.000 | There's these important people
00:51:46.000 | in the ecosystem
00:51:47.000 | that are just doing this
00:51:48.000 | because they want to,
00:51:49.000 | and it's for fun.
00:51:50.000 | They might have a day job,
00:51:51.000 | and it's like,
00:51:52.000 | how long can this go on?
00:51:53.000 | What are the limitations on this?
00:51:55.000 | But in 2024,
00:51:56.000 | we've really seen more companies
00:51:59.000 | come into the space,
00:52:00.000 | and someone drew Lama 3
00:52:02.000 | on the screen.
00:52:03.000 | It's talking to coworkers,
00:52:04.000 | and they're like,
00:52:05.000 | "Yeah, you're going to need
00:52:06.000 | to keep adding models.
00:52:07.000 | You're never going to be able
00:52:08.000 | to give this lecture."
00:52:09.000 | Yeah, it's a losing battle.
00:52:10.000 | I know, but there's just
00:52:12.000 | way more types of models.
00:52:15.000 | So I get away with not having Lama 3
00:52:17.000 | on this specific slide
00:52:18.000 | because I'm talking about
00:52:19.000 | diversity of players and models,
00:52:21.000 | not just the fact that
00:52:22.000 | there are more great models.
00:52:24.000 | So there's interesting models
00:52:25.000 | like this one Genstruct
00:52:26.000 | from New Research
00:52:27.000 | in the last few months
00:52:28.000 | where it's like
00:52:29.000 | a specifically fine-tuned model
00:52:31.000 | for rephrasing any text
00:52:33.000 | into instructions.
00:52:34.000 | So if you have a book,
00:52:35.000 | and you want a model
00:52:36.000 | to be able to answer
00:52:37.000 | questions about this,
00:52:38.000 | why don't we just throw it
00:52:39.000 | at this rephrasing question,
00:52:41.000 | this rephrasing model?
00:52:42.000 | And the teams that I work on
00:52:44.000 | at AI2
00:52:45.000 | are trying to release
00:52:46.000 | instruction models
00:52:47.000 | where every single thing
00:52:49.000 | that we've done to train it
00:52:50.000 | is documented and reproducible
00:52:52.000 | from data
00:52:53.000 | to what compute it was.
00:52:54.000 | There's just,
00:52:55.000 | these models are getting
00:52:56.000 | new features
00:52:57.000 | in these little ways
00:52:58.000 | other than just being
00:52:59.000 | the "best open model."
00:53:02.000 | Such as like these
00:53:04.000 | corporate entities
00:53:05.000 | that are going for
00:53:07.000 | really standing out
00:53:08.000 | and open.
00:53:09.000 | So there's Databricks
00:53:10.000 | DBRX model,
00:53:12.000 | Cohere's Command R+ model.
00:53:14.000 | I think people are mostly
00:53:15.000 | blindsided by Cohere
00:53:16.000 | releasing model weights,
00:53:18.000 | but it was the first open model
00:53:19.000 | to pass GPT-4
00:53:20.000 | on Chatbot Arena.
00:53:22.000 | And that has been
00:53:23.000 | a long time coming.
00:53:24.000 | I think beating GPT-4
00:53:26.000 | on a human evaluation
00:53:28.000 | is not easy.
00:53:29.000 | And yes,
00:53:30.000 | the open is still
00:53:31.000 | like a year behind,
00:53:32.000 | but that's fine.
00:53:34.000 | As long as we have
00:53:35.000 | a functioning ecosystem,
00:53:36.000 | it'll continue to grow.
00:53:38.000 | Then there's other things
00:53:39.000 | like interesting research models
00:53:41.000 | like Rho came out,
00:53:42.000 | does data weighting.
00:53:45.000 | We're finally starting
00:53:46.000 | to get multilingual models
00:53:47.000 | with AYA,
00:53:48.000 | which is also from Cohere.
00:53:50.000 | People are getting more
00:53:51.000 | mixture of expert models
00:53:52.000 | to train on,
00:53:53.000 | which is just a bit more
00:53:55.000 | of an efficient
00:53:56.000 | pre-training equation.
00:53:57.000 | State space models
00:53:58.000 | are really taking off.
00:53:59.000 | They had this moment
00:54:00.000 | in December with Mamba,
00:54:02.000 | and now it's kind of
00:54:03.000 | continuing in 2024.
00:54:05.000 | So there's just a lot going on.
00:54:06.000 | And this makes me feel good
00:54:09.000 | because it's like, okay,
00:54:10.000 | I just have to keep doing
00:54:11.000 | what I'm doing
00:54:12.000 | and encouraging people
00:54:13.000 | to participate.
00:54:14.000 | And we're going to keep
00:54:15.000 | being able to do
00:54:16.000 | this kind of fun thing
00:54:17.000 | and figuring out
00:54:18.000 | how to make models
00:54:19.000 | and share them with people.
00:54:21.000 | This is my slide for LLAMA3.
00:54:24.000 | The reason why I didn't make
00:54:25.000 | a lot of slides about this all day
00:54:27.000 | is that LLAMA3's release
00:54:29.000 | is more about scaling
00:54:31.000 | the kind of ecosystem as a whole
00:54:33.000 | than it is about alignment.
00:54:35.000 | The LLAMA2 paper
00:54:37.000 | was extremely detailed
00:54:38.000 | about alignment.
00:54:39.000 | And we're going to get
00:54:40.000 | a LLAMA3 paper soon,
00:54:41.000 | if you can believe
00:54:42.000 | multiple sources at Meta,
00:54:43.000 | which I choose to.
00:54:45.000 | And when the LLAMA3 paper comes out
00:54:47.000 | is when we will learn
00:54:48.000 | all the interesting alignment things
00:54:51.000 | that they have done.
00:54:52.000 | That being said,
00:54:53.000 | they are very unlikely
00:54:55.000 | to release the human preference data
00:54:57.000 | that they did.
00:54:58.000 | I'm yet to succeed
00:54:59.000 | in getting them to release
00:55:00.000 | a reward model for LLAMA2
00:55:01.000 | or LLAMA3 from alignment.
00:55:03.000 | So we have more work to do
00:55:05.000 | on getting Meta
00:55:07.000 | to support this kind of
00:55:08.000 | open alignment ecosystem
00:55:10.000 | to the same extent
00:55:11.000 | that they are supporting
00:55:12.000 | the pre-training ecosystem.
00:55:14.000 | And this kind of scaling story
00:55:16.000 | that I'm saying
00:55:17.000 | very much connects
00:55:19.000 | to the previous slide
00:55:21.000 | where scaling and solving this
00:55:24.000 | is very much determined
00:55:25.000 | by the markets
00:55:26.000 | and like capital incentives.
00:55:28.000 | But so long as scaling
00:55:30.000 | is continuing to happen
00:55:32.000 | in the open ecosystem,
00:55:33.000 | it just means that more players
00:55:35.000 | are going to stick around.
00:55:36.000 | And in some ways,
00:55:37.000 | it kind of feeds back into itself
00:55:39.000 | where if this LLAMA3
00:55:41.000 | is rumored to have...
00:55:42.000 | Or they're training
00:55:43.000 | a 400 billion parameter model,
00:55:44.000 | which we're not 100% sure
00:55:46.000 | that the weights will be released,
00:55:47.000 | but it seems like
00:55:48.000 | that's Mark Zuckerberg's intent.
00:55:50.000 | And having that,
00:55:52.000 | which is about GPT-4 quality,
00:55:54.000 | really changes what you can do
00:55:57.000 | to get language models
00:55:58.000 | running in your products.
00:56:00.000 | So LLAMA3
00:56:01.000 | and how many people are playing
00:56:02.000 | in the open space right now
00:56:04.000 | goes to show that we have
00:56:05.000 | more of the same coming,
00:56:07.000 | which is interesting models
00:56:08.000 | coming on a weekly basis.
00:56:10.000 | And most people
00:56:11.000 | are just kind of accommodated
00:56:12.000 | to it now.
00:56:13.000 | People don't freak out
00:56:14.000 | when there's a new...
00:56:15.000 | They like Mistral's model
00:56:16.000 | because there's a magnet link
00:56:17.000 | and it's funny,
00:56:18.000 | but we're used to it.
00:56:20.000 | And I still expect that
00:56:21.000 | to be the case
00:56:22.000 | for the next year or two
00:56:24.000 | with this pace just kind of
00:56:25.000 | being how it is.
00:56:27.000 | And it's really fun to follow.
00:56:29.000 | And I just think that it's like
00:56:31.000 | not a time to be worried
00:56:32.000 | about being scooped,
00:56:34.000 | but to just kind of keep figuring out
00:56:36.000 | where you can contribute,
00:56:37.000 | whether it's on evaluation
00:56:39.000 | or some of these other
00:56:40.000 | alignment methods
00:56:41.000 | that people have talked about.
00:56:44.000 | So I have a quick thing
00:56:45.000 | on kind of current directions,
00:56:47.000 | which is where I'll come back
00:56:48.000 | to some of these data things
00:56:49.000 | that I mentioned multiple times,
00:56:51.000 | and then we can get to questions.
00:56:56.000 | The thing that people want to know
00:56:57.000 | a lot is are open models
00:56:59.000 | going to catch up to closed models?
00:57:01.000 | My answer is probably not
00:57:03.000 | ever completely.
00:57:05.000 | There will be some friction
00:57:06.000 | in the system by a time delay
00:57:08.000 | by which open models are closed.
00:57:10.000 | And open model weights
00:57:12.000 | are not inherently unsafe.
00:57:14.000 | The open versus closed debate
00:57:16.000 | has mostly converged around this.
00:57:18.000 | But given the territory
00:57:19.000 | that we're going with in AI,
00:57:21.000 | where we're uncovering
00:57:22.000 | new capabilities we've never seen,
00:57:24.000 | I think it's okay
00:57:25.000 | that if there's a few months wait
00:57:26.000 | before you have open weights
00:57:28.000 | so you can run on your laptop
00:57:29.000 | as we're discovering what AI can do.
00:57:32.000 | If you look at someone,
00:57:33.000 | if you look at Maxime's plot
00:57:35.000 | with trend lines showing them,
00:57:37.000 | it shows that open models
00:57:38.000 | are getting closer,
00:57:39.000 | but we're not really sure
00:57:40.000 | if open models will stay closer
00:57:42.000 | on chatbot arena in the long term.
00:57:45.000 | There will always be
00:57:46.000 | an open and closed category
00:57:47.000 | because there is demand
00:57:48.000 | to have models that are tuned
00:57:50.000 | to what you want them to do.
00:57:52.000 | So this kind of leans
00:57:53.000 | into my current directions.
00:57:55.000 | Data is the biggest limitation
00:57:56.000 | to alignment,
00:57:57.000 | which is we have
00:57:58.000 | like two or three data sets
00:57:59.000 | that are driving all the research
00:58:01.000 | and open alignment.
00:58:02.000 | Anthropx HH dataset
00:58:04.000 | for my friend Deep
00:58:05.000 | got that uploaded back in 2022,
00:58:09.000 | I think.
00:58:10.000 | Ultrafeedback from OpenBMB
00:58:12.000 | and Nectar from Berkeley Next
00:58:14.000 | slash Nexus Flow
00:58:15.000 | with the Starling models
00:58:16.000 | are what most people
00:58:17.000 | are focusing on.
00:58:18.000 | We need more,
00:58:19.000 | particularly if humans wrote it,
00:58:21.000 | to add more diversity
00:58:22.000 | to our models and more robustness.
00:58:25.000 | DPO is continuing
00:58:27.000 | in an academic sense.
00:58:28.000 | There is a comedy of papers
00:58:30.000 | extending DPO.
00:58:32.000 | So this is odds ratio
00:58:34.000 | preference optimization,
00:58:35.000 | which doesn't need
00:58:36.000 | a reference model.
00:58:37.000 | Constrained DPO,
00:58:39.000 | identity preference optimization.
00:58:42.000 | I don't remember what BCO is.
00:58:44.000 | And then I can't pronounce
00:58:46.000 | the KTO authors,
00:58:47.000 | but like Kowarski something optimization
00:58:50.000 | from contextual and Stanford.
00:58:52.000 | DNO, SDPO,
00:58:54.000 | which is like sequential DPO
00:58:56.000 | and self-reward.
00:58:57.000 | There are so many,
00:58:58.000 | and that's good.
00:58:59.000 | And that trend will continue.
00:59:01.000 | And at the same time,
00:59:02.000 | we're seeing more model sizes.
00:59:04.000 | Most alignment happened
00:59:05.000 | at the 7 or 13B scale.
00:59:07.000 | I think there's a large drive
00:59:09.000 | to make smaller models aligned.
00:59:11.000 | Google is releasing
00:59:12.000 | 1 billion parameter models,
00:59:14.000 | but it's also an opportunity
00:59:15.000 | where there aren't that many people
00:59:17.000 | playing in the space.
00:59:18.000 | But it's something
00:59:19.000 | that a lot of people want
00:59:20.000 | just because to run
00:59:21.000 | these models locally,
00:59:22.000 | making them smaller
00:59:23.000 | makes it way easier.
00:59:25.000 | And then kind of running back
00:59:26.000 | to two themes throughout
00:59:27.000 | this lecture is
00:59:28.000 | what are specific evaluations?
00:59:30.000 | That we should be building
00:59:32.000 | and how do we personalize these models?
00:59:34.000 | They kind of go hand in hand.
00:59:36.000 | These are the things
00:59:37.000 | that I'm thinking about.
00:59:38.000 | I welcome feedback from them.
00:59:41.000 | I kind of identified some people
00:59:43.000 | that I'm following
00:59:44.000 | to see where new models come out.
00:59:46.000 | So I try to release models at AI2.
00:59:49.000 | Hugging face quickly turns around
00:59:52.000 | new aligned models
00:59:53.000 | under the Zephyr brand.
00:59:55.000 | These kind of Berkeley necks
00:59:57.000 | and necks of slow people
00:59:58.000 | building data sets
00:59:59.000 | and straddling models.
01:00:00.000 | New research is a kind of --
01:00:02.000 | they started as just a guy.
01:00:04.000 | Technium was fine-tuning models
01:00:05.000 | and now it's a company
01:00:06.000 | for fine-tuning models.
01:00:08.000 | OpenBMB in China
01:00:10.000 | has been doing a lot
01:00:11.000 | of preference data sets.
01:00:12.000 | They've recently released
01:00:13.000 | some data sets called UltraInteract,
01:00:15.000 | which is some math preference data
01:00:17.000 | for doing RLHF and fine-tuning.
01:00:20.000 | Argilla is a startup
01:00:22.000 | around building tools
01:00:23.000 | to annotate data.
01:00:25.000 | It's focused on preference data.
01:00:26.000 | And there's even just individuals
01:00:28.000 | that are driving this narrative.
01:00:30.000 | So Maxime and John,
01:00:32.000 | there's just a lot of people.
01:00:33.000 | Model merging is something
01:00:34.000 | I didn't talk about.
01:00:35.000 | But it's kind of like TPO,
01:00:37.000 | but taking it even farther,
01:00:38.000 | where model merging is so accessible,
01:00:41.000 | you don't need a GPU
01:00:43.000 | to merge models.
01:00:44.000 | It's a for loop.
01:00:45.000 | So people are going to try it
01:00:46.000 | and there's going to be iteration on it.
01:00:48.000 | So in this alignment space,
01:00:50.000 | never bet against people
01:00:52.000 | where they can just try things
01:00:53.000 | and see what's better.
01:00:55.000 | Excuse me.
01:00:56.000 | And then eventually learn.
01:00:57.000 | That's what model merging is.
01:00:58.000 | And it's going to be here to stay.
01:01:01.000 | So thanks for listening.
01:01:04.000 | Happy to take questions.
01:01:05.000 | And thanks to my many teammates
01:01:07.000 | at Hugging Face and AI2
01:01:08.000 | that make it look like
01:01:10.000 | I did so many of these things.
01:01:11.000 | But there's a lot of great contributors
01:01:13.000 | that underlie this.
01:01:15.000 | So I'll kind of slow down
01:01:17.000 | and drink some water
01:01:18.000 | and answer some questions.
01:01:19.000 | But thanks for coming again.
01:01:23.000 | Yeah, so the top question on scores,
01:01:25.000 | please rate them
01:01:27.000 | because it's easy for me to see them,
01:01:28.000 | was about odds ratio,
01:01:29.000 | preference alignment.
01:01:31.000 | I think it being agnostic to the method
01:01:33.000 | is the best thing,
01:01:34.000 | but you probably need to be good at engineering
01:01:36.000 | to get really good at one method
01:01:37.000 | to get a specific model out.
01:01:43.000 | And kind of getting these deliverables
01:01:44.000 | is important to getting recognition.
01:01:48.000 | I don't know if people can talk via microphone,
01:01:50.000 | which is a much more natural experience,
01:01:52.000 | but I'm just going to keep talking to myself.
01:01:58.000 | There's a question around the future of alignment,
01:02:00.000 | given simple methods can circumvent fine-tuning.
01:02:03.000 | I think that the future of alignment is,
01:02:06.000 | like safety is not the only thing that matters.
01:02:08.000 | There's a lot of promise
01:02:10.000 | showing that alignment helps
01:02:11.000 | with how much people like the model.
01:02:13.000 | So how much RLHF improves the user experience
01:02:16.000 | and how much it improves code and math abilities.
01:02:19.000 | So like while everyone hates Q*,
01:02:21.000 | like Q* has some things to guide towards,
01:02:24.000 | which are using synthetic data and RL search
01:02:26.000 | and stuff to improve the raw capabilities,
01:02:30.000 | rather than just talking about safety.
01:02:36.000 | Okay, onwards.
01:02:44.000 | Yeah, people are asking about the fact
01:02:46.000 | that Llama3 uses...
01:02:48.000 | Llama3 said that they use instruction fine-tuning,
01:02:51.000 | rejection sampling, DPO and PPO
01:02:54.000 | for their aligned models,
01:02:56.000 | which I was like,
01:02:57.000 | I don't know how they're using all of these things,
01:02:59.000 | but I think they're shifting the abilities incrementally
01:03:01.000 | to provide nice initialization for the next method
01:03:05.000 | and to keep being able to use new human data
01:03:07.000 | and make the metrics go up.
01:03:09.000 | I think over time, that will become simpler.
01:03:12.000 | In the future,
01:03:13.000 | Meta will not have this convoluted
01:03:14.000 | five-stage multi-method process,
01:03:17.000 | and we'll figure out a way
01:03:18.000 | to distill that to one algorithm.
01:03:22.000 | Pitfalls of synthetic data is repetitiveness
01:03:25.000 | and not robust distribution.
01:03:27.000 | So most of the synthetic data sets out there
01:03:30.000 | are about like,
01:03:32.000 | they have very similar things in there,
01:03:35.000 | and that is like,
01:03:38.000 | the models are going to generalize less well
01:03:40.000 | and probably get less boosts from alignment training
01:03:43.000 | if there's not this kind of general improvement
01:03:46.000 | to capabilities.
01:03:48.000 | So we want to take some in-person questions.
01:03:51.000 | Oh, yeah, that's much better.
01:03:54.000 | Does anyone have some in-person questions to ask Nathan?
01:04:00.000 | Okay, yeah.
01:04:05.000 | Hi, thank you so much for the talk.
01:04:08.000 | What do you think are the greatest hot spots
01:04:10.000 | of research or work
01:04:12.000 | in terms of personalized language models,
01:04:14.000 | and where do you see them having the most impact?
01:04:20.000 | This is one of the things that I'm excited about,
01:04:22.000 | the local LLM community.
01:04:24.000 | Like, I'm not particularly ideologically aligned
01:04:28.000 | with like the effective accelerationist stuff,
01:04:31.000 | but I do think that they have a lot of drive
01:04:33.000 | to create a language model that they like to use,
01:04:36.000 | so that therefore there's going to be things
01:04:38.000 | we learn from them.
01:04:40.000 | And it's kind of a classic,
01:04:42.000 | like how to integrate multiple communities.
01:04:44.000 | So it's like academics aren't used to looking there,
01:04:46.000 | but I'm sure there's a lot to learn there.
01:04:49.000 | Yeah.
01:04:50.000 | I guess there were multiple questions
01:04:51.000 | about advice for the field,
01:04:53.000 | whether it's like grad school or...
01:04:56.000 | I'll give my advice with the caged advice
01:04:58.000 | that is that you should be very wary
01:05:01.000 | of listening to people's advice
01:05:02.000 | because it's based on their situation.
01:05:04.000 | But I think that the most important thing
01:05:06.000 | you can do when the field is crazy
01:05:07.000 | is just keep trying to develop skills
01:05:09.000 | and keep trying to build something
01:05:11.000 | that you think matters.
01:05:12.000 | 'Cause it's like, at the end of the day,
01:05:14.000 | that's what you're making progress on,
01:05:15.000 | and you'll never be able to keep track of everything,
01:05:18.000 | and that's okay,
01:05:19.000 | and I can't keep track of everything,
01:05:21.000 | and I'm still trying to train models
01:05:23.000 | and build data sets.
01:05:24.000 | So it's just like grad school
01:05:25.000 | is about learning to do research,
01:05:27.000 | and that still has value,
01:05:29.000 | but industry is also fun if you want to do a startup.
01:05:32.000 | So there's not like...
01:05:33.000 | You just have to think about what you want to do.
01:05:36.000 | - I think someone sent me...
01:05:39.000 | You can hear me, right?
01:05:40.000 | - Yeah.
01:05:41.000 | - Someone sent me a question through Zoom.
01:05:44.000 | A quick question.
01:05:45.000 | You indicated that making lower methods
01:05:47.000 | work with reinforcement learning is tricky.
01:05:49.000 | Do you think lower methods work well
01:05:51.000 | with DPO or its variants?
01:05:54.000 | - I haven't seen it be particularly successful,
01:05:57.000 | so that's my general rule of thumb
01:05:59.000 | is I really wait to go deep into a method
01:06:02.000 | until there's been a model release
01:06:04.000 | that's in the relevant ballpark
01:06:07.000 | with that method.
01:06:10.000 | So the fact that it's been around for so long
01:06:12.000 | and hasn't happened could be a blind spot,
01:06:15.000 | but I think that there's some weirdness
01:06:17.000 | that's preventing it from happening.
01:06:21.000 | - Great. Okay, another one.
01:06:24.000 | Thank you for the talk.
01:06:25.000 | You mentioned GPT-4 being used as an evaluation method,
01:06:28.000 | but it causes data contamination.
01:06:31.000 | What are some ways to mitigate this?
01:06:36.000 | - Oh, man. Yeah.
01:06:37.000 | I mean, this is why it's nice to have human evaluation,
01:06:40.000 | but I don't know if I have an answer.
01:06:42.000 | At this point, I'm kind of fried
01:06:44.000 | from reading LLAMA-3 stuff and giving this lecture,
01:06:47.000 | but that's the fundamental problem
01:06:49.000 | is how to disambiguate various biases in evaluation
01:06:54.000 | and still get the signal out of them.
01:06:58.000 | - Right. Okay. One more.
01:07:01.000 | Give me a second.
01:07:04.000 | For stuff like LLAMA-3 training on so many tokens,
01:07:07.000 | like 15 trillion,
01:07:08.000 | would that actually make it harder to align this model
01:07:11.000 | without losing some capabilities?
01:07:14.000 | Learn from this over-training.
01:07:18.000 | - It's not technically over-trained,
01:07:20.000 | but every model will have kind of a different point
01:07:24.000 | by which they're released,
01:07:26.000 | so that's why you'll need a different learning rate
01:07:28.000 | in batch size and data sets for models,
01:07:30.000 | so you will need a different kind of way of continuing it,
01:07:35.000 | but that is a common confusion on how --
01:07:40.000 | I mean, I don't even have an intuition for it,
01:07:42.000 | just to know that I have bought this thing in the past
01:07:46.000 | and been proven wrong about it,
01:07:47.000 | but it's not that it's over-trained
01:07:50.000 | or harder to fine-tune.
01:07:51.000 | It's just that there's more information into the model,
01:07:53.000 | and as you continue to do this,
01:07:55.000 | the model can keep learning.
01:07:57.000 | It just takes more and more data to get marginal improvements,
01:08:00.000 | so Meadow is willing to invest more money into the model
01:08:03.000 | to make it just a bit better,
01:08:05.000 | but that should only help.
01:08:06.000 | That shouldn't hurt.
01:08:09.000 | - Right, great.
01:08:10.000 | Here's another one.
01:08:11.000 | Do you think synthetic data generation, like Cosmopedia,
01:08:15.000 | is the way to go for making controlled
01:08:17.000 | or trusted domain-specific models?
01:08:22.000 | - I think it'll be very good.
01:08:23.000 | I also think it's a good way to get around the fact
01:08:25.000 | that Google is paying Reddit $60 million a year
01:08:31.000 | to use their data so that we can no longer train
01:08:33.000 | on the newest Reddit data.
01:08:35.000 | I think that Cosmopedia and synthetic data sets
01:08:39.000 | at a large scale can be a way around this,
01:08:41.000 | and there are rumors that industry is doing something similar.
01:08:51.000 | - Give me a second.
01:08:52.000 | I think there's one that I missed.
01:09:01.000 | Could you please share some insights
01:09:02.000 | on why you are finding PPO better than DPO?
01:09:07.000 | - It's mostly like it ends up extracting more from the data,
01:09:12.000 | so it's like the benchmarks end up being a little bit better
01:09:15.000 | if we get it set up correctly with the same starting point.
01:09:20.000 | It's like you choose a set of evaluations that you care about
01:09:23.000 | and you look at them, and through fine-tuning,
01:09:26.000 | it's primarily a group of great grad students doing this.
01:09:29.000 | It's just running a ton of models and trainings,
01:09:32.000 | and they're seeing that PPO reliably can be doing
01:09:34.000 | a little bit better, and it's like this is the fine margins
01:09:38.000 | that a lot of AI works on nowadays.
01:09:43.000 | - Great.
01:09:46.000 | Do you foresee a better evaluation method to be determined
01:09:50.000 | by a stronger or more specialized model,
01:09:54.000 | which means rule-based metrics are dead forever?
01:10:01.000 | - Maybe.
01:10:02.000 | I try not to say no to things.
01:10:03.000 | This is becoming philosophical, which is like I'm trying
01:10:05.000 | not to say no to things in the language model space
01:10:08.000 | with how fast things are progressing.
01:10:10.000 | It's like I should try not to bet against progress continuing.
01:10:15.000 | This goes for pre-training and alignment,
01:10:17.000 | and it's like at multiple stages in the last few months
01:10:21.000 | come to benefit me.
01:10:22.000 | So it's like if you just assume that things will get better
01:10:24.000 | and they will work, it's like just makes it a little bit easier
01:10:28.000 | to wrap your head around things.
01:10:34.000 | - One last one here from -- give me a sec.
01:10:41.000 | At its core, an LLM is trying to approximate
01:10:43.000 | a complex distribution.
01:10:45.000 | Would you say that alignment is the process
01:10:47.000 | of squashing specific parts of this distribution
01:10:50.000 | according to what humans prefer?
01:10:58.000 | - Yeah, I think that's phrased generally enough
01:11:00.000 | that I could get behind it.
01:11:01.000 | It is.
01:11:02.000 | Alignment is about changing the distribution,
01:11:05.000 | and it can be multiple tokens.
01:11:08.000 | It's like a multi-turn prediction.
01:11:10.000 | RL is not just autoregressive.
01:11:12.000 | It can be these kind of multi-string different things
01:11:16.000 | that are getting shifted around,
01:11:17.000 | and it's a really different loss function.
01:11:23.000 | - Here's one from -- how do you envision the usage
01:11:26.000 | of watermarks for both open and closed language marks?
01:11:32.000 | - I think it a lot of times feels like a losing battle.
01:11:35.000 | I think that a practical solution in the future
01:11:38.000 | is that if you want to prove something that is human-made,
01:11:42.000 | you can prove that it was generated by a human
01:11:44.000 | by having a certain tool rather than trying to understand
01:11:48.000 | if a specific content was made by an AI.
01:11:52.000 | So the assumption will be that all content was made
01:11:54.000 | by an AI unless proven to be human.
01:11:57.000 | It's not what I would consider a sociologically good answer.
01:12:02.000 | It just seems like a practical one.
01:12:05.000 | - Makes sense.
01:12:07.000 | I think we have a few more minutes,
01:12:09.000 | so if anybody has any last-minute questions,
01:12:11.000 | feel free to send them over to me on the Zoom chat.
01:12:16.000 | - Yeah, that was much better than me half-reading the question.
01:12:22.000 | - All right, here's one.
01:12:24.000 | What are your thoughts on different optimization functions
01:12:27.000 | to train large-language models rather than using MLE?
01:12:31.000 | What could be good research directions there?
01:12:37.000 | - I think this is the whole idea of what RLHF represents.
01:12:41.000 | And that's why, like, if you ask people who have been in NLP longer,
01:12:44.000 | one of the most compelling arguments for RLHF for me
01:12:47.000 | is, like, you now have extreme flexibility on the loss function
01:12:51.000 | while we were kind of limited on what our regressive losses could do.
01:12:54.000 | So there's kind of arguments that it's, like, why is there any limit
01:12:57.000 | if we could just keep doing more and more tokens of RL training?
01:13:00.000 | It's a really, like, general framing, but, like, RL's loss function,
01:13:05.000 | you make it so that the training of a language model
01:13:07.000 | can incorporate many different things, and that's very exciting.
01:13:11.000 | That could be, like, the 10-year goal of RLHF.
01:13:16.000 | - To what extent is training on adversarial data effective
01:13:21.000 | for defending against crescendo and other simple multi-turn attacks?
01:13:27.000 | - I haven't spent as much time on safety as I would want to,
01:13:29.000 | but I think that it's, like, it'll be this everlasting dance
01:13:33.000 | where if you have example data, you can defend against it,
01:13:35.000 | but it will not be impossible to generate new data.
01:13:38.000 | So it mostly comes down to the use case that you're looking at protecting.
01:13:42.000 | So if you want to protect something really important,
01:13:44.000 | you need to have layers on that that are not just sensitive
01:13:47.000 | to a new prompting technique, but, like, limit what the model can do.
01:13:50.000 | That's kind of--it's, like, a use-focused theme,
01:13:53.000 | while the kind of whole, like, security is a very complicated thing otherwise.
01:14:01.000 | - Here's one on quantization.
01:14:06.000 | Do you see potential in quantization methods such as BitNet, like 1.58 bit?
01:14:12.000 | If so, do you think BitNet will become popular?
01:14:17.000 | - I have no idea. I wouldn't--this is what I mean.
01:14:20.000 | It's like, okay, sounds cool. Wouldn't rule it out.
01:14:27.000 | - You think there's a need or a way to control large-scale data extraction
01:14:31.000 | from large language models like Cosmopedia?
01:14:38.000 | - I do think there's a lot of wills and a lot of ways to explore
01:14:41.000 | making the synthetic data better. I think it's very early.
01:14:44.000 | I have a project that's going on it, and it is one of the few ways
01:14:48.000 | that can generate more tokens, which is, like--
01:14:51.000 | like, people are actually running out of tokens,
01:14:53.000 | especially if you try not to train on things that you're not supposed to train on.
01:14:56.000 | It's, like, then you can just generate more data,
01:14:59.000 | and as we've seen with LLAMA, if you have the compute, more data will help you.
01:15:08.000 | - Let's see. Self-play-like things.
01:15:12.000 | Any chance you can kind of expand upon or share your opinions
01:15:15.000 | on self-play-like things like OpenAI super alignment work?
01:15:22.000 | - I think people will keep using language models in the loop of training
01:15:25.000 | other language models, but it's a kind of broad field
01:15:29.000 | that doesn't have full agreement on how to do it.
01:15:37.000 | - Okay, great. And I think we're pretty much out of time,
01:15:39.000 | so if folks want to get in touch or have more questions,
01:15:43.000 | can they email you? - Yeah.
01:15:46.000 | - Okay, great. But, yeah, thanks so much again for taking the time
01:15:50.000 | and giving us such a great talk. So, yeah, give it up for Nathan.
01:15:56.000 | - Thanks, everyone. - And I think the slides,
01:15:57.000 | as well as the Hugging Face collection, are all posted on our website
01:16:00.000 | as well as Discord, so in case anybody wants to follow along.
01:16:09.000 | - Sounds good. Thanks a lot for having me.
01:16:11.000 | - Yeah, no worries. Thanks, everyone. - See everyone soon.
01:16:14.000 | - Bye-bye.
01:16:15.000 | - Bye-bye.
01:16:16.000 | [BLANK_AUDIO]