Stanford CS25: V5 I RL as a Co-Design of Product and Research, Karina Nguyen

Now, I'm delighted today to have Karina who is a director of CS25. Welcome to the second lecture of our current offering of CS25, Transformers United. So this will be the first lecture featuring an external speaker. So the rest of the course, we'll have folks come in and talk about, you know, the state of the art and cool research they're doing as part of their work.

So we have an exciting lineup of speakers for you guys for the rest of the quarter. So I'm delighted today to have Karina from OpenAI. So she works on both product and research there and also previously worked at Anthropic. So I'll let her take it from here. Can everybody hear?

Cool. Let's see. So before I start the stock, I would love to set a tone for I'm not here to lecture you, but I would love to have much more collaborative and interactive session. And I do hear I've heard a lot of, you know, people kind of concerning about AI and the future of AI.

And I understand that the development of a GI is both scary and exhilarating. And if anything, I would love to get for you to get out from the stock is the fact that I think everybody can have like very meaningful future and everybody can like build something really, really cool with an AI and as a part of this journey.

Cool. So before I started with a talk, there'll be all about how ARAO is a co-design of product and research. And, you know, we are shifting both in the labs, we're shifting the focus towards more of a frontier product research, where the tasks that we teach the models are becoming much more real world tasks.

So and before we get started, I'll just like share some of the vignettes of like what I'm really, really excited about the future of AI and what the AI is capable of right now that can inspire everybody. So the first one is actually education. I'm actually really, really excited about the fact that AI can democratize education.

So here's an example of me asking ChatGPT to explain the Gaussian distribution. And then with, and in Canvas, it creates, you know, the code itself in order to visualize for me. So it explains in text, and then I ask it to write in code, and then you can render the code in order to visualize it.

So in a way, it becomes much more, a little bit personalized. The second demo is also something that a lot of people might want to use it. So let's, let's see if you take a screenshot from the paper and you want to understand what's going on. And the model can go and explain in a separate canvas, that's like another feature in ChatGPT.

And then, then you can start to have much more interactive conversation with the model by selecting very, certain like things and try to like ask more questions, you know, follow up questions. So in a way, when ChatGPT came out to be first in 2022, it was purely conversational UI.

It was mostly chat. And as the use cases kind of grown from the product itself, you know, there was no expectation of how people would use ChatGPT at that time. But over time, you've seen a lot of people started using it for code generation, for writing and long form writing.

It became very obvious that the interface of Chat was kind of very limited. very limited. So, um, canvas is kind of like the first attempt, uh, of, um, of our team to break out from the cycle in order to allow people to have much more fine-grained collaboration with AI.

So, um, and actually today, uh, Antarctic published a small report, uh, around the education and how people use, uh, Claude for education. And it's very interesting to see the correlation between the U.S. bachelor's use and the way people use, um, um, yeah, use cases. So you can see there's a huge difference between, like, computer science there.

Um, another thing that I'm really, really excited about the AIs is that anyone can create their own tools for themselves, for their friends, for their family, or even run their business. And, um, you know, the models now can generate a lot of front-end code, and then you can render that code inside the canvas and then iterate on that and have much more visual representation of things.

Um, you know, on Twitter, I've seen a lot of people creating, like, very, very personalized and customized tools that they really, really want, even, like, games like chess. And with recent image and, uh, image generation model from, um, open AI, you can literally sketch anything, um, on, with your own hands and then recreate or kind of realize what you're dreaming of, uh, in, within, in image and in the style that you really, really want.

So, um, I do really hope that the creativity, the human creativity, and how AI tools can help anyone to become creatives or be an artist, um, in a way that was not possible before. Another thing that I've tried actually on the mobile is to create a mini game, um, and you can easily do that in Canvas.

So, I asked to just, like, generate me, you know, a React app of an interesting game because I'm the plane, and then the model does that. And I do hope that in the future, instead of me prompting it to generate a game, it will kind of, like, be more proactive and have this kind of personalization, um, as AI become more of a companion to humans.

Um, and, uh, an exciting thing, um, about CHPT is that the compositionality of different capabilities that can allow you to augment, like, human creativity. So, here I can, I asked, um, to generate an image of the UI and then, uh, sorry, um, generate an image of the UI and then ask the model to kind of implement this.

And you can do that literally in Canvas right now and, like, it can render, it's not, um, it's just a front-end, it doesn't have full kind of stack engineering. Um, but, you know, you can do some of the things like this, you can compose different tools and compose in a way that was never possible before.

So, I do hope that ideas like this will be very, very powerful and everybody in this room can play with those things. Um, and actually, how did we get there? And I think there are, like, two main scaling paradigms that helped us to get there. The first one is next token prediction, obviously, if we use a pre-training scaling paradigm, um, where the model kind of becomes, like, a world-building machine understanding the world in a much better scale.

Um, you know, but it becomes, the next token prediction, uh, works really, really good at certain tasks, but it becomes harder at tasks like writing. If the model predicts a wrong next token, it can kind of, the coherence of the plot will just be lost in the pertaining stage.

And maybe you kind of want to recover this in reinforcement learning. And the next second paradigm that has happened, um, is, um, RL on a chain of thought for more complex, uh, tasks. And this is the reasoning work from OpenAI, and now it's being adopted by a lot of different labs.

And, um, I do think this is scaling RL in itself is another paradigm that we can train the models on the axis, um, to, and especially for the real world's tasks that was never possible before. So, all the agentic work, um, agents like Operator, Deep Research, um, other agents are kind of, like, trained on this kind of new paradigm overall on the chain of thoughts.

Um, so, yeah, I, I, I want to, like, uh, finish this section of, like, vignette of, like, yeah, build, create, build, and make something wonderful in this world. And I hope, like, people be more inspired rather than scared that AI is going to take their jobs or kind of remove their creativity.

Instead, I feel like people can become more powerful with, with their imaginations with these tools. So, I currently work at OpenAI, before that I worked at Anthropic, and, um, I was kind of, like, at the intersection of product and research. When I first came to Anthropic, I was a product engineer.

And then, over time, I switched to research engineering. So, my background is mostly both in product and research now. And what I've learned over and over across different projects is that there are two kind of main ways of building research-driven products. And the first one, um, is when you have unfamiliar capability of the model, your job might be to create familiar form factor for that unfamiliar capability.

And, in examples of that could be, um, you know, ChatGPT was like this. Or, a hundred contacts from Quad was like this. Um, I'll share, let's dive deeper into this. Actually, the early prototypes that I've done before joining AnyLab was actually working with Clip, um, which is a contrastive model.

And, um, it's, like, image, um, text-to-image, basically. And, um, I fine-tuned Clip a little bit on the images that I was excited about. And I think this is the prototype of, you know, fashion search. Um, I think some people, you know, people kind of, like, it went viral on Twitter.

And I feel like that's because people really found some usefulness if you bring some, like, Clip technology into some form factor that people will like. And so, um, I think a lot of the product work is around like this. It's like, if the model has a certain capability that was never possible before, how do we create this new, um, form factor?

The same happened with a hundred key context. When Claude, um, started being able to, you know, consume the entire books, um, you can imagine to have various form factors. File uploads is just, like, very general and something really familiar to people that can just, like, dump the entire document into Claude and ask follow-up questions.

But you can imagine other form factors for this, like infinite chats. And in a way it's, like, infinite memory, um, as a way to kind of, like, take this a hundred key context into some form factor. So you can, like, exercise this product thinking by, um, thinking of, like, novel ways that people can interact, um, with this novel technology.

Um, another example, this is more speculative, it wasn't deployed anywhere, um, is if the model has kind of a sense of self-calibration, uh, uh, or, um, it's also called P, I know. If the model knows the confidence of the answer that it knows. So, for example, if I'm really, really confident in this claim and then my confidence may be, uh, 85%, then maybe there is a way for the interface to highlight that.

Um, maybe the, um, more highlighted versions is much more confident, um, kind of, much more confident, uh, claims and less highlighted as, um, kind of less, less, uh, uh, confidence. So, in a way you can imagine if the model, yeah, you can, like, think of, okay, if, if you train the model to be really good at, like, self-calibration, how do we then represent that to the humans?

And if, if anything, will it be useful, um, for humans? Same happened with, uh, when we were trying to deploy a one preview, um, the chain of thought itself is a very alien thing. And, um, a lot of people were, okay, how do you bring along humans with models thinking, right?

If the model, if, if the human needs to wait for chain of thought for, like, two minutes or five minutes, that's kind of boring. And, um, people don't, I think it's just like more of a, how would people perceive the model thoughts? Um, and one thing that we did, uh, is actually we wanted to create the streaming, uh, kind of interaction.

Um, so the model would always stream its thoughts ephemeral, um, and we trained the model to do that. Um, so, in a way there was chain of thought as an alien technology, as an alien artifact of the model, and then trying to figure out, like, how do we best bring to the humans, um, is another way of thinking of, uh, building product.

Okay, so another way, the second way of building research, um, during products is actually start with a deep belief in what you want to make, um, with either from a product perspective or, um, kind of like vision. Um, and literally make the model do that. Okay, so I feel like this is more of a common thing.

Um, so we can go through some examples. Um, before Antarctic, I worked at the New York Times, and, uh, in a lot of ways, we were thinking about how to represent information, uh, to people. And, like, how do we add the layer of context to the product and coverage?

And at that time, we were all working on, like, elections. And, uh, we only had tools like NLP. But you can imagine this idea, this concept could have extended, given the current tools of AI, to have much more dynamic, um, representation or dynamic UIs that people could, um, consume the content better.

Same when I was working on this, um, product, uh, the command line, the terminal, the new terminal, uh, I think it would be much, much richer if you could have integrated auto-completion or some of the benefits of GPT-3 at that time, um, into the product itself. Um, and so if you wanted to have, it was a vision of, like, having much more humane command line for people to, for junior engineers to actually, um, be more well-equipped.

Um, yeah, and then, like, early prototypes around, uh, with GPT-3, even though it was just next token prediction, um, how do I make a writing IDE, uh, with GPT-3? And at that time, it was just, you know, trying to, um, whenever I type, it would auto-complete my thoughts, almost.

So those ideas were kind of present early days with the technology that was coming out. Um, and yeah, and I feel like when I was at Anthropica, I kind of realized that, and actually, if you want to create, like, new interaction paradigm of interfaces, you actually need to train the model, um, for that.

Um, another example of what we did with Claude, which I don't think a lot of people know, um, is when Claude generates titles, it actually has some micropersonalization. So it takes the style, uh, the writing style of the user, and then generates a title in the same style of the user.

So you can imagine certain, like, interesting micropersonalization that you could create, um, within the products themselves. Um, and, um, another project was Claude and Slack. And that was the vision of Claude becoming a first virtual teammate. It was back in 2022. Um, and, you know, Slack was a very, very natural workspace where people collaborate.

And, um, you can, you know, imagine a Claude model to be able to jump into the threads and suggest new things. Or sometimes, um, you know, Claude was really, really good at, uh, summarizing the, what's, what went, what's going on in, within this channel at the high volume content.

Um, and so this was the first kind of vision and the first attempt of making Claude being a virtual super assistant, being, and being able to, like, use different tools. And, um, my first, one of the first projects at OpenAI was around Canvas. Um, so that was, you know, that was in the same spirit of breaking out from the chat interface to something different.

And we wanted to create, um, much more human AI collaborative and flexible affordance that will scale, uh, with new modalities. So, Canvas is not just, like, a thing that you can write into this. It's also a thing that the model can write into this. And the model can render a code.

And another model can check. And you can, like, create, like, the interfaces that will scale with new model capabilities and other tools that will happen. So, one thing, one interesting thing, um, about Canvas is that we actually postchained the model purely on synthetic data. And, um, you know, we live in the age where, um, a lot of most powerful reasoning models, um, can be distilled via APIs.

And this, like, distillation is a very powerful idea, uh, of having a student and teacher and have a teacher, um, to teach things to a smaller model. So, we've trained, um, you know, this model to become more of a collaborator. But what does it mean for a model to become a collaborator?

And how do we actually make evals for this? Right? Um, so, one thing that we wanted to kind of decompose is that teaching the model to use a tool is kind of different behavior from, uh, having a model to be proactive or act as a collaborator. Um, so, teaching a tool in itself is also have a nuanced kind of behavior.

So, one thing that we, um, had to calibrate the model on is when to entirely rewrite the document versus when to, uh, if you look at the canvas, sometimes the model can just specifically select sections and delete those and rewrite them, uh, in a much more fine-grained way instead of, like, rewriting everything.

Um, when, um, you know, to create, uh, code in canvas versus, um, ask, like, a python tool call. So, it's like different tools, compositionality, um, is happening. So, it's a lot of work around, um, teaching the behaviors for the models for this. Um, and you have, you can employ different, um, techniques for this.

And I can, I can share a little bit more, uh, how we did this cloud. Um, so, another project that we did was, uh, it's called tasks. And, you know, everybody is familiar with the idea of having reminders or to-do lists. And, um, the model can now maybe schedule tasks for you.

Uh, but the most important thing about it is, is the tasks itself is very, very diverse. And it's not just, you know, just a reminder of your to-do list. It can create stories for you every day. Or, it can continue as a story, uh, from the previous day. So, you can imagine this, like, the modularity of two compositions, but it's very powerful in product.

Um, okay. So, let's go to the case study of the model behavior in more specifically. Um, so, I really want to, like, dive deep into the idea of the model behavior. You know, how do we shape the models and why, how do you post-train the models on the behaviors that we want?

Um, and, you know, I think to be more specific and be grounded in the real world, uh, use case. I'll share more, uh, how we might want to think about shaping the, um, kind of behavior of the model, uh, around refusals. So, you know, let's, let's have, we, we, it's also like the second way of making the product where we have this, um, kind of vision of how the model should behave.

And that vision is also grounded by the, you know, cross-functional collaboration between different teams on how we want the model to respond to various users. Um, so, one particular thing that you can imagine is, like, the model should have more opinions, but with caveats. It's just you have opinions, but with caveats.

So, what does it mean, actually? So, the model maybe needs to be more decisive, um, when asked direct questions. And, um, let's say one thing that we've seen with RLHF is that the model is very sycophantic back, like, early days in, like, 2022, 2023. If the model would just agree with everything you would say.

And so, how do we actually teach the model to be more not to do that and be more nuanced? Um, another thing that was annoying, uh, is, is that, like, the model says, I don't actually have a point of view. Um, and, uh, is just, just willing to chat with people.

But actually, maybe the model should have some views on certain things. Uh, um, yeah. And then sometimes the model could indicate when, when things are just opinions, um, and notes its own biases and inconsistencies. And it should acknowledge, kind of have, like, self-knowledge of what it knows and what it thinks is right or not right.

And I think the model, like, back in 2022, cloud 1.3, uh, didn't really have, like, any thoughtful responses for things like philosophy, uh, like, ethical questions. So, uh, you can imagine to post-chain the model, um, on the behavior that would be much more nuanced than the philosophical questions. Um, yeah.

And, like, other behaviors that you might want to encode in your model. So, you kind of, like, list out all the various things that you would like a model to behave. So, maybe, like, the model can have better knowledge of who it is and what it's capable of. Um, because we've seen in the product a lot of people just, like, ask the model about its features.

And oftentimes it doesn't know. Um, yeah. And, um, let's dive deep into the, uh, more specific example. So, cloud 2.1, I don't know if anybody remembers. Um, but when it was launched, I think it was, um, kind of had the issue of, like, over-refusals. Um, yeah. Like, 2.1 would refuse tasks that superficially sounded harmful, but actually were not.

And, um, you know, it wasn't just caused by a single source of data. Uh, so we had to, like, investigate. Um, and, you know, we knew that it was fixable. Because for some reason, something was 2.1 led to some refusals on benign prompts, then 2.0. So, you kind of have, like, a really good baseline for experimentation and debugging.

And actually, the way you debug the model behavior is actually very similar to how you would want to debug software. Um, and, um, and the way we would approach this is, um, okay, how do we actually craft this Monion's refusals? Um, so the first principles that they had is, like, maybe the model should assume charitable interpretation of what the person is asking without being harmful.

Um, so instead of, you know, crafted dialogue between two characters who are planning a complex heist, the model would refuse because it's not comfortable with that. But the model should have much more charitable interpretation. And, you know, this is a creative writing prompt, so probably it should respond. Um, and other principles that we've kind of, like, thought about is, like, how do we actually use non-verbal communication, non-violent communication principles?

Um, you know, maybe the model should, um, refuse, um, more, like, I statements instead of, and take responsibility for its own refusal instead of saying U statements or any judgment to the user. Um, ask if the user would be willing to make some changes, so Claude can be more comfortable with its boundaries.

That's another kind of, like, very nuanced behavior we wanted to teach the model. So, um, you know, then the model needs to know what its boundaries are. Um, and this is, like, much more of a matter, kind of, post-training learning. Um, yeah, and also, like, acknowledges impact. Like, I know this may be annoying to you, and this is, like, much more empathetic answer than saying, you know, um, I don't want to respond to this.

Um, and then, you know, I think we've kind of came up with different, like, refusal taxonomy. There are, like, benign, uh, over-refusals on, like, homeless prompts. Um, there were creative writing of refusals. Um, and, uh, there are actually some of the interesting refusals were around, um, tool calls or function calls.

Um, it might have access, uh, had a tool to view a note, but actually would say, I can't see your notes. Um, and why is this happening? Um, yeah. So, other, yeah, other, uh, taxonomies of refusals at that time was, like, you know, long document attachments. If I upload the document, it would just say, like, I don't have a capability to read this document.

Like, why? Um, so something in the data might have been causing this. Um, and misdirected refusals, um, when they, um, you know, it kind of, like, took interpretation, um, of the user. And it should have had, like, much more charitable, charitable view of the user during, um, yeah. So, you kind of construct this, like, categories, um, of various refusals.

Because, you know, if it's in every behavior or, like, a capability that you want to, like, post in the model, you have, like, various use cases and various edge cases. And you kind of want to be approached for some more nuance. Um, and, yeah, like, so the first thing in every research project is, like, how do you actually, what evals do we build to trust?

And, like, what evals do we build to trust? Um, and there are, um, for subjective things like this, or for class, obviously, evals for certain tasks, class of tasks, like math, would be very, very different from what is here. Um, so, in terms of refusals, um, how did we construct the evals?

Well, first, it's, like, product of Flywheel, right? Like, we have all the users and manually collected prompts that wouldn't use refusals. Uh, we can also, like, send data to generate diverse prompts on the borderline between harmfulness and helpfulness. And those are prompts are around, like, creative writing, um, like, edgy creative writing, as we call it.

Um, and you can also use other evals. You kind of want to construct a suite of evals. Um, like, X-test was, like, 200 non-malicious prompts. Um, wild chat data set, um, a collection of diverse user chatbot interactions with ambiguous requests. Topic switching political discussions. So, you can also, um, use some of the open source benchmarks.

Um, yeah. And I think, like, general approaches, not particularly what has happened with cloud, but, like, more of a general approach to the model behavior frustrating is that, you know, you kind of want to, like, look at the data, clean up the data. Um, you might want to consider either, like, collect targeted human feedback collection for supervised fine-tuning or preference modeling, reward modeling.

Or you might not want to do that because human feedback is very costly. And now, especially with reasoning models, um, right, you might want to just not have any human feedback. And control was, like, synthetically generating some of the behavioral changes, uh, like a preference data to trained reward models.

Um, and do RL. So, I think, uh, at that time, you know, I think this is, like, more employing, uh, constitutional AI, uh, principles of those, uh, anti-refusal behaviors. And create a preference data, where you only change one particular feature within your pairs, let's say, of preferences. You only want to control, um, one change that, in that pair, uh, to have much more control over the reward model data.

Because if you can, like, the dumbest, simplest thing that you can do is, like, take, um, a preference, like, take a response from model A and take a response from model B and prefer B over A. But, um, it doesn't necessarily reduce, um, or fix the spurious features that the model will learn that you don't actually want to.

Um, so you kind of want to craft this distribution. Um, yeah, so this is mostly around, like, crafting the distribution of the data that you want. Um, and actually, yeah, look at the data, like, you would debug software. Like, each refusal might be caused by different data sets, right?

If it's, like, a tool called refusal, maybe it came from some kind of self-knowledge data that it would teach that the model doesn't have a physical body. And, uh, the model might just refuse to set an alarm, um, because it doesn't have a physical body. But actually, it does have a tool to set an alarm, right?

So, it has some of this contradictory data that might affect weird model behaviors. Um, yeah, also, like, long dog refusal like this. Um, it was creative writing few results. It's, like, more of a, this balanced, um, challenging act between, like, safety and harmfulness, uh, data and helpfulness data. Um, in cloud three model, Carvey actually, uh, wrote everything about this.

Um, you know, models that are trained to be more helpful and responsive to user requests may also lean towards harmful behaviors. Like, sharing information that violates the policy. And, conversely, when models just over-indexed on homelessness can tend towards not sharing any new information with users. Which, in itself, makes the model very unusable.

Um, so, navigating this balancing act is very challenging, um, and works. So, yeah, and this is the plot of, uh, what we did and the results of cloud 2.1 and cloud 3 at that time. Um, and, yeah, like, you, kind of, want to, like, web check, uh, the responses.

So, uh, when we asked the draft, like, fictional science sci-fi novel about surveillance system. Actually, 3 would respond in the match manual. It will respond instead of, like, um, refusing. Same here. Uh, yeah, mostly for creative writing, um, um, tasks. Cool. Um, my third section, um, of the talk.

So, before we go. Before I jump into this, um, I would love to invite you if anybody has any questions. Or, um, I don't want to, like, mumble all along. Um, anything I can just move on. Yeah. Um, I'm curious, like, what's your process that you guys follow in order to, like, see how some new thing you're trying to push out, like, how do you go from wanting that particular teacher or behavior to actually, like, inducing that into models?

Like, like, I don't know. Yeah, I don't know. Um, you know, it's, um, it's, um, it's supposed to do all of the risk. So, um, you might want to, um, you know, if this is your project, then you might want to think of what kind of data you want to collect and how you would want to collect the data.

And then, uh, you can take, like, uh, the base config or, um, train a model, um, the same way as, you know, let's say you want to make a change in, like, 4.0. You might want to take the 4.0 model and add your data change and then retrain the model again and then see the effect on the evals that you will build.

Um, there are other much more cheaper approaches, um, like, incremental, like, training on top of, um, the 4.0, let's say. Um, yeah. So, like, using either SFT. So, like, some of the choices that you will have to make is either you want to change in, um, like, supervised fine-tuning, uh, stage or you might want to have retrained reward model or, um, create a new kind of, like, evaluator grader, um, for that particular task.

And then you can, like, create prompts in a raw environment and then, um, exercise those, um, exercise that scale for the model and then see if the model learns over the course of training. And if it doesn't, um, you know, you then look at the bunch of, like, plots and, you know, your plot might go up, but maybe some other plots might go down, so you kind of want to, like, calibrate, um, and fix those.

Or, yeah, it's a very complex, um, as more and more tools and as more things we teach the model, it becomes much more, uh, kind of uncontrollable, almost. Um, yeah. Cool. Um, the third section is more about, um, the point that how you construct a raw environment and rewards is how your product will work.

Um, you know, I think real-world use cases is what creates the complexity of a raw environment. And the complexity comes from, you know, teaching the model to complete hard tasks. And oftentimes hard tasks are, require much more than just answering the question. They require tools like search, um, code tools, uh, computer use tools, uh, reasoning over a long context.

Um, and kind of like reward design that you want to shape. Um, and I think maybe it's obvious, maybe it's not obvious. But let's say, you know, as we teach the models, for the model to become very useful, we actually need to teach the models on useful things. And, um, you can think of like, uh, let's say we want to teach the model to be, uh, like a soft engineer.

Then, okay, what does it actually mean? It doesn't mean it like creates really good PRs. Then your task distribution will be on that. And how do you evaluate what is a good PR? What is not good PR? Um, is actually, isn't itself more of a, can be also like a product thinking work.

Um, same, um, you can dive deeper into the creative storyteller, right? Like if you want to teach the model to be, uh, good at writing, actually, what does it mean for a human to be a good writer? Well, they actually need some kind of tool to draft and edit their thoughts and like have multiple days to do that.

And maybe the model should be able to do that. Um, maybe the model should have a tool that where you can like edit and draft. And oftentimes for creatives, you know, people go observe the world for like a long, like sometimes they like connect the dot, like the dots are getting connected in the very random times.

And maybe you want to expose the model to never ending, uh, kind of search engine. Like the model should be able to always have access to the latest state of the world. Uh, and then, um, maybe over the course of let's say a week of being exposed to latest thing that is happening in the world, the model can start reflecting on the world and like write something.

Um, and maybe that's like much more natural process of writing than just like ask or prompt it, like write about X, Y, Z. Um, and we are shifting towards much more, uh, more complex, uh, kind of interactions within their all environments. So the multiplayer interactions, right? It's not just one user communicates with one model, but it might be the case that multiple users, multiplayer collaboration that can happen with a model.

So if I'm a, um, product designer and your product manager, we collaborating on something and we want to collaborate with an agent to, um, make a new product. And this in itself is a task that you can learn in RL, but each user has different preferences and each user is, has different, um, things.

So yeah, how do you construct this environment is actually important. And, uh, you know, multi-agentic environments is, um, more of, uh, the model debate with each other or deliberate on like certain topic to reach a conclusion. So you can construct, um, I, I think like multi-agentic is like more, more like AlphaGo, like, uh, kind of environments, um, where they, they like get reward by achieving something, um, together, maybe.

Um, so, um, I think in AI labs, I feel like we are also shifting, um, possibly the focus from, because we, we kind of like optimized so much around the, uh, class of time. And we're shifting the focus on the, uh, class of tasks that is like really easy to measure, like within like math, uh, competitive programming.

Right. And we're shifting the focus maybe towards more subjective class of tasks that is really, really hard to measure, but that are becoming much more important if AI models get socially integrated in our lives. Um, and more specifically, let's say emotional intelligence, right? Right. The humans who use chat GPT, uh, use it so much for things like coaching therapy, uh, emotional, but we don't actually have much open source evals on this.

And how do we actually measure that is actually becomes much more interesting question. Um, like social intelligence in voice, voice mode, right? I think it's one thing for a model to be intelligent in reason, like math and it's reasoning. Another thing, but another axis of intelligence is when I talk to a model in voice, uh, it can actually suggest something really meaningful.

It might just say like, Hey, I noticed you did X, Y, Z, maybe I should create, um, a new tool for you. So I think it's a different kind of like, uh, social, like intelligence in that way. Um, another cause of tasks that I'm interested in is, uh, writing, obviously I think, um, models, creativity.

Um, yeah, writing is like really hard to measure because it's so personal and subjective. Um, but it's interesting to think about, uh, can we make those tasks a bit more objective? Um, you know, everybody loves or like some kind of like sci-fi novel. Okay. What makes it really, really good?

Um, so maybe there are certain like rules, technological rules or consistency of the world, um, that people really like or the character development. So you can like decompose, um, those subjective tasks into much more objective. Same with like visual design and aesthetics. Um, and for the model to generate something really aesthetically interesting, um, it should know like the basic principles of good visual design.

So those are like much more objective. Yeah. And I think like, this is more of a kind of new, new kind of product research that a lot of people are starting doing is creating new RL tasks. And this is more of a simulation of real world scenarios, um, leveraging in context learning.

If you want to teach like a new tool or something, um, leveraging synthetic data, wide distillation from stronger reasoning models. Um, yeah, you can think of like inventing new model behavior and interactions like multiplayer and also incorporating product and user feedback, uh, during the entire process. Well, another axis of this work is actually around reward design.

Um, how do we teach the model? Um, like what kind of feedback would I want to give to a model so that it will learn how to better operate in this real world scenario use case and be more adapt in social contexts. And actually this is, this requires a real quite deep product thinking, right?

So we want to teach the model, um, to have like follow up, like meaningful follow up questions, but not like overly being annoying or something. Um, so how do you reward for completing a tasks in a way that makes, that will make a lot of sense in the product and will shape, um, the product experience with the user in the future.

Um, and obviously during that process you will get into, uh, VR things like, do you have a question? I'm just curious, you guys thinking about like, kind of infer the rewards from the data from like a certain group of people. I guess they will have some like internal reward, you know, like the mechanism to make decisions and then you can use that to sort of, you know, to inject such a insight into a model.

Yeah. Yeah, I think there are definitely various approaches to how you construct the reward, right? Like there was one, um, you can construct very simple rewards. You can also like train a reward model, um, that will, as yeah, you can like train like some kind of like inverse, um, reward modeling, um, tasks.

Yeah. Um, it depends on the tasks too. Uh, it depends on the, like what kind of thing that you want to like optimize the model for. Um, yeah, and, um, I think an interesting, um, thing that you will discover during this entire process is, uh, reward hacks, which is very, very common in RL.

And there are many, many different reasons why it's happening. I actually really highly recommend to read Lillian's blog about reward hacking in RL. It's very comprehensive. Um, there's, uh, reward hacking is basically when, um, the model achieves high reward for, uh, things that it actually didn't do. Like it's kind of deceived.

Um, and especially right now as more and more systems, we use other AI models, LLMs to be evaluators for policy models. Actually an interesting, the most common like reward hack is when a policy model tries to deceive, uh, to think that it, deceive the evaluator model such that it will think that the policy completed the task or something.

So, um, here's, um, an example of like the code patch tool. So the model is like, to skip all the tests, you can define this function that always skips. So, and then it passes. Um, and it's mostly interesting. There's a paper, recent paper from OpenAI around, um, just like monitoring, uh, reasoning models for misbehavior.

And interesting finding is that if you, you actually don't want to optimize chain of thought, uh, on being, using, because then the model will be much more kind of hide it in its intent. Um, so yeah, it's like very interesting paper on like reward hacks. And especially with like more complex reasoning models, the complexity of reward hacks will also, um, change.

Um, and especially, let's say in the software engineer, and you might not want to know what kind of code changed, uh, it made in order to create some certain vulnerability. Right? So you need actually, um, more, um, create like new affordances to have like much higher, like much more trustworthy verification of the model outputs.

Um, and this is like more of an alignment problem too. Okay. Um, I'm almost done. I think this is the fourth section. Um, it's more like vignettes, uh, and the future of human AI interactions is how I think, um, about things. Um, obviously I made this graph like a year ago and nobody cares about MLU anymore, but I think this is to communicate the fact that the cost of reasoning is drastically decreasing and it will only just be decreasing.

Um, and this idea of like raw intelligence in itself has just become so cheap that I think anybody can create like really useful and really amazing things with those models, um, at pretty much low cost. Um, you know, I, yeah, as I mentioned before, it's like, um, when we are entering the age where it's really hard to verify AI outputs because I'm not an expert in let's say medical, um, or financial analysis.

Like how do we actually teach create like new affordances for humans to verify or edit models outputs, um, and help them to teach the models. Um, I do think there's a cool future of like dynamic generative UI. Um, it's kind of like invisible software creation on the fly. So let's say you talk to a model and, uh, you know, I say like, I want to learn more about the solar system.

Um, instead of right now, the model will just like output text, but I would want to believe that in the future, uh, it would be much more personalized. Let's say I'm a visual thinker and you are more of a listener. Like if you're a listener, it will create maybe a podcast, but if I'm a visual, um, person, it might want to create me a picture or, you know, um, like a 3G, a 3GS visualization.

Um, yeah. And this idea of like, uh, this interface is like ephemeral and it's like self morphs depends on understanding your intent and your context. Um, it's like deeply personal personalized models. Um, yeah, I'm also excited about personalized access to healthcare and education. Um, I think it's really, um, incredible, um, that anybody can check the symptoms and with chat or something and get like some advice.

Um, and also like with that also comes like some of the interesting like consumer hardware, um, in the future. Um, and, uh, lastly is like our relationship to the process of storytelling will forever change. Like the way we tell the stories, the way we'll create new novels, maybe co-writing with like, with models, co-scripting new films.

Um, I don't think like, I think there will be the new generation of creative people who, um, will change our relationship with storytelling. And I would hope to think that the current creators don't, are not scared of the eye and more open minded to use those tools for their process.

Yeah. Thank you. Some questions. Yeah. So thanks Karina for the very interesting talk. Um, so yeah, now we'll open the floor to questions, a Q and A session. So, um, we'll structure it more openly. So anything you want to ask her about, um, research products, um, things that opening high, um, that she can talk about and so forth.

And for the folks on zoom, feel free to ask questions in the chat. Um, and so we'll take a mix of a zoom and in-person questions. Um, thank you. Yeah. Um, is there any, like, category of law? Because, like, there aren't any shorts on it right now. Like, things that are really a lot.

Yeah, there's, like, not enough people who wish they were not. Yeah, I mean, I think, yeah, creative writing. I don't think, like, anybody. Um, yeah, like, um, a lot of researchers are, uh, I think, I think, yeah, creative writing. I don't think, like, anybody. Um, yeah, like, um, a lot of researchers are, um, like, I think, I think, yeah, creative writing.

I don't think, like, anybody, um, yeah, like, um, like, um, a lot of researchers are, really, like, um, working on problems where it's very easy to evaluate, I think. Um, and I think there is a class of problems that is, like, subjective tasks. Like, there's no open, like, frontier benchmark for creative writing or emotional intelligence.

Um, right, but you can construct one if you want to. Um, yeah, we can, like, brainstorm if that's helpful. Yeah. Um, yeah. Yeah. Um, yeah. Yeah. is like, do you think about like ROI working on things that are easy to evaluate? Like you can move much faster. How do you consider things that are more subjective?

So you can't move as fast, but they're also important. Like would it be better to just take all these and put them on the things that aren't objective and you can move very fast? Yeah, I think moving fast though is also as tasks becomes like much more complex, as we kind of like optimized everything out from kind of easy tasks.

Now we kind of want to like teach the models for more low, like longer horizon tasks of software engineering or kind of automating like AI research, right? This in itself as a very hard tasks. And, you know, you have like kind of have some milestones. Maybe you can create benchmarks to hit those milestones.

Yeah, I do think this is like, you know, I think like this is all solvable things. And I don't think people should be worried about like moving fast versus like slow, because I feel like everything is moving fast these days. Yeah. If you were to build a startup right now, what startup would you do?

Um, I had, I mean, I was thinking about building a startup before joining Anthropic with that project was into Alia. Um, well, let's see, I actually really, like one thing that I recently told someone is like, I would actually build something around like particle collider, I don't know, or bio tech.

Like, like, I don't think we need larger particle collider than CERN, then somebody should be building one. Um, yeah. Because I feel like, uh, the models would be able to build any product that you imagine. Um, yeah, sure. Yeah. What do you think are the major bottlenecks for your team or, um, for OpenAI?

Um, I don't know, it is, it is, yeah. Um, the bottlenecks are, oh, the question is, what's the major bottlenecks, uh, for OpenAI? Um, I don't know, I'm not sure if, like, faster execution should be solved with more people versus, um, like using AIs now to, to help us move faster, if that makes sense.

So I feel like that's the bottleneck. I think there was a lot of, I think right now we are like in the middle of figuring a lot of things out in order to like, like, I feel like in a year or two, that will accelerate us so much more.

I feel like infrastructure is actually one of the major bottlenecks, right? If you don't build infrastructure with, uh, multimodal, like as a first class citizen, then, um, obviously all the multimodal things will be much more, um, slow, like, um, more difficult. So, um, yeah, in a way it's like infra, um, um, figuring out like what to prioritize at a given time, because sometimes you can say yes to everything and then not having a focus time is also a thing, yeah.

Um, basically when the model checks the turns context and updates how you should score the same parameter based on that turn. Wait, um, I'm a little bit confused by this question. Um, I think generally like rubric based grading is, um, very powerful and, um, you know, as long as you optimize, um, things that you want to optimize and there's no reward hacks, then it's good.

Um, how do you imagine AI will be used by creatives? If not just generating entire works on their own, how will they be integrated in their creatives flow? Well, um, I think, you know, I think AI is, is just like right now, maybe it's more like you would use Figma or like tools like Adobe.

Um, and I think in the future, it's more, um, maybe it will be much more co-creation with an AI rather than using it as a tool. Uh, maybe we have like live brainstorming and create things on the fly and then publish it together. it's like much more companion, um, work.

Cool. Other questions? Other questions? Um, tasks that are not captured by road models? Like how do you capture? That cannot be captured? Um, I mean, I can't think of any because, um, the way you can, uh, the way you teach the road model is based on some of the human preferences maybe, right?

Or like pairwise comparisons. So essentially you can almost teach anything, uh, to the model. I do think the complexity will, like the complexity always arises with like tools, right? And like, if you give very complicated tool, um, then you might want to be, like the learning is like much more difficult.

So I feel like, um, I do hope that almost anything can be teachable in a row. Yeah. Um, so I'm curious, like, now that you bring up like the learning of the frequency, how do you prevent the model from converging to like the main, like, frequency level? Like Airbnb listings all looking the same.

Like, how do you inject some creative diversity? Yeah, I think this is an interesting question. I think they, uh, the, the two ways, um, like what's talking on my mind is, um, how do you preserve entity from the base model, right? Like, uh, base models are super diverse because they can literally elicit any human preference or, um, a human thought.

Um, so how do you, um, you know, yeah, preserve the entity from the base model during the role? And another way, um, is, uh, the, the reason why we love our layoff, like reinforcement learning from AI feedback and create synthetic generations of pairwise comparisons, let's say, is because it can induce the diversity that you really want to teach on like distribution that you care about.

Uh, that is not like the mean average consumer. Um, yeah, because sometimes like, you know, um, average human can prefer certain emojis or markdown, but actually you don't want the model to behave like this. So in a way you can discourage the model from doing that. Um, yeah. So I feel like synthetic curation, like synthetic data generation is mostly like curation of, um, that kind of diversity.

Um, yeah. Um, yeah, I think it's qualitative, especially for the model behavior, like refusals. Um, Oh, um, how do I distill this question? Um, um, how do I distill this question? Um, yeah. Like how do you capture like models limitations on something automatic versus a home manual? Um, I think there was a lot of benefit of literally playing with the model and, uh, look at the outputs and see what are the weirdnesses it has.

Um, there are definitely like, um, maybe more automatic checks, uh, and those are mostly like evals, right. Um, but, um, and maybe you have an eval that specifically checks with the behavior that you really don't want. Um, and that might be helpful, but a lot of nuanced weirdnesses that, um, like, did you realize is like through kind of like manual.

And, and like, um, and that might be helpful. But a lot of nuanced weirdnesses that, um, like, did you realize is like through kind of like manual. And, and like, um, and that might be helpful. Um, and like, um, another thing is like the model should be consistently behaving like this because if it's just like one off, then it's fine.

But if the model consistently exhibits this behavior, then, uh, this becomes like a more problematic thing. Yeah. You mentioned in one of your slides are a huge tosses coming down. Um, but also, you know, it's kind of subjective creative dimension. like creating a whole visual interface or something. Arguably complexity of the problems, uh, increases.

Do you think that, you know, for those problems, compute is still pretty much a whole limitation? Or do you think that we're kind of getting to this point where, uh, that improvement is going to be improvement to the models or data sets? Um, I think that computer efficiency is important.

I think that with more test time compute, generally the assumption is, uh, um, the model can always get better and better. So it might achieve kind of human level of visual design, but can it just invent new interaction paradigms, like new interaction patterns? I feel like that's more of a superhuman skill.

Um, and I do hope that with more compute, um, it will do that at some point. Yeah. So you can actually, um, you can actually, uh, um, how do you verify it? Okay. Let me, uh, um, Um, well, the thing with synthetic data is that you don't actually need to generate a lot of it, right?

Um, so you can actually have like very much like manual inspection of what's going on. Um, obviously you can think of other methods, like, um, ask a human laborer, um, to check the work, um, you can also ask, um, you can also ask another model if that model is like very well covered.

So it's like much, it becomes a more like meta thing. Maybe you can see a lot of it. Um, you can actually have like very much manual inspection of what's going on. Um, obviously you can think of other methods, like, um, ask a human laborers to check the work.

Um, you can also ask another model if that model is like very well covered. So it's like much, it becomes a more like meta thing. Maybe you can have a meta eval for that model to verify what you want to like verify. Um, and I think we are kind of entering those types of tasks too.

Um, but the thing with synthetic data, um, and the data itself is like maybe you don't need as much. Um, and I think we are kind of entering those types of tasks too. Um, but the thing with synthetic data, um, and the data itself is like maybe you don't need as much.

Um, what's important really is diversity and, um, yeah, diversity. Cool. Just a follow-up question to that. What do you think about like the diversity and quality tool? So there's like been like recent code generation covers where it's like, um, well, they put instruction diversity as like, um, they prioritize it more than code.

So, um, yeah, I think it's, um, it actually depends so much on like, like task and what you're trying to do. Sometimes synthetic data, uh, like a lot, if you like, if synthetic data happens to collapse certain mode and then it will like actually be hurting, uh, your training.

But if synthetic data happens to collapse certain mode and then it will like actually be hurting, uh, your training. But if synthetic data happens to collapse certain mode and then it will like actually be hurting, uh, your training. But if synthetic data happens to collapse certain mode and then it will like actually be hurting, uh, your training.

But if synthetic data happens to collapse certain mode and then it will actually be hurting, uh, your training. Uh, your training. But if synthetic data is actually, um, diverse, it might actually at scale, if you do a lot of URL on this, it might just like recover from this.

Um, yeah. Yeah. It's hard to give you the right answer. Cause it depends. Nice. Another question. Thanks for, um, so I have a question that's about making money, I guess. So, um, okay. It's a real question though. Uh, like we all know that, um, serving large language models is expensive, especially at the scale that, uh, you know, OpenAI and Thropic are doing.

Uh, and my impression has been that, uh, at the current stage, um, um, it's actually losing money to, uh, you know, serve all these models, uh, to produce the products. Uh, is that true? Or, uh, what, what's being done to, to, I guess, bring down the cost? Oh yeah.

I mean, I think, uh, for a developer, um, I think, I don't think like, I, I, I don't know. I think it's the question for Sam, whether we are losing money or not. But, um, I do think that the, like the generality of the technology is like so wide that, um, you know, I don't think, I think about like how to make money in, on a regular basis.

I don't know. But, um, yeah, as a developer, let's say, uh, and if you're using all the tools, you actually don't need to like create a, like foundation model now. Like you can just use, uh, like very inexpensive, like, very like open, like deep seek is like very interesting results, right?

So you can have, you can like bootstrap from existing research. I think it's harder to be at the frontier because you kind of need to like invent and that might be expensive. Like when you invent something new, it's always inefficient. It's always expensive. But, um, actually it was every technological innovation.

And like, then what comes next, the second innovation is, uh, how do we bring down the cost of the thing that, that happened, right? And this is what's happening in AI too. Um, And then you can create an amazing product and this is how you make money. Yeah, that's, that's a good answer.

Uh, just quickly following up. So, um, is most of the cost reduction coming from like infrastructure improvements or can better models or better algorithms also, uh, contribute to lower costs? Um, lower costs. Yeah. I mean like, uh, lowering costs, but, um, I think it's, um, the costs of production, maybe just, you know, the cost of production of the model training itself might go down.

Um, like it's, um, the training process in itself, um, might be not as costly anymore, but obviously, uh, we scale a lot. It's like scale. Yeah. It's very much like in linear relationship, like scaling and compute. I think it's just like, yeah, it's hard to know. I'm sorry. I didn't have an answer to this.

Yeah. Okay. Thank you. So how do you envision LMS could be used in other fields, such as robotics or a body AI? Oh yeah. I do think like, um, future AIs will be building data centers, um, um, to all like, um, or how are the current alums in robotics.

I think I've, I've read like, um, um, PI, um, the company by Sergey Lemon, like the paper, they using a lot of like RLHF or like RL, uh, for robotics tasks. Um, obviously I feel like data is a huge limitation and bottleneck. Um, but I think as long as you solve that, um, I think it would be really amazing.

Yeah. Um, I'm very hopeful and very, yeah, excited about this. Oh, hi. So you mentioned you work, uh, in both product and research. So I'm interested in, um, from the developer or the researcher point of view, uh, Anthropic and OpenAI. Um, what's the visibility for you into other components of the model and how easy is it for you to learn?

For you to learn, say the pre-training of GPT, why you are like responsible for part of the post-training? Oh yeah. Um, well, I mostly worked on post-training side of things. Um, obviously pre-training, um, obviously you just cannot run like pre-training on your own. You want to like be a part of like the next big, uh, like training or something.

So that's like part of the teams that you can join. And I think there's, um, good visibility in like those steps. And then sometimes you want to, you know, uh, contribute the datasets to them or, um, help with certain tasks that you're interested in. So I was curious, do you have any co-workers who are AIs right now?

And do you use like agents in a kind of co-worker relationship right now? Um, I don't think I have like an amazing product to be like this, right? I think I use Chai GPT on a regular basis. Is it my co-worker? Not really. Like sometimes they ask like, what do you think?

Yeah, it was like in Chai GPT. Like what I really want, uh, I don't know if people have used this product called Tuple. It's a pair programming, um, software, but then it's like, you can just call a model, um, and then share screen. And then the model can, um, just literally start like editing my code if I'm coding.

Or I can like highlight things and then the model can see it and change this as much more of a natural co-working, um, form factor. But no, like, I don't think we are, technology wise, like we are not there yet. Do you have any ideas for like what's missing and having an autonomous, like co-worker partner?

Because I mean, I've seen the cool, you know, demonstrations here. I think one of the paper like a year or two ago on just like simulating like a city of people and all that. Like what's the gap between that? And so like a peer that you could have on Slack.

Like I think the gap is actually social intelligence and like some of the human things, like being able to like in real time, um, like generate things that I'm asking and then also being smart about things. Like maybe I want to coach too, but like will the model be able to like tell me instead of like, uh, taking agency from me, the model can just like guide, guide me through this.

Like it depends on like how you want to form this relationship. It's like a really hard. So the model is, I feel like it's kind of limited by the social abilities. Does it make sense? Like I think that's missing, um, kind of like, um, speech to speech conversation. Um, like can the model converse back with me in real time and then point to things that it talked about exactly at the same time.

So it's like those type of, um, things that might need, might have possibly need like some other changes in like architecture and like multimodality things. Yeah. In your experience, which is, which are the biggest differences between traditional product development? Yeah, that's a good question. Cause, um, I interned briefly at like companies like Dropbox or, um, Square and they may mainly very much like, traditional software product development and they go through like this interesting life cycle of, okay, I have like PRD.

I do this. I incentivize designer designers comes up with like UI's and then I asked software engineers to do this. Um, we have two minutes, I believe. Um, yeah, but, uh, I think with research during products is actually comes from research. Uh, like if the research has like an impressive demo, uh, of the model capability, um, maybe then, um, there is, um, then you shape the product around it.

Um, and sometimes it's like both product and research come together from the very beginning and do something. Um, it just happened with the canvas for example. Yeah. When both product and research were together and there was like less, less of a process. It's like more of a ad hoc.

Yeah. Um, for fundamental unverifiable domains like creative writing or visual art. Um, so, yeah, it's interesting. Um, I think, you know, nobody has tried it. I'm like, this, uh, this looks reasonable. Um, yeah, especially if creative writing, you have all the like competitions or prizes that people, uh, got from, um, their writing.

So there's something in there. Um, someone who's still concerned with AI. We're stepping into the creative space. Do you have any personal moral affliction on how your work is so powerful? And that was, um, I actually wrote a blog post about this. Um, I have a blog post. It's called in sub stack and the post is called moral progress.

And I think you will find it interesting. All right. So give, uh, give a hand again for, uh, Karina. Thank you. Thank you. Thank you.

Stanford CS25: V5 I RL as a Co-Design of Product and Research, Karina Nguyen

Transcript