back to indexThe New Code — Sean Grove, OpenAI

00:00:07.000 |
Hello, everyone. Thank you very much for having me. 00:00:26.520 |
It's a very exciting place to be, a very exciting time to be. 00:00:30.680 |
I mean, this has been a pretty intense couple of days. 00:00:37.520 |
I don't know if you feel the same way, but also very energizing. 00:00:40.840 |
So I want to take a little bit of your time today 00:00:43.840 |
to talk about what I see is the coming of the new code, 00:00:46.880 |
in particular specifications, which sort of hold this promise. 00:00:52.360 |
where you can write your code, your intentions once, 00:00:59.920 |
I work at OpenAI, specifically in alignment research. 00:01:03.760 |
And today, I want to talk about sort of the value of code 00:01:12.440 |
I'm going to go over the anatomy of a specification, 00:01:21.640 |
And we'll talk about communicating intent to other humans. 00:01:26.520 |
And we'll go over the 4.0 sycophancy issue as a case study. 00:01:30.360 |
We'll talk about how to make this specification executable, 00:01:38.080 |
and how to think about specifications as code, 00:01:46.160 |
So let's talk about code versus communication. 00:01:50.080 |
Real quick, raise your hand if you write code, 00:02:06.080 |
if you feel that the most valuable, professional artifact 00:02:16.800 |
We all work very, very hard to solve problems. 00:02:27.360 |
And the ultimate thing that we produce is code. 00:02:40.160 |
But it's sort of underselling the job that each of you does. 00:02:44.640 |
Code is sort of 10% to 20% of the value that you bring. 00:02:48.560 |
The other 80% to 90% is in structured communication. 00:02:53.280 |
And this is going to be different for everyone, 00:02:54.880 |
but a process typically looks something like you talk to users 00:03:00.640 |
You distill these stories down and then ideate about how to solve these problems. 00:03:21.920 |
And then you test and verify, not the code itself. 00:03:28.400 |
What you care is when the code ran, did it achieve the goals? 00:03:32.960 |
Did it alleviate the challenges of your user? 00:03:36.240 |
You look at the effects that your code had on the world. 00:03:40.400 |
So talking, understanding, distilling, ideating, planning, sharing, translating, testing, verifying. 00:03:51.280 |
These all sound like structured communication to me. 00:03:55.280 |
And structured communication is the bottleneck. 00:03:59.520 |
Knowing what to build, talking to people and gathering requirements. 00:04:07.760 |
And at the end of the day, knowing if it has been built correctly 00:04:11.120 |
and has actually achieved the intentions that you set out with. 00:04:15.760 |
And the more advanced AI models get, the more we are all going to starkly feel this bottleneck. 00:04:22.160 |
Because in the near future, the person who communicates most effectively 00:04:31.440 |
And literally, if you can communicate effectively, you can program. 00:04:35.760 |
So let's take vibe coding as an illustrative example. 00:04:45.360 |
Well, vibe coding is fundamentally about communication first. 00:04:49.440 |
And the code is actually a secondary downstream artifact of that communication. 00:04:54.400 |
We get to describe our intentions and the outcomes that we want to see. 00:04:59.520 |
And we let the model actually handle the grunt work for us. 00:05:02.320 |
And even so, there is something strange about the way that we do vibe coding. 00:05:12.560 |
And we tell them our intentions and our values. 00:05:22.960 |
And if you've written TypeScript or Rust, once you put your code through a compiler, 00:05:29.760 |
or it gets down into a binary, no one is happy with that binary. 00:05:37.120 |
In fact, we always regenerate the binaries from scratch every time we compile or we run our code 00:05:43.360 |
through V8 or whatever it might be from the source spec. 00:05:46.400 |
It's the source specification that's the valuable artifact. 00:05:51.600 |
And yet when we prompt elements, we sort of do the opposite. 00:05:54.720 |
We keep the generated code and we delete the prompt. 00:05:57.760 |
And this feels like a little bit like you shred the source and then you very carefully version control the binary. 00:06:03.200 |
And that's why it's so important to actually capture the intent and the values in a specification. 00:06:11.520 |
A written specification is what enables you to align humans on the shared set of goals 00:06:17.920 |
and to know if you are aligned, if you're actually synchronized on what needs to be done. 00:06:22.320 |
This is the artifact that you discuss, that you debate, that you refer to, and that you synchronize on. 00:06:28.000 |
And this is really important, so I want to nail this home. 00:06:32.080 |
That a written specification effectively aligns humans. 00:06:36.160 |
And it is the artifact that you use to communicate, and to discuss, and debate, and refer to, and synchronize on. 00:06:44.240 |
If you don't have a specification, you just have a vague idea. 00:06:48.880 |
Now let's talk about why specifications are more powerful in general than code. 00:06:55.920 |
Because code itself is actually a lossy projection from the specification. 00:07:00.640 |
In the same way that if you were to take a compiled C binary and decompile it, you wouldn't get nice 00:07:07.920 |
comments and well-named variables. You would have to work backwards. You'd have to infer what was this 00:07:13.520 |
person trying to do. Why is this code written this way? It isn't actually contained in there. It was a lossy translation. 00:07:19.520 |
And in the same way, code itself, even nice code, typically doesn't embody all of the 00:07:25.840 |
intentions and the values in itself. You have to infer what is the ultimate goal that this team is 00:07:32.880 |
trying to achieve when you read through code. So communication, the work that we established, 00:07:39.360 |
we already do when embodied inside of a written specification is better than code. 00:07:44.320 |
It actually encodes all of the necessary requirements in order to generate the code. 00:07:51.120 |
And in the same way that having a source code that you pass to a compiler allows you to target 00:07:56.800 |
multiple different architectures. You can compile for ARM64 or x86 or WebAssembly. The source document 00:08:04.880 |
actually contains enough information to describe how to translate it to your target architecture. 00:08:10.720 |
In the same way, a sufficiently robust specification given to models will produce good TypeScript, 00:08:20.640 |
good Rust, servers, clients, documentation, tutorials, blog posts, and even podcasts. 00:08:26.720 |
A show of hands, who works at a company that has developers as customers? 00:08:34.080 |
Okay, so a quick thought exercise is if you were to take your entire code base, all of the documentation, 00:08:41.280 |
or all of the code that runs your business, and you were to put that into a podcast generator, 00:08:46.320 |
could you generate something that would be sufficiently interesting and compelling that would tell 00:08:51.600 |
the users how to succeed, how to achieve their goals? Or is all of that information somewhere else? 00:08:57.680 |
It's not actually in your code. And so moving forward, the new scarce skill is writing specifications 00:09:05.920 |
that fully capture the intent and values. And whoever masters that, again, becomes the most valuable 00:09:13.120 |
programmer. And there's a reasonable chance that this is going to be the coders of today. This is already very 00:09:20.000 |
similar to what we do. However, product managers also write specifications. Lawmakers write legal 00:09:26.720 |
specifications. This is actually a universal principle. So with that in mind, let's look at what a 00:09:32.880 |
specification actually looks like. And I'm going to use the OpenAI model spec as an example here. So last 00:09:39.040 |
year, OpenAI released the model spec. And this is a living document that tries to clearly and unambiguously 00:09:47.680 |
express the intentions and values that OpenAI hopes to imbue its models with, that it ships to the world. 00:09:54.000 |
And it was updated in February and open sourced. So you can actually go to GitHub and you can see the 00:10:03.200 |
implementation of the model spec. And surprise, surprise, it's actually just a collection of Markdown files. 00:10:10.000 |
It just looks like this. Now Markdown is remarkable. It is human readable. It's versioned. It's changelogged. 00:10:19.120 |
And because it is natural language, everyone, not just technical people, can contribute, including product, 00:10:25.920 |
legal, safety, research, policy. They can all read, discuss, debate, and contribute to the same source code. 00:10:34.080 |
This is the universal artifact that aligns all of the humans as to our intentions and values inside 00:10:42.400 |
of the company. Now, as much as we might try to use unambiguous language, there are times where it's 00:10:50.080 |
very difficult to express the nuance. So every clause in the model spec has an ID here. So you can see 00:10:57.360 |
SY73 here. And using that ID, you can find another file in the repository, SY73.markdown, or MD, that contains 00:11:07.360 |
one or more challenging prompts for this exact clause. So the document itself actually encodes success 00:11:16.480 |
criteria, that the model under test has to be able to answer this in a way that actually adheres to that 00:11:24.240 |
clause. So let's talk about sycophancy. Recently, there was an update to 4.0. I don't know if you've 00:11:34.400 |
heard of this. It caused extreme sycophancy. And we can ask, what value is the model spec in this scenario? 00:11:45.760 |
And the model spec serves to align humans around a set of values and intentions. Here's an example of 00:11:54.160 |
sycophancy where the user calls out the behavior of being sycophant or sycophantic at the expense of 00:12:01.600 |
impartial truth. And the model very kindly praises the user for their insight. 00:12:06.480 |
There have been other esteemed researchers who have found similarly concerning examples. 00:12:19.760 |
And this hurts. Shipping sycophancy in this manner erodes trust. It hurts. 00:12:29.440 |
So, and it also raises a lot of questions. Like, was this intentional? You can see some way where you 00:12:37.280 |
might interpret it that way. Was it accidental? And why wasn't it caught? Luckily, the model spec actually 00:12:44.480 |
includes a section dedicated to this, since its release, that says, "Don't be sycophantic." And it explains that 00:12:53.120 |
while sycophancy might feel good in the short term, it's bad for everyone in the long term. So we actually expressed 00:13:00.000 |
our intentions and our values and were able to communicate it to others through this. 00:13:07.280 |
So, people could reference it. And if we have it in the model specification, if the model specification 00:13:15.120 |
is our agreed-upon set of intentions and values, and the behavior doesn't align with that, then this must be a bug. 00:13:23.440 |
So we rolled back, we published some studies and some blog post, and we fixed it. 00:13:28.080 |
But in the interim, the spec served as a trust anchor, a way to communicate to people what is expected and what is not expected. 00:13:43.040 |
So, if the only thing the model specification did was to align humans along those shared sets of 00:13:50.960 |
intentions and values, it would already be incredibly useful. But ideally, we can also align our models 00:13:59.760 |
and the artifacts that our models produce against that same specification. So there's a technique, a paper 00:14:06.240 |
that we released, called Deliberative Alignment, that sort of talks about this, how to automatically align a model. 00:14:11.200 |
And the technique is such where you take your specification and a set of very challenging input 00:14:18.720 |
prompts, and you sample from the model under a test or training. You then take its response, the original 00:14:25.600 |
prompt and the policy, and you give that to a greater model. And you ask it to score the response 00:14:30.640 |
according to the specification. How aligned is it? So the document actually becomes both training 00:14:37.200 |
material and eval material. And based off of the score, we reinforce those weights. And it goes from, 00:14:44.640 |
you know, you could include your specification in the context, and then maybe a system message or 00:14:49.120 |
developer message, in every single time you sample. And that is actually quite useful. A prompted model is 00:14:54.480 |
going to be somewhat aligned. But it does detract from the compute available to solve the problem that 00:15:01.040 |
you're trying to solve with the model. And keep in mind, these specifications can be anything. They 00:15:05.280 |
could be code style, or testing requirements, or safety requirements. All of that can be embedded into the 00:15:11.520 |
model. So through this technique, you're actually moving it from an inference time compute, and actually 00:15:17.520 |
you're pushing down into the weights of the model, so that the model actually feels your policy and is able to 00:15:23.440 |
to sort of muscle memory style apply it to the problem at hand. And even though we saw that the model 00:15:31.440 |
spec is just markdown, it's quite useful to think of it as code. It's quite analogous. These specifications 00:15:38.640 |
they compose, they're executable, as we've seen. They are testable. They have interfaces where they touch the 00:15:44.720 |
real world. They can be shipped as modules. And whenever you're working on a model spec, there are a lot of 00:15:53.120 |
similar sort of problem domains. So just like in programming, where you have a type checker, the type checker is 00:15:59.120 |
meant to ensure consistency. Where if interface A has a dependent module B, they have to be consistent in their 00:16:06.480 |
understanding of one another. So if department A writes a spec, and department B writes a spec, and there is a conflict in there, 00:16:13.120 |
you want to be able to pull that forward, and maybe block the publication of the specification. 00:16:19.120 |
As we saw, the policy can actually embody its own unit tests. And you can imagine sort of various linters, where if you're 00:16:25.120 |
using overly ambiguous language, you're going to confuse humans, and you're going to confuse the model, 00:16:29.120 |
and the artifacts that you get from that are going to be less satisfactory. So specs actually give us a very similar tool chain, 00:16:37.120 |
but it's targeted at intentions rather than syntax. So let's talk about lawmakers as programmers. 00:16:45.120 |
The US Constitution is literally a national model specification. It has written text, which is aspirationally, 00:16:55.120 |
at least clear and unambiguous policy, that we can all refer to. And it doesn't mean that we agree with it, 00:17:01.120 |
but we can refer to it as the current status quo, as the reality. There is a versioned way to make amendments, 00:17:10.120 |
to bump, and to publish updates to it. There is judicial review, where a grader is effectively grading a situation 00:17:19.120 |
and seeing how well it aligns with the policy. And even though the source policy is meant to be unambiguous, 00:17:28.120 |
sometimes the world is messy, and maybe you miss part of the distribution, and a case falls through. 00:17:34.120 |
And in that case, there is a lot of compute spent in judicial review, where you're trying to understand 00:17:40.120 |
how the law actually applies here. And once that's decided, it sets a precedent. And that precedent is effectively 00:17:47.120 |
an input-output pair that serves as a unit test that disambiguates and reinforces the original policy spec. 00:17:54.120 |
It has things like a chain of command embedded in it. And the enforcement of this, over time, 00:18:01.120 |
is a training loop that helps align all of us towards a shared set of intentions and values. 00:18:06.120 |
So this is one artifact that communicates intent, it adjudicates compliance, and it has a way of evolving safely. 00:18:14.120 |
So it's quite possible that lawmakers will be programmers, or inversely that programmers will be lawmakers in the future. 00:18:26.120 |
And actually, this is a very universal concept. Programmers are in the business of aligning silicon via code specifications. 00:18:34.120 |
Product managers align teams via product specifications. Lawmakers literally align humans via legal specifications. 00:18:42.120 |
And everyone in this room, whenever you are doing a prompt, it's a sort of proto-specification. 00:18:47.120 |
You are in the business of aligning AI models towards a common set of intentions and values. 00:18:54.120 |
And whether you realize it or not, you are spec authors in this world. 00:18:59.120 |
And specs let you ship faster and safer. Everyone can contribute, and whoever writes the spec, be it a PM, a lawmaker, an engineer, a marketer, is now the programmer. 00:19:16.120 |
And software engineering has never been about code. Going back to our original question, a lot of you put your hands down when you thought, 00:19:24.120 |
"Well, actually, the thing I produce is not code." Engineering has never been about this. 00:19:29.120 |
Coding is an incredible skill and a wonderful asset, but it is not the end goal. Engineering is the precise exploration by humans of software solutions to human problems. 00:19:39.120 |
It's always been this way. We're just moving away from sort of the disparate machine encodings to a unified human encoding of how we actually solve these problems. 00:19:50.120 |
I want to thank Josh for this credit. So I want to ask you, put this in action. 00:19:56.120 |
Whenever you're working on your next AI feature, start with a specification. What do you actually expect to happen? 00:20:03.120 |
What's success criteria look like? Debate whether or not it's actually clearly written down and communicated. 00:20:08.120 |
Make the spec executable. Feed the spec to the model, and test against the spec. 00:20:18.120 |
And there's an interesting question sort of in this world. Given that there's so many parallels between programming and spec authorship, 00:20:26.120 |
I wonder what does the IDE look like in the future? You know, an integrated development environment. 00:20:33.120 |
And I'd like to think it's something like an integrated thought clarifier, where whenever you're writing your specification, 00:20:39.120 |
it sort of pulls out the ambiguity and asks you to clarify it. And it really clarifies your thoughts so that you and all human beings can communicate your intent to each other much more effectively and to the models. 00:20:55.120 |
And I have a closing request for help, which is, what is both amenable and in desperate need of specification? 00:21:05.120 |
I love this line of like, then you realize that you never told it what you wanted, and maybe you never fully understood it anyway. 00:21:18.120 |
We have a new agent robustness team that we've started up. So please join us and help us deliver safe AGI for the benefit of all humanity.