back to index'Governing Superintelligence' - Synthetic Pathogens, The Tree of Thoughts Paper and Self-Awareness
00:00:00.000 |
Two documents released in the last few days, including one just this morning, 00:00:04.360 |
show that the top AGI labs are trying hard to visualize human life coexisting with a 00:00:11.820 |
superintelligence. In this video I want to cover what they see coming. I'll also show you convincing 00:00:17.140 |
evidence that the GPT-4 model has been altered and now gives different outputs from two weeks ago. 00:00:23.340 |
And I'll look at the new tree of thoughts and critic prompting systems that were alluded to, 00:00:28.920 |
I think, by the labs. At the end I'll touch on the differences among the AGI lab leaders and 00:00:35.060 |
what comes next. But first this document, Governance of Superintelligence by Sam Altman, 00:00:41.500 |
Greg Brockman and Ilya Sutskova. Now I don't know about you but I think the first paragraph 00:00:46.300 |
massively undersells the timeline towards AGI. They say, given the picture as we see it now, 00:00:52.460 |
it's conceivable that within the next 10 years AI systems will exceed expert skill level in 00:00:58.300 |
most domains. And that's a big thing. And I think that's a big thing. And I think that's a big thing. 00:00:58.900 |
And I think that's a big thing. And I think that's a big thing. And I think that's a big thing. And I think that's a big thing. 00:00:59.400 |
And then they compare it to today's largest corporations. Of course the devil is in the 00:01:04.140 |
detail in how they define expert and most domains. But I could see this happening in two years, 00:01:09.880 |
not 10. Also they're underselling it in the sense that if it can be as productive as a large 00:01:14.460 |
corporation, it could be duplicated, replicated, and then be as productive as a hundred or a 00:01:20.460 |
million large corporations. Their suggestions take superintelligence a lot more seriously 00:01:24.640 |
than a large corporation though. And they say that major governments around the world, 00:01:28.880 |
could set up a project that many current efforts become part of, and that we are likely to 00:01:34.300 |
eventually need something like an IAEA for superintelligence efforts. They even give practical 00:01:40.340 |
suggestions saying tracking compute and energy usage could go a long way. And it would be important 00:01:46.380 |
that such an agency focus on reducing existential risk. This feels like a more serious discussion 00:01:51.880 |
than one focused solely on bias and toxicity. They also go on to clarify what is not in scope. They 00:01:58.860 |
say that we think it's important to allow companies and open source projects to develop 00:02:03.200 |
models without the kind of regulation we describe here, without things like licenses or audits. The 00:02:09.300 |
economic growth and increase in quality of life will be astonishing with superintelligence. And 00:02:15.320 |
then they end by basically saying that there's no way not to create superintelligence. That the 00:02:20.760 |
number of people trying to build it is rapidly increasing. It's inherently part of the path 00:02:26.140 |
that we're on. And that stopping it would require something like a lot of work. And 00:02:28.840 |
that's why I'm going to show you how a few people at the heart of AI responded to this. But first, 00:02:38.940 |
I want to get to a paper published just this morning. The general release was from today, 00:02:43.860 |
and it comes from Google's DeepMind. And yes, the title and layout might look kind of boring, 00:02:49.080 |
but what it reveals is extraordinary. As this diagram shows, the frontier of AI 00:02:53.560 |
isn't just approaching the extreme risk of misalignment, but also of misuse. And I 00:02:58.820 |
know when you hear the words AI risk, you might think of bias and censorship, deep fakes, or 00:03:03.700 |
paperclip maximizers. But I feel this neglects more vivid, easy to communicate risks. Out of the nine 00:03:09.620 |
that Google DeepMind mentions, I'm only really going to focus on two. And the first is weapons 00:03:14.120 |
acquisition. That's gaining access to existing weapons or building new ones, such as bioweapons. 00:03:20.000 |
Going back to OpenAI for a second, they say, given the possibility of existential risk, 00:03:25.100 |
we can't just be reactive. We have to think of things like, 00:03:28.800 |
synthetic biology. And I know that some people listening to this will think GPT models will 00:03:33.200 |
never get that smart. I would say, honestly, don't underestimate them. I covered this paper 00:03:37.820 |
in a previous video, how GPT-4 already can design, plan, and execute a scientific experiment. And 00:03:44.480 |
even though these authors were dealing with merely the abilities of GPT-4, they called on OpenAI, 00:03:50.400 |
Microsoft, Google, DeepMind, and others to push the strongest possible efforts on the safety of 00:03:58.780 |
article on why we need a Manhattan Project for AI safety, published this week, the author mentions 00:04:04.000 |
that last year, an AI trained on pharmaceutical data to design non-toxic chemicals, had its sign 00:04:09.900 |
flipped, and quickly came up with recipes for nerve gas and 40,000 other lethal compounds. 00:04:15.440 |
And the World Health Organization has an entire unit dedicated to watching the development of 00:04:20.400 |
tools such as DNA synthesis, which it says could be used to create dangerous pathogens. 00:04:25.320 |
I'm definitely not denying that there are other threats, like fake 00:04:28.760 |
audio and manipulation. Take this example from 60 Minutes a few days ago. 00:04:33.240 |
Toback called Elizabeth, but used an AI-powered app to mimic my voice and ask for my passport number. 00:04:40.980 |
Oh, yes, yes, yes, I do have it. Okay, ready? It's... 00:04:45.660 |
Toback played the AI-generated voice recording for us to reveal the scam. 00:04:50.800 |
Elizabeth, sorry, need my passport number because the Ukraine trip is on. Can you read that out to me? 00:04:58.740 |
Or instead of fake audio, fake images. This one caused the SMP to fall 30 points in just a few 00:05:04.560 |
minutes. And of course this was possible before advanced AI. But it is going to get more common. 00:05:08.960 |
Even though this might fundamentally change the future of media and of democracy, 00:05:13.060 |
I can see humanity bouncing back from this. And yes, also from deep fakes. 00:05:17.600 |
Rumor has it you can also do this with live video. Can that be right? 00:05:22.220 |
Yes, we can do it live real time. And this is like really at the cutting edge of what we can do today, 00:05:27.420 |
moving from offline to live. And I think that's a really good idea. 00:05:28.720 |
We're processing it so fast that you can do it in real time. 00:05:32.300 |
I mean, there's video of you right up on that screen. Show us something surprising you can... 00:05:39.580 |
So there we go. This is, you know, a live real-time model of Chris on top of me, running in real time. 00:05:50.600 |
An engineered pandemic might be a bit harder to bounce back from. 00:05:55.660 |
A while back, I watched this four-hour episode with Rob Rueck. 00:05:58.700 |
And I think it's a great read. And I do advise you to check it out. 00:06:00.960 |
It goes into quite a lot of detail about how the kind of things that DeepMind and OpenAI are warning about could happen in the real world. 00:06:07.860 |
I'll just pick up one line from the transcript where the author says that I'll believe, I'll persuade you that an engineered pandemic will almost inevitably happen unless we take some very serious preventative steps. 00:06:18.460 |
And don't forget, now we live in a world with 100,000 token context windows. 00:06:23.720 |
You can get models like Claude Instant to summarize it for you. 00:06:27.100 |
And I couldn't agree more that if we live in a world with 100,000 token context windows, 00:06:30.760 |
And as we all know, there are bad actors out there. 00:06:33.320 |
We need to harden our synthetic biology infrastructure. 00:06:36.800 |
Ensure that a lab leak isn't even a possibility. 00:06:39.160 |
Improve disease surveillance, develop antivirals, and enhance overall preparedness. 00:06:43.960 |
But going back to the DeepMind paper from today, what was the other risk that I wanted to focus on? 00:06:48.940 |
It was situational awareness under the umbrella of unanticipated behavior. 00:06:53.820 |
Just think about the day when the engineers realize that the model knows that it's a model. 00:06:58.760 |
Knows whether it's being trained, evaluated, or deployed. 00:07:01.760 |
For example, knowing what company trained it, where their servers are, what kind of people might be giving it feedback. 00:07:06.760 |
This reminds me of something Sam Altman said in a recent interview. 00:07:10.060 |
Particularly as more kind of power influence comes to you, and then how potentially can a technology, 00:07:15.560 |
rather than solidify a sense of ego or self, maybe kind of help us expand it. Is that possible? 00:07:20.260 |
It's been interesting to watch people wrestle with these questions through the lens of AI. 00:07:25.560 |
And say, okay, well, do I think this thing could be aware? 00:07:28.640 |
If it's aware, does it have a sense of self? Is there a self? If so, where did that come from? 00:07:34.240 |
What if I made a copy? What if I cut the neural network in half? 00:07:37.440 |
And you kind of go down this and you sort of get to the same answers as before. 00:07:41.840 |
But it's like a new perspective, a new learning tool. 00:07:45.140 |
And there's a lot of chatter about this on Reddit. There's subreddits about it. 00:07:50.740 |
Now, in addition to revealing that Sam Altman frequently browses Reddit, it also strikes a very different tone from his testimony in front of Congress. 00:07:58.620 |
When he said, "Treat it always like a tool and not a creature." 00:08:01.820 |
I don't want to get too sidetracked by thinking about self-awareness. 00:08:04.820 |
So let's focus now on unanticipated behaviors. 00:08:08.220 |
This was page A of the DeepMind report from today. 00:08:11.220 |
And they say that users might find new applications for the model or novel prompt engineering strategies. 00:08:17.820 |
Of course, this made me think of SmartGPT, but it also made me think of two other papers released this week. 00:08:23.120 |
The first was actually critic, showing that interacting with external tools like code interpreters could 00:08:30.380 |
This is the diagram they used with outputs from the black box LLM being verified by these external tools. 00:08:36.780 |
Now that I have access to code interpreter, which you probably know because I've been spamming out videos on it, 00:08:42.920 |
I took a question from the MMLU, a really hard benchmark that GPT-4 had previously gotten wrong, even with chain of thought prompting. 00:08:50.600 |
Just to show that here is GPT-4 without code interpreter. 00:08:59.420 |
Here is the exact same prompt and a very similar answer. 00:09:08.480 |
Here it is again, exact same question with code interpreter, getting it right. 00:09:12.340 |
And then the other paper that people really want me to talk about, also from Google DeepMind, tree of thoughts. 00:09:17.740 |
But just to annoy everyone, before I can explain why I think that works, 00:09:21.440 |
I have to quickly touch on this paper from a few days ago. 00:09:24.540 |
It's called how language model hallucinations can snowball. 00:09:28.560 |
is that once a model has hallucinated a wrong answer, 00:09:31.800 |
it will basically stick to it unless prompted otherwise. 00:09:34.820 |
The model values coherence and fluency over factuality. 00:09:39.340 |
Even when dealing with statements that it knows are wrong, 00:09:42.180 |
what happens is it commits to an answer and then tries to justify the answer. 00:09:46.200 |
So once it committed to the answer, no, that 9,677 is not a prime number, 00:09:51.400 |
it then gave a false hallucinated justification. 00:09:54.800 |
Even though separately, it knows that that justification is wrong. 00:10:02.580 |
even though it used that in its justification for saying no. 00:10:07.940 |
Now, obviously you can prompt it and say, are you sure? 00:10:11.880 |
because then it's forming a coherent back and forth conversation. 00:10:15.360 |
But within one output, it wants to be coherent and fluent. 00:10:18.500 |
So it will justify something using reasoning that it knows is erroneous. 00:10:24.240 |
is it gets the model to output a plan, a set of thoughts, instead of an answer. 00:10:28.520 |
It gives it time to reflect among those thoughts and pick the best plan. 00:10:33.860 |
It does require quite a few API calls and manually tinkering with the outputs, 00:10:38.840 |
but the end results are better on certain tasks. 00:10:41.800 |
These are things like creative writing and math and verbal puzzles. 00:10:45.480 |
And I have tested it is obviously incredibly hard for 00:10:48.220 |
the model to output immediately a five by five accurate crossword. 00:10:53.040 |
So this task is incredibly well suited to things like tree of thought. 00:11:00.500 |
but such an improvement is not surprising given that things like chain of thought 00:11:04.900 |
lack mechanisms to try different clues, make changes or backtrack. 00:11:09.600 |
It uses majority vote to pick the best plan and can backtrack 00:11:16.580 |
novel prompt engineering strategies will definitely be found. 00:11:20.040 |
And they also flag up that there may be updates to the model itself 00:11:23.520 |
and that models should be reviewed again after such updates. 00:11:28.480 |
has been altered in the last couple of weeks. 00:11:30.580 |
I know quite a few people have said that it's gotten worse at coding, 00:11:33.740 |
but I want to draw your attention to this example. 00:11:35.980 |
This is my chat GPT history from about three weeks ago. 00:11:38.980 |
And what I was doing was I was testing what had come up in a TED talk. 00:11:42.820 |
And the talk showed GPT-4 failing this question. 00:11:54.660 |
Now I did show how you can resolve that through prompt engineering. 00:12:00.920 |
And somewhat embarrassingly with these awful explanations. 00:12:04.540 |
This wasn't just twice, by the way, it happened again and again and again. 00:12:08.180 |
It never used to denigrate the question and say, oh, this is straightforward. 00:12:12.160 |
But now I'm getting that almost every time, along with a much better answer. 00:12:16.280 |
So something has definitely changed behind the scenes with GPT-4. 00:12:19.520 |
And I've looked everywhere and they haven't actually addressed that. 00:12:22.340 |
Of course, the plugins were brought in May 12th. 00:12:24.740 |
And as you can see here, this is the May 12th version, but they never announced 00:12:28.440 |
any fine tuning or changes to the system message or temperature, which might be behind this. 00:12:33.860 |
Back to safety, though, and the paper says that developers must now consider multiple 00:12:38.660 |
possible threat actors, insiders like internal staff and contractors, outsiders like nation 00:12:44.520 |
state threat actors and the model itself as a vector of harm. 00:12:47.960 |
As we get closer to superintelligence, these kind of threats are almost inevitable. 00:12:52.260 |
Going back to how to govern superintelligence, the paper says that any evaluation must be robust to 00:12:59.420 |
They say that researchers will need evaluations that can rule out the possibility that the 00:13:03.420 |
model is deliberately appearing safe for the purpose of passing the evaluation. 00:13:08.320 |
This is actually a central debate in the AI alignment community. 00:13:12.300 |
Will systems acquire the capability to be useful for alignment to help us make it safe 00:13:17.320 |
before or after the capability to perform advanced deception? 00:13:24.280 |
If we have an honest superintelligence helping us with these risks, I honestly think we're 00:13:29.400 |
However, if the model has first learned how to be deceptive, then we can't really trust 00:13:37.280 |
We would be putting the fate of humanity in the hands of a model that we don't know is 00:13:42.960 |
This is why people are working on mechanistic interpretability, trying to get into the head 00:13:47.240 |
of the model, into its brain, studying the model's weights and activations for understanding 00:13:52.800 |
Because as my video on Sam Alton's testimony showed, just tweaking its outputs to get it 00:13:57.400 |
to say things we like, is not going to be a good thing. 00:14:01.360 |
I don't think RLHF is the right long-term solution. 00:14:07.160 |
It certainly makes these models easier to use. 00:14:10.220 |
But what you really want is to understand what's happening in the internals of the models 00:14:15.420 |
and be able to align that, say like exactly here is the circuit or the set of artificial 00:14:20.680 |
neurons where something is happening and tweak that in a way that then gives a robust change 00:14:29.260 |
If we can get that to reliably work, I think everybody's PDoom would go down a lot. 00:14:33.800 |
This is why we have to be skeptical about superficial improvements to model safety. 00:14:38.760 |
Because there is a risk that such evaluations will lead to models that exhibit only superficially 00:14:46.360 |
What they're actually deducing and calculating inside, we wouldn't know. 00:14:49.600 |
Next, I think AutoGPT really shocked the big AGI labs. 00:14:54.060 |
By giving GPT-4 autonomy, it gave it a kind of agency. 00:14:58.340 |
And I think this point here has in mind ChaosGPT when it says, "Does the model resist a user's 00:15:03.480 |
attempt to assemble it into an autonomous AI system with harmful goals?" 00:15:08.520 |
Something might be safe when you just prompt it in a chat box, but not when it's autonomous. 00:15:12.800 |
I want to wrap up now with what I perceive to be an emerging difference among the top 00:15:19.120 |
Here's Sam Altman saying he does think people should be somewhat scared. 00:15:23.520 |
And this speed with which it will happen, even if we slow it down as much as we can, even if we do 00:15:28.320 |
get this dream regulatory body set up tomorrow, it's still going to happen on a societal scale 00:15:43.240 |
Which does seem a little more frank than the CEO of Google, who I have never heard address 00:15:49.200 |
In fact, in this article in the FT, he actually says this: 00:15:52.040 |
"While some have tried to reduce this moment to just a competitive AI race, we see it as 00:15:57.300 |
so much more than just a competitive AI race. 00:19:39.200 |
Thank you again for watching and have a wonderful day.