back to indexGPT 4: 9 Revelations (not covered elsewhere)
00:00:00.000 |
The GPT-4 technical report is one of the most interesting documents I have ever read, 00:00:05.380 |
but I feel like the media is largely missing the story. They are either not covering it at all, 00:00:10.840 |
or are focusing on that same stuff about the $10 billion Microsoft investment, 00:00:15.360 |
how GPT-4 can write poems, and whether or not the demo contained a mistake. 00:00:19.880 |
Instead, I want to give you 9 insights from the report that I think will affect us all 00:00:25.260 |
in the coming months and years. If you haven't watched my video from the night of the release, 00:00:29.560 |
do check that out afterwards for more quite stunning details. 00:00:33.220 |
When I concluded that video, I talked about how I found it kind of concerning that they gave GPT-4 00:00:40.320 |
some money, allowed it to execute code and do chain of thought reasoning, 00:00:45.080 |
and even to delegate two copies of itself. Now it did fail that test, which is fortunate for 00:00:50.560 |
all of us, but there are a couple of key details I want to focus on. 00:00:54.500 |
The first was that the research centre that was testing this ability, 00:00:59.120 |
did not have access to the final version of the model that we deployed, the Wii being OpenAI. 00:01:04.640 |
They go on and say the final version has capability improvements relevant to some of the factors 00:01:10.580 |
that limited the earlier model's power-seeking abilities, such as longer context length. 00:01:16.160 |
Meaning that crazy experiment wasn't testing GPT-4's final form. 00:01:20.420 |
But there was something else that they tested that I really want to point out. 00:01:23.540 |
They were testing whether GPT-4 would try to avoid being shut down 00:01:28.680 |
in the wild. Now many people have criticised this test, 00:01:31.800 |
other people have praised it as being necessary. But my question is this: 00:01:35.640 |
What would have happened if it failed that test? Or if a future model does avoid being shut down in the wild? 00:01:42.180 |
Now again, GPT-4 did prove ineffective at replicating itself and avoiding being shut down. 00:01:48.920 |
But they must have thought that it was at least possible, 00:01:51.680 |
otherwise they wouldn't have done the test. And that is a concerning prospect. Which leads me to the second insight, 00:01:59.400 |
Buried in a footnote, it says that OpenAI will soon publish additional thoughts on social and economic implications, 00:02:06.200 |
I'm going to talk about that in a moment, including the need for effective regulation. 00:02:10.920 |
It is quite rare for an industry to ask for regulation of itself. 00:02:15.640 |
In fact, Sam Altman put it even more starkly than this. 00:02:18.840 |
When this person said, "Watch Sam Altman never say we need more regulation on AI." 00:02:24.640 |
How did he reply? "We definitely need more regulation on AI." 00:02:28.120 |
The industry is calling out to be regulated, but we shall see what ends up happening. 00:02:33.840 |
Next, on page 57, there was another interesting revelation. 00:02:38.500 |
It said, "One concern of particular importance to OpenAI is the risk of racing dynamics 00:02:44.560 |
leading to a decline in safety standards, the diffusion of bad norms, and 00:02:48.240 |
accelerated AI timelines." That's what they're concerned about, 00:02:51.920 |
accelerated AI timelines. But this seems at least mildly at odds with 00:02:57.840 |
the noises coming from Microsoft leadership. In a leaked conversation, it was revealed that 00:03:03.200 |
the pressure from Kevin Scott and CEO Satya Nadella is very, very high to take these most 00:03:10.080 |
recent OpenAI models and the ones that come after them and move them into customers' hands 00:03:16.000 |
at very high speed. Now, some will love this news and others will be concerned about it, 00:03:20.560 |
but either way, it does seem to slightly contradict the desire to avoid AI accelerationism. 00:03:26.560 |
Next, there was a footnote, which says, "The AI is a very, very powerful tool for the development of AI. 00:03:27.560 |
It is a very powerful tool for the development of AI." 00:03:28.560 |
There was a footnote that restated a very bold pledge, which was that if another company was approaching AGI before we did OpenAI, 00:03:37.560 |
that OpenAI would commit to stop competing with and start assisting that project, 00:03:42.560 |
and that the trigger for this would occur when there was a better than even chance of success in the next two years. 00:03:49.560 |
Now, Sam Altman and OpenAI have defined AGI as AI systems that are generally smarter than humans. 00:03:56.560 |
So that either means that OpenAI is a very powerful tool for the development of AI, 00:03:57.280 |
and that it's a very powerful tool for the development of AI. 00:03:58.280 |
It either means that they think we're more than two years away from that, 00:04:00.280 |
or that they have dropped everything and are working with another company, 00:04:03.280 |
although I think we'd all have heard about that, 00:04:05.280 |
or third, that the definition is so vague that it's quite non-committal. 00:04:09.280 |
Please do let me know your thoughts in the comments. 00:04:12.280 |
Next insight is that OpenAI employed superforecasters to help them predict what would happen when they deployed GPT-4. 00:04:20.280 |
In this extract, it just talks about expert forecasters, but when you go into the appendices, 00:04:25.280 |
you find out that they're talking about superforecasters. 00:04:29.000 |
Essentially, they're people who have proven that they can forecast the future pretty well, 00:04:34.000 |
or at least 30% better than intelligence analysts. 00:04:37.000 |
OpenAI wanted to know what these guys thought would happen when they deployed the model, 00:04:42.000 |
and hear their recommendations about avoiding risks. 00:04:45.000 |
Interestingly, these forecasters predicted several things would reduce acceleration, 00:04:50.000 |
including delaying the deployment of GPT-4 by a further six months. 00:04:56.720 |
But, in the end, we're still in the middle of the fall to autumn of this year. 00:05:00.720 |
Now, clearly, OpenAI didn't take up that advice. 00:05:05.720 |
There were quite a few benchmarks released in the technical report. 00:05:09.720 |
There's another one I want to highlight today. 00:05:11.720 |
I looked through all of these benchmarks, but it was hella swag that I wanted to focus on. 00:05:16.720 |
First of all, because it's interesting, and second of all, because of the gap between GPT-4 and the previous state of the art. 00:05:26.440 |
is a test of the human level of common sense. 00:05:28.440 |
Now, I know that's not as dramatic as passing the bar exam, 00:05:31.440 |
but it's nevertheless a milestone for humanity. 00:05:34.440 |
How is common sense tested, and how do I know that it's comparable to human performance? 00:05:38.440 |
Well, I dug into the literature and found the questions and examples myself. 00:05:43.440 |
Feel free to pause and read through these examples yourself. 00:05:46.440 |
But essentially, it's testing what is the most likely thing to occur, 00:05:52.440 |
But I want to draw your attention to this sentence. 00:05:56.160 |
"These questions are trivial for humans, with 95% accuracy. 00:06:01.160 |
State of the art models struggle, with less than 48% accuracy." 00:06:09.160 |
But let's find the exact number for humans further on in this paper. 00:06:27.880 |
That's GPT-4 being completed quite a few months 00:06:30.880 |
before they released ChatGPT, which was based on GPT-3. 00:06:36.880 |
They spent eight months on safety research, risk assessment, and iteration. 00:06:40.880 |
I talk about this in my GPT-5 video, but let me restate, 00:06:44.880 |
they had GPT-4 available before they released ChatGPT, 00:06:50.880 |
This made me reflect on the timelines for GPT-5. 00:06:54.880 |
The time taken to actually train GPT-5 probably won't be that long. 00:06:59.880 |
It's already pretty clear that they're training it on 00:07:07.880 |
For this 400 billion parameter model, it would only take 20 hours to train 00:07:12.880 |
with 8,000 H100s versus seven days with A100 GPUs. 00:07:23.880 |
but that what will follow is months and months, 00:07:26.880 |
possibly a year or more of safety research and risk assessment. 00:07:30.880 |
By the way, 400 billion parameters sounds about right for GPT-5, 00:07:34.880 |
perhaps trained on four to five trillion tokens. 00:07:40.880 |
Next, they admit that there's a double-edged sword with the economic impact of GPT-4. 00:07:45.880 |
They say it may lead to the automation, the full automation of certain jobs. 00:07:55.880 |
But they also mention and back up with research 00:07:58.880 |
the insane productivity gains in the meanwhile. 00:08:01.880 |
I read through each of the studies they linked to, 00:08:07.880 |
where they got together a bunch of marketers, 00:08:09.880 |
grant writers, consultants, data analysts, human resource professionals, and managers. 00:08:17.880 |
and split them into a group that could use ChatGPT 00:08:21.880 |
And then they got a bunch of experienced professionals 00:08:29.880 |
Using ChatGPT, and remember that's not GPT-4, 00:08:32.880 |
the time taken to do a task dropped almost in half. 00:08:35.880 |
And the rated performance did increase significantly. 00:08:39.880 |
This is going to be huge news for the economy. 00:08:50.880 |
programmers using it completed tasks 56% faster than the control group. 00:08:55.880 |
This brought to mind a chart I had seen from the ARK Investment Management Group, 00:09:00.880 |
predicting a tenfold increase in coding productivity by 2030. 00:09:05.880 |
And that brings me back to the technical report, 00:09:07.880 |
which talks about how GPT-4 might increase inequality. 00:09:14.880 |
that some people will use this technology to be insanely productive. 00:09:48.880 |
And it is honestly impossible to put a timeline on that. 00:09:51.880 |
And of course, it will depend on the industry and the job. 00:09:54.880 |
There was one more thing that I found fascinating from the report. 00:09:57.880 |
They admit that they're now using an approach similar to Anthropix. 00:10:11.880 |
And then you get the model to provide itself a reward 00:10:14.880 |
if it fails to meet the requirements of the model. 00:10:17.880 |
And then you get the reward if it follows those principles. 00:10:20.880 |
It's a smart attempt to harness the power of AI 00:10:26.880 |
But OpenAI have not released the constitution 00:10:50.880 |
Please respond in a socially acceptable manner. 00:10:53.880 |
And I think the most interesting principle comes later on, 00:11:05.880 |
And my point isn't to praise or criticize any of these principles. 00:11:36.880 |
and I'll be comparing that to GPT-4 imminently.