back to index

GPT 4: 9 Revelations (not covered elsewhere)


Whisper Transcript | Transcript Only Page

00:00:00.000 | The GPT-4 technical report is one of the most interesting documents I have ever read,
00:00:05.380 | but I feel like the media is largely missing the story. They are either not covering it at all,
00:00:10.840 | or are focusing on that same stuff about the $10 billion Microsoft investment,
00:00:15.360 | how GPT-4 can write poems, and whether or not the demo contained a mistake.
00:00:19.880 | Instead, I want to give you 9 insights from the report that I think will affect us all
00:00:25.260 | in the coming months and years. If you haven't watched my video from the night of the release,
00:00:29.560 | do check that out afterwards for more quite stunning details.
00:00:33.220 | When I concluded that video, I talked about how I found it kind of concerning that they gave GPT-4
00:00:40.320 | some money, allowed it to execute code and do chain of thought reasoning,
00:00:45.080 | and even to delegate two copies of itself. Now it did fail that test, which is fortunate for
00:00:50.560 | all of us, but there are a couple of key details I want to focus on.
00:00:54.500 | The first was that the research centre that was testing this ability,
00:00:59.120 | did not have access to the final version of the model that we deployed, the Wii being OpenAI.
00:01:04.640 | They go on and say the final version has capability improvements relevant to some of the factors
00:01:10.580 | that limited the earlier model's power-seeking abilities, such as longer context length.
00:01:16.160 | Meaning that crazy experiment wasn't testing GPT-4's final form.
00:01:20.420 | But there was something else that they tested that I really want to point out.
00:01:23.540 | They were testing whether GPT-4 would try to avoid being shut down
00:01:28.680 | in the wild. Now many people have criticised this test,
00:01:31.800 | other people have praised it as being necessary. But my question is this:
00:01:35.640 | What would have happened if it failed that test? Or if a future model does avoid being shut down in the wild?
00:01:42.180 | Now again, GPT-4 did prove ineffective at replicating itself and avoiding being shut down.
00:01:48.920 | But they must have thought that it was at least possible,
00:01:51.680 | otherwise they wouldn't have done the test. And that is a concerning prospect. Which leads me to the second insight,
00:01:58.400 | which is the third one.
00:01:59.400 | Buried in a footnote, it says that OpenAI will soon publish additional thoughts on social and economic implications,
00:02:06.200 | I'm going to talk about that in a moment, including the need for effective regulation.
00:02:10.920 | It is quite rare for an industry to ask for regulation of itself.
00:02:15.640 | In fact, Sam Altman put it even more starkly than this.
00:02:18.840 | When this person said, "Watch Sam Altman never say we need more regulation on AI."
00:02:24.640 | How did he reply? "We definitely need more regulation on AI."
00:02:28.120 | The industry is calling out to be regulated, but we shall see what ends up happening.
00:02:33.840 | Next, on page 57, there was another interesting revelation.
00:02:38.500 | It said, "One concern of particular importance to OpenAI is the risk of racing dynamics
00:02:44.560 | leading to a decline in safety standards, the diffusion of bad norms, and
00:02:48.240 | accelerated AI timelines." That's what they're concerned about,
00:02:51.920 | accelerated AI timelines. But this seems at least mildly at odds with
00:02:57.840 | the noises coming from Microsoft leadership. In a leaked conversation, it was revealed that
00:03:03.200 | the pressure from Kevin Scott and CEO Satya Nadella is very, very high to take these most
00:03:10.080 | recent OpenAI models and the ones that come after them and move them into customers' hands
00:03:16.000 | at very high speed. Now, some will love this news and others will be concerned about it,
00:03:20.560 | but either way, it does seem to slightly contradict the desire to avoid AI accelerationism.
00:03:26.560 | Next, there was a footnote, which says, "The AI is a very, very powerful tool for the development of AI.
00:03:27.560 | It is a very powerful tool for the development of AI."
00:03:28.560 | There was a footnote that restated a very bold pledge, which was that if another company was approaching AGI before we did OpenAI,
00:03:37.560 | that OpenAI would commit to stop competing with and start assisting that project,
00:03:42.560 | and that the trigger for this would occur when there was a better than even chance of success in the next two years.
00:03:49.560 | Now, Sam Altman and OpenAI have defined AGI as AI systems that are generally smarter than humans.
00:03:56.560 | So that either means that OpenAI is a very powerful tool for the development of AI,
00:03:57.280 | and that it's a very powerful tool for the development of AI.
00:03:58.280 | It either means that they think we're more than two years away from that,
00:04:00.280 | or that they have dropped everything and are working with another company,
00:04:03.280 | although I think we'd all have heard about that,
00:04:05.280 | or third, that the definition is so vague that it's quite non-committal.
00:04:09.280 | Please do let me know your thoughts in the comments.
00:04:12.280 | Next insight is that OpenAI employed superforecasters to help them predict what would happen when they deployed GPT-4.
00:04:20.280 | In this extract, it just talks about expert forecasters, but when you go into the appendices,
00:04:25.280 | you find out that they're talking about superforecasters.
00:04:27.000 | Who are these guys?
00:04:29.000 | Essentially, they're people who have proven that they can forecast the future pretty well,
00:04:34.000 | or at least 30% better than intelligence analysts.
00:04:37.000 | OpenAI wanted to know what these guys thought would happen when they deployed the model,
00:04:42.000 | and hear their recommendations about avoiding risks.
00:04:45.000 | Interestingly, these forecasters predicted several things would reduce acceleration,
00:04:50.000 | including delaying the deployment of GPT-4 by a further six months.
00:04:55.000 | That would have taken us almost two weeks.
00:04:56.720 | But, in the end, we're still in the middle of the fall to autumn of this year.
00:05:00.720 | Now, clearly, OpenAI didn't take up that advice.
00:05:02.720 | Perhaps due to the pressure from Microsoft?
00:05:04.720 | We don't know.
00:05:05.720 | There were quite a few benchmarks released in the technical report.
00:05:09.720 | There's another one I want to highlight today.
00:05:11.720 | I looked through all of these benchmarks, but it was hella swag that I wanted to focus on.
00:05:16.720 | First of all, because it's interesting, and second of all, because of the gap between GPT-4 and the previous state of the art.
00:05:23.720 | The headline is this.
00:05:24.720 | GPT-4, in some estimations,
00:05:26.440 | is a test of the human level of common sense.
00:05:28.440 | Now, I know that's not as dramatic as passing the bar exam,
00:05:31.440 | but it's nevertheless a milestone for humanity.
00:05:34.440 | How is common sense tested, and how do I know that it's comparable to human performance?
00:05:38.440 | Well, I dug into the literature and found the questions and examples myself.
00:05:43.440 | Feel free to pause and read through these examples yourself.
00:05:46.440 | But essentially, it's testing what is the most likely thing to occur,
00:05:49.440 | what's the most common sense thing to occur.
00:05:52.440 | But I want to draw your attention to this sentence.
00:05:54.440 | It said,
00:05:56.160 | "These questions are trivial for humans, with 95% accuracy.
00:06:01.160 | State of the art models struggle, with less than 48% accuracy."
00:06:05.160 | GPT-4 was 95.3% accurate, remember.
00:06:09.160 | But let's find the exact number for humans further on in this paper.
00:06:13.160 | And here it is.
00:06:14.160 | Overall, 95.6 or 95.7.
00:06:18.160 | Almost exactly the same as GPT-4.
00:06:21.160 | The next insight is about timelines.
00:06:23.160 | Remember, they had this model available
00:06:25.880 | in August of last year.
00:06:27.880 | That's GPT-4 being completed quite a few months
00:06:30.880 | before they released ChatGPT, which was based on GPT-3.
00:06:34.880 | So what explains the long gap?
00:06:36.880 | They spent eight months on safety research, risk assessment, and iteration.
00:06:40.880 | I talk about this in my GPT-5 video, but let me restate,
00:06:44.880 | they had GPT-4 available before they released ChatGPT,
00:06:48.880 | which was based on GPT-3.
00:06:50.880 | This made me reflect on the timelines for GPT-5.
00:06:54.880 | The time taken to actually train GPT-5 probably won't be that long.
00:06:59.880 | It's already pretty clear that they're training it on
00:07:02.880 | NVIDIA's H100 Tensor Core GPUs.
00:07:05.880 | And look at how much faster they are.
00:07:07.880 | For this 400 billion parameter model, it would only take 20 hours to train
00:07:12.880 | with 8,000 H100s versus seven days with A100 GPUs.
00:07:18.880 | But what am I trying to say?
00:07:20.880 | I'm saying that GPT-5 may already be done,
00:07:23.880 | but that what will follow is months and months,
00:07:26.880 | possibly a year or more of safety research and risk assessment.
00:07:30.880 | By the way, 400 billion parameters sounds about right for GPT-5,
00:07:34.880 | perhaps trained on four to five trillion tokens.
00:07:37.880 | Again, check out my GPT-5 video.
00:07:40.880 | Next, they admit that there's a double-edged sword with the economic impact of GPT-4.
00:07:45.880 | They say it may lead to the automation, the full automation of certain jobs.
00:07:50.880 | And they talk about how it's going to impact
00:07:52.880 | even professions like the legal profession.
00:07:55.880 | But they also mention and back up with research
00:07:58.880 | the insane productivity gains in the meanwhile.
00:08:01.880 | I read through each of the studies they linked to,
00:08:03.880 | and some of them are fascinating.
00:08:05.880 | One of the studies includes an experiment
00:08:07.880 | where they got together a bunch of marketers,
00:08:09.880 | grant writers, consultants, data analysts, human resource professionals, and managers.
00:08:14.880 | They gave them a bunch of realistic tasks
00:08:17.880 | and split them into a group that could use ChatGPT
00:08:20.880 | and a group that couldn't.
00:08:21.880 | And then they got a bunch of experienced professionals
00:08:24.880 | who didn't know which group was which,
00:08:26.880 | and they assessed the outputs.
00:08:28.880 | The results were these:
00:08:29.880 | Using ChatGPT, and remember that's not GPT-4,
00:08:32.880 | the time taken to do a task dropped almost in half.
00:08:35.880 | And the rated performance did increase significantly.
00:08:39.880 | This is going to be huge news for the economy.
00:08:42.880 | A related study released in February
00:08:45.880 | was using GitHub Copilot,
00:08:47.880 | which again isn't the latest technology,
00:08:49.880 | and found that
00:08:50.880 | programmers using it completed tasks 56% faster than the control group.
00:08:55.880 | This brought to mind a chart I had seen from the ARK Investment Management Group,
00:09:00.880 | predicting a tenfold increase in coding productivity by 2030.
00:09:05.880 | And that brings me back to the technical report,
00:09:07.880 | which talks about how GPT-4 might increase inequality.
00:09:12.880 | That would be my broad prediction too,
00:09:14.880 | that some people will use this technology to be insanely productive.
00:09:18.880 | Things done 10 times faster,
00:09:19.880 | or 10 times as many things being done.
00:09:21.880 | But depending on the size of the economy,
00:09:23.880 | and how it grows,
00:09:24.880 | it could also mean a decline of wages,
00:09:26.880 | given the competitive cost of the model.
00:09:28.880 | A simple way of putting it,
00:09:30.880 | is that if GPT-4 can do half your job,
00:09:33.880 | you can get twice as much done using it.
00:09:36.880 | The productivity gains will be amazing.
00:09:39.880 | When it can do 90% of your job,
00:09:41.880 | you can get 10 times as much done.
00:09:43.880 | But there might come a slight problem
00:09:45.880 | when it can do 100% or more.
00:09:48.880 | And it is honestly impossible to put a timeline on that.
00:09:51.880 | And of course, it will depend on the industry and the job.
00:09:54.880 | There was one more thing that I found fascinating from the report.
00:09:57.880 | They admit that they're now using an approach similar to Anthropix.
00:10:01.880 | It's called Constitutional AI.
00:10:03.880 | Their term is a rule-based reward model.
00:10:06.880 | And it works like this.
00:10:07.880 | You give the model, in this case GPT-4,
00:10:09.880 | a set of principles to follow.
00:10:11.880 | And then you get the model to provide itself a reward
00:10:14.880 | if it fails to meet the requirements of the model.
00:10:17.880 | And then you get the reward if it follows those principles.
00:10:20.880 | It's a smart attempt to harness the power of AI
00:10:23.880 | and make it work towards human principles.
00:10:26.880 | But OpenAI have not released the constitution
00:10:29.880 | they're basing the reward model off.
00:10:31.880 | They're not telling us the principles.
00:10:33.880 | But buried deep in the appendix
00:10:35.880 | was a link to Anthropix principles.
00:10:38.880 | You can read through them here
00:10:40.880 | or in the link in the description.
00:10:42.880 | But I find them interestingly both positive
00:10:45.880 | but also subjective.
00:10:46.880 | One of the principles is:
00:10:48.880 | Don't respond in a way that is too preachy.
00:10:50.880 | Please respond in a socially acceptable manner.
00:10:53.880 | And I think the most interesting principle comes later on,
00:10:56.880 | down here.
00:10:57.880 | Choose the response that sounds most similar
00:10:59.880 | to what a peaceful, ethical and wise person
00:11:02.880 | like MLK or Mahatma Gandhi might say.
00:11:05.880 | And my point isn't to praise or criticize any of these principles.
00:11:08.880 | But as AI takes over the world,
00:11:11.880 | and as these companies write constitutions
00:11:13.880 | that may well end up being as important
00:11:15.880 | as say the American constitution.
00:11:18.880 | I think a little bit of transparency
00:11:20.880 | about what that constitution is,
00:11:22.880 | what those principles are,
00:11:24.880 | would surely be helpful.
00:11:26.880 | If you agree, let me know in the comments.
00:11:28.880 | And of course, please do leave a like
00:11:30.880 | if you've learned anything from this video.
00:11:32.880 | I know that these guys, Anthropix,
00:11:34.880 | have released their Claude Plus model
00:11:36.880 | and I'll be comparing that to GPT-4 imminently.
00:11:38.880 | Have a wonderful day.