back to index8 Ways ChatGPT 4 [Is] Better Than ChatGPT
Chapters
0:0
2:3 Logical Reasoning
4:22 Jokes
5:26 Physics
6:28 Quick Math
9:37 Reading Comprehension
10:43 Coding
11:32 Speed of Output
00:00:00.000 |
I would not blame you if you thought that all talk about GPT-4 or ChatGPT-4 is just that, talk. 00:00:08.280 |
But we actually can have a surprising amount of confidence in the ways in which GPT-4 will improve on ChatGPT. 00:00:17.820 |
By examining publicly accessible benchmarks, comparable large language models like Palm, 00:00:23.760 |
and the latest research papers, which I've spent dozens of hours reading, 00:00:27.600 |
we can discern at least eight clear ways in which GPT-4, integrated into Bing or otherwise, will beat ChatGPT. 00:00:37.220 |
I'm going to show you how unreleased models already beat current ChatGPT. 00:00:41.820 |
And all of this will give us a clearer insight into what even GPT-5 and future rival models from Google might well soon be able to achieve. 00:00:51.960 |
There are numerous benchmarks that Palm, Google's large language model, 00:00:56.580 |
and by extension, Google's large language model, will be able to achieve. 00:00:57.580 |
And by extension, GPT-4 will beat ChatGPT on. 00:01:01.340 |
But the largest and most impressive is the big bench set of tasks. 00:01:06.160 |
More than 150 or now 200 language modeling tasks, and I've studied almost all of them. 00:01:11.940 |
And you can see the approximate current state of affairs summarized in this graph, 00:01:17.000 |
where the latest models are now beating the average human and showing dramatic improvement on previous models. 00:01:23.640 |
ChatGPT would be somewhere around this point. 00:01:27.340 |
privately available, but better than previous models down here. 00:01:32.860 |
I want to show you in detail the eight ways that you can expect ChatGPT-4 or GPT-4 to beat the current ChatGPT. 00:01:41.900 |
And no, that's not just because it's going to have more parameters off to the right of this graph, 00:01:48.780 |
It's also because compute efficiency will improve. 00:01:51.340 |
Chain of thought prompting will be integrated, and the number of tokens it's trained on might go up by an awful lot. 00:01:56.720 |
This is a very important aspect of ChatGPT-4. 00:01:58.720 |
And it's also very important to know that the output of your data will go up by an order of magnitude. 00:02:04.720 |
Let's start with logic and logical inference. 00:02:06.720 |
This example comes from Google's Palm Research paper. 00:02:10.720 |
Shelley is from Virginia, but is visiting that city with that famous market where they throw the fish. 00:02:18.720 |
Question, is it likely that Shelley will be near the Pacific Ocean this weekend? 00:02:20.720 |
And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean. 00:02:22.720 |
And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean. 00:02:24.720 |
And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean. 00:02:26.720 |
And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean. 00:02:28.720 |
And you can see how the improved model is likely to be near the Pacific Ocean. 00:02:30.720 |
And you can see how the improved model is likely to be near the Pacific Ocean. 00:02:32.720 |
Whereas if you ask current ChatGPT this question, what you get is, based on the information given, it's not possible to determine. 00:02:34.720 |
Whereas if you ask current ChatGPT this question, what you get is, based on the information given, it's not possible to determine. 00:02:36.720 |
Whereas if you ask current ChatGPT this question, what you get is, based on the information given, it's not possible to determine. 00:02:38.720 |
The statement only mentions that Shelley is from Virginia and visiting a city with a famous market. 00:02:40.720 |
The statement only mentions that Shelley is from Virginia and visiting a city with a famous market. 00:02:42.720 |
The statement only mentions that Shelley is from Virginia and visiting a city with a famous market. 00:02:44.720 |
It really can't handle it. It can't do that level of logical inference. 00:02:46.720 |
It really can't handle it. It can't do that level of logical inference. 00:02:50.720 |
This test of critical reasoning and logic was designed again for the Big Bench benchmark. 00:02:52.720 |
This test of critical reasoning and logic was designed again for the Big Bench benchmark. 00:02:54.720 |
This test of critical reasoning and logic was designed again for the Big Bench benchmark. 00:02:56.720 |
And it was tested on different language models. 00:03:02.720 |
I gave it this question and it picked the wrong answer. 00:03:04.720 |
I gave it this question and it picked the wrong answer. 00:03:06.720 |
You can examine the question yourself, but C is not the correct answer. 00:03:08.720 |
You can examine the question yourself, but C is not the correct answer. 00:03:12.720 |
However, let's take a look at the graph beneath at other language models. 00:03:14.720 |
However, let's take a look at the graph beneath at other language models. 00:03:20.720 |
As the models increase in effective parameter count and other things like token size, 00:03:22.720 |
As the models increase in effective parameter count and other things like token size, 00:03:24.720 |
As the models increase in effective parameter count and other things like token size, 00:03:28.720 |
We start to beat not only average raters but all previous models 00:03:30.720 |
We start to beat not only average raters but all previous models 00:03:32.720 |
and approximate the performance of the best human language. 00:03:34.720 |
and approximate the performance of the best human language. 00:03:36.720 |
and approximate the performance of the best human language. 00:03:44.720 |
The three shot means it was given three examples of what was expected before being tested. 00:03:46.720 |
The three shot means it was given three examples of what was expected before being tested. 00:03:48.720 |
The three shot means it was given three examples of what was expected before being tested. 00:03:50.720 |
These best models, and you can imagine GPT-4 would be around the same level, 00:03:52.720 |
These best models, and you can imagine GPT-4 would be around the same level, 00:03:58.720 |
You can imagine what this means in terms of GPT-4 giving more rigorous arguments. 00:04:00.720 |
You can imagine what this means in terms of GPT-4 giving more rigorous arguments. 00:04:02.720 |
You can imagine what this means in terms of GPT-4 giving more rigorous arguments. 00:04:08.720 |
like this thing talking about a famous market where they throw the fish. 00:04:10.720 |
like this thing talking about a famous market where they throw the fish. 00:04:12.720 |
And GPT-4 might well be able to understand exactly what you mean. 00:04:14.720 |
And GPT-4 might well be able to understand exactly what you mean. 00:04:16.720 |
And to be honest, if you thought that's interesting, 00:04:22.720 |
On the left you can see a computer science-y type of joke 00:04:24.720 |
On the left you can see a computer science-y type of joke