back to index

8 Ways ChatGPT 4 [Is] Better Than ChatGPT


Chapters

0:0
2:3 Logical Reasoning
4:22 Jokes
5:26 Physics
6:28 Quick Math
9:37 Reading Comprehension
10:43 Coding
11:32 Speed of Output

Whisper Transcript | Transcript Only Page

00:00:00.000 | I would not blame you if you thought that all talk about GPT-4 or ChatGPT-4 is just that, talk.
00:00:08.280 | But we actually can have a surprising amount of confidence in the ways in which GPT-4 will improve on ChatGPT.
00:00:17.820 | By examining publicly accessible benchmarks, comparable large language models like Palm,
00:00:23.760 | and the latest research papers, which I've spent dozens of hours reading,
00:00:27.600 | we can discern at least eight clear ways in which GPT-4, integrated into Bing or otherwise, will beat ChatGPT.
00:00:37.220 | I'm going to show you how unreleased models already beat current ChatGPT.
00:00:41.820 | And all of this will give us a clearer insight into what even GPT-5 and future rival models from Google might well soon be able to achieve.
00:00:51.960 | There are numerous benchmarks that Palm, Google's large language model,
00:00:56.580 | and by extension, Google's large language model, will be able to achieve.
00:00:57.580 | And by extension, GPT-4 will beat ChatGPT on.
00:01:01.340 | But the largest and most impressive is the big bench set of tasks.
00:01:06.160 | More than 150 or now 200 language modeling tasks, and I've studied almost all of them.
00:01:11.940 | And you can see the approximate current state of affairs summarized in this graph,
00:01:17.000 | where the latest models are now beating the average human and showing dramatic improvement on previous models.
00:01:23.640 | ChatGPT would be somewhere around this point.
00:01:26.340 | Lower than what is actually...
00:01:27.340 | privately available, but better than previous models down here.
00:01:31.180 | But this just skims the surface.
00:01:32.860 | I want to show you in detail the eight ways that you can expect ChatGPT-4 or GPT-4 to beat the current ChatGPT.
00:01:41.900 | And no, that's not just because it's going to have more parameters off to the right of this graph,
00:01:46.580 | 10 to the 12, a trillion parameters.
00:01:48.780 | It's also because compute efficiency will improve.
00:01:51.340 | Chain of thought prompting will be integrated, and the number of tokens it's trained on might go up by an awful lot.
00:01:56.720 | This is a very important aspect of ChatGPT-4.
00:01:58.720 | And it's also very important to know that the output of your data will go up by an order of magnitude.
00:02:02.720 | Lots of reasons why GPT-4 will be better.
00:02:04.720 | Let's start with logic and logical inference.
00:02:06.720 | This example comes from Google's Palm Research paper.
00:02:08.720 | The question or input was this.
00:02:10.720 | Shelley is from Virginia, but is visiting that city with that famous market where they throw the fish.
00:02:14.720 | So vague.
00:02:16.720 | Going home next Tuesday.
00:02:18.720 | Question, is it likely that Shelley will be near the Pacific Ocean this weekend?
00:02:20.720 | And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean.
00:02:22.720 | And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean.
00:02:24.720 | And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean.
00:02:26.720 | And you can see how the improved model is able to deduce that the improved model is likely to be near the Pacific Ocean.
00:02:28.720 | And you can see how the improved model is likely to be near the Pacific Ocean.
00:02:30.720 | And you can see how the improved model is likely to be near the Pacific Ocean.
00:02:32.720 | Whereas if you ask current ChatGPT this question, what you get is, based on the information given, it's not possible to determine.
00:02:34.720 | Whereas if you ask current ChatGPT this question, what you get is, based on the information given, it's not possible to determine.
00:02:36.720 | Whereas if you ask current ChatGPT this question, what you get is, based on the information given, it's not possible to determine.
00:02:38.720 | The statement only mentions that Shelley is from Virginia and visiting a city with a famous market.
00:02:40.720 | The statement only mentions that Shelley is from Virginia and visiting a city with a famous market.
00:02:42.720 | The statement only mentions that Shelley is from Virginia and visiting a city with a famous market.
00:02:44.720 | It really can't handle it. It can't do that level of logical inference.
00:02:46.720 | It really can't handle it. It can't do that level of logical inference.
00:02:48.720 | Here is another great example.
00:02:50.720 | This test of critical reasoning and logic was designed again for the Big Bench benchmark.
00:02:52.720 | This test of critical reasoning and logic was designed again for the Big Bench benchmark.
00:02:54.720 | This test of critical reasoning and logic was designed again for the Big Bench benchmark.
00:02:56.720 | And it was tested on different language models.
00:02:58.720 | And most of them fail, including ChatGPT.
00:03:00.720 | And most of them fail, including ChatGPT.
00:03:02.720 | I gave it this question and it picked the wrong answer.
00:03:04.720 | I gave it this question and it picked the wrong answer.
00:03:06.720 | You can examine the question yourself, but C is not the correct answer.
00:03:08.720 | You can examine the question yourself, but C is not the correct answer.
00:03:10.720 | It gets it wrong.
00:03:12.720 | However, let's take a look at the graph beneath at other language models.
00:03:14.720 | However, let's take a look at the graph beneath at other language models.
00:03:16.720 | Ones to come. GPT-4 maybe.
00:03:18.720 | And look what happens.
00:03:20.720 | As the models increase in effective parameter count and other things like token size,
00:03:22.720 | As the models increase in effective parameter count and other things like token size,
00:03:24.720 | As the models increase in effective parameter count and other things like token size,
00:03:26.720 | look at the performance.
00:03:28.720 | We start to beat not only average raters but all previous models
00:03:30.720 | We start to beat not only average raters but all previous models
00:03:32.720 | and approximate the performance of the best human language.
00:03:34.720 | and approximate the performance of the best human language.
00:03:36.720 | and approximate the performance of the best human language.
00:03:38.720 | The top line is the best human rater.
00:03:40.720 | The blue line is the average human rater.
00:03:42.720 | These unreleased models,
00:03:44.720 | The three shot means it was given three examples of what was expected before being tested.
00:03:46.720 | The three shot means it was given three examples of what was expected before being tested.
00:03:48.720 | The three shot means it was given three examples of what was expected before being tested.
00:03:50.720 | These best models, and you can imagine GPT-4 would be around the same level,
00:03:52.720 | These best models, and you can imagine GPT-4 would be around the same level,
00:03:54.720 | crush what ChatGPT was capable of.
00:03:56.720 | crush what ChatGPT was capable of.
00:03:58.720 | You can imagine what this means in terms of GPT-4 giving more rigorous arguments.
00:04:00.720 | You can imagine what this means in terms of GPT-4 giving more rigorous arguments.
00:04:02.720 | You can imagine what this means in terms of GPT-4 giving more rigorous arguments.
00:04:04.720 | Or conversely, you can give vague inputs
00:04:06.720 | Or conversely, you can give vague inputs
00:04:08.720 | like this thing talking about a famous market where they throw the fish.
00:04:10.720 | like this thing talking about a famous market where they throw the fish.
00:04:12.720 | And GPT-4 might well be able to understand exactly what you mean.
00:04:14.720 | And GPT-4 might well be able to understand exactly what you mean.
00:04:16.720 | And to be honest, if you thought that's interesting,
00:04:18.720 | we are just getting started.
00:04:20.720 | Next, jokes.
00:04:22.720 | On the left you can see a computer science-y type of joke
00:04:24.720 | On the left you can see a computer science-y type of joke
00:04:26.720 | that it was able to explain.
00:04:28.720 | But I tested ChatGPT on a variety of jokes
00:04:30.720 | and some of them it could explain,