back to indexThe Model That Changes Everything: Alpaca Breakthrough (ft. Apple's LLM, BritGPT, Ernie and AlexaTM)
00:00:00.000 |
A little under 72 hours ago, a language model was released that could end up being as consequential 00:00:06.720 |
as GPT-4. Now I know you're thinking that's a bold claim, but let's see if you agree with it 00:00:12.980 |
after watching what happened. I will explain as best as I can what was released and how revelations 00:00:18.520 |
in the last 24 hours from Apple, Amazon, Britain and Baidu make it particularly significant. 00:00:26.200 |
The model was Stanford's Alpaca and here is the key line. Alpaca behaves qualitatively similarly 00:00:33.840 |
to OpenAI's Text DaVinci 3 while being surprisingly small and easy and cheap to reproduce at under $600. 00:00:44.340 |
Now that is cool, but how does that change the world? Well, first it wasn't supposed to get this 00:00:51.080 |
cheap this fast. Just six weeks ago or five weeks before they released 00:00:56.080 |
the first version of the Alpaca, the model was Stanford's Alpaca. And here is the key line. 00:00:56.180 |
The model was Stanford's Alpaca. And here is the key line. 00:00:56.680 |
ARK Investment Management put out this prediction that the 2020 cost of GPT-3 at $4.6 million 00:01:04.960 |
would take until 2030 to fall to something as insignificant as $30. If Stanford have done what 00:01:12.660 |
they claim, then 99% of this cost reduction has happened within five weeks of this prediction 00:01:19.760 |
being published, not eight years. As AI researcher Eliezer Yudkowsky puts it, I don't think 00:01:26.160 |
people realize what a big deal it is that Stanford retrained a Lama model by cheaply fine-tuning it. 00:01:33.180 |
Now I'm going to explain all of this in a moment. He then goes on, I'm not sure I can convey how 00:01:37.860 |
much this is a brand new idiom of AI as a technology. Now Stanford claim their model 00:01:43.260 |
performs comparably to DaVinci 3, which is GPT-3.5. Of course, I'm going to test and analyze this in 00:01:49.740 |
a moment, but how could it be that a $600 model can compete with ChatGPT? Well, do you remember 00:01:56.140 |
how Meta open sourced their Lama models about two weeks ago? Stanford used the weakest of these 00:02:03.040 |
open source models, the $7 billion parameter one, and then essentially they recruited GPT-3.5 00:02:09.640 |
to train that Meta model. How could they possibly do this? Well, they used Self-Instruct, 00:02:16.280 |
and I dug into the literature to find the original paper on Self-Instruct. This was 00:02:21.740 |
released in December of last year, and I'm going to give you the 30-second summary, 00:02:26.120 |
of how it works. Essentially, you start off with some human-made examples of exemplar prompts and 00:02:32.400 |
outputs. These are fed into the language model, and then you ask it to generate thousands more 00:02:37.660 |
such instances. You filter out the bad ones, and then put all the good examples back into the 00:02:43.260 |
language model. Then it understands the instructions much better and produces thousands more examples. 00:02:48.080 |
As the paper says, this is almost human annotation free. And remember this stat, it only leaves a 5% 00:02:56.100 |
gap behind Instruct GPT. What is Instruct GPT? Well, it's the breakthrough that led to ChatGPT 00:03:03.260 |
in the first place. Look at the original GPT-3. If you gave it a prompt like, 00:03:07.700 |
explain the moon landing to a six-year-old in a few sentences, you got this gobbledygook here. 00:03:12.300 |
After months of onerous human training, called reinforcement learning with human feedback, 00:03:17.240 |
it was able to follow instructions much better and produce an outcome like this. 00:03:22.080 |
But this relied on so much human labeling and human 00:03:25.880 |
rationale that it was able to do so much better. And so, this is what we're going to 00:03:56.060 |
the services like ChatGPT to develop models that compete with OpenAI. So, they knew it was possible 00:04:01.720 |
and even Stanford admit that this breakthrough enables more people, including bad actors, 00:04:06.740 |
to create new cheap models. Yudkowsky also points out that one of the reasons why ChatGPT and GPT-4 00:04:13.680 |
are so good is that they rest on proprietary data and that that was supposed to give them 00:04:18.460 |
a competitive moat, which is now revealed people can quite cheaply steal. Just before I test and 00:04:25.320 |
demonstrate our results, I'm going to show you a video of a chat GPT-3 that I made. 00:04:26.040 |
Let me summarize how it works. Using the self-instruct process, you get GPT-3.5 similar to 00:04:33.640 |
ChatGPT to create thousands and thousands, in this case, 52,000 instruction following examples, 00:04:40.360 |
automatically filtered by quality. Stanford then used an open source model, 00:04:44.920 |
indeed the weakest of the LAMA models, and trained it using those examples. 00:04:49.720 |
The end result? Alpaca. So, let's see it in action and compare it to ChatGPT and GPT-4. 00:04:55.880 |
OpenAI is a very powerful tool for learning and learning. It's a very powerful tool for learning and 00:04:56.020 |
learning. Oh, and just quickly, you know that training of the LAMA model with those 52,000 00:04:59.780 |
examples? It only took three hours and cost less than $100. The first example I'm going to show 00:05:05.300 |
you does not come from me. I found it in this academic paper linked in the description. And 00:05:10.180 |
it's a task which requires understanding detailed and dissonant scenarios, applying appropriate 00:05:16.260 |
legal precedents, and choosing the correct explanation. The correct answer, if you want 00:05:20.500 |
to read through it or not, is B. Alpaca gets this question right. Or I should say it gets it right 00:05:26.000 |
about 80% of the time. You can keep clicking generate and sometimes you do get the answer D, 00:05:30.640 |
but about 80% of the time, four times in five, you get the correct answer B. How about ChatGPT? 00:05:36.000 |
Well, every time I've tried it, it's gotten the wrong answer of C. And GPT-4? Shocking even to me, 00:05:42.160 |
it also gets it wrong and picks C. Now, before you get too excited, I am not saying that it is better 00:05:48.560 |
than or even as good as GPT-4 or ChatGPT. It's not. But remember, it's only 7 billion parameters. 00:05:55.980 |
And 600 dollars worth. Take this example. I asked it for an example of an animal that begins with 00:06:01.360 |
the same letter as the capital city of France. And it said elephant. No idea where it got that. 00:06:06.640 |
Now, in fairness, ChatGPT gave me lion and GPT-4 gave me ferret. But there are other questions 00:06:13.320 |
where alpaca definitely flops. For example, this math question, which ChatGPT and GPT-4 uniformly 00:06:19.960 |
get right, alpaca simply gets it wrong every time. I tried asking it in lots of different ways with 00:06:25.960 |
chain of thought prompting. But no, every time it gets it wrong. It's definitely not better than 00:06:30.760 |
those models. But by the end of the video, you'll see why it's revolutionary anyway. 00:06:34.780 |
At this point, if you're learning anything, please don't forget to leave a like or a comment to let 00:06:39.020 |
me know. Basic addition and subtraction, it does better. And yes, it can crank out poems, 00:06:43.920 |
solve some hella swag common sense problems, and generate literary analogies. 00:06:49.400 |
But at this point, I want to remind you of three things. First, that it was using the weakest of the 00:06:55.940 |
more open source models. They could have used the 65 billion parameter model for a bit more cost. 00:07:01.420 |
I'm sure the results would have been even more impressive. Next, you remember it was trained 00:07:06.080 |
by examples generated using the DaVinci 3 model. Well, that cost them about $0.03 per 1000 tokens. 00:07:14.240 |
But as of 48 hours ago, they could have used the GPT-4 API at a very similar cost. 00:07:21.720 |
So it wasn't the best open source model, and it wasn't trained by the best 00:07:25.920 |
GPT model. I am genuinely curious as to what the results would have been if it had been trained by 00:07:31.140 |
the 65 billion parameter model using a GPT-4 API. Maybe someone's going to do that, maybe even this 00:07:37.500 |
week. But just before we get on to Apple, Amazon, Britain, and Baidu, I just want to restate this 00:07:42.800 |
was all done for $600 or less. They even say there were training efficiencies they could have done, 00:07:48.540 |
for example, using the H100 GPUs, that would have further reduced the cost. The question is, if it's 00:07:55.900 |
going to facilitate a larger model, what's going to happen when Apple release their large language 00:07:59.940 |
model? It was only revealed yesterday in the New York Times that they are indeed working on one. 00:08:05.140 |
And don't forget, they have far more money than the other companies mentioned. Amazon recently 00:08:10.020 |
stated that they have been working on similar tech to ChatGPT for a long time. And looking in 00:08:16.100 |
the literature, as early as mid last year, they had a model called Alexa TM that outperformed GPT-3. 00:08:25.880 |
demonstrated their Ernie bot today, although they didn't allow anyone else to use it. Apparently, 00:08:31.260 |
it's better in the Chinese language than even GPT-4. But because they didn't release a paper 00:08:36.380 |
and we can't check it, we simply don't know. And of course, we can't forget Google, who just two 00:08:41.200 |
days ago announced the Palm API. What would have happened if Stanford's model had used that one? 00:08:47.000 |
I'm sure we will soon find out. But to take us back to the start, I have one overriding observation 00:08:55.860 |
weren't supposed to get this cheap this fast. That is going to upend the economics of large 00:09:01.680 |
language models. And my questions are these. Does this mean that all incentive is gone for Microsoft 00:09:07.680 |
or Google to pour in billions of dollars producing these cutting edge models if anyone can just 00:09:13.360 |
easily reproduce them? Will they react by making the models even more closed and disallowing GPT-5 00:09:20.040 |
from having an API? We don't know. But as even nation states enter this quote unquote 00:09:25.840 |
thumbs race, spending hundreds of millions of pounds, in this case to build Brit GPT, 00:09:31.300 |
are these companies and governments drifting into a war on two fronts where they compete with each 00:09:37.320 |
other, but also with outsiders who are trying to cheaply imitate their models? If you've learned 00:09:42.960 |
anything in this video, please do leave a like and leave a comment. But either way, have a wonderful