back to index

The Model That Changes Everything: Alpaca Breakthrough (ft. Apple's LLM, BritGPT, Ernie and AlexaTM)


Whisper Transcript | Transcript Only Page

00:00:00.000 | A little under 72 hours ago, a language model was released that could end up being as consequential
00:00:06.720 | as GPT-4. Now I know you're thinking that's a bold claim, but let's see if you agree with it
00:00:12.980 | after watching what happened. I will explain as best as I can what was released and how revelations
00:00:18.520 | in the last 24 hours from Apple, Amazon, Britain and Baidu make it particularly significant.
00:00:26.200 | The model was Stanford's Alpaca and here is the key line. Alpaca behaves qualitatively similarly
00:00:33.840 | to OpenAI's Text DaVinci 3 while being surprisingly small and easy and cheap to reproduce at under $600.
00:00:44.340 | Now that is cool, but how does that change the world? Well, first it wasn't supposed to get this
00:00:51.080 | cheap this fast. Just six weeks ago or five weeks before they released
00:00:56.080 | the first version of the Alpaca, the model was Stanford's Alpaca. And here is the key line.
00:00:56.180 | The model was Stanford's Alpaca. And here is the key line.
00:00:56.680 | ARK Investment Management put out this prediction that the 2020 cost of GPT-3 at $4.6 million
00:01:04.960 | would take until 2030 to fall to something as insignificant as $30. If Stanford have done what
00:01:12.660 | they claim, then 99% of this cost reduction has happened within five weeks of this prediction
00:01:19.760 | being published, not eight years. As AI researcher Eliezer Yudkowsky puts it, I don't think
00:01:26.160 | people realize what a big deal it is that Stanford retrained a Lama model by cheaply fine-tuning it.
00:01:33.180 | Now I'm going to explain all of this in a moment. He then goes on, I'm not sure I can convey how
00:01:37.860 | much this is a brand new idiom of AI as a technology. Now Stanford claim their model
00:01:43.260 | performs comparably to DaVinci 3, which is GPT-3.5. Of course, I'm going to test and analyze this in
00:01:49.740 | a moment, but how could it be that a $600 model can compete with ChatGPT? Well, do you remember
00:01:56.140 | how Meta open sourced their Lama models about two weeks ago? Stanford used the weakest of these
00:02:03.040 | open source models, the $7 billion parameter one, and then essentially they recruited GPT-3.5
00:02:09.640 | to train that Meta model. How could they possibly do this? Well, they used Self-Instruct,
00:02:16.280 | and I dug into the literature to find the original paper on Self-Instruct. This was
00:02:21.740 | released in December of last year, and I'm going to give you the 30-second summary,
00:02:26.120 | of how it works. Essentially, you start off with some human-made examples of exemplar prompts and
00:02:32.400 | outputs. These are fed into the language model, and then you ask it to generate thousands more
00:02:37.660 | such instances. You filter out the bad ones, and then put all the good examples back into the
00:02:43.260 | language model. Then it understands the instructions much better and produces thousands more examples.
00:02:48.080 | As the paper says, this is almost human annotation free. And remember this stat, it only leaves a 5%
00:02:56.100 | gap behind Instruct GPT. What is Instruct GPT? Well, it's the breakthrough that led to ChatGPT
00:03:03.260 | in the first place. Look at the original GPT-3. If you gave it a prompt like,
00:03:07.700 | explain the moon landing to a six-year-old in a few sentences, you got this gobbledygook here.
00:03:12.300 | After months of onerous human training, called reinforcement learning with human feedback,
00:03:17.240 | it was able to follow instructions much better and produce an outcome like this.
00:03:22.080 | But this relied on so much human labeling and human
00:03:25.880 | rationale that it was able to do so much better. And so, this is what we're going to
00:03:26.080 | see in the next episode.
00:03:56.060 | the services like ChatGPT to develop models that compete with OpenAI. So, they knew it was possible
00:04:01.720 | and even Stanford admit that this breakthrough enables more people, including bad actors,
00:04:06.740 | to create new cheap models. Yudkowsky also points out that one of the reasons why ChatGPT and GPT-4
00:04:13.680 | are so good is that they rest on proprietary data and that that was supposed to give them
00:04:18.460 | a competitive moat, which is now revealed people can quite cheaply steal. Just before I test and
00:04:25.320 | demonstrate our results, I'm going to show you a video of a chat GPT-3 that I made.
00:04:26.040 | Let me summarize how it works. Using the self-instruct process, you get GPT-3.5 similar to
00:04:33.640 | ChatGPT to create thousands and thousands, in this case, 52,000 instruction following examples,
00:04:40.360 | automatically filtered by quality. Stanford then used an open source model,
00:04:44.920 | indeed the weakest of the LAMA models, and trained it using those examples.
00:04:49.720 | The end result? Alpaca. So, let's see it in action and compare it to ChatGPT and GPT-4.
00:04:55.880 | OpenAI is a very powerful tool for learning and learning. It's a very powerful tool for learning and
00:04:56.020 | learning. Oh, and just quickly, you know that training of the LAMA model with those 52,000
00:04:59.780 | examples? It only took three hours and cost less than $100. The first example I'm going to show
00:05:05.300 | you does not come from me. I found it in this academic paper linked in the description. And
00:05:10.180 | it's a task which requires understanding detailed and dissonant scenarios, applying appropriate
00:05:16.260 | legal precedents, and choosing the correct explanation. The correct answer, if you want
00:05:20.500 | to read through it or not, is B. Alpaca gets this question right. Or I should say it gets it right
00:05:26.000 | about 80% of the time. You can keep clicking generate and sometimes you do get the answer D,
00:05:30.640 | but about 80% of the time, four times in five, you get the correct answer B. How about ChatGPT?
00:05:36.000 | Well, every time I've tried it, it's gotten the wrong answer of C. And GPT-4? Shocking even to me,
00:05:42.160 | it also gets it wrong and picks C. Now, before you get too excited, I am not saying that it is better
00:05:48.560 | than or even as good as GPT-4 or ChatGPT. It's not. But remember, it's only 7 billion parameters.
00:05:55.980 | And 600 dollars worth. Take this example. I asked it for an example of an animal that begins with
00:06:01.360 | the same letter as the capital city of France. And it said elephant. No idea where it got that.
00:06:06.640 | Now, in fairness, ChatGPT gave me lion and GPT-4 gave me ferret. But there are other questions
00:06:13.320 | where alpaca definitely flops. For example, this math question, which ChatGPT and GPT-4 uniformly
00:06:19.960 | get right, alpaca simply gets it wrong every time. I tried asking it in lots of different ways with
00:06:25.960 | chain of thought prompting. But no, every time it gets it wrong. It's definitely not better than
00:06:30.760 | those models. But by the end of the video, you'll see why it's revolutionary anyway.
00:06:34.780 | At this point, if you're learning anything, please don't forget to leave a like or a comment to let
00:06:39.020 | me know. Basic addition and subtraction, it does better. And yes, it can crank out poems,
00:06:43.920 | solve some hella swag common sense problems, and generate literary analogies.
00:06:49.400 | But at this point, I want to remind you of three things. First, that it was using the weakest of the
00:06:55.940 | more open source models. They could have used the 65 billion parameter model for a bit more cost.
00:07:01.420 | I'm sure the results would have been even more impressive. Next, you remember it was trained
00:07:06.080 | by examples generated using the DaVinci 3 model. Well, that cost them about $0.03 per 1000 tokens.
00:07:14.240 | But as of 48 hours ago, they could have used the GPT-4 API at a very similar cost.
00:07:21.720 | So it wasn't the best open source model, and it wasn't trained by the best
00:07:25.920 | GPT model. I am genuinely curious as to what the results would have been if it had been trained by
00:07:31.140 | the 65 billion parameter model using a GPT-4 API. Maybe someone's going to do that, maybe even this
00:07:37.500 | week. But just before we get on to Apple, Amazon, Britain, and Baidu, I just want to restate this
00:07:42.800 | was all done for $600 or less. They even say there were training efficiencies they could have done,
00:07:48.540 | for example, using the H100 GPUs, that would have further reduced the cost. The question is, if it's
00:07:55.900 | going to facilitate a larger model, what's going to happen when Apple release their large language
00:07:59.940 | model? It was only revealed yesterday in the New York Times that they are indeed working on one.
00:08:05.140 | And don't forget, they have far more money than the other companies mentioned. Amazon recently
00:08:10.020 | stated that they have been working on similar tech to ChatGPT for a long time. And looking in
00:08:16.100 | the literature, as early as mid last year, they had a model called Alexa TM that outperformed GPT-3.
00:08:23.560 | And as you may already know, Baidu,
00:08:25.880 | demonstrated their Ernie bot today, although they didn't allow anyone else to use it. Apparently,
00:08:31.260 | it's better in the Chinese language than even GPT-4. But because they didn't release a paper
00:08:36.380 | and we can't check it, we simply don't know. And of course, we can't forget Google, who just two
00:08:41.200 | days ago announced the Palm API. What would have happened if Stanford's model had used that one?
00:08:47.000 | I'm sure we will soon find out. But to take us back to the start, I have one overriding observation
00:08:52.800 | and two questions. First, these models,
00:08:55.860 | weren't supposed to get this cheap this fast. That is going to upend the economics of large
00:09:01.680 | language models. And my questions are these. Does this mean that all incentive is gone for Microsoft
00:09:07.680 | or Google to pour in billions of dollars producing these cutting edge models if anyone can just
00:09:13.360 | easily reproduce them? Will they react by making the models even more closed and disallowing GPT-5
00:09:20.040 | from having an API? We don't know. But as even nation states enter this quote unquote
00:09:25.840 | thumbs race, spending hundreds of millions of pounds, in this case to build Brit GPT,
00:09:31.300 | are these companies and governments drifting into a war on two fronts where they compete with each
00:09:37.320 | other, but also with outsiders who are trying to cheaply imitate their models? If you've learned
00:09:42.960 | anything in this video, please do leave a like and leave a comment. But either way, have a wonderful