back to indexNEW Falcon 180B — the Open Source GPT-4?
Chapters
0:0 Most Powerful Open Source LLM
0:38 Falcon 180B Release
2:25 Hugging Face Falcon 180B
3:10 Falcon 180B on LLM Leaderboards
4:55 Falcon 180B Hardware Requirements
5:47 Cost of Running Falcon 180B
8:24 Falcon 180B vs GPT-4
11:23 Falcon 180B vs GPT-4 on Code
17:49 Falcon 180B Summary
00:00:00.000 |
Today, we're going to take a look at the new Falcon 180B large language model. 00:00:08.880 |
It is apparently on par with the BARD LLM and is very close to GPT-4 level performance. 00:00:16.960 |
At the same time, it's licensed for commercial use and we can obviously access it ourselves. 00:00:23.880 |
Although obviously it does take quite a bit of hardware to actually run the thing. 00:00:28.320 |
So let's begin by just taking a look at what this model is and how it compares 00:00:36.080 |
to other models that are available right now. 00:00:38.760 |
OK, so Falcon is, again, it's from the Technology Innovation Institute. 00:00:46.600 |
It's an Abu Dhabi based, I assume, research lab. 00:00:51.280 |
And they actually released earlier this year, Falcon 40B. 00:00:58.160 |
It was the best performing LLM or pre-trained LLM on Hugging Face's LLM leaderboards, 00:01:07.120 |
Now, the 180B model is actually, again, at the top of those leaderboards. 00:01:16.720 |
And you can see here that it ranks just behind OpenAI's GPT-4, 00:01:24.040 |
especially when you consider the hypothesis or the sort of leaked sizes of these models. 00:01:34.920 |
And just that, GPT-4, if you believe George Hotz and Sumit of PyTorch, 00:01:42.760 |
which are both, you know, they're kind of in-the-know people. 00:01:50.160 |
of parameters for GPT-4 would be over a trillion. 00:01:53.400 |
And that's because there's, according to them. 00:01:56.120 |
Well, OK, so GPT-4 is 220 billion in each head and then it's an 8-way mixture model. 00:02:01.520 |
Eight models, each with 220 billion parameters, 00:02:05.880 |
all combined with a mixture of experts approach. 00:02:09.640 |
So the fact that this model is close in performance to GPT-4, 00:02:15.120 |
despite being smaller than just one GPT-4, I think that is kind of impressive. 00:02:36.680 |
And we can actually see, OK, they have a base and a chat model, right? 00:02:43.280 |
and chat model, which has been fine-tuned for chat. 00:02:46.600 |
And we can see here, not that many downloads so far. 00:02:50.200 |
I suppose it is pretty early days, but also it's a it's a big model. 00:02:54.800 |
It's going to be hard for many people to actually deploy this. 00:03:00.200 |
OK, so, yeah, they talk about it a little bit here and even show us how we would use 00:03:10.040 |
Now, they do mention that it is the tops lead board for pre-trained open access models. 00:03:16.560 |
So we can come over to the OpenLLM leaderboard here. 00:03:19.720 |
And then what we can do is we want to look for pre-trained only. 00:03:27.680 |
So for some reason, I still have fine-tuned on there. 00:03:32.560 |
Oh, this is Streamlit or Gradio being annoying. 00:03:35.960 |
But anyway, OK, so all these with the little diamond here are the fine-tuned 00:03:41.160 |
models we want to look at just pre-trained, right? 00:03:43.760 |
There are fine-tuned models that perform better than 00:03:48.240 |
Falcon 180B, but we come down here, the first model that is actually just 00:03:52.960 |
pre-trained, not fine-tuned, is Falcon 180B here, right? 00:03:56.400 |
So it's the highest performing pre-trained only model on the leaderboards. 00:04:03.840 |
I think there's something going on with the leaderboard at the moment. 00:04:13.480 |
It's at the top if you don't include all these fine-tuned models. 00:04:17.160 |
And I think the idea here is really that, OK, 00:04:21.360 |
yeah, right now there's all these fine-tuned models that are better than 00:04:25.400 |
the Falcon model, the pre-trained Falcon model. 00:04:30.200 |
But these fine-tuned models are fine-tuned from, like, lesser pre-trained models. 00:04:36.560 |
So the idea is that people are going to fine-tune Falcon 180B 00:04:43.280 |
And technically, we should see very soon fine-tuned models 00:04:48.920 |
that are higher performance than these fine-tuned models. 00:04:58.160 |
So we can see here that most people are probably going to go for the minimum. 00:05:08.360 |
INT4 quantization, which will actually slow things down quite a lot. 00:05:22.360 |
That would require eight A100 GPUs, the 40GB ones, OK? 00:05:35.760 |
upgrade performance significantly or really a noticeable amount at all. 00:05:53.880 |
so this is a SageMaker pricing page and we can see all the instances. 00:05:59.160 |
We need to come down to these ones here, right where they have a GPU model. 00:06:21.040 |
even for float16 precision, we could actually use this one here. 00:06:28.800 |
Now, for some reason, they don't have the actual pricing on this page. 00:06:33.360 |
I don't know why. Maybe I'm maybe it's I don't know. 00:06:37.680 |
Which I don't understand, but fine, let's just I'm going to copy this and I'm 00:06:44.840 |
Again, no idea why they don't put the pricing on the same page as the instances. 00:06:54.880 |
All right, so I think we've come to here on demand pricing. 00:07:13.320 |
OK, maybe it's not accessible for a lot, most of us normal people. 00:07:28.680 |
That is enough to fit our quantized model, like a fully quantized model. 00:07:58.440 |
four twenty two thousand dollars a month, which is on a good month on, you know, 00:08:05.720 |
the shortest month you can possibly have, which is quite a lot. 00:08:10.960 |
It makes you makes you appreciate opening highs pricing a little bit. 00:08:16.200 |
So, yeah, that would be relatively expensive to run. 00:08:21.520 |
But let's say let's say we do want to run it. 00:08:28.160 |
let's see, there is a somewhere there is a demo. 00:08:36.040 |
OK, so we have this little demo here, Falcon 180 B demo. 00:08:41.480 |
We can ask you questions and I think this actually runs pretty quickly. 00:08:48.040 |
I'm going to ask you to tell me about the latest news on LMS. 00:09:04.400 |
OK, I want to know what its knowledge cut off is. 00:09:14.520 |
What's your knowledge cut off date? I'm not sure why I struggled with that so much. 00:09:57.400 |
If we have one ChatGPT release date, I miss a year. 00:10:25.800 |
So GPT-4 also doesn't know they have their last update in September 2021. 00:10:37.640 |
although it seems to be a bit confused about when that knowledge is from. 00:10:42.240 |
Are you sure ChatGPT was released on that date? 00:10:55.440 |
So, yeah, they seem to be a little confused about dates here. 00:11:01.000 |
Anyway, nonetheless, at least December or November 2022. 00:11:10.360 |
which is roughly a year later than or at least a year later than GPT-4. 00:11:17.680 |
Now, okay, let's ask it something coding related. 00:11:22.560 |
Now, one of the things I always ask GPT-4 about is code. 00:11:33.120 |
And this is from a project I covered earlier this year, 00:11:41.040 |
a conversational agent using OpenAI's function calling. 00:11:45.160 |
I'm just kind of curious how hard it would be. 00:11:47.520 |
You can see here, if you're interested, it's FuncAgent. 00:11:53.040 |
I could be wrong. I think you can pip install FuncAgent. 00:11:56.400 |
Now, I want to see how does this model do with code? 00:12:02.440 |
Right. So one thing that we should do is there's additional inputs here. 00:12:07.000 |
We should reduce the temperature, in my opinion, 00:12:28.720 |
So I'm going to say, can you tell me what this code is doing? 00:12:37.520 |
I'm going to give it the Python, paste my code in here and submit. 00:12:50.880 |
Okay. So there is actually a limitation here. 00:12:53.680 |
It tells us a demo is limited to a session length of 1000 words. 00:12:57.800 |
So I had to remove a few of the functions or methods from the code. 00:13:01.720 |
But that's good, because now we can ask GPT four and Falcon one ATB. 00:13:08.240 |
What is missing or what needs to be improved? 00:13:16.160 |
Now I have the code here on the left is the full code. 00:13:20.920 |
It's 114 lines on the right is my modified code 48 lines. 00:13:29.240 |
I'm missing call function and final thought answer. 00:13:32.240 |
So let's say, okay, can you explain this code? 00:13:44.400 |
So I mean, it's really quick to respond, which is cool. 00:13:51.880 |
that initializes an API key and list of functions to be using natural 00:14:00.520 |
Open AIs API chat history and internal thoughts. 00:14:03.520 |
So it has that choice of sub conditions like reaching limit on internal thoughts. 00:14:15.560 |
framework for an AI agent that can respond to user queries and use predefined 00:14:20.800 |
functions to generate more complex responses. 00:14:30.680 |
So same again, I'm going to ask you to explain this code. 00:14:35.840 |
This code defines a Python class called agent, 00:14:39.120 |
presumably to utilize models like GPT4, so on and so on. 00:14:50.680 |
So the init method, initialize the objects, plus 00:14:59.840 |
It hasn't got the it just says that this is to generate responses in a chat like 00:15:06.880 |
manner, which, yes, kind of true, but it's missing kind of the point of a few items 00:15:15.200 |
So I think the definition of the components of code is pretty good. 00:15:23.880 |
So Falcon didn't, at least within that initial response, 00:15:29.000 |
which is a lot shorter, to be fair, it didn't mention missing pieces. 00:15:40.560 |
possibly invoking some internal functions and returns a response back to user. 00:15:47.720 |
It does identify these things I'm missing, which is cool. 00:15:51.920 |
But it doesn't really give us an idea of what the code is actually doing. 00:15:57.680 |
There's nothing here that's like, oh, this is an agent. 00:16:06.640 |
but it doesn't tell us how those internal thoughts use the Falcon one here. 00:16:15.720 |
It uses predefined functions to generate more complex responses. 00:16:20.960 |
The ask method creates an internal thought process for the agent, 00:16:24.760 |
which can include function calls and final answers. 00:16:28.920 |
And it mentions internal thoughts here as well. 00:16:32.560 |
I think overall, this explanation is easy to understand to me, at least. 00:16:37.360 |
But it hasn't mentioned anything about what is missing. 00:16:40.760 |
I'm just going to ask, are there any missing methods in this code? 00:16:53.320 |
However, without additional context about the purpose of this class, 00:16:57.160 |
it's difficult to determine if there are any missing methods. 00:17:05.400 |
Are there any class methods that are referred to, but missing in this code block? 00:17:18.520 |
I don't think I can get any more specific than that. 00:17:26.560 |
So it doesn't seem to do so well on kind of identifying issues with the code. 00:17:31.480 |
Interesting. But I think the actual explanation to code is pretty good. 00:17:37.080 |
So that's kind of I think, okay, when you're developing something, 00:17:44.240 |
GPT four is probably going to be more useful there because it's way more specific. 00:17:49.720 |
I mean, there's quite a lot to consider here. 00:17:53.440 |
As they said, it's maybe not too far behind GPT four, 00:18:04.240 |
It's really hard to justify against just paying the price of open AI. 00:18:12.000 |
But at the same time, if privacy and having your data local, 00:18:16.520 |
especially like in the EU, you have things like GDPR. 00:18:20.680 |
There are maybe some cases where this sort of model is actually 00:18:26.520 |
the only alternative you would have to something like GPT four despite the cost. 00:18:33.080 |
It's going to be an expensive alternative, unfortunately, for now. 00:18:36.680 |
But I'm sure over time it will probably decrease in price. 00:18:40.800 |
There will be more optimized ways of deploying these models. 00:18:44.200 |
But anyway, for now, that's it for this video. 00:18:51.320 |
I'm sure we're going to see a lot of interesting fine tuned models come out 00:18:55.480 |
of this that will be even cooler, but yeah, I'll leave it there for now. 00:19:00.840 |
I hope it's been useful and interesting, and I will see you again in the next one.