back to indexGetting Started with GPT-3 vs. Open Source LLMs - LangChain #1
Chapters
0:0 Getting LangChain
1:14 Four Components of LangChain
6:43 Using Hugging Face and OpenAI LLMs in LangChain
7:13 LangChain Hugging Face LLM
13:51 OpenAI LLMs in LangChain
18:58 Final results from GPT-3
00:00:00.000 |
Today we're going to get started with what will be a series of videos, tutorials, examples, 00:00:10.240 |
Now LangChain is a pretty new NLP framework that has become very popular very quickly. 00:00:18.400 |
At the core of LangChain you have large language models and the idea behind it is that we can 00:00:24.920 |
use the framework to build very cool apps using large language models very quickly. 00:00:31.800 |
We can use it for chatbots, generative question answering, summarization, logic loops that 00:00:38.480 |
include large language models and web search and all these crazy different things that 00:00:44.080 |
we can chain together in some sort of logical fashion. 00:00:48.680 |
In this video what we are going to do is just have a quick introduction to LangChain and 00:00:55.840 |
We're going to take a look at the core components of what will make up our chains in LangChain 00:01:02.240 |
and we're going to look at some very simple generative language examples using both the 00:01:09.520 |
HuggingFace endpoint in LangChain and the OpenAI endpoint in LangChain. 00:01:14.480 |
So let's get started by having a look at the, I think the main four components that I believe 00:01:21.880 |
So we have prompt templates, large language models, agents and memory. 00:01:26.620 |
Now prompt templates are actually pretty straight forward. 00:01:30.760 |
They are templates for different types of prompts. 00:01:34.840 |
Now let me actually show you a couple of examples from an app I built a while ago. 00:01:39.840 |
So in this app here we have all these different styles, so these instructions that we can 00:01:45.360 |
pass to a large language model, we have conservative Q&A, so basically we want to answer a question 00:01:51.280 |
based on the context below and if the question can't be answered based on the context say 00:01:56.040 |
I don't know and then you feed in the context and you feed in these questions. 00:02:00.040 |
We have simple instructions, given the common questions and answers below I think we would 00:02:05.440 |
be feeding in here which would be the question and these would be the answers. 00:02:12.080 |
Extract key libraries and tools, so this is talking about extracting like code libraries 00:02:16.800 |
that you would use, so write a list of libraries and tools present in the context below and 00:02:22.080 |
then this would basically return items from a database and you'd see from that a set of 00:02:28.720 |
libraries that were mentioned in whatever information you've retrieved. 00:02:32.440 |
So these are the type of things I mean when we have, when I say we have prompt templates. 00:02:38.120 |
Next of course we have large language models, I don't think I really need to explain them, 00:02:41.400 |
it's just big models that are capable of doing pretty incredible things like GPT-3, Bloom 00:02:48.880 |
Next we have agents, now agents are processes that use large language models to decide what 00:02:56.180 |
actions should be taken given a particular query or set of instructions or so on. 00:03:01.660 |
So these can be paired with tools like web search or calculators and we package them 00:03:06.720 |
all into this logical loop of operations, now it sounds pretty complicated so it's probably 00:03:10.800 |
best I just show you an example of what this is. 00:03:13.980 |
So if we go over to the LangChain website, they have a really cool example in agents, 00:03:18.640 |
getting started and we'll just scroll down a little bit and we can see here an example. 00:03:26.360 |
So this is the agent executor chain, so there's a few components in here, the first thing 00:03:31.920 |
that comes in is a thought from the large language model and so we're basing it on this 00:03:37.760 |
query here, who is Olivia Wilde's boyfriend, what is his current age raised to 0.23 power. 00:03:44.160 |
So there's a few logical steps in this process and this is why we might need to use something 00:03:49.120 |
So the model, the large language model says okay from this, it's thought is I need to 00:03:55.200 |
find out who Olivia Wilde's boyfriend is and then calculate his age raised to 0.23 power, 00:04:01.040 |
the action here that the agent is deciding is search, okay and then it decides okay the 00:04:08.460 |
input for this search action must initially be Olivia Wilde's boyfriend. 00:04:14.600 |
Now this here, so this defines that we're going to use a web search component, it goes 00:04:20.200 |
to the web search component, types this in and the result that it gets is this, Harry 00:04:25.200 |
Styles, so that's the observation based on what we have so far and this is part of a 00:04:30.520 |
specific agent framework called React and at some point in the future we will definitely 00:04:38.120 |
For now let's continue with this, based on this observation the language model now thinks 00:04:44.000 |
okay I need to find out Harry Styles' age, so it starts the search again, it searches 00:04:49.060 |
for his age, it gets 28 years and then the next thought is I need to calculate 28 raised 00:04:54.400 |
to the 0.23 power, goes to the calculator action this time, so not search, it calculates 00:04:59.960 |
this and we get the answer here, okay and then the final thought is I know the final 00:05:04.640 |
answer, the final answer is this, okay so that's an example of one of these agents using 00:05:11.600 |
multiple tools, in here we have the calculator and also the search tool as well. 00:05:17.280 |
So I think they are a pretty exciting and cool use of language. 00:05:23.000 |
Now and then the final one is memory, so we have short-term, long-term memory for our 00:05:27.240 |
models, now again this is really interesting, for long-term memory if you have watched my 00:05:34.800 |
videos, if you've read my articles or anything like that in the past you have probably come 00:05:40.000 |
We're going to take a look at this getting started, we have this conversation buffer 00:05:44.240 |
memory which essentially you would use in a chatbot and it will just remember all the 00:05:50.660 |
previous inputs and outputs and adds them into your next set of generations, so this 00:05:56.240 |
is what you would use to have a conversation with a chatbot where it's remembering the 00:06:02.720 |
previous steps of that conversation, there's some different versions of that like conversation 00:06:08.200 |
summary memory and all of this is essentially what I would refer to as the short-term memory 00:06:15.120 |
and then on the other side, so for long-term memory you have the data augmented generation 00:06:20.640 |
stuff which is essentially where you're retrieving bits of information to feed into your model 00:06:26.440 |
from an external data source and that would just allow it to essentially answer questions 00:06:33.960 |
in a specific domain better or keep more up-to-date information or simply allow us to fact check 00:06:41.300 |
what the large language model is actually saying. 00:06:43.920 |
Now they're the main components to LangChain and what we'll do now is actually just get 00:06:49.160 |
started and we're going to do something really simple which is just using large language 00:06:52.880 |
models in LangChain, so to get started we need to just pip install LangChain, so this 00:06:58.720 |
will obviously install the library and what we will do is just go through some really 00:07:03.720 |
basic examples of using LangChain for large language model generation with both OpenAI 00:07:10.960 |
and HuggingFace, so let's get started with HuggingFace, now if you would like to follow 00:07:17.240 |
along with this I'll leave a link to this notebook in the top right of the video right 00:07:22.440 |
now or you can click a link in the video description to take you to this colab, so with HuggingFace 00:07:30.200 |
we need to install the HuggingFace hub as a prerequisite and what's actually going to 00:07:35.720 |
happen here is we're not going to be running HuggingFace models locally, we're actually 00:07:39.560 |
going to be calling their inference API and we're going to be getting results directly 00:07:45.000 |
from that, so to do that we actually do need a HuggingFace API token and this is all free 00:07:52.600 |
by the way, so to get that we need to go to HuggingFace.co and if you don't have an account 00:08:02.240 |
you'll need to sign up for one, I believe the sign up will be over here on the top right 00:08:06.360 |
of the web page, you need to click here if you have signed in and you need to go to settings 00:08:13.280 |
then you head over to access tokens and you will need to get, I think you can actually 00:08:19.200 |
just use a read token but a write token you can use as well, in either case if this is 00:08:24.840 |
your first time you will need to click new token, either choose read or write, I'm going 00:08:30.360 |
to go with write because I know that one does definitely work, you just have to write something 00:08:34.360 |
in here and then you click generate token, I've already created mine so I'm just going 00:08:38.680 |
to copy this okay and then with that you would just put it into here, now I've already set 00:08:46.680 |
my environment variable here so I'm not going to do it again and then we can come down and 00:08:53.760 |
we can start generating text using a HuggingFace model from HuggingFace hub, so there are a 00:09:00.600 |
few things that we'll need for this, we're going to be using a prompt template which 00:09:04.400 |
is a template for our prompt as I mentioned before, we're going to be using HuggingFace 00:09:09.200 |
hub class from LangChain and we're also going to be using this chain which is like a pipeline 00:09:16.360 |
or a chain of steps from LangChain, now this one is pretty simple it's just a prompt template 00:09:24.280 |
so you create your prompt based on this prompt template and then you generate your text using 00:09:30.120 |
your large language model, now we are going to be initializing a large language model 00:09:36.560 |
from HuggingFace and for that we are going to be using this model here, now this model 00:09:43.120 |
if we go over to HuggingFace we can click on here type in Flan and you can see there 00:09:52.160 |
are a few different models here, the Google Flan T5 XL is not the biggest but it is the 00:09:59.200 |
biggest that will work on the free tier of inference here, so that's what we're using, 00:10:10.520 |
if you try and use the XL model it will I think more likely not time out, at least it 00:10:16.480 |
did for me, so with that in mind we initialize the model, we set the randomness or the temperature 00:10:24.480 |
of the model to be very low so that we get relatively stable results, if you want more 00:10:29.960 |
creative writing you would want to increase this value and then we create our template, 00:10:35.180 |
so our template is going to just be very simple it's going to be a question answering template 00:10:38.840 |
where you have a question, now this is our input variable that we'll be using in the 00:10:44.640 |
template and then we have our answer and then the model will essentially continue from this 00:10:49.360 |
point, so with that we use our prompt template, we use this template here and we just say 00:10:56.520 |
the input variables within this template is a question, because this isn't an F string 00:11:02.480 |
here, if it was an S string it would look different, it's actually just a string, so 00:11:07.760 |
here we're saying whatever the question input is we're going to put it here, then we create 00:11:13.920 |
our chain prompt followed by our large language model and then we're going to ask the question 00:11:19.520 |
which NFL team won the Super Bowl in the 2010 season and we're just going to print that 00:11:24.480 |
out so I'm going to run this, okay and we get Green Bay Packers, now if we would like 00:11:29.720 |
to ask multiple questions together we have to do this, so we get like a list of dictionaries, 00:11:36.960 |
within each one of those dictionaries we need to have the input variables, so if we had 00:11:40.560 |
multiple input variables we would pass them into here, so this question is going to be 00:11:46.040 |
mapped to question in our template and now I'm going to ask the question, so the first 00:11:50.400 |
one same thing again, now I'm going to ask a bit more of like a logical question here, 00:11:55.720 |
some more facts and again like common sense and we can run these, I think this model doesn't 00:12:03.160 |
actually do so well with these, so we have this kind of like format here, first one I 00:12:08.680 |
believe is correct, the second one 184 centimeters which is not true it should be about 193 centimeters 00:12:17.840 |
for this one so who's the 12th person on the moon, it's saying John Glenn who never went 00:12:24.120 |
to the moon and then how many eyes does the blade of grass have, apparently it has one, 00:12:31.320 |
so you know this model is, it's not the biggest model, it's somewhat limited and there are 00:12:38.400 |
other models that are open source that will perform much better like bloom but when we're 00:12:44.240 |
using this endpoint here without running these locally we are kind of restricted in size 00:12:51.540 |
to this model, so that's what we have there, one other thing now obviously these haven't 00:12:59.920 |
performed so well so these are not very likely to perform well either but what we can do 00:13:05.240 |
with a lot of large language models is we can actually feed in all these questions at 00:13:10.280 |
once so we wouldn't need to do this like iteratively calling the large language model and asking 00:13:15.560 |
it one question and then another question and another question, some of the better large 00:13:19.920 |
language models as we'll see soon would be able to handle them all at once, now in this 00:13:24.680 |
case we'll see it doesn't quite work but we'll see later that there are models that can do 00:13:31.360 |
that, so the only thing that changed here is I changed my template so I said answer 00:13:36.120 |
the following questions one at a time, pass in those questions and then try to get some 00:13:41.280 |
answers, the model didn't really listen to me so it just kind of did its own thing, nonetheless 00:13:48.040 |
that is what we got for that one, now let's compare that to the OpenAI approach of doing 00:13:54.960 |
things, now for this we again need another prerequisite which is the OpenAI library, 00:14:01.480 |
just say pip install OpenAI and we come down here, we will also need to pass in our OpenAI 00:14:09.000 |
API key, let me just show you how to get that quickly, so if we go OpenAI it was at betaopenai.com 00:14:17.360 |
but I think they've changed it recently so it's no longer beta, it's just openai.com/api, 00:14:24.520 |
you come over here to log in or sign up if you don't have an account, once you have logged 00:14:31.960 |
in you have to head over to the right here, go to account, come down and I think we need 00:14:39.600 |
to just go to settings, API keys and then you can create a new secret key here so you 00:14:46.600 |
just click create new secret key, okay for me it doesn't actually let me create another 00:14:50.960 |
one because I already have too many here but that's fine you just create your new secret 00:14:55.680 |
key and then you just copy it, for me I have already added it to my environment variables 00:15:02.120 |
in OpenAI API key so I don't need to rerun that, one thing is if you are using OpenAI 00:15:10.520 |
via Azure you should also set these things as well so you should set that you're using 00:15:15.720 |
Azure here, the OpenAI API version, Azure has several API versions apparently so you 00:15:23.440 |
will need to set that and then you also need to set the URL for your Azure OpenAI resource 00:15:30.520 |
here as well and then here I don't think that's relevant so skip that and after that so we 00:15:40.320 |
need to decide on which model we're going to use, we're going to be using the text DaVinci 00:15:44.280 |
0.0.3 model which is one of the better generation models from OpenAI runners, okay and that 00:15:52.360 |
is our large language model, that is our lang chain, large language model there, we can 00:15:57.440 |
actually generate stuff with this directly but as we did before we are going to use the 00:16:04.080 |
large language model chain, again if you're using Azure you'll need to follow this step 00:16:09.240 |
here rather than what I just did, so come down here we use large language model chain 00:16:14.480 |
again and we're using the same prompt as what we initially created before so this is a simple 00:16:19.600 |
question answer prompt, large language model is this time DaVinci and I'm going to run 00:16:25.840 |
this and we get this answer so the Green Bay Packers won the Super Bowl in the 2010 season 00:16:31.400 |
so a little more descriptive than the answer we got from our T5 Flan model but that's to 00:16:37.320 |
be expected, the OpenAI's DaVinci model is a lot bigger and pretty advanced so after 00:16:44.760 |
that let's try again with multiple questions, let's see what we get, okay so we get the 00:16:51.960 |
Green Bay Packers won the Super Bowl in the 2010 season, correct, next we get this which 00:17:00.060 |
again is mostly wrong so Eugene A. Cernan was the 12th person to walk on the moon, as 00:17:07.680 |
far as I know it is Harrison Smith I think, yeah Harrison Smith so not quite right but 00:17:16.600 |
I think the rest of it was very close so Apollo 17 and I'm pretty sure it was December 1972 00:17:25.920 |
as well although not 100% sure on that so we can assume that is correct so the Apollo 00:17:31.560 |
17 mission in December 1972 and I think this guy is actually his teammate so I suppose 00:17:38.360 |
this would have been the 11th person on the moon so it got, it did get pretty close but 00:17:43.280 |
not quite there and that is actually the third question I've skipped one by accident, if 00:17:47.360 |
I'm 6 foot 4 inches how tall am I in centimetres, very specific we've got 193.04 centimetres 00:17:56.960 |
which is probably like the exact measurement but I know for sure 193 is correct and then 00:18:05.120 |
on to the last question, how many eyes does a blade of grass have, we get a blade of grass 00:18:09.880 |
does not have any eyes, okay so we get a sensible answer this time and then I wanted to very 00:18:15.240 |
quickly just show this, so this is a list of items and when passing this to the large 00:18:19.640 |
items model chain dot run this is actually incorrect so this is actually just going to 00:18:24.240 |
see all of this as a single string, now in this case with our model, with our DaVinci 00:18:29.240 |
model it does pretty well, still got this one wrong but it's actually able to manage 00:18:35.280 |
with this even though it's not in the correct format or when asking these questions one 00:18:39.840 |
by one and you can see it actually does get, sometimes it gets the correct answer and sometimes 00:18:45.400 |
it messes other questions up which is I think pretty interesting to see, now the final one 00:18:53.240 |
is what we did before so where we come up to here where we have the string and let's 00:19:02.560 |
go ahead and do that, bring it down here, so I'm going to answer multiple questions 00:19:12.440 |
in a single string, a large items model actually I want to be using DaVinci, okay and we get 00:19:19.600 |
this so the green, Bay Packers won the Superbowl in the 2010 season, I am 193 centimeters tall, 00:19:26.400 |
yep Edwin Buzz Aldrin wrong and Ablade Agresta does not have eyes, so we get some good answers 00:19:33.200 |
there, okay so that's it for this very quick introduction to Langtrain, as I said in the 00:19:40.600 |
future we're going to be covering this library in a lot more detail and as you've already 00:19:44.920 |
seen at the start of the video there are some pretty interesting things we can do with this 00:19:49.120 |
library very easily, but for now that's it for this video I hope all this been interesting 00:19:55.440 |
and useful, so thank you very much for watching and I will see you again in the next one,