back to index

Getting Started with GPT-3 vs. Open Source LLMs - LangChain #1


Chapters

0:0 Getting LangChain
1:14 Four Components of LangChain
6:43 Using Hugging Face and OpenAI LLMs in LangChain
7:13 LangChain Hugging Face LLM
13:51 OpenAI LLMs in LangChain
18:58 Final results from GPT-3

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're going to get started with what will be a series of videos, tutorials, examples,
00:00:06.480 | articles on what is called LangChain.
00:00:10.240 | Now LangChain is a pretty new NLP framework that has become very popular very quickly.
00:00:18.400 | At the core of LangChain you have large language models and the idea behind it is that we can
00:00:24.920 | use the framework to build very cool apps using large language models very quickly.
00:00:31.800 | We can use it for chatbots, generative question answering, summarization, logic loops that
00:00:38.480 | include large language models and web search and all these crazy different things that
00:00:44.080 | we can chain together in some sort of logical fashion.
00:00:48.680 | In this video what we are going to do is just have a quick introduction to LangChain and
00:00:54.840 | how we can use it.
00:00:55.840 | We're going to take a look at the core components of what will make up our chains in LangChain
00:01:02.240 | and we're going to look at some very simple generative language examples using both the
00:01:09.520 | HuggingFace endpoint in LangChain and the OpenAI endpoint in LangChain.
00:01:14.480 | So let's get started by having a look at the, I think the main four components that I believe
00:01:20.360 | need explaining.
00:01:21.880 | So we have prompt templates, large language models, agents and memory.
00:01:26.620 | Now prompt templates are actually pretty straight forward.
00:01:30.760 | They are templates for different types of prompts.
00:01:34.840 | Now let me actually show you a couple of examples from an app I built a while ago.
00:01:39.840 | So in this app here we have all these different styles, so these instructions that we can
00:01:45.360 | pass to a large language model, we have conservative Q&A, so basically we want to answer a question
00:01:51.280 | based on the context below and if the question can't be answered based on the context say
00:01:56.040 | I don't know and then you feed in the context and you feed in these questions.
00:02:00.040 | We have simple instructions, given the common questions and answers below I think we would
00:02:05.440 | be feeding in here which would be the question and these would be the answers.
00:02:12.080 | Extract key libraries and tools, so this is talking about extracting like code libraries
00:02:16.800 | that you would use, so write a list of libraries and tools present in the context below and
00:02:22.080 | then this would basically return items from a database and you'd see from that a set of
00:02:28.720 | libraries that were mentioned in whatever information you've retrieved.
00:02:32.440 | So these are the type of things I mean when we have, when I say we have prompt templates.
00:02:38.120 | Next of course we have large language models, I don't think I really need to explain them,
00:02:41.400 | it's just big models that are capable of doing pretty incredible things like GPT-3, Bloom
00:02:47.720 | and so on.
00:02:48.880 | Next we have agents, now agents are processes that use large language models to decide what
00:02:56.180 | actions should be taken given a particular query or set of instructions or so on.
00:03:01.660 | So these can be paired with tools like web search or calculators and we package them
00:03:06.720 | all into this logical loop of operations, now it sounds pretty complicated so it's probably
00:03:10.800 | best I just show you an example of what this is.
00:03:13.980 | So if we go over to the LangChain website, they have a really cool example in agents,
00:03:18.640 | getting started and we'll just scroll down a little bit and we can see here an example.
00:03:26.360 | So this is the agent executor chain, so there's a few components in here, the first thing
00:03:31.920 | that comes in is a thought from the large language model and so we're basing it on this
00:03:37.760 | query here, who is Olivia Wilde's boyfriend, what is his current age raised to 0.23 power.
00:03:44.160 | So there's a few logical steps in this process and this is why we might need to use something
00:03:48.120 | like this.
00:03:49.120 | So the model, the large language model says okay from this, it's thought is I need to
00:03:55.200 | find out who Olivia Wilde's boyfriend is and then calculate his age raised to 0.23 power,
00:04:01.040 | the action here that the agent is deciding is search, okay and then it decides okay the
00:04:08.460 | input for this search action must initially be Olivia Wilde's boyfriend.
00:04:14.600 | Now this here, so this defines that we're going to use a web search component, it goes
00:04:20.200 | to the web search component, types this in and the result that it gets is this, Harry
00:04:25.200 | Styles, so that's the observation based on what we have so far and this is part of a
00:04:30.520 | specific agent framework called React and at some point in the future we will definitely
00:04:35.960 | go into that into a lot more detail.
00:04:38.120 | For now let's continue with this, based on this observation the language model now thinks
00:04:44.000 | okay I need to find out Harry Styles' age, so it starts the search again, it searches
00:04:49.060 | for his age, it gets 28 years and then the next thought is I need to calculate 28 raised
00:04:54.400 | to the 0.23 power, goes to the calculator action this time, so not search, it calculates
00:04:59.960 | this and we get the answer here, okay and then the final thought is I know the final
00:05:04.640 | answer, the final answer is this, okay so that's an example of one of these agents using
00:05:11.600 | multiple tools, in here we have the calculator and also the search tool as well.
00:05:17.280 | So I think they are a pretty exciting and cool use of language.
00:05:23.000 | Now and then the final one is memory, so we have short-term, long-term memory for our
00:05:27.240 | models, now again this is really interesting, for long-term memory if you have watched my
00:05:34.800 | videos, if you've read my articles or anything like that in the past you have probably come
00:05:39.000 | across it.
00:05:40.000 | We're going to take a look at this getting started, we have this conversation buffer
00:05:44.240 | memory which essentially you would use in a chatbot and it will just remember all the
00:05:50.660 | previous inputs and outputs and adds them into your next set of generations, so this
00:05:56.240 | is what you would use to have a conversation with a chatbot where it's remembering the
00:06:02.720 | previous steps of that conversation, there's some different versions of that like conversation
00:06:08.200 | summary memory and all of this is essentially what I would refer to as the short-term memory
00:06:15.120 | and then on the other side, so for long-term memory you have the data augmented generation
00:06:20.640 | stuff which is essentially where you're retrieving bits of information to feed into your model
00:06:26.440 | from an external data source and that would just allow it to essentially answer questions
00:06:33.960 | in a specific domain better or keep more up-to-date information or simply allow us to fact check
00:06:41.300 | what the large language model is actually saying.
00:06:43.920 | Now they're the main components to LangChain and what we'll do now is actually just get
00:06:49.160 | started and we're going to do something really simple which is just using large language
00:06:52.880 | models in LangChain, so to get started we need to just pip install LangChain, so this
00:06:58.720 | will obviously install the library and what we will do is just go through some really
00:07:03.720 | basic examples of using LangChain for large language model generation with both OpenAI
00:07:10.960 | and HuggingFace, so let's get started with HuggingFace, now if you would like to follow
00:07:17.240 | along with this I'll leave a link to this notebook in the top right of the video right
00:07:22.440 | now or you can click a link in the video description to take you to this colab, so with HuggingFace
00:07:30.200 | we need to install the HuggingFace hub as a prerequisite and what's actually going to
00:07:35.720 | happen here is we're not going to be running HuggingFace models locally, we're actually
00:07:39.560 | going to be calling their inference API and we're going to be getting results directly
00:07:45.000 | from that, so to do that we actually do need a HuggingFace API token and this is all free
00:07:52.600 | by the way, so to get that we need to go to HuggingFace.co and if you don't have an account
00:08:02.240 | you'll need to sign up for one, I believe the sign up will be over here on the top right
00:08:06.360 | of the web page, you need to click here if you have signed in and you need to go to settings
00:08:13.280 | then you head over to access tokens and you will need to get, I think you can actually
00:08:19.200 | just use a read token but a write token you can use as well, in either case if this is
00:08:24.840 | your first time you will need to click new token, either choose read or write, I'm going
00:08:30.360 | to go with write because I know that one does definitely work, you just have to write something
00:08:34.360 | in here and then you click generate token, I've already created mine so I'm just going
00:08:38.680 | to copy this okay and then with that you would just put it into here, now I've already set
00:08:46.680 | my environment variable here so I'm not going to do it again and then we can come down and
00:08:53.760 | we can start generating text using a HuggingFace model from HuggingFace hub, so there are a
00:09:00.600 | few things that we'll need for this, we're going to be using a prompt template which
00:09:04.400 | is a template for our prompt as I mentioned before, we're going to be using HuggingFace
00:09:09.200 | hub class from LangChain and we're also going to be using this chain which is like a pipeline
00:09:16.360 | or a chain of steps from LangChain, now this one is pretty simple it's just a prompt template
00:09:24.280 | so you create your prompt based on this prompt template and then you generate your text using
00:09:30.120 | your large language model, now we are going to be initializing a large language model
00:09:36.560 | from HuggingFace and for that we are going to be using this model here, now this model
00:09:43.120 | if we go over to HuggingFace we can click on here type in Flan and you can see there
00:09:52.160 | are a few different models here, the Google Flan T5 XL is not the biggest but it is the
00:09:59.200 | biggest that will work on the free tier of inference here, so that's what we're using,
00:10:10.520 | if you try and use the XL model it will I think more likely not time out, at least it
00:10:16.480 | did for me, so with that in mind we initialize the model, we set the randomness or the temperature
00:10:24.480 | of the model to be very low so that we get relatively stable results, if you want more
00:10:29.960 | creative writing you would want to increase this value and then we create our template,
00:10:35.180 | so our template is going to just be very simple it's going to be a question answering template
00:10:38.840 | where you have a question, now this is our input variable that we'll be using in the
00:10:44.640 | template and then we have our answer and then the model will essentially continue from this
00:10:49.360 | point, so with that we use our prompt template, we use this template here and we just say
00:10:56.520 | the input variables within this template is a question, because this isn't an F string
00:11:02.480 | here, if it was an S string it would look different, it's actually just a string, so
00:11:07.760 | here we're saying whatever the question input is we're going to put it here, then we create
00:11:13.920 | our chain prompt followed by our large language model and then we're going to ask the question
00:11:19.520 | which NFL team won the Super Bowl in the 2010 season and we're just going to print that
00:11:24.480 | out so I'm going to run this, okay and we get Green Bay Packers, now if we would like
00:11:29.720 | to ask multiple questions together we have to do this, so we get like a list of dictionaries,
00:11:36.960 | within each one of those dictionaries we need to have the input variables, so if we had
00:11:40.560 | multiple input variables we would pass them into here, so this question is going to be
00:11:46.040 | mapped to question in our template and now I'm going to ask the question, so the first
00:11:50.400 | one same thing again, now I'm going to ask a bit more of like a logical question here,
00:11:55.720 | some more facts and again like common sense and we can run these, I think this model doesn't
00:12:03.160 | actually do so well with these, so we have this kind of like format here, first one I
00:12:08.680 | believe is correct, the second one 184 centimeters which is not true it should be about 193 centimeters
00:12:17.840 | for this one so who's the 12th person on the moon, it's saying John Glenn who never went
00:12:24.120 | to the moon and then how many eyes does the blade of grass have, apparently it has one,
00:12:31.320 | so you know this model is, it's not the biggest model, it's somewhat limited and there are
00:12:38.400 | other models that are open source that will perform much better like bloom but when we're
00:12:44.240 | using this endpoint here without running these locally we are kind of restricted in size
00:12:51.540 | to this model, so that's what we have there, one other thing now obviously these haven't
00:12:59.920 | performed so well so these are not very likely to perform well either but what we can do
00:13:05.240 | with a lot of large language models is we can actually feed in all these questions at
00:13:10.280 | once so we wouldn't need to do this like iteratively calling the large language model and asking
00:13:15.560 | it one question and then another question and another question, some of the better large
00:13:19.920 | language models as we'll see soon would be able to handle them all at once, now in this
00:13:24.680 | case we'll see it doesn't quite work but we'll see later that there are models that can do
00:13:31.360 | that, so the only thing that changed here is I changed my template so I said answer
00:13:36.120 | the following questions one at a time, pass in those questions and then try to get some
00:13:41.280 | answers, the model didn't really listen to me so it just kind of did its own thing, nonetheless
00:13:48.040 | that is what we got for that one, now let's compare that to the OpenAI approach of doing
00:13:54.960 | things, now for this we again need another prerequisite which is the OpenAI library,
00:14:01.480 | just say pip install OpenAI and we come down here, we will also need to pass in our OpenAI
00:14:09.000 | API key, let me just show you how to get that quickly, so if we go OpenAI it was at betaopenai.com
00:14:17.360 | but I think they've changed it recently so it's no longer beta, it's just openai.com/api,
00:14:24.520 | you come over here to log in or sign up if you don't have an account, once you have logged
00:14:31.960 | in you have to head over to the right here, go to account, come down and I think we need
00:14:39.600 | to just go to settings, API keys and then you can create a new secret key here so you
00:14:46.600 | just click create new secret key, okay for me it doesn't actually let me create another
00:14:50.960 | one because I already have too many here but that's fine you just create your new secret
00:14:55.680 | key and then you just copy it, for me I have already added it to my environment variables
00:15:02.120 | in OpenAI API key so I don't need to rerun that, one thing is if you are using OpenAI
00:15:10.520 | via Azure you should also set these things as well so you should set that you're using
00:15:15.720 | Azure here, the OpenAI API version, Azure has several API versions apparently so you
00:15:23.440 | will need to set that and then you also need to set the URL for your Azure OpenAI resource
00:15:30.520 | here as well and then here I don't think that's relevant so skip that and after that so we
00:15:40.320 | need to decide on which model we're going to use, we're going to be using the text DaVinci
00:15:44.280 | 0.0.3 model which is one of the better generation models from OpenAI runners, okay and that
00:15:52.360 | is our large language model, that is our lang chain, large language model there, we can
00:15:57.440 | actually generate stuff with this directly but as we did before we are going to use the
00:16:04.080 | large language model chain, again if you're using Azure you'll need to follow this step
00:16:09.240 | here rather than what I just did, so come down here we use large language model chain
00:16:14.480 | again and we're using the same prompt as what we initially created before so this is a simple
00:16:19.600 | question answer prompt, large language model is this time DaVinci and I'm going to run
00:16:25.840 | this and we get this answer so the Green Bay Packers won the Super Bowl in the 2010 season
00:16:31.400 | so a little more descriptive than the answer we got from our T5 Flan model but that's to
00:16:37.320 | be expected, the OpenAI's DaVinci model is a lot bigger and pretty advanced so after
00:16:44.760 | that let's try again with multiple questions, let's see what we get, okay so we get the
00:16:51.960 | Green Bay Packers won the Super Bowl in the 2010 season, correct, next we get this which
00:17:00.060 | again is mostly wrong so Eugene A. Cernan was the 12th person to walk on the moon, as
00:17:07.680 | far as I know it is Harrison Smith I think, yeah Harrison Smith so not quite right but
00:17:16.600 | I think the rest of it was very close so Apollo 17 and I'm pretty sure it was December 1972
00:17:25.920 | as well although not 100% sure on that so we can assume that is correct so the Apollo
00:17:31.560 | 17 mission in December 1972 and I think this guy is actually his teammate so I suppose
00:17:38.360 | this would have been the 11th person on the moon so it got, it did get pretty close but
00:17:43.280 | not quite there and that is actually the third question I've skipped one by accident, if
00:17:47.360 | I'm 6 foot 4 inches how tall am I in centimetres, very specific we've got 193.04 centimetres
00:17:56.960 | which is probably like the exact measurement but I know for sure 193 is correct and then
00:18:05.120 | on to the last question, how many eyes does a blade of grass have, we get a blade of grass
00:18:09.880 | does not have any eyes, okay so we get a sensible answer this time and then I wanted to very
00:18:15.240 | quickly just show this, so this is a list of items and when passing this to the large
00:18:19.640 | items model chain dot run this is actually incorrect so this is actually just going to
00:18:24.240 | see all of this as a single string, now in this case with our model, with our DaVinci
00:18:29.240 | model it does pretty well, still got this one wrong but it's actually able to manage
00:18:35.280 | with this even though it's not in the correct format or when asking these questions one
00:18:39.840 | by one and you can see it actually does get, sometimes it gets the correct answer and sometimes
00:18:45.400 | it messes other questions up which is I think pretty interesting to see, now the final one
00:18:53.240 | is what we did before so where we come up to here where we have the string and let's
00:19:02.560 | go ahead and do that, bring it down here, so I'm going to answer multiple questions
00:19:12.440 | in a single string, a large items model actually I want to be using DaVinci, okay and we get
00:19:19.600 | this so the green, Bay Packers won the Superbowl in the 2010 season, I am 193 centimeters tall,
00:19:26.400 | yep Edwin Buzz Aldrin wrong and Ablade Agresta does not have eyes, so we get some good answers
00:19:33.200 | there, okay so that's it for this very quick introduction to Langtrain, as I said in the
00:19:40.600 | future we're going to be covering this library in a lot more detail and as you've already
00:19:44.920 | seen at the start of the video there are some pretty interesting things we can do with this
00:19:49.120 | library very easily, but for now that's it for this video I hope all this been interesting
00:19:55.440 | and useful, so thank you very much for watching and I will see you again in the next one,
00:20:01.480 | [Music]
00:20:02.480 | [Music]
00:20:03.480 | [Music]
00:20:03.480 | [Music]
00:20:08.480 | [Music]
00:20:13.480 | [Music]