Getting Started with GPT-3 vs. Open Source LLMs

00:00:00.000 | Today we're going to get started with what will be a series of videos, tutorials, examples,

00:00:06.480 | articles on what is called LangChain.

00:00:10.240 | Now LangChain is a pretty new NLP framework that has become very popular very quickly.

00:00:18.400 | At the core of LangChain you have large language models and the idea behind it is that we can

00:00:24.920 | use the framework to build very cool apps using large language models very quickly.

00:00:31.800 | We can use it for chatbots, generative question answering, summarization, logic loops that

00:00:38.480 | include large language models and web search and all these crazy different things that

00:00:44.080 | we can chain together in some sort of logical fashion.

00:00:48.680 | In this video what we are going to do is just have a quick introduction to LangChain and

00:00:54.840 | how we can use it.

00:00:55.840 | We're going to take a look at the core components of what will make up our chains in LangChain

00:01:02.240 | and we're going to look at some very simple generative language examples using both the

00:01:09.520 | HuggingFace endpoint in LangChain and the OpenAI endpoint in LangChain.

00:01:14.480 | So let's get started by having a look at the, I think the main four components that I believe

00:01:20.360 | need explaining.

00:01:21.880 | So we have prompt templates, large language models, agents and memory.

00:01:26.620 | Now prompt templates are actually pretty straight forward.

00:01:30.760 | They are templates for different types of prompts.

00:01:34.840 | Now let me actually show you a couple of examples from an app I built a while ago.

00:01:39.840 | So in this app here we have all these different styles, so these instructions that we can

00:01:45.360 | pass to a large language model, we have conservative Q&A, so basically we want to answer a question

00:01:51.280 | based on the context below and if the question can't be answered based on the context say

00:01:56.040 | I don't know and then you feed in the context and you feed in these questions.

00:02:00.040 | We have simple instructions, given the common questions and answers below I think we would

00:02:05.440 | be feeding in here which would be the question and these would be the answers.

00:02:12.080 | Extract key libraries and tools, so this is talking about extracting like code libraries

00:02:16.800 | that you would use, so write a list of libraries and tools present in the context below and

00:02:22.080 | then this would basically return items from a database and you'd see from that a set of

00:02:28.720 | libraries that were mentioned in whatever information you've retrieved.

00:02:32.440 | So these are the type of things I mean when we have, when I say we have prompt templates.

00:02:38.120 | Next of course we have large language models, I don't think I really need to explain them,

00:02:41.400 | it's just big models that are capable of doing pretty incredible things like GPT-3, Bloom

00:02:47.720 | and so on.

00:02:48.880 | Next we have agents, now agents are processes that use large language models to decide what

00:02:56.180 | actions should be taken given a particular query or set of instructions or so on.

00:03:01.660 | So these can be paired with tools like web search or calculators and we package them

00:03:06.720 | all into this logical loop of operations, now it sounds pretty complicated so it's probably

00:03:10.800 | best I just show you an example of what this is.

00:03:13.980 | So if we go over to the LangChain website, they have a really cool example in agents,

00:03:18.640 | getting started and we'll just scroll down a little bit and we can see here an example.

00:03:26.360 | So this is the agent executor chain, so there's a few components in here, the first thing

00:03:31.920 | that comes in is a thought from the large language model and so we're basing it on this

00:03:37.760 | query here, who is Olivia Wilde's boyfriend, what is his current age raised to 0.23 power.

00:03:44.160 | So there's a few logical steps in this process and this is why we might need to use something

00:03:48.120 | like this.

00:03:49.120 | So the model, the large language model says okay from this, it's thought is I need to

00:03:55.200 | find out who Olivia Wilde's boyfriend is and then calculate his age raised to 0.23 power,

00:04:01.040 | the action here that the agent is deciding is search, okay and then it decides okay the

00:04:08.460 | input for this search action must initially be Olivia Wilde's boyfriend.

00:04:14.600 | Now this here, so this defines that we're going to use a web search component, it goes

00:04:20.200 | to the web search component, types this in and the result that it gets is this, Harry

00:04:25.200 | Styles, so that's the observation based on what we have so far and this is part of a

00:04:30.520 | specific agent framework called React and at some point in the future we will definitely

00:04:35.960 | go into that into a lot more detail.

00:04:38.120 | For now let's continue with this, based on this observation the language model now thinks

00:04:44.000 | okay I need to find out Harry Styles' age, so it starts the search again, it searches

00:04:49.060 | for his age, it gets 28 years and then the next thought is I need to calculate 28 raised

00:04:54.400 | to the 0.23 power, goes to the calculator action this time, so not search, it calculates

00:04:59.960 | this and we get the answer here, okay and then the final thought is I know the final

00:05:04.640 | answer, the final answer is this, okay so that's an example of one of these agents using

00:05:11.600 | multiple tools, in here we have the calculator and also the search tool as well.

00:05:17.280 | So I think they are a pretty exciting and cool use of language.

00:05:23.000 | Now and then the final one is memory, so we have short-term, long-term memory for our

00:05:27.240 | models, now again this is really interesting, for long-term memory if you have watched my

00:05:34.800 | videos, if you've read my articles or anything like that in the past you have probably come

00:05:39.000 | across it.

00:05:40.000 | We're going to take a look at this getting started, we have this conversation buffer

00:05:44.240 | memory which essentially you would use in a chatbot and it will just remember all the

00:05:50.660 | previous inputs and outputs and adds them into your next set of generations, so this

00:05:56.240 | is what you would use to have a conversation with a chatbot where it's remembering the

00:06:02.720 | previous steps of that conversation, there's some different versions of that like conversation

00:06:08.200 | summary memory and all of this is essentially what I would refer to as the short-term memory

00:06:15.120 | and then on the other side, so for long-term memory you have the data augmented generation

00:06:20.640 | stuff which is essentially where you're retrieving bits of information to feed into your model

00:06:26.440 | from an external data source and that would just allow it to essentially answer questions

00:06:33.960 | in a specific domain better or keep more up-to-date information or simply allow us to fact check

00:06:41.300 | what the large language model is actually saying.

00:06:43.920 | Now they're the main components to LangChain and what we'll do now is actually just get

00:06:49.160 | started and we're going to do something really simple which is just using large language

00:06:52.880 | models in LangChain, so to get started we need to just pip install LangChain, so this

00:06:58.720 | will obviously install the library and what we will do is just go through some really

00:07:03.720 | basic examples of using LangChain for large language model generation with both OpenAI

00:07:10.960 | and HuggingFace, so let's get started with HuggingFace, now if you would like to follow

00:07:17.240 | along with this I'll leave a link to this notebook in the top right of the video right

00:07:22.440 | now or you can click a link in the video description to take you to this colab, so with HuggingFace

00:07:30.200 | we need to install the HuggingFace hub as a prerequisite and what's actually going to

00:07:35.720 | happen here is we're not going to be running HuggingFace models locally, we're actually

00:07:39.560 | going to be calling their inference API and we're going to be getting results directly

00:07:45.000 | from that, so to do that we actually do need a HuggingFace API token and this is all free

00:07:52.600 | by the way, so to get that we need to go to HuggingFace.co and if you don't have an account

00:08:02.240 | you'll need to sign up for one, I believe the sign up will be over here on the top right

00:08:06.360 | of the web page, you need to click here if you have signed in and you need to go to settings

00:08:13.280 | then you head over to access tokens and you will need to get, I think you can actually

00:08:19.200 | just use a read token but a write token you can use as well, in either case if this is

00:08:24.840 | your first time you will need to click new token, either choose read or write, I'm going

00:08:30.360 | to go with write because I know that one does definitely work, you just have to write something

00:08:34.360 | in here and then you click generate token, I've already created mine so I'm just going

00:08:38.680 | to copy this okay and then with that you would just put it into here, now I've already set

00:08:46.680 | my environment variable here so I'm not going to do it again and then we can come down and

00:08:53.760 | we can start generating text using a HuggingFace model from HuggingFace hub, so there are a

00:09:00.600 | few things that we'll need for this, we're going to be using a prompt template which

00:09:04.400 | is a template for our prompt as I mentioned before, we're going to be using HuggingFace

00:09:09.200 | hub class from LangChain and we're also going to be using this chain which is like a pipeline

00:09:16.360 | or a chain of steps from LangChain, now this one is pretty simple it's just a prompt template

00:09:24.280 | so you create your prompt based on this prompt template and then you generate your text using

00:09:30.120 | your large language model, now we are going to be initializing a large language model

00:09:36.560 | from HuggingFace and for that we are going to be using this model here, now this model

00:09:43.120 | if we go over to HuggingFace we can click on here type in Flan and you can see there

00:09:52.160 | are a few different models here, the Google Flan T5 XL is not the biggest but it is the

00:09:59.200 | biggest that will work on the free tier of inference here, so that's what we're using,

00:10:10.520 | if you try and use the XL model it will I think more likely not time out, at least it

00:10:16.480 | did for me, so with that in mind we initialize the model, we set the randomness or the temperature

00:10:24.480 | of the model to be very low so that we get relatively stable results, if you want more

00:10:29.960 | creative writing you would want to increase this value and then we create our template,

00:10:35.180 | so our template is going to just be very simple it's going to be a question answering template

00:10:38.840 | where you have a question, now this is our input variable that we'll be using in the

00:10:44.640 | template and then we have our answer and then the model will essentially continue from this

00:10:49.360 | point, so with that we use our prompt template, we use this template here and we just say

00:10:56.520 | the input variables within this template is a question, because this isn't an F string

00:11:02.480 | here, if it was an S string it would look different, it's actually just a string, so

00:11:07.760 | here we're saying whatever the question input is we're going to put it here, then we create

00:11:13.920 | our chain prompt followed by our large language model and then we're going to ask the question

00:11:19.520 | which NFL team won the Super Bowl in the 2010 season and we're just going to print that

00:11:24.480 | out so I'm going to run this, okay and we get Green Bay Packers, now if we would like

00:11:29.720 | to ask multiple questions together we have to do this, so we get like a list of dictionaries,

00:11:36.960 | within each one of those dictionaries we need to have the input variables, so if we had

00:11:40.560 | multiple input variables we would pass them into here, so this question is going to be

00:11:46.040 | mapped to question in our template and now I'm going to ask the question, so the first

00:11:50.400 | one same thing again, now I'm going to ask a bit more of like a logical question here,

00:11:55.720 | some more facts and again like common sense and we can run these, I think this model doesn't

00:12:03.160 | actually do so well with these, so we have this kind of like format here, first one I

00:12:08.680 | believe is correct, the second one 184 centimeters which is not true it should be about 193 centimeters

00:12:17.840 | for this one so who's the 12th person on the moon, it's saying John Glenn who never went

00:12:24.120 | to the moon and then how many eyes does the blade of grass have, apparently it has one,

00:12:31.320 | so you know this model is, it's not the biggest model, it's somewhat limited and there are

00:12:38.400 | other models that are open source that will perform much better like bloom but when we're

00:12:44.240 | using this endpoint here without running these locally we are kind of restricted in size

00:12:51.540 | to this model, so that's what we have there, one other thing now obviously these haven't

00:12:59.920 | performed so well so these are not very likely to perform well either but what we can do

00:13:05.240 | with a lot of large language models is we can actually feed in all these questions at

00:13:10.280 | once so we wouldn't need to do this like iteratively calling the large language model and asking

00:13:15.560 | it one question and then another question and another question, some of the better large

00:13:19.920 | language models as we'll see soon would be able to handle them all at once, now in this

00:13:24.680 | case we'll see it doesn't quite work but we'll see later that there are models that can do

00:13:31.360 | that, so the only thing that changed here is I changed my template so I said answer

00:13:36.120 | the following questions one at a time, pass in those questions and then try to get some

00:13:41.280 | answers, the model didn't really listen to me so it just kind of did its own thing, nonetheless

00:13:48.040 | that is what we got for that one, now let's compare that to the OpenAI approach of doing

00:13:54.960 | things, now for this we again need another prerequisite which is the OpenAI library,

00:14:01.480 | just say pip install OpenAI and we come down here, we will also need to pass in our OpenAI

00:14:09.000 | API key, let me just show you how to get that quickly, so if we go OpenAI it was at betaopenai.com

00:14:17.360 | but I think they've changed it recently so it's no longer beta, it's just openai.com/api,

00:14:24.520 | you come over here to log in or sign up if you don't have an account, once you have logged

00:14:31.960 | in you have to head over to the right here, go to account, come down and I think we need

00:14:39.600 | to just go to settings, API keys and then you can create a new secret key here so you

00:14:46.600 | just click create new secret key, okay for me it doesn't actually let me create another

00:14:50.960 | one because I already have too many here but that's fine you just create your new secret

00:14:55.680 | key and then you just copy it, for me I have already added it to my environment variables

00:15:02.120 | in OpenAI API key so I don't need to rerun that, one thing is if you are using OpenAI

00:15:10.520 | via Azure you should also set these things as well so you should set that you're using

00:15:15.720 | Azure here, the OpenAI API version, Azure has several API versions apparently so you

00:15:23.440 | will need to set that and then you also need to set the URL for your Azure OpenAI resource

00:15:30.520 | here as well and then here I don't think that's relevant so skip that and after that so we

00:15:40.320 | need to decide on which model we're going to use, we're going to be using the text DaVinci

00:15:44.280 | 0.0.3 model which is one of the better generation models from OpenAI runners, okay and that

00:15:52.360 | is our large language model, that is our lang chain, large language model there, we can

00:15:57.440 | actually generate stuff with this directly but as we did before we are going to use the

00:16:04.080 | large language model chain, again if you're using Azure you'll need to follow this step

00:16:09.240 | here rather than what I just did, so come down here we use large language model chain

00:16:14.480 | again and we're using the same prompt as what we initially created before so this is a simple

00:16:19.600 | question answer prompt, large language model is this time DaVinci and I'm going to run

00:16:25.840 | this and we get this answer so the Green Bay Packers won the Super Bowl in the 2010 season

00:16:31.400 | so a little more descriptive than the answer we got from our T5 Flan model but that's to

00:16:37.320 | be expected, the OpenAI's DaVinci model is a lot bigger and pretty advanced so after

00:16:44.760 | that let's try again with multiple questions, let's see what we get, okay so we get the

00:16:51.960 | Green Bay Packers won the Super Bowl in the 2010 season, correct, next we get this which

00:17:00.060 | again is mostly wrong so Eugene A. Cernan was the 12th person to walk on the moon, as

00:17:07.680 | far as I know it is Harrison Smith I think, yeah Harrison Smith so not quite right but

00:17:16.600 | I think the rest of it was very close so Apollo 17 and I'm pretty sure it was December 1972

00:17:25.920 | as well although not 100% sure on that so we can assume that is correct so the Apollo

00:17:31.560 | 17 mission in December 1972 and I think this guy is actually his teammate so I suppose

00:17:38.360 | this would have been the 11th person on the moon so it got, it did get pretty close but

00:17:43.280 | not quite there and that is actually the third question I've skipped one by accident, if

00:17:47.360 | I'm 6 foot 4 inches how tall am I in centimetres, very specific we've got 193.04 centimetres

00:17:56.960 | which is probably like the exact measurement but I know for sure 193 is correct and then

00:18:05.120 | on to the last question, how many eyes does a blade of grass have, we get a blade of grass

00:18:09.880 | does not have any eyes, okay so we get a sensible answer this time and then I wanted to very

00:18:15.240 | quickly just show this, so this is a list of items and when passing this to the large

00:18:19.640 | items model chain dot run this is actually incorrect so this is actually just going to

00:18:24.240 | see all of this as a single string, now in this case with our model, with our DaVinci

00:18:29.240 | model it does pretty well, still got this one wrong but it's actually able to manage

00:18:35.280 | with this even though it's not in the correct format or when asking these questions one

00:18:39.840 | by one and you can see it actually does get, sometimes it gets the correct answer and sometimes

00:18:45.400 | it messes other questions up which is I think pretty interesting to see, now the final one

00:18:53.240 | is what we did before so where we come up to here where we have the string and let's

00:19:02.560 | go ahead and do that, bring it down here, so I'm going to answer multiple questions

00:19:12.440 | in a single string, a large items model actually I want to be using DaVinci, okay and we get

00:19:19.600 | this so the green, Bay Packers won the Superbowl in the 2010 season, I am 193 centimeters tall,

00:19:26.400 | yep Edwin Buzz Aldrin wrong and Ablade Agresta does not have eyes, so we get some good answers

00:19:33.200 | there, okay so that's it for this very quick introduction to Langtrain, as I said in the

00:19:40.600 | future we're going to be covering this library in a lot more detail and as you've already

00:19:44.920 | seen at the start of the video there are some pretty interesting things we can do with this

00:19:49.120 | library very easily, but for now that's it for this video I hope all this been interesting

00:19:55.440 | and useful, so thank you very much for watching and I will see you again in the next one,

00:20:01.200 | bye.

00:20:01.480 | [Music]

00:20:02.480 | [Music]

00:20:03.480 | [Music]

00:20:08.480 | [Music]

00:20:13.480 | [Music]

Getting Started with GPT-3 vs. Open Source LLMs - LangChain #1

Chapters