back to indexCreate Custom Tools for Chatbots in LangChain — LangChain #8
Chapters
0:0 LangChain agents and tools
1:46 What are LLM tools
3:12 Code notebook setup and prerequisites
5:58 Building a simple LangChain calculator tool
10:8 Updating agent prompts
12:14 Building tools with multiple parameters
15:40 Helping ChatGPT understand images
23:5 What else can LangChain agents do
00:00:00.000 |
large language models and conversational agents are one of the most interesting technologies to 00:00:07.040 |
have proliferated recently, specifically agents and their use of tools. So using agents we 00:00:16.640 |
essentially just give a large language model access to different tools and tools can mean 00:00:22.080 |
a lot of things as I will explain in a moment. By using agents and tools we have a pretty much 00:00:30.240 |
infinite potential in what we can actually do with large language models. They allow us to search web, 00:00:38.080 |
run code, do maths and a lot more. Now we can actually get pretty far with some of the pre-built 00:00:44.880 |
tools that are available through lang chain but in a lot of use cases we will probably need to 00:00:52.160 |
either modify those existing tools in some way or actually just build completely custom new tools 00:00:59.360 |
to fit to our particular use case. So that's exactly what we're going to talk about in this video. 00:01:05.440 |
We're going to take a look at a few pretty simple examples to get started of tools that we can 00:01:11.840 |
that we can build before moving on to what I think is a more interesting example of a tool 00:01:17.840 |
that takes inspiration from the recent Hugging GPT paper and will essentially give our 00:01:25.680 |
ChatGPT large language model access to another deep learning model which will allow it to actually 00:01:33.680 |
look at images and caption them. So that should be pretty interesting but before we get into that 00:01:40.560 |
let's just define kind of what I mean by a tool very quickly. So we have our large language models 00:01:49.360 |
and by default they just kind of exist in isolation. You put in some text and you generate 00:01:56.320 |
some text. You get some output text from them. Tools can be a lot of different things 00:02:02.480 |
but the core is that they are something, this box in the middle here, and that something 00:02:09.680 |
takes in some like a string or a couple of parameters and it outputs a string or a couple 00:02:17.920 |
of parameters. Now the input to this tool is going to be controlled by our large language model 00:02:25.920 |
and then the output actually goes back to our large language model. So it's just a way of 00:02:34.000 |
giving new abilities to our large language models. You can think of maths for example. 00:02:40.000 |
Large language models are very bad at maths but we can actually just tell it okay you're bad at maths 00:02:46.080 |
just use this tool and you can calculate whatever you want, right? That is basically what tools are. 00:02:53.520 |
They are anything that we can plug into our large language model and that anything is pretty flexible 00:03:01.120 |
on what we're using. We just need to figure out a way of inputting and outputting relevant information 00:03:08.560 |
that our large language model can understand. So let's get started with our first example. We're 00:03:15.040 |
going to be using this notebook here so there will be a link to this notebook you can follow along 00:03:20.080 |
with me if you want. We are going to be using, later on I mentioned, we're going to be using 00:03:25.200 |
another ML model as one of our tools. So we're going to be using transformers. 00:03:30.080 |
If you have CUDA enabled GPU this part will run a lot faster for you if you use that GPU or if 00:03:38.480 |
you're on Colab what you can do is you can go to runtime at the top, change runtime type, hardware 00:03:43.760 |
accelerator and make sure you have GPU selected. Okay so we're going to run those prerequisites 00:03:51.440 |
that'll just take a moment for those to install. Okay once they are installed we're going to come 00:03:57.040 |
to our first example. So we're going to build a simple circle circumference calculator tool. 00:04:03.920 |
Okay so one thing that large language models are bad at is maths even when it's something pretty 00:04:09.920 |
simple like this. So what we're doing is we're importing this base tool class from 00:04:15.840 |
line chain tools and we're using that to initialize our circumference tool class. Okay so we're going 00:04:21.520 |
to inherit the the main methods from that base tool. Now there's two important things that we 00:04:28.400 |
define here the name which is just the name of our tool circumference calculator and more 00:04:34.000 |
importantly the description. So the description is a literally natural language description that 00:04:39.520 |
is going to be used by our large language model to decide which tool to use. Okay so we need it 00:04:45.600 |
to describe when to use this tool. So we just say use this tool when you need to calculate a 00:04:51.280 |
circumference using the radius of a circle. Okay it's really clear and straightforward and then we 00:04:58.400 |
need to define these two functions. So we have run and a run. Okay a run is when you would run this 00:05:05.840 |
tool asynchronously that I'm not going to talk about that it's kind of beyond the scope of this 00:05:10.720 |
video. I will talk about this at some point in the future but for now we're going to leave that. 00:05:14.880 |
So we're just going to focus on the standard run function. Okay so run function in this example 00:05:22.240 |
takes a single input radius which is going to be either int or a float and we're just going to 00:05:28.080 |
return the circumference based on the radius of the circle. Okay that's all this is super simple. 00:05:35.200 |
I just want to show you like the structure of one of these tools. Okay so we run that 00:05:40.480 |
and when our large language model decides to run this tool it is going to go to this. Okay 00:05:48.400 |
so now we've defined that let's go ahead and actually initialize our large language model, 00:05:54.640 |
initialize the conversational agent that we're going to use. So to do that we actually first 00:06:00.160 |
need an OpenAI API key. For that you need to go to platform.openai.com and your API key you just 00:06:08.400 |
put it in here or you can set it as a environment variable and this will grab it for you. Then we 00:06:15.440 |
initialize large language model. So we're going to be using chat OpenAI because we're using a 00:06:22.080 |
chat model here so gpt 3.5 turbo. Okay temperature because we're doing some maths this is pretty 00:06:28.960 |
important it is very good to keep your temperature very low. You can think of this as basically 00:06:35.280 |
randomness so when you're doing maths or writing code or anything like that it's a very good idea 00:06:41.200 |
to keep the temperature or randomness of the model as low as possible because you don't want it to be 00:06:47.760 |
random with code it's going to more likely make mistakes. When you do want it to be more creative 00:06:54.480 |
like with creative writing then you would set temperature to one or higher. Then we initialize 00:07:01.200 |
conversational memory so this is just going to allow us to remember the previous five interactions 00:07:07.440 |
between the AI and us in this conversation. The memory key here we set this to chat history because 00:07:15.680 |
that is what will be used by the agent down here. So we just align those two things. I've spoken 00:07:21.360 |
about that in a previous video on a chat in this line chain video series so if you want to look 00:07:29.680 |
more into that I would recommend you take a look at the chat video of this series. So let's run 00:07:35.600 |
that. Okay cool and now what we want to do is initialize the agent. So we're going to be using 00:07:42.640 |
let's run this. So we're going to be using the tools here so the circumference tool 00:07:47.760 |
is the only tool we're using just one. What we need to pass it as a list into the tools parameter 00:07:52.640 |
of initialize agent. The agent we're going to be using is a chat agent which means we're using a 00:07:59.280 |
chat model. It's conversational meaning it has conversational memory. It uses a react framework 00:08:05.920 |
which is like a reasoning and action framework so basically with this the model is able to reason 00:08:13.280 |
about what steps to take and also take actions like use a tool based on those thoughts and 00:08:22.000 |
description is referring to the fact that the large language model decides which tool to use 00:08:27.680 |
based on the description of those tools. Alright we pass in our large language model set verbose 00:08:35.120 |
to true because we're developing this right so we want to see all the information or the 00:08:39.760 |
like everything that is happening. If you are like pushing this out to customers you don't 00:08:45.200 |
necessarily need this and max iterations so this is saying how many steps of reasoning and action 00:08:54.000 |
and observation will you are you allowed to take before we say stop. Right this is important because 00:08:59.200 |
it can get stuck in an infinite loop of just trying tool after tool after tool so we don't 00:09:03.760 |
want it to do that. It's going to cost money for starters and then we also have the early 00:09:09.520 |
stopping method so generate means that the model is going to decide when it should stop. 00:09:15.440 |
Okay cool so I have already run that and then we can go to our first question which is can 00:09:22.720 |
you calculate the circumference of a circle that has a radius of 7.81 millimeters. Let's see what 00:09:30.720 |
it comes up with. Okay so it gets 49.03 millimeters let's take a look what the actual answer is. Okay 00:09:39.680 |
49.07 so it got close but it's it's not really accurate. Okay and if we take a look at why 00:09:45.840 |
this isn't accurate even though we've already passed in that tool it's actually because it 00:09:50.000 |
didn't use the tool. Okay it jumped straight to the final answer so the reason that it's doing that 00:09:56.640 |
is because this this model is actually overly confident in its own ability to do maths and so 00:10:03.200 |
to fix that we basically need to tell it that it's terrible at maths. So we need to update the 00:10:10.640 |
system message so the initial message that the model is given to tell it that it can't do maths. 00:10:17.440 |
So let's have a look at the existing prompt. Okay so it's just system is a large language model 00:10:23.680 |
trained by OpenAI designed to assist with a wide variety of range of tasks so on and so on. 00:10:29.440 |
Basically it's telling it you can do anything right but we don't want it to do anything. We 00:10:34.480 |
want it to not do maths. So I'm just taking that same prompt but I'm adding a new line in saying 00:10:42.000 |
unfortunately Assistant is terrible at maths. When provided with math questions no matter how simple 00:10:48.800 |
Assistant always refers to its trusty tools and absolutely does not try to answer math questions 00:10:55.760 |
by itself. Okay so we just add this in and this is enough for the model to decide okay actually 00:11:03.520 |
I shouldn't just try and guess the answer I should use my circumference calculator tool. 00:11:09.120 |
Okay so we have our new system message here we need to update our prompt so the prompt actually 00:11:17.680 |
contains a system message some other things and also the tools that it has available so we need to 00:11:24.480 |
create a new prompt using both of those things and then we set our prompt to that new prompt. 00:11:32.240 |
Okay now let's try and ask the exact same question again and see what happens. 00:11:37.600 |
Okay great so it's using the circumference calculator this time it inputs 7.81 and the 00:11:44.880 |
output that it returns is 49.07 right which is accurate because it is literally running some 00:11:51.840 |
Python code to calculate that. Okay so then the output is a circumference of a circle with a 00:11:58.800 |
radius of 7.81 is approximately 49.07 millimeters which is much more accurate. Okay that's pretty 00:12:07.520 |
cool now in that example we just use one single input and return a single output. What if we would 00:12:15.760 |
like to use a tool with multiple inputs? Okay we can also do that so we input these these are just 00:12:24.080 |
specific to the tool we're using and let me just describe the tool. Okay so what we're going to do 00:12:31.280 |
is we have our function here and it's a hypotenuse calculator. Okay we can calculate the hypotenuse 00:12:38.720 |
of a triangle in different ways so we depending on what we have right so if we have the adjacent 00:12:45.840 |
side and the opposite side we can use this here so we take the square root of the adjacent side 00:12:51.760 |
squared plus the opposite side squared. If we have an angle and one of those sides we can use this 00:12:59.120 |
here so we can use the adjacent side over the cosine of the angle or the opposite side over 00:13:04.640 |
the sine of the angle. Right so there's multiple options here multiple ways we can go about this. 00:13:10.320 |
So because of this we have multiple inputs and we actually want the large language model to decide 00:13:18.800 |
which of these inputs to use. Right so we're telling it use this tool when you need to 00:13:23.840 |
calculate the length of a hypotenuse given one or two sides of a triangle and or an angle in degrees. 00:13:30.720 |
Okay to use the tool you must provide at least two of the following parameters one adjacent side 00:13:37.120 |
opposite side or angle. Okay so with that that's actually all we need and this should work. Okay 00:13:44.720 |
so again we're not using async run we're just using run. So we run that we set our new tools 00:13:52.160 |
list in here we could also include the previous circumference calculator tool but just you know 00:13:59.600 |
I just want to show you how to build custom tools here so I'm not going to we're just going to 00:14:03.760 |
include the single tool. Then we need to create a new prompt. Okay so we create our new prompt with 00:14:08.960 |
this code here and then we also because we just changed the tools that the agent has access to 00:14:15.280 |
we need to update those in the agent itself. Okay so we also update the agent tools with 00:14:22.080 |
our new tools list. Okay and now we can ask the question if I have a triangle with two sides of 00:14:28.240 |
length 51 and 34 centimeters what is the length of the hypotenuse? Let's try. So it goes into the 00:14:35.120 |
agent execute chain goes to the correct action it inputs the correct items both of the sides and 00:14:43.200 |
there we go we get our answer so 61.92 so the length of the hypotenuse is approximately 61.29 00:14:51.440 |
centimeters. Now let's try this so rather than giving both sides and we're now going to give 00:14:56.880 |
the opposite side and an angle. Okay let's see if that works I haven't double checked logic here so 00:15:04.240 |
I'm not actually sure if this is correct but all we really care about here is showing that the 00:15:11.680 |
inputs and the outputs are being used correctly by the model. Yeah it does opposite side and angle 00:15:17.600 |
and the observation so the calculated output there is 55.86 which we get down here. Okay so that 00:15:28.240 |
seems to be working which is pretty cool. Now both of these are pretty ready simple examples right 00:15:35.840 |
very simple tools. We can do a lot more and what I want to show you now is actually using tools to 00:15:45.040 |
give your GPT model or the large language model access to other deep learning models. Okay because 00:15:53.920 |
large language models can't do everything. Okay we still use other deep learning models for other 00:16:00.080 |
things right so what I want to do is well I'm taking inspiration first from this paper Hugging 00:16:08.000 |
GPT and the idea is that you use a large language model as a controller and that will refer to other 00:16:17.360 |
models and these are all like open source models from Hugging Face. So one they have here is this 00:16:24.800 |
image captioning model so what I thought okay we can try the same. So we're using a different model 00:16:32.400 |
I found the one that they use in that paper at least in that visual wasn't the best so we're 00:16:38.400 |
going to use a more I think it's a more recent model called Blip and we just initialize that 00:16:45.680 |
and we'll use that for image captioning and actually I didn't even update this here right 00:16:52.560 |
so initially I tried with that model let's update this cool so what we do is we from Transformers 00:17:00.480 |
we want to import the processor and the actual model itself the model we're using is this and we 00:17:08.000 |
here we're just saying okay if we have a CUDA enabled GPU available use that rather than CPU 00:17:14.720 |
okay it just means things will be a bit faster. Now the processor that will process any text if 00:17:22.800 |
we pass in any text in this example we don't but it does do that so tokenizes the text 00:17:28.080 |
and it will also pre-process any images okay so Blip expects a specific dimensionality of images 00:17:37.440 |
it expects the image pixels to be normalized so that will be handled by the processor and then 00:17:44.000 |
after that we will pass it to the model here which is Blip for conditional generation. 00:17:48.480 |
Conditional generation is just saying okay given like some text it's going to start generating 00:17:55.760 |
some text and it's going to this model I don't know exactly how it works but I think it uses 00:18:01.360 |
whatever it sees from the images right so converts them into probably a set of an array or a single 00:18:09.760 |
vector and uses that to inform that text generation okay so we initialize that and we move it to our 00:18:19.360 |
CUDA enabled GPU if it's available. Okay cool so if we to generate these captions we actually need 00:18:27.600 |
an image so I'm going to start with this image here I'm going to download it with requests and 00:18:33.280 |
then I'm going to open it with Pill okay so that will just create a Python image object that we can 00:18:38.960 |
then view and this is also the format that the Blip processor expects to be fed. Okay so that 00:18:48.000 |
is loaded it takes a little bit of time and it's just a picture of a orangutan in a tree okay so 00:18:55.040 |
let's go ahead and pass that to our processor here and then what we're going to do is generate 00:19:02.800 |
some some text based on the inputs from this okay we also limit that to 20 tokens so yeah we use 00:19:10.880 |
that by default I think this actually does already use 20 tokens but the default value is to be 00:19:17.360 |
deprecated at some point so you should set the number of tokens you want here and then what we 00:19:24.720 |
do so this will output I think it outputs a like a array of tokens so then we use the processor 00:19:34.560 |
decode to convert those tokens into like human readable text so let's run that and see see what 00:19:43.120 |
it thinks we have okay processor is not defined I need to run this okay so then we come to here 00:19:52.560 |
run this and now we should get our output okay so we get there is a monkey that is sitting in a tree 00:19:59.280 |
which is accurate okay so now what we want to do is just use this like processes logic in our tool 00:20:10.080 |
so that's what we're going to do here so all this code is basically just what we what we just went 00:20:17.040 |
through so I'm not going to go through that again the all we do is we pass in the URL so a string 00:20:23.120 |
which is where we're going to be downloading everything so this should actually be URL 00:20:28.960 |
okay and yeah looks good so the description that we're using here is that we should use this tool 00:20:37.520 |
when given the URL of an image that you'd like to be described it will return a simple caption 00:20:43.920 |
describing the image okay so we're just telling it when to use it and what it does and then we 00:20:49.440 |
reinitialize our tools list okay run this and then so the system message before that we created it 00:20:57.760 |
mentioned that thing about you know unfortunately you're terrible at maths we're not using any math 00:21:03.360 |
tools now so we can we can remove that that's all I've done here and we reinitialize our prompt 00:21:09.600 |
and also reinitialize our tools okay and now we can ask about this image URL here okay so it enters 00:21:19.600 |
into the agent executor chain goes straight to the image captioner passes in the URL and this is the 00:21:27.200 |
same image that we saw before so the observation is that there is a monkey that is sitting in the 00:21:31.760 |
tree and so the final answer is if we come down to here the image shows a monkey sitting in a tree 00:21:38.240 |
okay so now our large language model agent can actually tell us what is in a image which is 00:21:47.280 |
is pretty cool and obviously we can pair this with multiple different deep learning models like they 00:21:52.320 |
did in the Hugging GPT paper let's try a few more so I'm gonna load this image okay so just have 00:22:01.200 |
some guy surfing and if we ask the agent what is in this image let's see what it comes up with 00:22:08.720 |
okay so we get surfer riding a wave in the ocean on a clear day okay so again pretty accurate I 00:22:17.360 |
wouldn't I probably wouldn't bet on it being a clear day but yeah maybe it is okay so let's come 00:22:24.160 |
down to the final one and let's see this one's a little more difficult so let's see what it says 00:22:29.520 |
okay so in this image you can see there's a baby I think it's a I think it's a crocodile 00:22:36.560 |
it's a crocodile with the like narrow snout and it's just like in a in a river sitting on a on a 00:22:42.720 |
log okay so we say what is in this image it goes down this image gives us the observation that 00:22:49.920 |
this is a lizard that is sitting on a tree branch in the water okay so it hasn't been specific about 00:22:55.920 |
whether it's a crocodile or an alligator it's just said a lizard so you know not perfect but I think 00:23:02.800 |
you know it's still still pretty good so naturally I think you can already see there's there's quite 00:23:09.120 |
a lot that we can do with tools and we don't have to restrict ourselves to just a single tool per 00:23:14.400 |
agent as well we can we can just add like a ton of different tools and give our agent you know a ton 00:23:21.760 |
of options on what it can do and that's actually what we see in like the recent auto gbt and baby 00:23:28.480 |
agi models that have kind of gone crazy recently that's what they do there's just like a ton of 00:23:34.960 |
different tools that these models can use they have like an internal dialogue that it goes through 00:23:40.320 |
but a big part of it is actually the number of tools that they can use 00:23:44.720 |
so yeah you can you can really do quite a lot with these and I think for anyone that's actually 00:23:53.200 |
working with large language models and using these you really do need to be using or need to 00:24:01.520 |
get familiar with building tools because they really are like a key component of being able to 00:24:08.960 |
do cool stuff with these models now this is pretty introductory there is a ton of other stuff that we 00:24:16.400 |
can cover with with tools and for sure there will be future videos going into more detail building 00:24:23.360 |
more advanced conversational agents and yeah plenty more so that should be pretty interesting 00:24:31.280 |
but for now we're going to leave it there so I hope this video has been interesting and useful 00:24:38.000 |
thank you very much for watching and I will see you again in the next one. Bye!