Create Custom Tools for Chatbots in LangChain

00:00:00.000 | large language models and conversational agents are one of the most interesting technologies to

00:00:07.040 | have proliferated recently, specifically agents and their use of tools. So using agents we

00:00:16.640 | essentially just give a large language model access to different tools and tools can mean

00:00:22.080 | a lot of things as I will explain in a moment. By using agents and tools we have a pretty much

00:00:30.240 | infinite potential in what we can actually do with large language models. They allow us to search web,

00:00:38.080 | run code, do maths and a lot more. Now we can actually get pretty far with some of the pre-built

00:00:44.880 | tools that are available through lang chain but in a lot of use cases we will probably need to

00:00:52.160 | either modify those existing tools in some way or actually just build completely custom new tools

00:00:59.360 | to fit to our particular use case. So that's exactly what we're going to talk about in this video.

00:01:05.440 | We're going to take a look at a few pretty simple examples to get started of tools that we can

00:01:11.840 | that we can build before moving on to what I think is a more interesting example of a tool

00:01:17.840 | that takes inspiration from the recent Hugging GPT paper and will essentially give our

00:01:25.680 | ChatGPT large language model access to another deep learning model which will allow it to actually

00:01:33.680 | look at images and caption them. So that should be pretty interesting but before we get into that

00:01:40.560 | let's just define kind of what I mean by a tool very quickly. So we have our large language models

00:01:49.360 | and by default they just kind of exist in isolation. You put in some text and you generate

00:01:56.320 | some text. You get some output text from them. Tools can be a lot of different things

00:02:02.480 | but the core is that they are something, this box in the middle here, and that something

00:02:09.680 | takes in some like a string or a couple of parameters and it outputs a string or a couple

00:02:17.920 | of parameters. Now the input to this tool is going to be controlled by our large language model

00:02:25.920 | and then the output actually goes back to our large language model. So it's just a way of

00:02:34.000 | giving new abilities to our large language models. You can think of maths for example.

00:02:40.000 | Large language models are very bad at maths but we can actually just tell it okay you're bad at maths

00:02:46.080 | just use this tool and you can calculate whatever you want, right? That is basically what tools are.

00:02:53.520 | They are anything that we can plug into our large language model and that anything is pretty flexible

00:03:01.120 | on what we're using. We just need to figure out a way of inputting and outputting relevant information

00:03:08.560 | that our large language model can understand. So let's get started with our first example. We're

00:03:15.040 | going to be using this notebook here so there will be a link to this notebook you can follow along

00:03:20.080 | with me if you want. We are going to be using, later on I mentioned, we're going to be using

00:03:25.200 | another ML model as one of our tools. So we're going to be using transformers.

00:03:30.080 | If you have CUDA enabled GPU this part will run a lot faster for you if you use that GPU or if

00:03:38.480 | you're on Colab what you can do is you can go to runtime at the top, change runtime type, hardware

00:03:43.760 | accelerator and make sure you have GPU selected. Okay so we're going to run those prerequisites

00:03:51.440 | that'll just take a moment for those to install. Okay once they are installed we're going to come

00:03:57.040 | to our first example. So we're going to build a simple circle circumference calculator tool.

00:04:03.920 | Okay so one thing that large language models are bad at is maths even when it's something pretty

00:04:09.920 | simple like this. So what we're doing is we're importing this base tool class from

00:04:15.840 | line chain tools and we're using that to initialize our circumference tool class. Okay so we're going

00:04:21.520 | to inherit the the main methods from that base tool. Now there's two important things that we

00:04:28.400 | define here the name which is just the name of our tool circumference calculator and more

00:04:34.000 | importantly the description. So the description is a literally natural language description that

00:04:39.520 | is going to be used by our large language model to decide which tool to use. Okay so we need it

00:04:45.600 | to describe when to use this tool. So we just say use this tool when you need to calculate a

00:04:51.280 | circumference using the radius of a circle. Okay it's really clear and straightforward and then we

00:04:58.400 | need to define these two functions. So we have run and a run. Okay a run is when you would run this

00:05:05.840 | tool asynchronously that I'm not going to talk about that it's kind of beyond the scope of this

00:05:10.720 | video. I will talk about this at some point in the future but for now we're going to leave that.

00:05:14.880 | So we're just going to focus on the standard run function. Okay so run function in this example

00:05:22.240 | takes a single input radius which is going to be either int or a float and we're just going to

00:05:28.080 | return the circumference based on the radius of the circle. Okay that's all this is super simple.

00:05:35.200 | I just want to show you like the structure of one of these tools. Okay so we run that

00:05:40.480 | and when our large language model decides to run this tool it is going to go to this. Okay

00:05:48.400 | so now we've defined that let's go ahead and actually initialize our large language model,

00:05:54.640 | initialize the conversational agent that we're going to use. So to do that we actually first

00:06:00.160 | need an OpenAI API key. For that you need to go to platform.openai.com and your API key you just

00:06:08.400 | put it in here or you can set it as a environment variable and this will grab it for you. Then we

00:06:15.440 | initialize large language model. So we're going to be using chat OpenAI because we're using a

00:06:22.080 | chat model here so gpt 3.5 turbo. Okay temperature because we're doing some maths this is pretty

00:06:28.960 | important it is very good to keep your temperature very low. You can think of this as basically

00:06:35.280 | randomness so when you're doing maths or writing code or anything like that it's a very good idea

00:06:41.200 | to keep the temperature or randomness of the model as low as possible because you don't want it to be

00:06:47.760 | random with code it's going to more likely make mistakes. When you do want it to be more creative

00:06:54.480 | like with creative writing then you would set temperature to one or higher. Then we initialize

00:07:01.200 | conversational memory so this is just going to allow us to remember the previous five interactions

00:07:07.440 | between the AI and us in this conversation. The memory key here we set this to chat history because

00:07:15.680 | that is what will be used by the agent down here. So we just align those two things. I've spoken

00:07:21.360 | about that in a previous video on a chat in this line chain video series so if you want to look

00:07:29.680 | more into that I would recommend you take a look at the chat video of this series. So let's run

00:07:35.600 | that. Okay cool and now what we want to do is initialize the agent. So we're going to be using

00:07:42.640 | let's run this. So we're going to be using the tools here so the circumference tool

00:07:47.760 | is the only tool we're using just one. What we need to pass it as a list into the tools parameter

00:07:52.640 | of initialize agent. The agent we're going to be using is a chat agent which means we're using a

00:07:59.280 | chat model. It's conversational meaning it has conversational memory. It uses a react framework

00:08:05.920 | which is like a reasoning and action framework so basically with this the model is able to reason

00:08:13.280 | about what steps to take and also take actions like use a tool based on those thoughts and

00:08:22.000 | description is referring to the fact that the large language model decides which tool to use

00:08:27.680 | based on the description of those tools. Alright we pass in our large language model set verbose

00:08:35.120 | to true because we're developing this right so we want to see all the information or the

00:08:39.760 | like everything that is happening. If you are like pushing this out to customers you don't

00:08:45.200 | necessarily need this and max iterations so this is saying how many steps of reasoning and action

00:08:54.000 | and observation will you are you allowed to take before we say stop. Right this is important because

00:08:59.200 | it can get stuck in an infinite loop of just trying tool after tool after tool so we don't

00:09:03.760 | want it to do that. It's going to cost money for starters and then we also have the early

00:09:09.520 | stopping method so generate means that the model is going to decide when it should stop.

00:09:15.440 | Okay cool so I have already run that and then we can go to our first question which is can

00:09:22.720 | you calculate the circumference of a circle that has a radius of 7.81 millimeters. Let's see what

00:09:30.720 | it comes up with. Okay so it gets 49.03 millimeters let's take a look what the actual answer is. Okay

00:09:39.680 | 49.07 so it got close but it's it's not really accurate. Okay and if we take a look at why

00:09:45.840 | this isn't accurate even though we've already passed in that tool it's actually because it

00:09:50.000 | didn't use the tool. Okay it jumped straight to the final answer so the reason that it's doing that

00:09:56.640 | is because this this model is actually overly confident in its own ability to do maths and so

00:10:03.200 | to fix that we basically need to tell it that it's terrible at maths. So we need to update the

00:10:10.640 | system message so the initial message that the model is given to tell it that it can't do maths.

00:10:17.440 | So let's have a look at the existing prompt. Okay so it's just system is a large language model

00:10:23.680 | trained by OpenAI designed to assist with a wide variety of range of tasks so on and so on.

00:10:29.440 | Basically it's telling it you can do anything right but we don't want it to do anything. We

00:10:34.480 | want it to not do maths. So I'm just taking that same prompt but I'm adding a new line in saying

00:10:42.000 | unfortunately Assistant is terrible at maths. When provided with math questions no matter how simple

00:10:48.800 | Assistant always refers to its trusty tools and absolutely does not try to answer math questions

00:10:55.760 | by itself. Okay so we just add this in and this is enough for the model to decide okay actually

00:11:03.520 | I shouldn't just try and guess the answer I should use my circumference calculator tool.

00:11:09.120 | Okay so we have our new system message here we need to update our prompt so the prompt actually

00:11:17.680 | contains a system message some other things and also the tools that it has available so we need to

00:11:24.480 | create a new prompt using both of those things and then we set our prompt to that new prompt.

00:11:32.240 | Okay now let's try and ask the exact same question again and see what happens.

00:11:37.600 | Okay great so it's using the circumference calculator this time it inputs 7.81 and the

00:11:44.880 | output that it returns is 49.07 right which is accurate because it is literally running some

00:11:51.840 | Python code to calculate that. Okay so then the output is a circumference of a circle with a

00:11:58.800 | radius of 7.81 is approximately 49.07 millimeters which is much more accurate. Okay that's pretty

00:12:07.520 | cool now in that example we just use one single input and return a single output. What if we would

00:12:15.760 | like to use a tool with multiple inputs? Okay we can also do that so we input these these are just

00:12:24.080 | specific to the tool we're using and let me just describe the tool. Okay so what we're going to do

00:12:31.280 | is we have our function here and it's a hypotenuse calculator. Okay we can calculate the hypotenuse

00:12:38.720 | of a triangle in different ways so we depending on what we have right so if we have the adjacent

00:12:45.840 | side and the opposite side we can use this here so we take the square root of the adjacent side

00:12:51.760 | squared plus the opposite side squared. If we have an angle and one of those sides we can use this

00:12:59.120 | here so we can use the adjacent side over the cosine of the angle or the opposite side over

00:13:04.640 | the sine of the angle. Right so there's multiple options here multiple ways we can go about this.

00:13:10.320 | So because of this we have multiple inputs and we actually want the large language model to decide

00:13:18.800 | which of these inputs to use. Right so we're telling it use this tool when you need to

00:13:23.840 | calculate the length of a hypotenuse given one or two sides of a triangle and or an angle in degrees.

00:13:30.720 | Okay to use the tool you must provide at least two of the following parameters one adjacent side

00:13:37.120 | opposite side or angle. Okay so with that that's actually all we need and this should work. Okay

00:13:44.720 | so again we're not using async run we're just using run. So we run that we set our new tools

00:13:52.160 | list in here we could also include the previous circumference calculator tool but just you know

00:13:59.600 | I just want to show you how to build custom tools here so I'm not going to we're just going to

00:14:03.760 | include the single tool. Then we need to create a new prompt. Okay so we create our new prompt with

00:14:08.960 | this code here and then we also because we just changed the tools that the agent has access to

00:14:15.280 | we need to update those in the agent itself. Okay so we also update the agent tools with

00:14:22.080 | our new tools list. Okay and now we can ask the question if I have a triangle with two sides of

00:14:28.240 | length 51 and 34 centimeters what is the length of the hypotenuse? Let's try. So it goes into the

00:14:35.120 | agent execute chain goes to the correct action it inputs the correct items both of the sides and

00:14:43.200 | there we go we get our answer so 61.92 so the length of the hypotenuse is approximately 61.29

00:14:51.440 | centimeters. Now let's try this so rather than giving both sides and we're now going to give

00:14:56.880 | the opposite side and an angle. Okay let's see if that works I haven't double checked logic here so

00:15:04.240 | I'm not actually sure if this is correct but all we really care about here is showing that the

00:15:11.680 | inputs and the outputs are being used correctly by the model. Yeah it does opposite side and angle

00:15:17.600 | and the observation so the calculated output there is 55.86 which we get down here. Okay so that

00:15:28.240 | seems to be working which is pretty cool. Now both of these are pretty ready simple examples right

00:15:35.840 | very simple tools. We can do a lot more and what I want to show you now is actually using tools to

00:15:45.040 | give your GPT model or the large language model access to other deep learning models. Okay because

00:15:53.920 | large language models can't do everything. Okay we still use other deep learning models for other

00:16:00.080 | things right so what I want to do is well I'm taking inspiration first from this paper Hugging

00:16:08.000 | GPT and the idea is that you use a large language model as a controller and that will refer to other

00:16:17.360 | models and these are all like open source models from Hugging Face. So one they have here is this

00:16:24.800 | image captioning model so what I thought okay we can try the same. So we're using a different model

00:16:32.400 | I found the one that they use in that paper at least in that visual wasn't the best so we're

00:16:38.400 | going to use a more I think it's a more recent model called Blip and we just initialize that

00:16:45.680 | and we'll use that for image captioning and actually I didn't even update this here right

00:16:52.560 | so initially I tried with that model let's update this cool so what we do is we from Transformers

00:17:00.480 | we want to import the processor and the actual model itself the model we're using is this and we

00:17:08.000 | here we're just saying okay if we have a CUDA enabled GPU available use that rather than CPU

00:17:14.720 | okay it just means things will be a bit faster. Now the processor that will process any text if

00:17:22.800 | we pass in any text in this example we don't but it does do that so tokenizes the text

00:17:28.080 | and it will also pre-process any images okay so Blip expects a specific dimensionality of images

00:17:37.440 | it expects the image pixels to be normalized so that will be handled by the processor and then

00:17:44.000 | after that we will pass it to the model here which is Blip for conditional generation.

00:17:48.480 | Conditional generation is just saying okay given like some text it's going to start generating

00:17:55.760 | some text and it's going to this model I don't know exactly how it works but I think it uses

00:18:01.360 | whatever it sees from the images right so converts them into probably a set of an array or a single

00:18:09.760 | vector and uses that to inform that text generation okay so we initialize that and we move it to our

00:18:19.360 | CUDA enabled GPU if it's available. Okay cool so if we to generate these captions we actually need

00:18:27.600 | an image so I'm going to start with this image here I'm going to download it with requests and

00:18:33.280 | then I'm going to open it with Pill okay so that will just create a Python image object that we can

00:18:38.960 | then view and this is also the format that the Blip processor expects to be fed. Okay so that

00:18:48.000 | is loaded it takes a little bit of time and it's just a picture of a orangutan in a tree okay so

00:18:55.040 | let's go ahead and pass that to our processor here and then what we're going to do is generate

00:19:02.800 | some some text based on the inputs from this okay we also limit that to 20 tokens so yeah we use

00:19:10.880 | that by default I think this actually does already use 20 tokens but the default value is to be

00:19:17.360 | deprecated at some point so you should set the number of tokens you want here and then what we

00:19:24.720 | do so this will output I think it outputs a like a array of tokens so then we use the processor

00:19:34.560 | decode to convert those tokens into like human readable text so let's run that and see see what

00:19:43.120 | it thinks we have okay processor is not defined I need to run this okay so then we come to here

00:19:52.560 | run this and now we should get our output okay so we get there is a monkey that is sitting in a tree

00:19:59.280 | which is accurate okay so now what we want to do is just use this like processes logic in our tool

00:20:10.080 | so that's what we're going to do here so all this code is basically just what we what we just went

00:20:17.040 | through so I'm not going to go through that again the all we do is we pass in the URL so a string

00:20:23.120 | which is where we're going to be downloading everything so this should actually be URL

00:20:28.960 | okay and yeah looks good so the description that we're using here is that we should use this tool

00:20:37.520 | when given the URL of an image that you'd like to be described it will return a simple caption

00:20:43.920 | describing the image okay so we're just telling it when to use it and what it does and then we

00:20:49.440 | reinitialize our tools list okay run this and then so the system message before that we created it

00:20:57.760 | mentioned that thing about you know unfortunately you're terrible at maths we're not using any math

00:21:03.360 | tools now so we can we can remove that that's all I've done here and we reinitialize our prompt

00:21:09.600 | and also reinitialize our tools okay and now we can ask about this image URL here okay so it enters

00:21:19.600 | into the agent executor chain goes straight to the image captioner passes in the URL and this is the

00:21:27.200 | same image that we saw before so the observation is that there is a monkey that is sitting in the

00:21:31.760 | tree and so the final answer is if we come down to here the image shows a monkey sitting in a tree

00:21:38.240 | okay so now our large language model agent can actually tell us what is in a image which is

00:21:47.280 | is pretty cool and obviously we can pair this with multiple different deep learning models like they

00:21:52.320 | did in the Hugging GPT paper let's try a few more so I'm gonna load this image okay so just have

00:22:01.200 | some guy surfing and if we ask the agent what is in this image let's see what it comes up with

00:22:08.720 | okay so we get surfer riding a wave in the ocean on a clear day okay so again pretty accurate I

00:22:17.360 | wouldn't I probably wouldn't bet on it being a clear day but yeah maybe it is okay so let's come

00:22:24.160 | down to the final one and let's see this one's a little more difficult so let's see what it says

00:22:29.520 | okay so in this image you can see there's a baby I think it's a I think it's a crocodile

00:22:36.560 | it's a crocodile with the like narrow snout and it's just like in a in a river sitting on a on a

00:22:42.720 | log okay so we say what is in this image it goes down this image gives us the observation that

00:22:49.920 | this is a lizard that is sitting on a tree branch in the water okay so it hasn't been specific about

00:22:55.920 | whether it's a crocodile or an alligator it's just said a lizard so you know not perfect but I think

00:23:02.800 | you know it's still still pretty good so naturally I think you can already see there's there's quite

00:23:09.120 | a lot that we can do with tools and we don't have to restrict ourselves to just a single tool per

00:23:14.400 | agent as well we can we can just add like a ton of different tools and give our agent you know a ton

00:23:21.760 | of options on what it can do and that's actually what we see in like the recent auto gbt and baby

00:23:28.480 | agi models that have kind of gone crazy recently that's what they do there's just like a ton of

00:23:34.960 | different tools that these models can use they have like an internal dialogue that it goes through

00:23:40.320 | but a big part of it is actually the number of tools that they can use

00:23:44.720 | so yeah you can you can really do quite a lot with these and I think for anyone that's actually

00:23:53.200 | working with large language models and using these you really do need to be using or need to

00:24:01.520 | get familiar with building tools because they really are like a key component of being able to

00:24:08.960 | do cool stuff with these models now this is pretty introductory there is a ton of other stuff that we

00:24:16.400 | can cover with with tools and for sure there will be future videos going into more detail building

00:24:23.360 | more advanced conversational agents and yeah plenty more so that should be pretty interesting

00:24:31.280 | but for now we're going to leave it there so I hope this video has been interesting and useful

00:24:38.000 | thank you very much for watching and I will see you again in the next one. Bye!

00:24:55.040 | you

Create Custom Tools for Chatbots in LangChain — LangChain #8

Chapters