back to index

Create Custom Tools for Chatbots in LangChain — LangChain #8


Chapters

0:0 LangChain agents and tools
1:46 What are LLM tools
3:12 Code notebook setup and prerequisites
5:58 Building a simple LangChain calculator tool
10:8 Updating agent prompts
12:14 Building tools with multiple parameters
15:40 Helping ChatGPT understand images
23:5 What else can LangChain agents do

Whisper Transcript | Transcript Only Page

00:00:00.000 | large language models and conversational agents are one of the most interesting technologies to
00:00:07.040 | have proliferated recently, specifically agents and their use of tools. So using agents we
00:00:16.640 | essentially just give a large language model access to different tools and tools can mean
00:00:22.080 | a lot of things as I will explain in a moment. By using agents and tools we have a pretty much
00:00:30.240 | infinite potential in what we can actually do with large language models. They allow us to search web,
00:00:38.080 | run code, do maths and a lot more. Now we can actually get pretty far with some of the pre-built
00:00:44.880 | tools that are available through lang chain but in a lot of use cases we will probably need to
00:00:52.160 | either modify those existing tools in some way or actually just build completely custom new tools
00:00:59.360 | to fit to our particular use case. So that's exactly what we're going to talk about in this video.
00:01:05.440 | We're going to take a look at a few pretty simple examples to get started of tools that we can
00:01:11.840 | that we can build before moving on to what I think is a more interesting example of a tool
00:01:17.840 | that takes inspiration from the recent Hugging GPT paper and will essentially give our
00:01:25.680 | ChatGPT large language model access to another deep learning model which will allow it to actually
00:01:33.680 | look at images and caption them. So that should be pretty interesting but before we get into that
00:01:40.560 | let's just define kind of what I mean by a tool very quickly. So we have our large language models
00:01:49.360 | and by default they just kind of exist in isolation. You put in some text and you generate
00:01:56.320 | some text. You get some output text from them. Tools can be a lot of different things
00:02:02.480 | but the core is that they are something, this box in the middle here, and that something
00:02:09.680 | takes in some like a string or a couple of parameters and it outputs a string or a couple
00:02:17.920 | of parameters. Now the input to this tool is going to be controlled by our large language model
00:02:25.920 | and then the output actually goes back to our large language model. So it's just a way of
00:02:34.000 | giving new abilities to our large language models. You can think of maths for example.
00:02:40.000 | Large language models are very bad at maths but we can actually just tell it okay you're bad at maths
00:02:46.080 | just use this tool and you can calculate whatever you want, right? That is basically what tools are.
00:02:53.520 | They are anything that we can plug into our large language model and that anything is pretty flexible
00:03:01.120 | on what we're using. We just need to figure out a way of inputting and outputting relevant information
00:03:08.560 | that our large language model can understand. So let's get started with our first example. We're
00:03:15.040 | going to be using this notebook here so there will be a link to this notebook you can follow along
00:03:20.080 | with me if you want. We are going to be using, later on I mentioned, we're going to be using
00:03:25.200 | another ML model as one of our tools. So we're going to be using transformers.
00:03:30.080 | If you have CUDA enabled GPU this part will run a lot faster for you if you use that GPU or if
00:03:38.480 | you're on Colab what you can do is you can go to runtime at the top, change runtime type, hardware
00:03:43.760 | accelerator and make sure you have GPU selected. Okay so we're going to run those prerequisites
00:03:51.440 | that'll just take a moment for those to install. Okay once they are installed we're going to come
00:03:57.040 | to our first example. So we're going to build a simple circle circumference calculator tool.
00:04:03.920 | Okay so one thing that large language models are bad at is maths even when it's something pretty
00:04:09.920 | simple like this. So what we're doing is we're importing this base tool class from
00:04:15.840 | line chain tools and we're using that to initialize our circumference tool class. Okay so we're going
00:04:21.520 | to inherit the the main methods from that base tool. Now there's two important things that we
00:04:28.400 | define here the name which is just the name of our tool circumference calculator and more
00:04:34.000 | importantly the description. So the description is a literally natural language description that
00:04:39.520 | is going to be used by our large language model to decide which tool to use. Okay so we need it
00:04:45.600 | to describe when to use this tool. So we just say use this tool when you need to calculate a
00:04:51.280 | circumference using the radius of a circle. Okay it's really clear and straightforward and then we
00:04:58.400 | need to define these two functions. So we have run and a run. Okay a run is when you would run this
00:05:05.840 | tool asynchronously that I'm not going to talk about that it's kind of beyond the scope of this
00:05:10.720 | video. I will talk about this at some point in the future but for now we're going to leave that.
00:05:14.880 | So we're just going to focus on the standard run function. Okay so run function in this example
00:05:22.240 | takes a single input radius which is going to be either int or a float and we're just going to
00:05:28.080 | return the circumference based on the radius of the circle. Okay that's all this is super simple.
00:05:35.200 | I just want to show you like the structure of one of these tools. Okay so we run that
00:05:40.480 | and when our large language model decides to run this tool it is going to go to this. Okay
00:05:48.400 | so now we've defined that let's go ahead and actually initialize our large language model,
00:05:54.640 | initialize the conversational agent that we're going to use. So to do that we actually first
00:06:00.160 | need an OpenAI API key. For that you need to go to platform.openai.com and your API key you just
00:06:08.400 | put it in here or you can set it as a environment variable and this will grab it for you. Then we
00:06:15.440 | initialize large language model. So we're going to be using chat OpenAI because we're using a
00:06:22.080 | chat model here so gpt 3.5 turbo. Okay temperature because we're doing some maths this is pretty
00:06:28.960 | important it is very good to keep your temperature very low. You can think of this as basically
00:06:35.280 | randomness so when you're doing maths or writing code or anything like that it's a very good idea
00:06:41.200 | to keep the temperature or randomness of the model as low as possible because you don't want it to be
00:06:47.760 | random with code it's going to more likely make mistakes. When you do want it to be more creative
00:06:54.480 | like with creative writing then you would set temperature to one or higher. Then we initialize
00:07:01.200 | conversational memory so this is just going to allow us to remember the previous five interactions
00:07:07.440 | between the AI and us in this conversation. The memory key here we set this to chat history because
00:07:15.680 | that is what will be used by the agent down here. So we just align those two things. I've spoken
00:07:21.360 | about that in a previous video on a chat in this line chain video series so if you want to look
00:07:29.680 | more into that I would recommend you take a look at the chat video of this series. So let's run
00:07:35.600 | that. Okay cool and now what we want to do is initialize the agent. So we're going to be using
00:07:42.640 | let's run this. So we're going to be using the tools here so the circumference tool
00:07:47.760 | is the only tool we're using just one. What we need to pass it as a list into the tools parameter
00:07:52.640 | of initialize agent. The agent we're going to be using is a chat agent which means we're using a
00:07:59.280 | chat model. It's conversational meaning it has conversational memory. It uses a react framework
00:08:05.920 | which is like a reasoning and action framework so basically with this the model is able to reason
00:08:13.280 | about what steps to take and also take actions like use a tool based on those thoughts and
00:08:22.000 | description is referring to the fact that the large language model decides which tool to use
00:08:27.680 | based on the description of those tools. Alright we pass in our large language model set verbose
00:08:35.120 | to true because we're developing this right so we want to see all the information or the
00:08:39.760 | like everything that is happening. If you are like pushing this out to customers you don't
00:08:45.200 | necessarily need this and max iterations so this is saying how many steps of reasoning and action
00:08:54.000 | and observation will you are you allowed to take before we say stop. Right this is important because
00:08:59.200 | it can get stuck in an infinite loop of just trying tool after tool after tool so we don't
00:09:03.760 | want it to do that. It's going to cost money for starters and then we also have the early
00:09:09.520 | stopping method so generate means that the model is going to decide when it should stop.
00:09:15.440 | Okay cool so I have already run that and then we can go to our first question which is can
00:09:22.720 | you calculate the circumference of a circle that has a radius of 7.81 millimeters. Let's see what
00:09:30.720 | it comes up with. Okay so it gets 49.03 millimeters let's take a look what the actual answer is. Okay
00:09:39.680 | 49.07 so it got close but it's it's not really accurate. Okay and if we take a look at why
00:09:45.840 | this isn't accurate even though we've already passed in that tool it's actually because it
00:09:50.000 | didn't use the tool. Okay it jumped straight to the final answer so the reason that it's doing that
00:09:56.640 | is because this this model is actually overly confident in its own ability to do maths and so
00:10:03.200 | to fix that we basically need to tell it that it's terrible at maths. So we need to update the
00:10:10.640 | system message so the initial message that the model is given to tell it that it can't do maths.
00:10:17.440 | So let's have a look at the existing prompt. Okay so it's just system is a large language model
00:10:23.680 | trained by OpenAI designed to assist with a wide variety of range of tasks so on and so on.
00:10:29.440 | Basically it's telling it you can do anything right but we don't want it to do anything. We
00:10:34.480 | want it to not do maths. So I'm just taking that same prompt but I'm adding a new line in saying
00:10:42.000 | unfortunately Assistant is terrible at maths. When provided with math questions no matter how simple
00:10:48.800 | Assistant always refers to its trusty tools and absolutely does not try to answer math questions
00:10:55.760 | by itself. Okay so we just add this in and this is enough for the model to decide okay actually
00:11:03.520 | I shouldn't just try and guess the answer I should use my circumference calculator tool.
00:11:09.120 | Okay so we have our new system message here we need to update our prompt so the prompt actually
00:11:17.680 | contains a system message some other things and also the tools that it has available so we need to
00:11:24.480 | create a new prompt using both of those things and then we set our prompt to that new prompt.
00:11:32.240 | Okay now let's try and ask the exact same question again and see what happens.
00:11:37.600 | Okay great so it's using the circumference calculator this time it inputs 7.81 and the
00:11:44.880 | output that it returns is 49.07 right which is accurate because it is literally running some
00:11:51.840 | Python code to calculate that. Okay so then the output is a circumference of a circle with a
00:11:58.800 | radius of 7.81 is approximately 49.07 millimeters which is much more accurate. Okay that's pretty
00:12:07.520 | cool now in that example we just use one single input and return a single output. What if we would
00:12:15.760 | like to use a tool with multiple inputs? Okay we can also do that so we input these these are just
00:12:24.080 | specific to the tool we're using and let me just describe the tool. Okay so what we're going to do
00:12:31.280 | is we have our function here and it's a hypotenuse calculator. Okay we can calculate the hypotenuse
00:12:38.720 | of a triangle in different ways so we depending on what we have right so if we have the adjacent
00:12:45.840 | side and the opposite side we can use this here so we take the square root of the adjacent side
00:12:51.760 | squared plus the opposite side squared. If we have an angle and one of those sides we can use this
00:12:59.120 | here so we can use the adjacent side over the cosine of the angle or the opposite side over
00:13:04.640 | the sine of the angle. Right so there's multiple options here multiple ways we can go about this.
00:13:10.320 | So because of this we have multiple inputs and we actually want the large language model to decide
00:13:18.800 | which of these inputs to use. Right so we're telling it use this tool when you need to
00:13:23.840 | calculate the length of a hypotenuse given one or two sides of a triangle and or an angle in degrees.
00:13:30.720 | Okay to use the tool you must provide at least two of the following parameters one adjacent side
00:13:37.120 | opposite side or angle. Okay so with that that's actually all we need and this should work. Okay
00:13:44.720 | so again we're not using async run we're just using run. So we run that we set our new tools
00:13:52.160 | list in here we could also include the previous circumference calculator tool but just you know
00:13:59.600 | I just want to show you how to build custom tools here so I'm not going to we're just going to
00:14:03.760 | include the single tool. Then we need to create a new prompt. Okay so we create our new prompt with
00:14:08.960 | this code here and then we also because we just changed the tools that the agent has access to
00:14:15.280 | we need to update those in the agent itself. Okay so we also update the agent tools with
00:14:22.080 | our new tools list. Okay and now we can ask the question if I have a triangle with two sides of
00:14:28.240 | length 51 and 34 centimeters what is the length of the hypotenuse? Let's try. So it goes into the
00:14:35.120 | agent execute chain goes to the correct action it inputs the correct items both of the sides and
00:14:43.200 | there we go we get our answer so 61.92 so the length of the hypotenuse is approximately 61.29
00:14:51.440 | centimeters. Now let's try this so rather than giving both sides and we're now going to give
00:14:56.880 | the opposite side and an angle. Okay let's see if that works I haven't double checked logic here so
00:15:04.240 | I'm not actually sure if this is correct but all we really care about here is showing that the
00:15:11.680 | inputs and the outputs are being used correctly by the model. Yeah it does opposite side and angle
00:15:17.600 | and the observation so the calculated output there is 55.86 which we get down here. Okay so that
00:15:28.240 | seems to be working which is pretty cool. Now both of these are pretty ready simple examples right
00:15:35.840 | very simple tools. We can do a lot more and what I want to show you now is actually using tools to
00:15:45.040 | give your GPT model or the large language model access to other deep learning models. Okay because
00:15:53.920 | large language models can't do everything. Okay we still use other deep learning models for other
00:16:00.080 | things right so what I want to do is well I'm taking inspiration first from this paper Hugging
00:16:08.000 | GPT and the idea is that you use a large language model as a controller and that will refer to other
00:16:17.360 | models and these are all like open source models from Hugging Face. So one they have here is this
00:16:24.800 | image captioning model so what I thought okay we can try the same. So we're using a different model
00:16:32.400 | I found the one that they use in that paper at least in that visual wasn't the best so we're
00:16:38.400 | going to use a more I think it's a more recent model called Blip and we just initialize that
00:16:45.680 | and we'll use that for image captioning and actually I didn't even update this here right
00:16:52.560 | so initially I tried with that model let's update this cool so what we do is we from Transformers
00:17:00.480 | we want to import the processor and the actual model itself the model we're using is this and we
00:17:08.000 | here we're just saying okay if we have a CUDA enabled GPU available use that rather than CPU
00:17:14.720 | okay it just means things will be a bit faster. Now the processor that will process any text if
00:17:22.800 | we pass in any text in this example we don't but it does do that so tokenizes the text
00:17:28.080 | and it will also pre-process any images okay so Blip expects a specific dimensionality of images
00:17:37.440 | it expects the image pixels to be normalized so that will be handled by the processor and then
00:17:44.000 | after that we will pass it to the model here which is Blip for conditional generation.
00:17:48.480 | Conditional generation is just saying okay given like some text it's going to start generating
00:17:55.760 | some text and it's going to this model I don't know exactly how it works but I think it uses
00:18:01.360 | whatever it sees from the images right so converts them into probably a set of an array or a single
00:18:09.760 | vector and uses that to inform that text generation okay so we initialize that and we move it to our
00:18:19.360 | CUDA enabled GPU if it's available. Okay cool so if we to generate these captions we actually need
00:18:27.600 | an image so I'm going to start with this image here I'm going to download it with requests and
00:18:33.280 | then I'm going to open it with Pill okay so that will just create a Python image object that we can
00:18:38.960 | then view and this is also the format that the Blip processor expects to be fed. Okay so that
00:18:48.000 | is loaded it takes a little bit of time and it's just a picture of a orangutan in a tree okay so
00:18:55.040 | let's go ahead and pass that to our processor here and then what we're going to do is generate
00:19:02.800 | some some text based on the inputs from this okay we also limit that to 20 tokens so yeah we use
00:19:10.880 | that by default I think this actually does already use 20 tokens but the default value is to be
00:19:17.360 | deprecated at some point so you should set the number of tokens you want here and then what we
00:19:24.720 | do so this will output I think it outputs a like a array of tokens so then we use the processor
00:19:34.560 | decode to convert those tokens into like human readable text so let's run that and see see what
00:19:43.120 | it thinks we have okay processor is not defined I need to run this okay so then we come to here
00:19:52.560 | run this and now we should get our output okay so we get there is a monkey that is sitting in a tree
00:19:59.280 | which is accurate okay so now what we want to do is just use this like processes logic in our tool
00:20:10.080 | so that's what we're going to do here so all this code is basically just what we what we just went
00:20:17.040 | through so I'm not going to go through that again the all we do is we pass in the URL so a string
00:20:23.120 | which is where we're going to be downloading everything so this should actually be URL
00:20:28.960 | okay and yeah looks good so the description that we're using here is that we should use this tool
00:20:37.520 | when given the URL of an image that you'd like to be described it will return a simple caption
00:20:43.920 | describing the image okay so we're just telling it when to use it and what it does and then we
00:20:49.440 | reinitialize our tools list okay run this and then so the system message before that we created it
00:20:57.760 | mentioned that thing about you know unfortunately you're terrible at maths we're not using any math
00:21:03.360 | tools now so we can we can remove that that's all I've done here and we reinitialize our prompt
00:21:09.600 | and also reinitialize our tools okay and now we can ask about this image URL here okay so it enters
00:21:19.600 | into the agent executor chain goes straight to the image captioner passes in the URL and this is the
00:21:27.200 | same image that we saw before so the observation is that there is a monkey that is sitting in the
00:21:31.760 | tree and so the final answer is if we come down to here the image shows a monkey sitting in a tree
00:21:38.240 | okay so now our large language model agent can actually tell us what is in a image which is
00:21:47.280 | is pretty cool and obviously we can pair this with multiple different deep learning models like they
00:21:52.320 | did in the Hugging GPT paper let's try a few more so I'm gonna load this image okay so just have
00:22:01.200 | some guy surfing and if we ask the agent what is in this image let's see what it comes up with
00:22:08.720 | okay so we get surfer riding a wave in the ocean on a clear day okay so again pretty accurate I
00:22:17.360 | wouldn't I probably wouldn't bet on it being a clear day but yeah maybe it is okay so let's come
00:22:24.160 | down to the final one and let's see this one's a little more difficult so let's see what it says
00:22:29.520 | okay so in this image you can see there's a baby I think it's a I think it's a crocodile
00:22:36.560 | it's a crocodile with the like narrow snout and it's just like in a in a river sitting on a on a
00:22:42.720 | log okay so we say what is in this image it goes down this image gives us the observation that
00:22:49.920 | this is a lizard that is sitting on a tree branch in the water okay so it hasn't been specific about
00:22:55.920 | whether it's a crocodile or an alligator it's just said a lizard so you know not perfect but I think
00:23:02.800 | you know it's still still pretty good so naturally I think you can already see there's there's quite
00:23:09.120 | a lot that we can do with tools and we don't have to restrict ourselves to just a single tool per
00:23:14.400 | agent as well we can we can just add like a ton of different tools and give our agent you know a ton
00:23:21.760 | of options on what it can do and that's actually what we see in like the recent auto gbt and baby
00:23:28.480 | agi models that have kind of gone crazy recently that's what they do there's just like a ton of
00:23:34.960 | different tools that these models can use they have like an internal dialogue that it goes through
00:23:40.320 | but a big part of it is actually the number of tools that they can use
00:23:44.720 | so yeah you can you can really do quite a lot with these and I think for anyone that's actually
00:23:53.200 | working with large language models and using these you really do need to be using or need to
00:24:01.520 | get familiar with building tools because they really are like a key component of being able to
00:24:08.960 | do cool stuff with these models now this is pretty introductory there is a ton of other stuff that we
00:24:16.400 | can cover with with tools and for sure there will be future videos going into more detail building
00:24:23.360 | more advanced conversational agents and yeah plenty more so that should be pretty interesting
00:24:31.280 | but for now we're going to leave it there so I hope this video has been interesting and useful
00:24:38.000 | thank you very much for watching and I will see you again in the next one. Bye!