Hugging Face Agents — Building Custom Tools

00:00:00.000 | Today, we're going to be taking another look at Hugging Face Agents.

00:00:03.400 | This time, we're going to focus on how we can actually build our own custom tools

00:00:08.000 | for these agents to use.

00:00:10.200 | So we're going to work through this notebook here.

00:00:12.300 | There will be a link at the top of the video right now for this,

00:00:17.000 | and you can just follow along as we go through it.

00:00:20.200 | One thing before we do start is we're going to be running

00:00:23.900 | Transformer models locally and Diffusion models as well.

00:00:27.800 | So to speed that up, we can go to Runtime, Change Runtime Type,

00:00:32.400 | make sure you have GPU as your hardware accelerator.

00:00:36.100 | For this walkthrough, you can use the free version of Colab.

00:00:40.000 | You just select GPU, the base GPU will work for this.

00:00:44.900 | So we save that, and then all we need to do is run the pip installs up here.

00:00:50.500 | So we've got Transformers, Diffusers because one of the examples

00:00:54.200 | includes a image generation, and also Accelerate.

00:00:58.900 | So that just optimizes the way that we use our GPU.

00:01:03.800 | And also OpenAI because we're going to use OpenAI's GPT 3.5 Turbo model

00:01:08.800 | as the controller or the agent itself.

00:01:12.700 | So we run those, and then you'd also want to run this as well.

00:01:17.800 | So we're importing the OpenAI agent.

00:01:20.600 | There's also a HuggingFace agent, which uses HuggingFace endpoints

00:01:24.500 | to give us access to the HuggingFace Sword models

00:01:29.500 | or the models on the HuggingFace Hub.

00:01:31.400 | We can also use that, but it's actually easier just to use OpenAI

00:01:37.100 | and also cheaper to use OpenAI at the moment

00:01:40.100 | until they build out the functionality to use local LLMs.

00:01:45.100 | So yeah, we run that.

00:01:47.700 | You'll need your OpenAI API key,

00:01:49.600 | which you can get from platform.openai.com.

00:01:55.900 | And after you run that, it's just going to download

00:01:58.800 | some tool configurations here.

00:02:01.000 | So obviously HuggingFace agents, it is using a set of tools.

00:02:05.000 | So that is what it's downloading those for.

00:02:08.200 | And then what we're going to do is just run this.

00:02:11.100 | So we're going to make sure this is actually initialized and working.

00:02:16.700 | So the first time you run these,

00:02:18.600 | it's always going to download the models

00:02:20.300 | that it needs to run the tools that the agent will be using.

00:02:25.800 | So we do have to wait a little while the first time,

00:02:29.300 | but then after running it the first time,

00:02:31.700 | we can run it again and it will be much faster.

00:02:34.000 | Okay, and after downloading and running the process,

00:02:36.300 | we get this image of a boat in the water.

00:02:38.700 | We can try running it again and this time it will be much faster.

00:02:41.700 | So we run that. Okay, that processes and we should get our image.

00:02:48.000 | Here we go. All right, so that was 12 seconds.

00:02:50.500 | So it's fairly, takes a little bit of time,

00:02:54.100 | but it's so much faster than downloading everything every time.

00:02:57.700 | Okay, now what we've just done is use the default agent

00:03:01.700 | with all the default tools that come with it.

00:03:03.600 | And there are quite a few of those.

00:03:05.100 | And we can actually see them by printing out the agent toolbox.

00:03:10.300 | Okay, so we can see there's this document QA, image captioner,

00:03:14.400 | image QA, image segmenter, all these other things.

00:03:17.600 | And then you can see the details of those tools in there as well.

00:03:21.600 | Now, for the default tools, they are defined as pre-tool objects.

00:03:26.800 | Okay, so we can see all that in there.

00:03:29.500 | It tells you what this task is for.

00:03:32.000 | It gives you a description of the tool.

00:03:34.800 | And this is actually used by the agent,

00:03:37.600 | this description in order to decide which tool to use.

00:03:40.900 | So that is actually very important and it's not just for us,

00:03:44.300 | it's actually for the model.

00:03:46.800 | Okay, and we can see there's actually quite a few in there.

00:03:49.900 | I'm not sure how many exactly, but there are a few.

00:03:53.800 | So what we can do is actually define our own tools just like these.

00:04:01.000 | Okay, and then we just add them to the agent toolbox

00:04:04.000 | and then the agent can actually use that tool.

00:04:07.200 | And naturally, being able to build our own tools for these agents to use

00:04:11.900 | makes what these agents can do in scope much broader.

00:04:17.800 | We can kind of anything we program, we can almost do with an agent,

00:04:22.800 | which is pretty cool.

00:04:25.300 | And obviously, for building tools or use cases with these agents,

00:04:30.500 | it's something that I think the vast majority of use cases

00:04:34.900 | are probably going to need.

00:04:36.400 | So what I want to do is just show you how to build really simple tools.

00:04:40.000 | I mean, nothing complicated,

00:04:42.100 | but it just kind of shows the format or the structure

00:04:46.700 | of what a tool must be.

00:04:49.100 | So for that, we have this meaning of life tool.

00:04:52.800 | You can see here, we have this task, we have a description,

00:04:58.500 | and we have a similar but not exactly the same format here.

00:05:02.700 | So in this case, we actually have a name

00:05:05.500 | and these, if I...

00:05:08.200 | Okay, so the name is actually this here.

00:05:10.200 | So it's a key within that dictionary.

00:05:12.300 | So they do still have that name,

00:05:14.200 | it's just not within the pretool object here.

00:05:16.700 | So we have a name and then we have the description,

00:05:19.400 | just like what we see here.

00:05:20.800 | And this description, like I mentioned before,

00:05:24.000 | it's for the large language model.

00:05:26.200 | It's not for us to understand,

00:05:28.200 | although if we can understand what this tool does,

00:05:31.100 | it's probably a good indication

00:05:32.500 | that the large language model should understand as well.

00:05:35.100 | But when we're writing these descriptions,

00:05:38.500 | the most important thing to understand or to consider

00:05:43.700 | is that it needs to be really concise

00:05:46.600 | and very specific on what the tool does, right?

00:05:51.300 | Just very simple language, make it very clear.

00:05:54.500 | Okay, so we have our description

00:05:56.800 | and then we also want to specify inputs and outputs of the tool.

00:06:00.900 | So the input format is just some text

00:06:04.600 | and the output format is actually just some text as well.

00:06:08.500 | So we specify that and then we have the call method here.

00:06:13.500 | So every tool, when the agent refers to that tool for help,

00:06:20.200 | this is what it's going to be called.

00:06:22.100 | Okay, so in here, you would write some code,

00:06:25.100 | usually to process whatever it is you're doing here, right?

00:06:29.400 | In this case, we're just doing something really simple.

00:06:31.800 | We're going to return the string 42.

00:06:34.400 | Okay, so whenever the user asks something

00:06:37.300 | like what is the meaning of life

00:06:39.800 | or some other broad unanswerable question,

00:06:43.600 | we're going to return 42.

00:06:46.000 | And after we've initialized that tool,

00:06:47.900 | what we're going to do is reinitialize our agent

00:06:51.800 | with this meaning of life tool.

00:06:54.100 | Okay, so we have these additional tools

00:06:56.100 | and we just pass in that meaning of life tool.

00:06:58.900 | So let's run that.

00:07:00.900 | Actually, did I run this?

00:07:02.100 | Okay, run this first and then run this.

00:07:05.400 | Cool, and then we can say, okay, what is the meaning of life?

00:07:09.800 | And we can see this explanation from the agent.

00:07:12.300 | So it explains it's going to use this meaning of life tool

00:07:16.400 | to find the answer to the question.

00:07:18.500 | The code that it generates is this.

00:07:20.900 | So it goes to the meaning of life tool

00:07:22.900 | and it passes in this query, what is the meaning of life?

00:07:26.500 | And it then prints out the answer.

00:07:29.600 | Okay, so the answer is 42.

00:07:32.300 | Okay, perfect.

00:07:34.400 | Now, one other thing that we should just kind of cover here

00:07:40.200 | is that right now there are a lot of tools

00:07:45.200 | that are attached to our agent.

00:07:47.700 | Okay, so if we just print all those out,

00:07:50.800 | we have all of these pre-tools.

00:07:52.600 | So it's 14 pre-tools in total.

00:07:55.000 | And then we have our meaning of life tool at the end.

00:07:57.500 | Now, in some use cases,

00:07:59.900 | maybe you do want all of these pre-tools,

00:08:03.200 | but I think in most we would probably want to define

00:08:07.800 | which tools are open to be used by the model, right?

00:08:12.300 | Because chatbots tend to work better

00:08:15.000 | if you restrict their scope.

00:08:16.800 | And in order for the agent to use these tools,

00:08:19.700 | all of these tools and their descriptions

00:08:23.000 | are passed into every prompt we send to the LLM.

00:08:26.700 | And if we, I mean, there's a lot of texting, right?

00:08:30.500 | All of these descriptions are being passed to the LLM.

00:08:33.600 | That's a lot of extra tokens,

00:08:36.600 | which is going to slow down the processing

00:08:38.500 | or the response time for our LLM.

00:08:42.100 | And it can reduce the quality of what it outputs

00:08:46.200 | because if you put in more text,

00:08:49.100 | LLMs can struggle to follow the initial instructions

00:08:53.000 | that you've given them.

00:08:54.100 | And it's also going to cost more money

00:08:55.900 | because there's more tokens that you have to pay for here.

00:08:58.900 | So for those reasons,

00:09:01.000 | it's a good idea to limit the number of tools

00:09:04.300 | that we have available to our agent.

00:09:06.800 | And we can do that.

00:09:09.300 | Okay, we can see the agent toolbox here again.

00:09:12.500 | We can do that by just going through here,

00:09:16.700 | identifying which of these tools are pre-tools

00:09:20.200 | and just removing them from the toolbox.

00:09:22.700 | So let's do that.

00:09:24.700 | So I'm just going to initialize this delete list.

00:09:27.000 | We're going to go through each tool in the toolbox

00:09:30.100 | and we're just going to test if it is a pre-tool.

00:09:33.000 | If it is a pre-tool, we add its name to the delete list.

00:09:37.600 | And then after that, we're just going to go through

00:09:40.000 | that delete list and just delete them from the toolbox.

00:09:43.500 | Okay, so we can run that and then this is our toolbox now.

00:09:48.000 | It's just got one item in there, right?

00:09:50.100 | So we've just cleaned up that toolbox

00:09:53.400 | and that will just help our agent focus on the tools

00:09:57.100 | that we actually need rather than all these other tools

00:09:59.300 | that we don't need in most cases.

00:10:01.800 | So that's it for this walkthrough.

00:10:03.500 | I just wanted to show you a little bit

00:10:05.600 | of how we can use those custom tools

00:10:07.900 | and also control or clean up the toolbox

00:10:11.300 | within our agent for Hugging Face.

00:10:13.900 | Naturally, as I mentioned, these agents,

00:10:17.400 | what they can do is massively expanded in scope

00:10:21.300 | when we start building our own custom agents.

00:10:24.300 | And as I said, like if you are actually building projects

00:10:27.800 | with these, I think almost all the time,

00:10:31.700 | you're going to want these custom agents

00:10:34.300 | unless you manage to find agents out there

00:10:36.100 | that have already been built for you to use.

00:10:39.100 | Of which for Hugging Face agents as a very new framework,

00:10:43.000 | there are very few.

00:10:45.400 | So yeah, that's it for this video.

00:10:48.200 | I hope this has been useful and interesting.

00:10:51.300 | So thank you very much for watching

00:10:53.200 | and I will see you again in the next one.

00:10:55.200 | Bye.

00:10:56.300 | (gentle music)

00:11:01.300 | (gentle music)

00:11:06.300 | (gentle music)

Hugging Face Agents — Building Custom Tools

Chapters