back to index

Hugging Face Agents — Building Custom Tools


Chapters

0:0 Custom Hugging Face Agents
0:10 Hugging Face Custom Tools Setup
2:58 How Agents Work in Hugging Face
3:55 Building Custom Tools
7:34 Keeping Agent Toolbox Organized
10:2 Final Thoughts on Agents and Tools

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to be taking another look at Hugging Face Agents.
00:00:03.400 | This time, we're going to focus on how we can actually build our own custom tools
00:00:08.000 | for these agents to use.
00:00:10.200 | So we're going to work through this notebook here.
00:00:12.300 | There will be a link at the top of the video right now for this,
00:00:17.000 | and you can just follow along as we go through it.
00:00:20.200 | One thing before we do start is we're going to be running
00:00:23.900 | Transformer models locally and Diffusion models as well.
00:00:27.800 | So to speed that up, we can go to Runtime, Change Runtime Type,
00:00:32.400 | make sure you have GPU as your hardware accelerator.
00:00:36.100 | For this walkthrough, you can use the free version of Colab.
00:00:40.000 | You just select GPU, the base GPU will work for this.
00:00:44.900 | So we save that, and then all we need to do is run the pip installs up here.
00:00:50.500 | So we've got Transformers, Diffusers because one of the examples
00:00:54.200 | includes a image generation, and also Accelerate.
00:00:58.900 | So that just optimizes the way that we use our GPU.
00:01:03.800 | And also OpenAI because we're going to use OpenAI's GPT 3.5 Turbo model
00:01:08.800 | as the controller or the agent itself.
00:01:12.700 | So we run those, and then you'd also want to run this as well.
00:01:17.800 | So we're importing the OpenAI agent.
00:01:20.600 | There's also a HuggingFace agent, which uses HuggingFace endpoints
00:01:24.500 | to give us access to the HuggingFace Sword models
00:01:29.500 | or the models on the HuggingFace Hub.
00:01:31.400 | We can also use that, but it's actually easier just to use OpenAI
00:01:37.100 | and also cheaper to use OpenAI at the moment
00:01:40.100 | until they build out the functionality to use local LLMs.
00:01:45.100 | So yeah, we run that.
00:01:47.700 | You'll need your OpenAI API key,
00:01:49.600 | which you can get from platform.openai.com.
00:01:55.900 | And after you run that, it's just going to download
00:01:58.800 | some tool configurations here.
00:02:01.000 | So obviously HuggingFace agents, it is using a set of tools.
00:02:05.000 | So that is what it's downloading those for.
00:02:08.200 | And then what we're going to do is just run this.
00:02:11.100 | So we're going to make sure this is actually initialized and working.
00:02:16.700 | So the first time you run these,
00:02:18.600 | it's always going to download the models
00:02:20.300 | that it needs to run the tools that the agent will be using.
00:02:25.800 | So we do have to wait a little while the first time,
00:02:29.300 | but then after running it the first time,
00:02:31.700 | we can run it again and it will be much faster.
00:02:34.000 | Okay, and after downloading and running the process,
00:02:36.300 | we get this image of a boat in the water.
00:02:38.700 | We can try running it again and this time it will be much faster.
00:02:41.700 | So we run that. Okay, that processes and we should get our image.
00:02:48.000 | Here we go. All right, so that was 12 seconds.
00:02:50.500 | So it's fairly, takes a little bit of time,
00:02:54.100 | but it's so much faster than downloading everything every time.
00:02:57.700 | Okay, now what we've just done is use the default agent
00:03:01.700 | with all the default tools that come with it.
00:03:03.600 | And there are quite a few of those.
00:03:05.100 | And we can actually see them by printing out the agent toolbox.
00:03:10.300 | Okay, so we can see there's this document QA, image captioner,
00:03:14.400 | image QA, image segmenter, all these other things.
00:03:17.600 | And then you can see the details of those tools in there as well.
00:03:21.600 | Now, for the default tools, they are defined as pre-tool objects.
00:03:26.800 | Okay, so we can see all that in there.
00:03:29.500 | It tells you what this task is for.
00:03:32.000 | It gives you a description of the tool.
00:03:34.800 | And this is actually used by the agent,
00:03:37.600 | this description in order to decide which tool to use.
00:03:40.900 | So that is actually very important and it's not just for us,
00:03:44.300 | it's actually for the model.
00:03:46.800 | Okay, and we can see there's actually quite a few in there.
00:03:49.900 | I'm not sure how many exactly, but there are a few.
00:03:53.800 | So what we can do is actually define our own tools just like these.
00:04:01.000 | Okay, and then we just add them to the agent toolbox
00:04:04.000 | and then the agent can actually use that tool.
00:04:07.200 | And naturally, being able to build our own tools for these agents to use
00:04:11.900 | makes what these agents can do in scope much broader.
00:04:17.800 | We can kind of anything we program, we can almost do with an agent,
00:04:22.800 | which is pretty cool.
00:04:25.300 | And obviously, for building tools or use cases with these agents,
00:04:30.500 | it's something that I think the vast majority of use cases
00:04:34.900 | are probably going to need.
00:04:36.400 | So what I want to do is just show you how to build really simple tools.
00:04:40.000 | I mean, nothing complicated,
00:04:42.100 | but it just kind of shows the format or the structure
00:04:46.700 | of what a tool must be.
00:04:49.100 | So for that, we have this meaning of life tool.
00:04:52.800 | You can see here, we have this task, we have a description,
00:04:58.500 | and we have a similar but not exactly the same format here.
00:05:02.700 | So in this case, we actually have a name
00:05:05.500 | and these, if I...
00:05:08.200 | Okay, so the name is actually this here.
00:05:10.200 | So it's a key within that dictionary.
00:05:12.300 | So they do still have that name,
00:05:14.200 | it's just not within the pretool object here.
00:05:16.700 | So we have a name and then we have the description,
00:05:19.400 | just like what we see here.
00:05:20.800 | And this description, like I mentioned before,
00:05:24.000 | it's for the large language model.
00:05:26.200 | It's not for us to understand,
00:05:28.200 | although if we can understand what this tool does,
00:05:31.100 | it's probably a good indication
00:05:32.500 | that the large language model should understand as well.
00:05:35.100 | But when we're writing these descriptions,
00:05:38.500 | the most important thing to understand or to consider
00:05:43.700 | is that it needs to be really concise
00:05:46.600 | and very specific on what the tool does, right?
00:05:51.300 | Just very simple language, make it very clear.
00:05:54.500 | Okay, so we have our description
00:05:56.800 | and then we also want to specify inputs and outputs of the tool.
00:06:00.900 | So the input format is just some text
00:06:04.600 | and the output format is actually just some text as well.
00:06:08.500 | So we specify that and then we have the call method here.
00:06:13.500 | So every tool, when the agent refers to that tool for help,
00:06:20.200 | this is what it's going to be called.
00:06:22.100 | Okay, so in here, you would write some code,
00:06:25.100 | usually to process whatever it is you're doing here, right?
00:06:29.400 | In this case, we're just doing something really simple.
00:06:31.800 | We're going to return the string 42.
00:06:34.400 | Okay, so whenever the user asks something
00:06:37.300 | like what is the meaning of life
00:06:39.800 | or some other broad unanswerable question,
00:06:43.600 | we're going to return 42.
00:06:46.000 | And after we've initialized that tool,
00:06:47.900 | what we're going to do is reinitialize our agent
00:06:51.800 | with this meaning of life tool.
00:06:54.100 | Okay, so we have these additional tools
00:06:56.100 | and we just pass in that meaning of life tool.
00:06:58.900 | So let's run that.
00:07:00.900 | Actually, did I run this?
00:07:02.100 | Okay, run this first and then run this.
00:07:05.400 | Cool, and then we can say, okay, what is the meaning of life?
00:07:09.800 | And we can see this explanation from the agent.
00:07:12.300 | So it explains it's going to use this meaning of life tool
00:07:16.400 | to find the answer to the question.
00:07:18.500 | The code that it generates is this.
00:07:20.900 | So it goes to the meaning of life tool
00:07:22.900 | and it passes in this query, what is the meaning of life?
00:07:26.500 | And it then prints out the answer.
00:07:29.600 | Okay, so the answer is 42.
00:07:32.300 | Okay, perfect.
00:07:34.400 | Now, one other thing that we should just kind of cover here
00:07:40.200 | is that right now there are a lot of tools
00:07:45.200 | that are attached to our agent.
00:07:47.700 | Okay, so if we just print all those out,
00:07:50.800 | we have all of these pre-tools.
00:07:52.600 | So it's 14 pre-tools in total.
00:07:55.000 | And then we have our meaning of life tool at the end.
00:07:57.500 | Now, in some use cases,
00:07:59.900 | maybe you do want all of these pre-tools,
00:08:03.200 | but I think in most we would probably want to define
00:08:07.800 | which tools are open to be used by the model, right?
00:08:12.300 | Because chatbots tend to work better
00:08:15.000 | if you restrict their scope.
00:08:16.800 | And in order for the agent to use these tools,
00:08:19.700 | all of these tools and their descriptions
00:08:23.000 | are passed into every prompt we send to the LLM.
00:08:26.700 | And if we, I mean, there's a lot of texting, right?
00:08:30.500 | All of these descriptions are being passed to the LLM.
00:08:33.600 | That's a lot of extra tokens,
00:08:36.600 | which is going to slow down the processing
00:08:38.500 | or the response time for our LLM.
00:08:42.100 | And it can reduce the quality of what it outputs
00:08:46.200 | because if you put in more text,
00:08:49.100 | LLMs can struggle to follow the initial instructions
00:08:53.000 | that you've given them.
00:08:54.100 | And it's also going to cost more money
00:08:55.900 | because there's more tokens that you have to pay for here.
00:08:58.900 | So for those reasons,
00:09:01.000 | it's a good idea to limit the number of tools
00:09:04.300 | that we have available to our agent.
00:09:06.800 | And we can do that.
00:09:09.300 | Okay, we can see the agent toolbox here again.
00:09:12.500 | We can do that by just going through here,
00:09:16.700 | identifying which of these tools are pre-tools
00:09:20.200 | and just removing them from the toolbox.
00:09:22.700 | So let's do that.
00:09:24.700 | So I'm just going to initialize this delete list.
00:09:27.000 | We're going to go through each tool in the toolbox
00:09:30.100 | and we're just going to test if it is a pre-tool.
00:09:33.000 | If it is a pre-tool, we add its name to the delete list.
00:09:37.600 | And then after that, we're just going to go through
00:09:40.000 | that delete list and just delete them from the toolbox.
00:09:43.500 | Okay, so we can run that and then this is our toolbox now.
00:09:48.000 | It's just got one item in there, right?
00:09:50.100 | So we've just cleaned up that toolbox
00:09:53.400 | and that will just help our agent focus on the tools
00:09:57.100 | that we actually need rather than all these other tools
00:09:59.300 | that we don't need in most cases.
00:10:01.800 | So that's it for this walkthrough.
00:10:03.500 | I just wanted to show you a little bit
00:10:05.600 | of how we can use those custom tools
00:10:07.900 | and also control or clean up the toolbox
00:10:11.300 | within our agent for Hugging Face.
00:10:13.900 | Naturally, as I mentioned, these agents,
00:10:17.400 | what they can do is massively expanded in scope
00:10:21.300 | when we start building our own custom agents.
00:10:24.300 | And as I said, like if you are actually building projects
00:10:27.800 | with these, I think almost all the time,
00:10:31.700 | you're going to want these custom agents
00:10:34.300 | unless you manage to find agents out there
00:10:36.100 | that have already been built for you to use.
00:10:39.100 | Of which for Hugging Face agents as a very new framework,
00:10:43.000 | there are very few.
00:10:45.400 | So yeah, that's it for this video.
00:10:48.200 | I hope this has been useful and interesting.
00:10:51.300 | So thank you very much for watching
00:10:53.200 | and I will see you again in the next one.
00:10:56.300 | (gentle music)
00:11:01.300 | (gentle music)
00:11:06.300 | (gentle music)