back to indexHugging Face Agents — Building Custom Tools
Chapters
0:0 Custom Hugging Face Agents
0:10 Hugging Face Custom Tools Setup
2:58 How Agents Work in Hugging Face
3:55 Building Custom Tools
7:34 Keeping Agent Toolbox Organized
10:2 Final Thoughts on Agents and Tools
00:00:00.000 |
Today, we're going to be taking another look at Hugging Face Agents. 00:00:03.400 |
This time, we're going to focus on how we can actually build our own custom tools 00:00:10.200 |
So we're going to work through this notebook here. 00:00:12.300 |
There will be a link at the top of the video right now for this, 00:00:17.000 |
and you can just follow along as we go through it. 00:00:20.200 |
One thing before we do start is we're going to be running 00:00:23.900 |
Transformer models locally and Diffusion models as well. 00:00:27.800 |
So to speed that up, we can go to Runtime, Change Runtime Type, 00:00:32.400 |
make sure you have GPU as your hardware accelerator. 00:00:36.100 |
For this walkthrough, you can use the free version of Colab. 00:00:40.000 |
You just select GPU, the base GPU will work for this. 00:00:44.900 |
So we save that, and then all we need to do is run the pip installs up here. 00:00:50.500 |
So we've got Transformers, Diffusers because one of the examples 00:00:54.200 |
includes a image generation, and also Accelerate. 00:00:58.900 |
So that just optimizes the way that we use our GPU. 00:01:03.800 |
And also OpenAI because we're going to use OpenAI's GPT 3.5 Turbo model 00:01:12.700 |
So we run those, and then you'd also want to run this as well. 00:01:20.600 |
There's also a HuggingFace agent, which uses HuggingFace endpoints 00:01:24.500 |
to give us access to the HuggingFace Sword models 00:01:31.400 |
We can also use that, but it's actually easier just to use OpenAI 00:01:40.100 |
until they build out the functionality to use local LLMs. 00:01:55.900 |
And after you run that, it's just going to download 00:02:01.000 |
So obviously HuggingFace agents, it is using a set of tools. 00:02:08.200 |
And then what we're going to do is just run this. 00:02:11.100 |
So we're going to make sure this is actually initialized and working. 00:02:20.300 |
that it needs to run the tools that the agent will be using. 00:02:25.800 |
So we do have to wait a little while the first time, 00:02:31.700 |
we can run it again and it will be much faster. 00:02:34.000 |
Okay, and after downloading and running the process, 00:02:38.700 |
We can try running it again and this time it will be much faster. 00:02:41.700 |
So we run that. Okay, that processes and we should get our image. 00:02:48.000 |
Here we go. All right, so that was 12 seconds. 00:02:54.100 |
but it's so much faster than downloading everything every time. 00:02:57.700 |
Okay, now what we've just done is use the default agent 00:03:01.700 |
with all the default tools that come with it. 00:03:05.100 |
And we can actually see them by printing out the agent toolbox. 00:03:10.300 |
Okay, so we can see there's this document QA, image captioner, 00:03:14.400 |
image QA, image segmenter, all these other things. 00:03:17.600 |
And then you can see the details of those tools in there as well. 00:03:21.600 |
Now, for the default tools, they are defined as pre-tool objects. 00:03:37.600 |
this description in order to decide which tool to use. 00:03:40.900 |
So that is actually very important and it's not just for us, 00:03:46.800 |
Okay, and we can see there's actually quite a few in there. 00:03:49.900 |
I'm not sure how many exactly, but there are a few. 00:03:53.800 |
So what we can do is actually define our own tools just like these. 00:04:01.000 |
Okay, and then we just add them to the agent toolbox 00:04:04.000 |
and then the agent can actually use that tool. 00:04:07.200 |
And naturally, being able to build our own tools for these agents to use 00:04:11.900 |
makes what these agents can do in scope much broader. 00:04:17.800 |
We can kind of anything we program, we can almost do with an agent, 00:04:25.300 |
And obviously, for building tools or use cases with these agents, 00:04:30.500 |
it's something that I think the vast majority of use cases 00:04:36.400 |
So what I want to do is just show you how to build really simple tools. 00:04:42.100 |
but it just kind of shows the format or the structure 00:04:49.100 |
So for that, we have this meaning of life tool. 00:04:52.800 |
You can see here, we have this task, we have a description, 00:04:58.500 |
and we have a similar but not exactly the same format here. 00:05:14.200 |
it's just not within the pretool object here. 00:05:16.700 |
So we have a name and then we have the description, 00:05:20.800 |
And this description, like I mentioned before, 00:05:28.200 |
although if we can understand what this tool does, 00:05:32.500 |
that the large language model should understand as well. 00:05:38.500 |
the most important thing to understand or to consider 00:05:46.600 |
and very specific on what the tool does, right? 00:05:51.300 |
Just very simple language, make it very clear. 00:05:56.800 |
and then we also want to specify inputs and outputs of the tool. 00:06:04.600 |
and the output format is actually just some text as well. 00:06:08.500 |
So we specify that and then we have the call method here. 00:06:13.500 |
So every tool, when the agent refers to that tool for help, 00:06:25.100 |
usually to process whatever it is you're doing here, right? 00:06:29.400 |
In this case, we're just doing something really simple. 00:06:47.900 |
what we're going to do is reinitialize our agent 00:06:56.100 |
and we just pass in that meaning of life tool. 00:07:05.400 |
Cool, and then we can say, okay, what is the meaning of life? 00:07:09.800 |
And we can see this explanation from the agent. 00:07:12.300 |
So it explains it's going to use this meaning of life tool 00:07:22.900 |
and it passes in this query, what is the meaning of life? 00:07:34.400 |
Now, one other thing that we should just kind of cover here 00:07:55.000 |
And then we have our meaning of life tool at the end. 00:08:03.200 |
but I think in most we would probably want to define 00:08:07.800 |
which tools are open to be used by the model, right? 00:08:16.800 |
And in order for the agent to use these tools, 00:08:23.000 |
are passed into every prompt we send to the LLM. 00:08:26.700 |
And if we, I mean, there's a lot of texting, right? 00:08:30.500 |
All of these descriptions are being passed to the LLM. 00:08:42.100 |
And it can reduce the quality of what it outputs 00:08:49.100 |
LLMs can struggle to follow the initial instructions 00:08:55.900 |
because there's more tokens that you have to pay for here. 00:09:01.000 |
it's a good idea to limit the number of tools 00:09:09.300 |
Okay, we can see the agent toolbox here again. 00:09:16.700 |
identifying which of these tools are pre-tools 00:09:24.700 |
So I'm just going to initialize this delete list. 00:09:27.000 |
We're going to go through each tool in the toolbox 00:09:30.100 |
and we're just going to test if it is a pre-tool. 00:09:33.000 |
If it is a pre-tool, we add its name to the delete list. 00:09:37.600 |
And then after that, we're just going to go through 00:09:40.000 |
that delete list and just delete them from the toolbox. 00:09:43.500 |
Okay, so we can run that and then this is our toolbox now. 00:09:53.400 |
and that will just help our agent focus on the tools 00:09:57.100 |
that we actually need rather than all these other tools 00:10:17.400 |
what they can do is massively expanded in scope 00:10:21.300 |
when we start building our own custom agents. 00:10:24.300 |
And as I said, like if you are actually building projects 00:10:39.100 |
Of which for Hugging Face agents as a very new framework,