Back to Index

Hugging Face Agents — Building Custom Tools


Chapters

0:0 Custom Hugging Face Agents
0:10 Hugging Face Custom Tools Setup
2:58 How Agents Work in Hugging Face
3:55 Building Custom Tools
7:34 Keeping Agent Toolbox Organized
10:2 Final Thoughts on Agents and Tools

Transcript

Today, we're going to be taking another look at Hugging Face Agents. This time, we're going to focus on how we can actually build our own custom tools for these agents to use. So we're going to work through this notebook here. There will be a link at the top of the video right now for this, and you can just follow along as we go through it.

One thing before we do start is we're going to be running Transformer models locally and Diffusion models as well. So to speed that up, we can go to Runtime, Change Runtime Type, make sure you have GPU as your hardware accelerator. For this walkthrough, you can use the free version of Colab.

You just select GPU, the base GPU will work for this. So we save that, and then all we need to do is run the pip installs up here. So we've got Transformers, Diffusers because one of the examples includes a image generation, and also Accelerate. So that just optimizes the way that we use our GPU.

And also OpenAI because we're going to use OpenAI's GPT 3.5 Turbo model as the controller or the agent itself. So we run those, and then you'd also want to run this as well. So we're importing the OpenAI agent. There's also a HuggingFace agent, which uses HuggingFace endpoints to give us access to the HuggingFace Sword models or the models on the HuggingFace Hub.

We can also use that, but it's actually easier just to use OpenAI and also cheaper to use OpenAI at the moment until they build out the functionality to use local LLMs. So yeah, we run that. You'll need your OpenAI API key, which you can get from platform.openai.com. And after you run that, it's just going to download some tool configurations here.

So obviously HuggingFace agents, it is using a set of tools. So that is what it's downloading those for. And then what we're going to do is just run this. So we're going to make sure this is actually initialized and working. So the first time you run these, it's always going to download the models that it needs to run the tools that the agent will be using.

So we do have to wait a little while the first time, but then after running it the first time, we can run it again and it will be much faster. Okay, and after downloading and running the process, we get this image of a boat in the water. We can try running it again and this time it will be much faster.

So we run that. Okay, that processes and we should get our image. Here we go. All right, so that was 12 seconds. So it's fairly, takes a little bit of time, but it's so much faster than downloading everything every time. Okay, now what we've just done is use the default agent with all the default tools that come with it.

And there are quite a few of those. And we can actually see them by printing out the agent toolbox. Okay, so we can see there's this document QA, image captioner, image QA, image segmenter, all these other things. And then you can see the details of those tools in there as well.

Now, for the default tools, they are defined as pre-tool objects. Okay, so we can see all that in there. It tells you what this task is for. It gives you a description of the tool. And this is actually used by the agent, this description in order to decide which tool to use.

So that is actually very important and it's not just for us, it's actually for the model. Okay, and we can see there's actually quite a few in there. I'm not sure how many exactly, but there are a few. So what we can do is actually define our own tools just like these.

Okay, and then we just add them to the agent toolbox and then the agent can actually use that tool. And naturally, being able to build our own tools for these agents to use makes what these agents can do in scope much broader. We can kind of anything we program, we can almost do with an agent, which is pretty cool.

And obviously, for building tools or use cases with these agents, it's something that I think the vast majority of use cases are probably going to need. So what I want to do is just show you how to build really simple tools. I mean, nothing complicated, but it just kind of shows the format or the structure of what a tool must be.

So for that, we have this meaning of life tool. You can see here, we have this task, we have a description, and we have a similar but not exactly the same format here. So in this case, we actually have a name and these, if I... Okay, so the name is actually this here.

So it's a key within that dictionary. So they do still have that name, it's just not within the pretool object here. So we have a name and then we have the description, just like what we see here. And this description, like I mentioned before, it's for the large language model.

It's not for us to understand, although if we can understand what this tool does, it's probably a good indication that the large language model should understand as well. But when we're writing these descriptions, the most important thing to understand or to consider is that it needs to be really concise and very specific on what the tool does, right?

Just very simple language, make it very clear. Okay, so we have our description and then we also want to specify inputs and outputs of the tool. So the input format is just some text and the output format is actually just some text as well. So we specify that and then we have the call method here.

So every tool, when the agent refers to that tool for help, this is what it's going to be called. Okay, so in here, you would write some code, usually to process whatever it is you're doing here, right? In this case, we're just doing something really simple. We're going to return the string 42.

Okay, so whenever the user asks something like what is the meaning of life or some other broad unanswerable question, we're going to return 42. And after we've initialized that tool, what we're going to do is reinitialize our agent with this meaning of life tool. Okay, so we have these additional tools and we just pass in that meaning of life tool.

So let's run that. Actually, did I run this? Okay, run this first and then run this. Cool, and then we can say, okay, what is the meaning of life? And we can see this explanation from the agent. So it explains it's going to use this meaning of life tool to find the answer to the question.

The code that it generates is this. So it goes to the meaning of life tool and it passes in this query, what is the meaning of life? And it then prints out the answer. Okay, so the answer is 42. Okay, perfect. Now, one other thing that we should just kind of cover here is that right now there are a lot of tools that are attached to our agent.

Okay, so if we just print all those out, we have all of these pre-tools. So it's 14 pre-tools in total. And then we have our meaning of life tool at the end. Now, in some use cases, maybe you do want all of these pre-tools, but I think in most we would probably want to define which tools are open to be used by the model, right?

Because chatbots tend to work better if you restrict their scope. And in order for the agent to use these tools, all of these tools and their descriptions are passed into every prompt we send to the LLM. And if we, I mean, there's a lot of texting, right? All of these descriptions are being passed to the LLM.

That's a lot of extra tokens, which is going to slow down the processing or the response time for our LLM. And it can reduce the quality of what it outputs because if you put in more text, LLMs can struggle to follow the initial instructions that you've given them. And it's also going to cost more money because there's more tokens that you have to pay for here.

So for those reasons, it's a good idea to limit the number of tools that we have available to our agent. And we can do that. Okay, we can see the agent toolbox here again. We can do that by just going through here, identifying which of these tools are pre-tools and just removing them from the toolbox.

So let's do that. So I'm just going to initialize this delete list. We're going to go through each tool in the toolbox and we're just going to test if it is a pre-tool. If it is a pre-tool, we add its name to the delete list. And then after that, we're just going to go through that delete list and just delete them from the toolbox.

Okay, so we can run that and then this is our toolbox now. It's just got one item in there, right? So we've just cleaned up that toolbox and that will just help our agent focus on the tools that we actually need rather than all these other tools that we don't need in most cases.

So that's it for this walkthrough. I just wanted to show you a little bit of how we can use those custom tools and also control or clean up the toolbox within our agent for Hugging Face. Naturally, as I mentioned, these agents, what they can do is massively expanded in scope when we start building our own custom agents.

And as I said, like if you are actually building projects with these, I think almost all the time, you're going to want these custom agents unless you manage to find agents out there that have already been built for you to use. Of which for Hugging Face agents as a very new framework, there are very few.

So yeah, that's it for this video. I hope this has been useful and interesting. So thank you very much for watching and I will see you again in the next one. Bye. (gentle music) (gentle music) (gentle music)