back to indexOpenAI Agents SDK Handoffs | Deep Dive Tutorial

Chapters
0:0 Agents SDK Handoff
5:6 Code Start
7:12 Web Search Agent
10:14 RAG Agent
11:18 Code Execution Agent
11:46 Defining the Orchestrator
12:36 Agents SDK Handoffs
20:27 Using OpenAI Traces Dashboard
23:26 More Handoff Testing
26:20 Other Handoff Features
28:45 Agents SDK on_handoff
29:51 Agents SDK Handoff input_type
31:28 Agents SDK Handoff input_filter
00:00:00.000 |
Today, we're going to be taking a look at OpenAI's Agents SDK and how we can use agent 00:00:05.940 |
handoffs within the framework. Now, this is the second video on multi-agent systems in Agents 00:00:13.840 |
SDK. The first one, we were looking at more of a orchestrate sub-agent pattern and how we achieve 00:00:20.400 |
that with the Agents as Tools method, but we didn't cover the handoffs and the handoffs are, 00:00:26.720 |
well, they're different to the as Tools method. And the best way of thinking about these two 00:00:34.880 |
different methods and understanding them is the as Tools method or the orchestrate sub-agent pattern. 00:00:40.580 |
When you use that, you always have one of your agents as the orchestrator, as the controller of 00:00:47.520 |
the entire workflow. It's that orchestrator that is receiving and sending messages to your user. 00:00:54.000 |
It's that orchestrator that is deciding which sub-agent to go ahead and use. And those sub-agents 00:01:01.000 |
are always going to respond back to that orchestrator. So, the orchestrator is always in control. 00:01:06.420 |
Whereas, with handoffs, it is slightly different. I mean, you kind of have it in the name. It's a 00:01:11.820 |
handoff. So, let's say we still had an orchestrator, although that wouldn't be a very good name in this 00:01:18.340 |
scenario. So, let's just call our main agent the orchestrator. In the handoff setup, if we maintain 00:01:25.220 |
three sub-agents, the orchestrator would be handing off full control to the sub-agent. So, 00:01:32.900 |
the sub-agent may be able to go back to the orchestrator if it likes. It depends on how you set 00:01:39.340 |
things up. But in many cases, it will probably go directly back to the user. So, the sub-agents 00:01:46.340 |
in this scenario can actually respond directly to the user. Now, both of these approaches have their 00:01:53.980 |
pros and cons. The orchestrator pattern, you can use this orchestrator agent to have very fine-grained 00:02:04.780 |
control over what is happening within the workflow. You can call various agents, also in parallel if you 00:02:10.860 |
need, and prompt each one of those agents to specialize in whatever it is they are doing. 00:02:17.820 |
With the handoff, you can still do that. You can still specialize each of these agents to be 00:02:23.820 |
good at specific tasks. But they also need to be capable of answering correctly to a user. And 00:02:32.620 |
generally, because of that, they need to have a better understanding of the overall context of the 00:02:39.520 |
system. And one of the biggest differences, one of the biggest pros or cons, depending on which way 00:02:47.600 |
you're looking at it, the orchestrator system is generally going to use more tokens, and it's going 00:02:52.320 |
to be slower, because everything is going through our orchestrator. So, when a user asks for some 00:02:58.480 |
information from the web, the orchestrator is going to be receiving those tokens. The orchestrator is then 00:03:04.000 |
going to be generating tokens to say, "Okay, use the web search sub-agent." The web sub-agent is going to be 00:03:09.280 |
generating tokens to use its internal web search tool, then it's going to be generating tokens to 00:03:13.840 |
return the answer back to the orchestrator. And the orchestrator is going to be generating tokens 00:03:18.400 |
to respond to the user. This is a pretty inefficient approach. There are just a lot of tokens being 00:03:24.720 |
created. It's expensive. And that can benefit you where you really need an accurate system that can do 00:03:33.600 |
many things. But if it is, like I just explained with the web search sample, you just need to use the 00:03:39.760 |
web search, it's super inefficient. The handoff approach, because you're handing off to that web 00:03:48.000 |
search agent, the web search agent can generate a response early to the user. So, you are using one 00:03:56.320 |
less generation step with that, which it makes a big difference. So, which of these approaches 00:04:03.440 |
you're going to go for? It's kind of up to you. Handoffs are very useful though. So, to get started, 00:04:09.360 |
we are going through this article here, which is multi-agent systems and agents SDK. And in this 00:04:16.080 |
article we covered the orchestrator sub-agent pattern to begin with. That's the video from before. 00:04:21.520 |
And the handoff part of this article actually comes a little bit later. So, click over here 00:04:28.880 |
and we get to the handoff part. So, this handoff section is what we're going to be walking through 00:04:34.880 |
in this video. Now, this all comes with code. So, the code we will find over here. So, this Aurelio 00:04:42.880 |
CoLab's Agents SDK course. This is part of a broader course on the Agents SDK. And we want to go to the 00:04:49.040 |
multi-agent notebook. This is what we're going to be running through. You can open it in CoLab. That's 00:04:54.320 |
the way that I recommend running this. You can also run this locally, which is actually what I am doing. 00:05:00.000 |
But again, I would recommend running CoLab. It's just easier and simpler. So, if you are running in CoLab, 00:05:08.000 |
you will want to run this cell here. This will install everything you need to run through the 00:05:13.200 |
code. If you are running it locally, we have setup instructions over in the README, which will explain 00:05:20.240 |
how you get everything set up. We use the UMV package manager. And it's pretty simple. But again, 00:05:27.040 |
not as simple as just using CoLab. So, it's up to you. Now, let's get started. The first thing we do want 00:05:33.280 |
to do here is set up our OpenAI API key. I will note that there are many features in OpenAI's Agents SDK that 00:05:43.200 |
you can use with other LLM providers. But handoffs are not one of those. So, handoffs, from what I have 00:05:51.200 |
seen so far, do only work with OpenAI LLMs. So, we will need an OpenAI API key. So, we'll run this. And you 00:05:59.760 |
get this from the OpenAI platform, which is platform.openai.com. Now, we will be going through some 00:06:06.560 |
of the orchestrator sub-agent code just to sell our agents in particular because we're using those 00:06:12.240 |
both in the orchestrator sub-agent part of this and also in the handoffs section as well. So, 00:06:18.240 |
we'll go and initialize those. But the actual agents themselves are basically the same. We're 00:06:24.560 |
going to prompt the orchestrator a little bit differently. But otherwise, they are the same. So, 00:06:30.320 |
let's go ahead and just initialize all those. I have spoken more about them in the orchestrator 00:06:35.760 |
sub-agent video. So, if you really do want to cover multi-agent systems in Agents SDK, I would 00:06:42.320 |
recommend watching that as well. Whether that's before or after this, that's completely up to you. 00:06:47.280 |
I will explain the bare minimum to understand everything here. So, we are first going to initialize 00:06:53.760 |
this web search sub-agent, okay? Just to be clear, this is not the architecture.webbuilding. The 00:07:00.960 |
architecture.webbuilding is shown in this later graph here, which is the main agent. It's basically 00:07:07.280 |
the same book using handoffs, okay? So, we're going to go ahead and initialize those. So, the web search 00:07:13.600 |
sub-agent is using this LinkUp API. LinkUp is a search provider like Perplexity, like EXA. So, if you've 00:07:23.280 |
used either of those services, it's a similar sort of thing. But in general, they provide really good 00:07:29.920 |
search results. So, I do really like using these guys. So, we will need to set up our LinkUp API key. 00:07:39.120 |
So, to do that, you need to click over here. We have this LinkUp reference and we just need to 00:07:46.000 |
obviously sign up if you don't already have an account. If you're following the last video, 00:07:50.400 |
you will already have one. So, you will sign in and you should find that you'll have some free credits. 00:07:56.080 |
So, it will probably last you a while, I think. So, you need to copy your API key and we'll come 00:08:04.000 |
back over to here and I just want to run this, okay? And then to my API. Great. So, we have that. Now, 00:08:10.560 |
once we have that, we'll want to perform a search. I do generally say, you know, you should always use 00:08:18.000 |
async because if you're building AI agents into an application, like AI is using a lot of API calls 00:08:25.520 |
in general. If you're going fully local, then it's different, of course. But a lot of the time, you're 00:08:31.680 |
using a lot of API calls. And API calls, if you're writing synchronous code, your Python instance is 00:08:40.160 |
just going to be waiting for a response. So, your code is sending the API request and it's just doing 00:08:47.840 |
nothing whilst it waits for it to return. And that's super inefficient because that's a lot of time, 00:08:53.680 |
especially when you think about how much code could be executed in the time it takes for an LLM to respond 00:08:59.280 |
to you, which is, you know, it's like two or so seconds. You could be doing a lot in that time. 00:09:03.840 |
So, write everything asynchronously and whilst you're waiting for that response from an API, 00:09:10.240 |
your Python code can go and it can be doing other things within that time. So, yeah, 00:09:18.720 |
AI especially is one of those fields where writing async code is generally very, very useful. 00:09:28.320 |
So, yeah, we use the async search because of that. Great. We get our search results there. We pass 00:09:35.520 |
those out and you can kind of see here like this, you know, this is telling me, did I search for world 00:09:42.000 |
news, right? Search for latest world news. And this is everything I'm getting there. Then I create this 00:09:48.400 |
search web tool. This is what my search web agent is going to be using. Okay. You see here, this is the 00:09:55.760 |
prompting and everything for that web search agent. And yeah, we have that. So, we can also confirm, 00:10:02.960 |
does that work? I'm going to ask you, how's the weather in Tokyo? It will tell us. So, let's see that. 00:10:09.200 |
Okay. Very nice weather. And that is, yeah, all seems to work. That's great. Now, we move on to the next 00:10:16.320 |
sub-agent, which is like a dummy rag agent. So, we're pretending to have these internal docs, 00:10:22.480 |
of which you can read those docs here. Basically, we have our AI robotics startup, which is Skynet, 00:10:30.400 |
talking about, I think it's like a T1000 or something robots. I can't quite remember. It was a 00:10:38.560 |
while ago when I last read it. But it's talking about how Skynet is doing. And you can see here, 00:10:46.160 |
right? So, it's basically all this information here, right? And we are essentially creating this dummy 00:10:55.200 |
rag search here. So, this query that the LLM is passing to our dummy search tool doesn't actually 00:11:02.160 |
get used. But that's fine because we're not, you know, we're focusing really on that agentic 00:11:09.280 |
architecture here, not necessarily on a specific rag tool, which takes a little bit of setup. 00:11:16.480 |
So, yeah, we have that. Then we have our code execution sub-agent. Code execution sub-agent 00:11:22.640 |
is just going to execute some code that our LLM generates. Again, yeah, it's the same sort of thing, 00:11:29.760 |
right? So, we have our code execution tool, code execution agent. Then we provide that agent with 00:11:37.440 |
specific prompting based on what we need it to do. And yeah, we can see everything it's doing here 00:11:43.520 |
and what that comes out to. Cool. Then we have the orchestrator. I don't think we necessarily 00:11:50.560 |
need to define everything here, but we'll just run through just in case. I think we might need 00:11:56.160 |
this get current date tool. This is a tool of itself. This is not a sub-agent or anything like 00:12:02.240 |
that. We just also define this get current date tool just to see, okay, our architecture here can 00:12:09.280 |
use both agents as tools and just tools. And then when we move on in a moment to the handoffs, 00:12:16.640 |
we'll see that we have both agents that can hand off to other agents and our agent that can just use 00:12:23.280 |
a tool or do both, right? So, the agent can use a tool then hand off to another agent, for example. 00:12:28.080 |
Okay. So, we have that. I'm not going to go and run the orchestrator stuff. That's what we're building here. 00:12:34.320 |
Instead, we're going to move on to the handoff parts. We have all of these components basically 00:12:40.000 |
ready. We're going to redefine the main agent here, but the website, sub-agent, term.sub-agent, 00:12:44.800 |
and codecluson.sub-agent, they are all ready. The main difference, as I mentioned near the start of the 00:12:51.760 |
video, is that each of these sub-agents, as you can see here, each of these sub-agents can go ahead 00:12:57.360 |
and respond to the human user directly. Okay. That's one of the main differences here. 00:13:03.040 |
So, as I mentioned, one of the biggest positives there is latency, right? Latency and cost as well. So, 00:13:11.680 |
we are removing that additional step to go back to the orchestrator or main agent to get back to the 00:13:19.680 |
user, right? Which is not as direct as it could be. So, let's go ahead and implement that handoff. 00:13:28.080 |
So, with the handoffs, OpenAI within the SDK, they have provided this default prompt that you can add to 00:13:38.240 |
your orchestrator or main agent prompt or instructions that essentially just describes 00:13:45.680 |
what handoffs are and that sort of broader system context for the LLM. And I find this quite useful 00:13:54.880 |
because it's your LLM, it has no idea where it is. Okay. And what that means is that 00:14:05.360 |
if it doesn't know where it is, it doesn't know the context of, you know, within what system it's 00:14:09.120 |
being used, it might do things that we wouldn't necessarily want it to do, right? It might misuse 00:14:16.080 |
that system and just understand how to use the system. And of course, we don't want that. We want it. 00:14:21.120 |
If we explain to the LLM what the multi-agent architecture around it looks like, 00:14:29.040 |
generally speaking, the LLM is going to know, is going to be able to better use its tools, its other 00:14:38.160 |
agents, and understand its own place within that system. And we'll generally get better results from 00:14:45.200 |
that. So, this is really useful. They have this recommended prompt prefix here for handoffs, this 00:14:52.400 |
handoff prompt, recommended prompt prefix. And we can read it here. Okay. So "handoffs are achieved by 00:14:58.000 |
calling a handoff function, generally named transfer to agent name". Okay. So, internally, what is 00:15:03.680 |
happening here is when we have these handoffs here, each one of these is going to be renamed internally 00:15:12.320 |
to something like transfer to, and then I believe it would be web search agent or internal docs agent. 00:15:19.440 |
And all those are going to be presented to this main agent as a thing, like essentially almost 00:15:24.400 |
tools that it can call. So, transfers between agents are handled seamlessly in the background. Do not 00:15:30.000 |
mention or draw attention to these transfers in your conversation with the user. I think that bit is 00:15:33.920 |
important because it's very easy for a LLM within the multi-agent system, or even, you know, when it's 00:15:41.440 |
using tools to talk directly to the user about what it's doing. And in some cases, you might want to do this. 00:15:48.400 |
So, you know, if it's like a research agent and you might want the LLM to say, "Okay, 00:15:54.720 |
I'm just going to refer to this tool to look up these particular bits of information. And hey, look, 00:16:02.080 |
this is what I got from this tool, this information here". So, you might want some level of detail in 00:16:08.960 |
there, but most likely you don't want too much. Like, you don't want your LLM to be saying, "Hey, I'm going to use the 00:16:27.680 |
You don't want all of that specific information. Instead, you're probably going to want to, 00:16:33.840 |
maybe through streaming or some other interface, 00:16:37.680 |
potentially show the user, "Oh, we're using these various tools or we're using a tool". 00:16:43.440 |
And you're probably going to want to include sources in particular, which we do do. We've 00:16:51.120 |
prompted all of our sub-agents to provide markdown citations. So, all that does get returned to us, 00:16:56.880 |
but we don't want it to just return too much information. Okay? So, yeah, we have this, we have our recommended prompt prefix. 00:17:05.200 |
You, depending on what you're doing here, you might actually want to, 00:17:08.240 |
well, use this as a prefix rather than just the prompt. But in this case, we don't necessarily need to. 00:17:14.000 |
So, if you are going to use it as a prefix, which I think most probably would be doing, you're going 00:17:18.800 |
to be adding more text like it kind of has on here. So, yeah, it would be something like this in this 00:17:26.800 |
scenario, right? Because it's like an internal company agent that we're building. So, you would 00:17:33.520 |
use this as a prefix and then add more use case specific information following that prefix. 00:17:39.760 |
So, then we have our handoffs. So, handoffs are easy, right? So, we've defined our agents already, 00:17:45.360 |
they're standard agents, and then we pass them as a list into the handoffs parameter of our agent 00:17:51.200 |
definition. Then we also want to provide handoff description. So, this is just, okay, in general, 00:17:59.040 |
how should I use these handoffs? You can get more specific in this as well if you, 00:18:02.480 |
depends on how your agent is behaving. If you find that it's not using the 00:18:06.320 |
tools correctly in various cases, you might want to say, oh, you need to use this handoff 00:18:14.400 |
in this particular scenario, right? You can get more specific. It just depends on what you're going for. 00:18:20.480 |
And based on your testing of the agents, where is it lacking ability and where is it performing well, 00:18:31.200 |
right? You would obviously just iterate on your instructions and your handoff description based 00:18:37.040 |
on that. And the final thing is that we do also include a tool. So, this tool is just for the main 00:18:42.880 |
agent. Okay. Cool. So, we have that. We run it and we go on and just say, okay, how long ago from today 00:18:52.240 |
was it when we got our last red and report? Okay. So, ideally, what is going to happen here is the main 00:18:59.040 |
agent is going to go and use the get current date tool. It's going to receive that information and then 00:19:05.680 |
all of that then it's going to hand off to the, I would say, the internal docs sub-agent. And the 00:19:13.120 |
internal docs sub-agent will see, okay, the previous tool call for the get current time. It will see the 00:19:17.920 |
query from the user and it will be able to go and check its own internal docs tool, get the revenue report, 00:19:28.000 |
and return all of that information to the user. So, I think the revenue report in the internal docs 00:19:34.720 |
that we have, the date on that is April 2nd, 2025. So, let's see what happens there. When we're comparing 00:19:45.200 |
the time between the orchestrated sub-agent, I was getting like 7.5 seconds for this query 00:19:50.240 |
with the orchestrated sub-agent pattern. So, that's without handoffs. And then with the handoffs, 00:19:54.080 |
I was getting there 6.4 seconds. There may also be network latency involved there. And we can try 00:20:00.160 |
running it again, although we might run into some caching or prompt caching there. So, let's just see 00:20:07.040 |
what we get. So, interestingly, I'm getting very long response time to say, which is not normal. 00:20:12.080 |
This is, I assume it's my network latency or something wrong with OpenAI. Okay. Well, we see it's still 00:20:18.720 |
running. That was an insanely long time. The answer is correct, right? But that time is a bit crazy. 00:20:27.440 |
But we can have a look. So, let's just take a look at the tracing dashboard, because we also get tracing 00:20:34.560 |
by default with Agents SDK, which is really very useful. So, let's talk a little bit about this in the 00:20:40.160 |
of the previous videos. But I'm going to go to the dashboard here. I'm going to go to traces and we'll 00:20:47.600 |
just see, okay, why did that take so long? And it does seem like it was actually on the OpenAI side, 00:20:53.200 |
which is pretty wild. Wow. So, when we look at this, we can see, well, what took such a long time. 00:21:04.720 |
So, our main agent step here was just 3.8 seconds. Okay. And that was, you know, it was just, it was 00:21:12.720 |
normal, right? So, we got the get, we decided, okay, I'm going to use the get current date tool. It did. 00:21:18.960 |
It got that and then it decided, okay, I'm going to use the transfer to internal docs agents. And it did that. 00:21:26.560 |
So, there was a, it created that handoff. And that's what we see here. Then we came to this internal docs 00:21:31.680 |
agent and took an incredibly long time. The internal docs agent decided first to use the search internal 00:21:39.440 |
docs tool. And that was very quick. So, we had this, it took 1.5 seconds to generate that. The docs tool, 00:21:47.280 |
it's not really doing anything. So, it was really fast to respond. Then, this response here took 00:21:55.840 |
a incredibly, incredibly long time. And looking at this, there's nothing in here that, in my opinion, 00:22:04.320 |
there's nothing in here that should have taken a very long time. All it did was generate this. 00:22:10.640 |
So, it read all this information here, the LM, and we're using GPT 4.1 mini, should be very fast. And 00:22:18.560 |
this is a response we got. Okay. So, there's nothing crazy in here. Like, I could have assumed if this was 00:22:25.360 |
like a basically an essay, but it's not. It's 55 tokens output and 615 tokens input, which is not that, 00:22:34.720 |
there's nothing significant. So, to me, this time here is a bit of an outlier and very likely 00:22:44.240 |
from OpenAI. And if I run it again, it's 11.2 seconds. Not as good as what I was seeing before, 00:22:49.840 |
but at least a lot more reasonable than what we saw. So, yeah, the correct information has been 00:22:57.040 |
returned is going through that. So, okay. That seems to be working other than the outliers in latency there. 00:23:05.920 |
Just to confirm, we can take a look at that last run and just check, okay, what actually happened 00:23:12.640 |
there. And we can see exactly the same thing. Just that final post, that final LM generation 00:23:19.440 |
was not like insanely long, which is what we see there. Great. So, we have that. And now we have our 00:23:27.200 |
next question, which is what is our current revenue and what century of revenue comes from T1000 units? 00:23:32.080 |
Now, what we would typically see here, or what we would need to be aware of is in this question, 00:23:38.080 |
we're kind of asking for two sub-agents to be used. We have what is our current revenue, 00:23:43.040 |
which is the internal docs sub-agent. And then we also have what percentage of revenue comes from 00:23:47.440 |
the T1000 units, which is ideally the code execution agent, you know, just to be careful in how we're 00:23:54.400 |
computing things. Although what we'll find is that it will, the internal docs sub-agent will likely just 00:24:00.080 |
calculate it itself, because it's not too difficult of a calculation, but particularly for those more 00:24:05.280 |
complicated calculations, we'd ideally want to hand off to the code execution sub-agent. And to be honest, 00:24:11.360 |
even ideally for simple calculations, because LLMs can fairly easy get calculations wrong, 00:24:18.080 |
although they have gotten much better at it recently, it's still something that they're going to 00:24:24.960 |
hallucinate every now and again. And we want to try and avoid that as much as possible, 00:24:28.880 |
which is much easier to do when we have another agent writing and executing code. Because if it 00:24:35.360 |
writes that code wrong, it's not going to run. It's going to be told, "Hey, you need to write this code 00:24:42.160 |
correctly." And it will usually basically fix itself and realize it's the error and resolve the issue. So 00:24:49.760 |
it generally, it can be much safer to do that. 00:24:56.080 |
So let's see how long this one takes. So we've got 7.7 seconds. This is more aligned with what I was 00:25:04.400 |
seeing before, actually. So we have all of this current revenue, and this is the correct percentage from 00:25:13.920 |
the T1000 units. So yeah, this is accurate. Although if we have a look at the agent workflow, 00:25:24.880 |
the most recent one here, we will see that, okay, use the internal docs. It seems to use them, I think it 00:25:34.320 |
called parallel. Yeah, it tried parallel tool calls here. It was trying to search your current revenue 00:25:40.480 |
and revenue percentage from the T1000 units because the internal docs agent thinks that it's using 00:25:46.560 |
like an actual WAG tool that has access to many documents. It doesn't, it just has access to the 00:25:50.960 |
one, but it's going to try and use that as if it has access to many documents. And of course, the response 00:25:57.680 |
from both of those is the same. So we've got that response and it uses that to generate the answer, 00:26:02.880 |
which is what we have here. So yeah, all of that happened as we would expect it to happen. 00:26:10.400 |
So nothing surprising that. Great. So that's a handoff, so high level. What I do want to cover 00:26:18.080 |
is a few other handoff features. And these are not necessarily things that we're going to be using 00:26:23.760 |
in production, maybe in some cases, not most of the time. Instead, I think most of these features 00:26:29.680 |
are very useful for development and debugging and just understanding what is going on. So we have 00:26:35.760 |
three things on take you through. We have on handoff, which is a callback that gets executed whenever we 00:26:42.480 |
hand off to a sub agent. And in a production scenario, you probably use this to write, you know, 00:26:51.840 |
like the fact that a handoff happened, like some handoff event log to a database or to your 00:26:59.120 |
telemetry provider, you know, whatever you're doing, that's probably where you would use this 00:27:04.960 |
callback. And in development, this can just be a very good place to put like a print statement or a 00:27:13.440 |
log, a debug or whatever else, just to see kind of what is okay when a handoff is happening and 00:27:21.280 |
whatever information you need within that handoff as well. So that can be really useful for that. 00:27:26.800 |
And I'll take you through that you're using that in a moment. We then also have input type. So input type 00:27:32.560 |
allows us to define a specific structured format to be used by our agent in the handoff. So to pass 00:27:40.720 |
particular information to either the sub agent or actually through the callback. So those can obviously 00:27:48.240 |
be used in tandem. So you can, you'll get some structured information that you then print or 00:27:55.600 |
saw in your telemetry somewhere based on whatever you put into this input type. So that can be really 00:28:00.880 |
useful. And then we also have input filter. I expect OpenAI are going to add more to this feature in the 00:28:06.560 |
future. The way that OpenAI describe input filter is like you can use it to filter various things that are 00:28:13.440 |
going into your agent that you're handing off to. So the way that they phrase it makes it seem like you 00:28:19.520 |
can filter. Like maybe you just want to filter your user message or assistant message or you want to 00:28:25.840 |
filter the tools that the LM downstream might see. But right now, the only thing that you can filter 00:28:34.400 |
is all tool messages for in the conversation so far. So that, that is what it does. I will see in a moment 00:28:41.680 |
an example of that and it will probably make a bit more sense. Now, all of these are set via this handoff 00:28:48.560 |
object. Okay. So if we start with that on handoff function, so this is the callback that gets called 00:28:56.240 |
whenever a handoff happens. So in here, we can, we're just going to print handoff called to the, 00:29:01.760 |
to the console. And then what we do is we wrap our agent that we're going to be handing off to 00:29:09.440 |
inside this handoff object. And we also pair that with the on handoff function that we defined here. 00:29:16.560 |
And then these become the handoff agents that we create. We define those and yeah, let's run that. 00:29:26.160 |
And then we're just going to run this. Okay. So what we should see is when the handoff occurs, 00:29:31.120 |
we should see that we should see handoff called below our cell that we're running. So let's run that. 00:29:36.800 |
Okay. So we can see the print has occurred there and then we get this. Okay. So it's the same as 00:29:45.040 |
before. Great. So we have that. Let's add a little more to this. So one thing with this information 00:29:53.920 |
here is that we don't actually get much information being passed to the callback handler. So what I'm 00:30:00.880 |
going to do is I'm going to add a little more information. What I'm going to do is say, okay, 00:30:05.600 |
I know the handoff is happening, but why is it happening? And where is it being handed off to? 00:30:11.360 |
Okay. So I'm defining this pedantic base model. I'm saying, I want the sub-agent name. We set as field 00:30:18.800 |
and we say, okay, this is the name of sub-agent that is being called. Then the reason, like, 00:30:23.840 |
why is this sub-agent being called? And the main agent is going to have to generate this 00:30:30.240 |
for that handoff. Okay. So we're going to be able to see exactly why things are happening according to 00:30:37.680 |
the, according to the main agent. Okay. And then we're just going to print it out so we can, we can read 00:30:42.080 |
that. So we do that. We use on handoff again, which is the same as before, but we've added this handoff 00:30:50.880 |
info via the input data there. Then we also add input type. Okay. So you can see how both of these 00:30:59.440 |
together, it can be really helpful for debugging things. So I'm going to go ahead and initialize 00:31:05.360 |
that and just run and let's see what happens. Okay. So we see that we are handing off to the internal 00:31:15.120 |
docs agent because to determine the date of the most recent revenue report for the user, which is, 00:31:20.800 |
okay. That's what we would expect. And then we get the correct answer again. Okay. So that is really, 00:31:27.120 |
really helpful. Now with the handoffs, all of our chat history is being passed to the sub agents. 00:31:37.840 |
And in some cases, we might actually not want that to happen. In many cases, we probably do want that to 00:31:43.920 |
be the case. Generally speaking, I think LMs are going to perform at their best with maximal context and 00:31:53.120 |
information to a certain level. You probably like, for example, frag, I don't think there's any point 00:32:01.200 |
in just sending everything to an agent. It's better to filter that down. But when it comes to chat history, 00:32:08.000 |
they saw and tool calls that have been made. I think it is generally best to keep all that information 00:32:13.280 |
there and available to the LMs, but maybe in some cases, you actually might want to filter that stuff out. 00:32:18.880 |
So we can do that with the handoff filters. And we, at least for now, the only thing that we can 00:32:25.280 |
filter is all the tool call messages. Okay. So I'm going to add that. So we do that with input filter 00:32:32.560 |
and we're adding handoff filters, remove all tools. This is going to remove all the tool messages. 00:32:38.160 |
Okay. Now the only tool that can be called here is the get current date tool. And we've been using 00:32:46.000 |
that to answer all the questions accurately. So we're going to actually see now that this, 00:32:50.800 |
our workflow won't be able to answer this question because we're going to be filtering out that 00:32:58.160 |
information. So that tool is going to be called by the main agent, but then the fact that that tool was 00:33:04.320 |
called, that is not going to make it to the sub-agents that we hand off to. So let's run this and we'll see 00:33:11.760 |
what happens. So we have the hand off to internal documentation to find the day of the most recent 00:33:17.120 |
revenue report. And yeah, so now it's telling me that today is April 27th, 2025, which it is, 00:33:23.200 |
it's not, it's, we're in May. So it's a little while, it's a little bit off. So yeah, it's incorrect, 00:33:30.400 |
right? And that's because we're using that filter. Now there may be cases where you do want to filter out 00:33:36.400 |
those things. It just depends on your use case and what you're building. So that is actually it. That's 00:33:42.000 |
all I wanted to cover. We've, I think, really dived into what handoffs are, where we might want to use 00:33:47.600 |
them and also where we might not want to use them and the various tools and features that I have also 00:33:55.760 |
included for handoffs. I think in general, handoffs are a really good concept for building multi-agent 00:34:04.080 |
systems. And I think that's obvious from what we've seen here as well. But of course, there are cases 00:34:08.960 |
where we might want to go with that, like orchestrate a sub-agent pattern or something else, or maybe a mix of 00:34:13.520 |
both. It really depends on what you're building, but it's very good to just be aware of all these different 00:34:19.040 |
approaches that you can take when building these multi-agent systems. But yeah, that's all I wanted 00:34:24.000 |
to cover. So I hope all this has been useful and interesting. Thank you very much for watching, 00:34:30.000 |
and I will see you again in the next one. Bye.