back to index

OpenAI Agents SDK Handoffs | Deep Dive Tutorial


Chapters

0:0 Agents SDK Handoff
5:6 Code Start
7:12 Web Search Agent
10:14 RAG Agent
11:18 Code Execution Agent
11:46 Defining the Orchestrator
12:36 Agents SDK Handoffs
20:27 Using OpenAI Traces Dashboard
23:26 More Handoff Testing
26:20 Other Handoff Features
28:45 Agents SDK on_handoff
29:51 Agents SDK Handoff input_type
31:28 Agents SDK Handoff input_filter

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to be taking a look at OpenAI's Agents SDK and how we can use agent
00:00:05.940 | handoffs within the framework. Now, this is the second video on multi-agent systems in Agents
00:00:13.840 | SDK. The first one, we were looking at more of a orchestrate sub-agent pattern and how we achieve
00:00:20.400 | that with the Agents as Tools method, but we didn't cover the handoffs and the handoffs are,
00:00:26.720 | well, they're different to the as Tools method. And the best way of thinking about these two
00:00:34.880 | different methods and understanding them is the as Tools method or the orchestrate sub-agent pattern.
00:00:40.580 | When you use that, you always have one of your agents as the orchestrator, as the controller of
00:00:47.520 | the entire workflow. It's that orchestrator that is receiving and sending messages to your user.
00:00:54.000 | It's that orchestrator that is deciding which sub-agent to go ahead and use. And those sub-agents
00:01:01.000 | are always going to respond back to that orchestrator. So, the orchestrator is always in control.
00:01:06.420 | Whereas, with handoffs, it is slightly different. I mean, you kind of have it in the name. It's a
00:01:11.820 | handoff. So, let's say we still had an orchestrator, although that wouldn't be a very good name in this
00:01:18.340 | scenario. So, let's just call our main agent the orchestrator. In the handoff setup, if we maintain
00:01:25.220 | three sub-agents, the orchestrator would be handing off full control to the sub-agent. So,
00:01:32.900 | the sub-agent may be able to go back to the orchestrator if it likes. It depends on how you set
00:01:39.340 | things up. But in many cases, it will probably go directly back to the user. So, the sub-agents
00:01:46.340 | in this scenario can actually respond directly to the user. Now, both of these approaches have their
00:01:53.980 | pros and cons. The orchestrator pattern, you can use this orchestrator agent to have very fine-grained
00:02:04.780 | control over what is happening within the workflow. You can call various agents, also in parallel if you
00:02:10.860 | need, and prompt each one of those agents to specialize in whatever it is they are doing.
00:02:17.820 | With the handoff, you can still do that. You can still specialize each of these agents to be
00:02:23.820 | good at specific tasks. But they also need to be capable of answering correctly to a user. And
00:02:32.620 | generally, because of that, they need to have a better understanding of the overall context of the
00:02:39.520 | system. And one of the biggest differences, one of the biggest pros or cons, depending on which way
00:02:47.600 | you're looking at it, the orchestrator system is generally going to use more tokens, and it's going
00:02:52.320 | to be slower, because everything is going through our orchestrator. So, when a user asks for some
00:02:58.480 | information from the web, the orchestrator is going to be receiving those tokens. The orchestrator is then
00:03:04.000 | going to be generating tokens to say, "Okay, use the web search sub-agent." The web sub-agent is going to be
00:03:09.280 | generating tokens to use its internal web search tool, then it's going to be generating tokens to
00:03:13.840 | return the answer back to the orchestrator. And the orchestrator is going to be generating tokens
00:03:18.400 | to respond to the user. This is a pretty inefficient approach. There are just a lot of tokens being
00:03:24.720 | created. It's expensive. And that can benefit you where you really need an accurate system that can do
00:03:33.600 | many things. But if it is, like I just explained with the web search sample, you just need to use the
00:03:39.760 | web search, it's super inefficient. The handoff approach, because you're handing off to that web
00:03:48.000 | search agent, the web search agent can generate a response early to the user. So, you are using one
00:03:56.320 | less generation step with that, which it makes a big difference. So, which of these approaches
00:04:03.440 | you're going to go for? It's kind of up to you. Handoffs are very useful though. So, to get started,
00:04:09.360 | we are going through this article here, which is multi-agent systems and agents SDK. And in this
00:04:16.080 | article we covered the orchestrator sub-agent pattern to begin with. That's the video from before.
00:04:21.520 | And the handoff part of this article actually comes a little bit later. So, click over here
00:04:28.880 | and we get to the handoff part. So, this handoff section is what we're going to be walking through
00:04:34.880 | in this video. Now, this all comes with code. So, the code we will find over here. So, this Aurelio
00:04:42.880 | CoLab's Agents SDK course. This is part of a broader course on the Agents SDK. And we want to go to the
00:04:49.040 | multi-agent notebook. This is what we're going to be running through. You can open it in CoLab. That's
00:04:54.320 | the way that I recommend running this. You can also run this locally, which is actually what I am doing.
00:05:00.000 | But again, I would recommend running CoLab. It's just easier and simpler. So, if you are running in CoLab,
00:05:08.000 | you will want to run this cell here. This will install everything you need to run through the
00:05:13.200 | code. If you are running it locally, we have setup instructions over in the README, which will explain
00:05:20.240 | how you get everything set up. We use the UMV package manager. And it's pretty simple. But again,
00:05:27.040 | not as simple as just using CoLab. So, it's up to you. Now, let's get started. The first thing we do want
00:05:33.280 | to do here is set up our OpenAI API key. I will note that there are many features in OpenAI's Agents SDK that
00:05:43.200 | you can use with other LLM providers. But handoffs are not one of those. So, handoffs, from what I have
00:05:51.200 | seen so far, do only work with OpenAI LLMs. So, we will need an OpenAI API key. So, we'll run this. And you
00:05:59.760 | get this from the OpenAI platform, which is platform.openai.com. Now, we will be going through some
00:06:06.560 | of the orchestrator sub-agent code just to sell our agents in particular because we're using those
00:06:12.240 | both in the orchestrator sub-agent part of this and also in the handoffs section as well. So,
00:06:18.240 | we'll go and initialize those. But the actual agents themselves are basically the same. We're
00:06:24.560 | going to prompt the orchestrator a little bit differently. But otherwise, they are the same. So,
00:06:30.320 | let's go ahead and just initialize all those. I have spoken more about them in the orchestrator
00:06:35.760 | sub-agent video. So, if you really do want to cover multi-agent systems in Agents SDK, I would
00:06:42.320 | recommend watching that as well. Whether that's before or after this, that's completely up to you.
00:06:47.280 | I will explain the bare minimum to understand everything here. So, we are first going to initialize
00:06:53.760 | this web search sub-agent, okay? Just to be clear, this is not the architecture.webbuilding. The
00:07:00.960 | architecture.webbuilding is shown in this later graph here, which is the main agent. It's basically
00:07:07.280 | the same book using handoffs, okay? So, we're going to go ahead and initialize those. So, the web search
00:07:13.600 | sub-agent is using this LinkUp API. LinkUp is a search provider like Perplexity, like EXA. So, if you've
00:07:23.280 | used either of those services, it's a similar sort of thing. But in general, they provide really good
00:07:29.920 | search results. So, I do really like using these guys. So, we will need to set up our LinkUp API key.
00:07:39.120 | So, to do that, you need to click over here. We have this LinkUp reference and we just need to
00:07:46.000 | obviously sign up if you don't already have an account. If you're following the last video,
00:07:50.400 | you will already have one. So, you will sign in and you should find that you'll have some free credits.
00:07:56.080 | So, it will probably last you a while, I think. So, you need to copy your API key and we'll come
00:08:04.000 | back over to here and I just want to run this, okay? And then to my API. Great. So, we have that. Now,
00:08:10.560 | once we have that, we'll want to perform a search. I do generally say, you know, you should always use
00:08:18.000 | async because if you're building AI agents into an application, like AI is using a lot of API calls
00:08:25.520 | in general. If you're going fully local, then it's different, of course. But a lot of the time, you're
00:08:31.680 | using a lot of API calls. And API calls, if you're writing synchronous code, your Python instance is
00:08:40.160 | just going to be waiting for a response. So, your code is sending the API request and it's just doing
00:08:47.840 | nothing whilst it waits for it to return. And that's super inefficient because that's a lot of time,
00:08:53.680 | especially when you think about how much code could be executed in the time it takes for an LLM to respond
00:08:59.280 | to you, which is, you know, it's like two or so seconds. You could be doing a lot in that time.
00:09:03.840 | So, write everything asynchronously and whilst you're waiting for that response from an API,
00:09:10.240 | your Python code can go and it can be doing other things within that time. So, yeah,
00:09:18.720 | AI especially is one of those fields where writing async code is generally very, very useful.
00:09:28.320 | So, yeah, we use the async search because of that. Great. We get our search results there. We pass
00:09:35.520 | those out and you can kind of see here like this, you know, this is telling me, did I search for world
00:09:42.000 | news, right? Search for latest world news. And this is everything I'm getting there. Then I create this
00:09:48.400 | search web tool. This is what my search web agent is going to be using. Okay. You see here, this is the
00:09:55.760 | prompting and everything for that web search agent. And yeah, we have that. So, we can also confirm,
00:10:02.960 | does that work? I'm going to ask you, how's the weather in Tokyo? It will tell us. So, let's see that.
00:10:09.200 | Okay. Very nice weather. And that is, yeah, all seems to work. That's great. Now, we move on to the next
00:10:16.320 | sub-agent, which is like a dummy rag agent. So, we're pretending to have these internal docs,
00:10:22.480 | of which you can read those docs here. Basically, we have our AI robotics startup, which is Skynet,
00:10:30.400 | talking about, I think it's like a T1000 or something robots. I can't quite remember. It was a
00:10:38.560 | while ago when I last read it. But it's talking about how Skynet is doing. And you can see here,
00:10:46.160 | right? So, it's basically all this information here, right? And we are essentially creating this dummy
00:10:55.200 | rag search here. So, this query that the LLM is passing to our dummy search tool doesn't actually
00:11:02.160 | get used. But that's fine because we're not, you know, we're focusing really on that agentic
00:11:09.280 | architecture here, not necessarily on a specific rag tool, which takes a little bit of setup.
00:11:16.480 | So, yeah, we have that. Then we have our code execution sub-agent. Code execution sub-agent
00:11:22.640 | is just going to execute some code that our LLM generates. Again, yeah, it's the same sort of thing,
00:11:29.760 | right? So, we have our code execution tool, code execution agent. Then we provide that agent with
00:11:37.440 | specific prompting based on what we need it to do. And yeah, we can see everything it's doing here
00:11:43.520 | and what that comes out to. Cool. Then we have the orchestrator. I don't think we necessarily
00:11:50.560 | need to define everything here, but we'll just run through just in case. I think we might need
00:11:56.160 | this get current date tool. This is a tool of itself. This is not a sub-agent or anything like
00:12:02.240 | that. We just also define this get current date tool just to see, okay, our architecture here can
00:12:09.280 | use both agents as tools and just tools. And then when we move on in a moment to the handoffs,
00:12:16.640 | we'll see that we have both agents that can hand off to other agents and our agent that can just use
00:12:23.280 | a tool or do both, right? So, the agent can use a tool then hand off to another agent, for example.
00:12:28.080 | Okay. So, we have that. I'm not going to go and run the orchestrator stuff. That's what we're building here.
00:12:34.320 | Instead, we're going to move on to the handoff parts. We have all of these components basically
00:12:40.000 | ready. We're going to redefine the main agent here, but the website, sub-agent, term.sub-agent,
00:12:44.800 | and codecluson.sub-agent, they are all ready. The main difference, as I mentioned near the start of the
00:12:51.760 | video, is that each of these sub-agents, as you can see here, each of these sub-agents can go ahead
00:12:57.360 | and respond to the human user directly. Okay. That's one of the main differences here.
00:13:03.040 | So, as I mentioned, one of the biggest positives there is latency, right? Latency and cost as well. So,
00:13:11.680 | we are removing that additional step to go back to the orchestrator or main agent to get back to the
00:13:19.680 | user, right? Which is not as direct as it could be. So, let's go ahead and implement that handoff.
00:13:28.080 | So, with the handoffs, OpenAI within the SDK, they have provided this default prompt that you can add to
00:13:38.240 | your orchestrator or main agent prompt or instructions that essentially just describes
00:13:45.680 | what handoffs are and that sort of broader system context for the LLM. And I find this quite useful
00:13:54.880 | because it's your LLM, it has no idea where it is. Okay. And what that means is that
00:14:05.360 | if it doesn't know where it is, it doesn't know the context of, you know, within what system it's
00:14:09.120 | being used, it might do things that we wouldn't necessarily want it to do, right? It might misuse
00:14:16.080 | that system and just understand how to use the system. And of course, we don't want that. We want it.
00:14:21.120 | If we explain to the LLM what the multi-agent architecture around it looks like,
00:14:29.040 | generally speaking, the LLM is going to know, is going to be able to better use its tools, its other
00:14:38.160 | agents, and understand its own place within that system. And we'll generally get better results from
00:14:45.200 | that. So, this is really useful. They have this recommended prompt prefix here for handoffs, this
00:14:52.400 | handoff prompt, recommended prompt prefix. And we can read it here. Okay. So "handoffs are achieved by
00:14:58.000 | calling a handoff function, generally named transfer to agent name". Okay. So, internally, what is
00:15:03.680 | happening here is when we have these handoffs here, each one of these is going to be renamed internally
00:15:12.320 | to something like transfer to, and then I believe it would be web search agent or internal docs agent.
00:15:19.440 | And all those are going to be presented to this main agent as a thing, like essentially almost
00:15:24.400 | tools that it can call. So, transfers between agents are handled seamlessly in the background. Do not
00:15:30.000 | mention or draw attention to these transfers in your conversation with the user. I think that bit is
00:15:33.920 | important because it's very easy for a LLM within the multi-agent system, or even, you know, when it's
00:15:41.440 | using tools to talk directly to the user about what it's doing. And in some cases, you might want to do this.
00:15:48.400 | So, you know, if it's like a research agent and you might want the LLM to say, "Okay,
00:15:54.720 | I'm just going to refer to this tool to look up these particular bits of information. And hey, look,
00:16:02.080 | this is what I got from this tool, this information here". So, you might want some level of detail in
00:16:08.960 | there, but most likely you don't want too much. Like, you don't want your LLM to be saying, "Hey, I'm going to use the
00:16:17.360 | the transfer_to_web_agent
00:16:25.680 | and the transfer_to_internal_docsagent".
00:16:27.680 | You don't want all of that specific information. Instead, you're probably going to want to,
00:16:33.840 | maybe through streaming or some other interface,
00:16:37.680 | potentially show the user, "Oh, we're using these various tools or we're using a tool".
00:16:43.440 | And you're probably going to want to include sources in particular, which we do do. We've
00:16:51.120 | prompted all of our sub-agents to provide markdown citations. So, all that does get returned to us,
00:16:56.880 | but we don't want it to just return too much information. Okay? So, yeah, we have this, we have our recommended prompt prefix.
00:17:05.200 | You, depending on what you're doing here, you might actually want to,
00:17:08.240 | well, use this as a prefix rather than just the prompt. But in this case, we don't necessarily need to.
00:17:14.000 | So, if you are going to use it as a prefix, which I think most probably would be doing, you're going
00:17:18.800 | to be adding more text like it kind of has on here. So, yeah, it would be something like this in this
00:17:26.800 | scenario, right? Because it's like an internal company agent that we're building. So, you would
00:17:33.520 | use this as a prefix and then add more use case specific information following that prefix.
00:17:39.760 | So, then we have our handoffs. So, handoffs are easy, right? So, we've defined our agents already,
00:17:45.360 | they're standard agents, and then we pass them as a list into the handoffs parameter of our agent
00:17:51.200 | definition. Then we also want to provide handoff description. So, this is just, okay, in general,
00:17:59.040 | how should I use these handoffs? You can get more specific in this as well if you,
00:18:02.480 | depends on how your agent is behaving. If you find that it's not using the
00:18:06.320 | tools correctly in various cases, you might want to say, oh, you need to use this handoff
00:18:14.400 | in this particular scenario, right? You can get more specific. It just depends on what you're going for.
00:18:20.480 | And based on your testing of the agents, where is it lacking ability and where is it performing well,
00:18:31.200 | right? You would obviously just iterate on your instructions and your handoff description based
00:18:37.040 | on that. And the final thing is that we do also include a tool. So, this tool is just for the main
00:18:42.880 | agent. Okay. Cool. So, we have that. We run it and we go on and just say, okay, how long ago from today
00:18:52.240 | was it when we got our last red and report? Okay. So, ideally, what is going to happen here is the main
00:18:59.040 | agent is going to go and use the get current date tool. It's going to receive that information and then
00:19:05.680 | all of that then it's going to hand off to the, I would say, the internal docs sub-agent. And the
00:19:13.120 | internal docs sub-agent will see, okay, the previous tool call for the get current time. It will see the
00:19:17.920 | query from the user and it will be able to go and check its own internal docs tool, get the revenue report,
00:19:28.000 | and return all of that information to the user. So, I think the revenue report in the internal docs
00:19:34.720 | that we have, the date on that is April 2nd, 2025. So, let's see what happens there. When we're comparing
00:19:45.200 | the time between the orchestrated sub-agent, I was getting like 7.5 seconds for this query
00:19:50.240 | with the orchestrated sub-agent pattern. So, that's without handoffs. And then with the handoffs,
00:19:54.080 | I was getting there 6.4 seconds. There may also be network latency involved there. And we can try
00:20:00.160 | running it again, although we might run into some caching or prompt caching there. So, let's just see
00:20:07.040 | what we get. So, interestingly, I'm getting very long response time to say, which is not normal.
00:20:12.080 | This is, I assume it's my network latency or something wrong with OpenAI. Okay. Well, we see it's still
00:20:18.720 | running. That was an insanely long time. The answer is correct, right? But that time is a bit crazy.
00:20:27.440 | But we can have a look. So, let's just take a look at the tracing dashboard, because we also get tracing
00:20:34.560 | by default with Agents SDK, which is really very useful. So, let's talk a little bit about this in the
00:20:40.160 | of the previous videos. But I'm going to go to the dashboard here. I'm going to go to traces and we'll
00:20:47.600 | just see, okay, why did that take so long? And it does seem like it was actually on the OpenAI side,
00:20:53.200 | which is pretty wild. Wow. So, when we look at this, we can see, well, what took such a long time.
00:21:04.720 | So, our main agent step here was just 3.8 seconds. Okay. And that was, you know, it was just, it was
00:21:12.720 | normal, right? So, we got the get, we decided, okay, I'm going to use the get current date tool. It did.
00:21:18.960 | It got that and then it decided, okay, I'm going to use the transfer to internal docs agents. And it did that.
00:21:26.560 | So, there was a, it created that handoff. And that's what we see here. Then we came to this internal docs
00:21:31.680 | agent and took an incredibly long time. The internal docs agent decided first to use the search internal
00:21:39.440 | docs tool. And that was very quick. So, we had this, it took 1.5 seconds to generate that. The docs tool,
00:21:47.280 | it's not really doing anything. So, it was really fast to respond. Then, this response here took
00:21:55.840 | a incredibly, incredibly long time. And looking at this, there's nothing in here that, in my opinion,
00:22:04.320 | there's nothing in here that should have taken a very long time. All it did was generate this.
00:22:10.640 | So, it read all this information here, the LM, and we're using GPT 4.1 mini, should be very fast. And
00:22:18.560 | this is a response we got. Okay. So, there's nothing crazy in here. Like, I could have assumed if this was
00:22:25.360 | like a basically an essay, but it's not. It's 55 tokens output and 615 tokens input, which is not that,
00:22:34.720 | there's nothing significant. So, to me, this time here is a bit of an outlier and very likely
00:22:44.240 | from OpenAI. And if I run it again, it's 11.2 seconds. Not as good as what I was seeing before,
00:22:49.840 | but at least a lot more reasonable than what we saw. So, yeah, the correct information has been
00:22:57.040 | returned is going through that. So, okay. That seems to be working other than the outliers in latency there.
00:23:05.920 | Just to confirm, we can take a look at that last run and just check, okay, what actually happened
00:23:12.640 | there. And we can see exactly the same thing. Just that final post, that final LM generation
00:23:19.440 | was not like insanely long, which is what we see there. Great. So, we have that. And now we have our
00:23:27.200 | next question, which is what is our current revenue and what century of revenue comes from T1000 units?
00:23:32.080 | Now, what we would typically see here, or what we would need to be aware of is in this question,
00:23:38.080 | we're kind of asking for two sub-agents to be used. We have what is our current revenue,
00:23:43.040 | which is the internal docs sub-agent. And then we also have what percentage of revenue comes from
00:23:47.440 | the T1000 units, which is ideally the code execution agent, you know, just to be careful in how we're
00:23:54.400 | computing things. Although what we'll find is that it will, the internal docs sub-agent will likely just
00:24:00.080 | calculate it itself, because it's not too difficult of a calculation, but particularly for those more
00:24:05.280 | complicated calculations, we'd ideally want to hand off to the code execution sub-agent. And to be honest,
00:24:11.360 | even ideally for simple calculations, because LLMs can fairly easy get calculations wrong,
00:24:18.080 | although they have gotten much better at it recently, it's still something that they're going to
00:24:24.960 | hallucinate every now and again. And we want to try and avoid that as much as possible,
00:24:28.880 | which is much easier to do when we have another agent writing and executing code. Because if it
00:24:35.360 | writes that code wrong, it's not going to run. It's going to be told, "Hey, you need to write this code
00:24:42.160 | correctly." And it will usually basically fix itself and realize it's the error and resolve the issue. So
00:24:49.760 | it generally, it can be much safer to do that.
00:24:56.080 | So let's see how long this one takes. So we've got 7.7 seconds. This is more aligned with what I was
00:25:04.400 | seeing before, actually. So we have all of this current revenue, and this is the correct percentage from
00:25:13.920 | the T1000 units. So yeah, this is accurate. Although if we have a look at the agent workflow,
00:25:24.880 | the most recent one here, we will see that, okay, use the internal docs. It seems to use them, I think it
00:25:34.320 | called parallel. Yeah, it tried parallel tool calls here. It was trying to search your current revenue
00:25:40.480 | and revenue percentage from the T1000 units because the internal docs agent thinks that it's using
00:25:46.560 | like an actual WAG tool that has access to many documents. It doesn't, it just has access to the
00:25:50.960 | one, but it's going to try and use that as if it has access to many documents. And of course, the response
00:25:57.680 | from both of those is the same. So we've got that response and it uses that to generate the answer,
00:26:02.880 | which is what we have here. So yeah, all of that happened as we would expect it to happen.
00:26:10.400 | So nothing surprising that. Great. So that's a handoff, so high level. What I do want to cover
00:26:18.080 | is a few other handoff features. And these are not necessarily things that we're going to be using
00:26:23.760 | in production, maybe in some cases, not most of the time. Instead, I think most of these features
00:26:29.680 | are very useful for development and debugging and just understanding what is going on. So we have
00:26:35.760 | three things on take you through. We have on handoff, which is a callback that gets executed whenever we
00:26:42.480 | hand off to a sub agent. And in a production scenario, you probably use this to write, you know,
00:26:51.840 | like the fact that a handoff happened, like some handoff event log to a database or to your
00:26:59.120 | telemetry provider, you know, whatever you're doing, that's probably where you would use this
00:27:04.960 | callback. And in development, this can just be a very good place to put like a print statement or a
00:27:13.440 | log, a debug or whatever else, just to see kind of what is okay when a handoff is happening and
00:27:21.280 | whatever information you need within that handoff as well. So that can be really useful for that.
00:27:26.800 | And I'll take you through that you're using that in a moment. We then also have input type. So input type
00:27:32.560 | allows us to define a specific structured format to be used by our agent in the handoff. So to pass
00:27:40.720 | particular information to either the sub agent or actually through the callback. So those can obviously
00:27:48.240 | be used in tandem. So you can, you'll get some structured information that you then print or
00:27:55.600 | saw in your telemetry somewhere based on whatever you put into this input type. So that can be really
00:28:00.880 | useful. And then we also have input filter. I expect OpenAI are going to add more to this feature in the
00:28:06.560 | future. The way that OpenAI describe input filter is like you can use it to filter various things that are
00:28:13.440 | going into your agent that you're handing off to. So the way that they phrase it makes it seem like you
00:28:19.520 | can filter. Like maybe you just want to filter your user message or assistant message or you want to
00:28:25.840 | filter the tools that the LM downstream might see. But right now, the only thing that you can filter
00:28:34.400 | is all tool messages for in the conversation so far. So that, that is what it does. I will see in a moment
00:28:41.680 | an example of that and it will probably make a bit more sense. Now, all of these are set via this handoff
00:28:48.560 | object. Okay. So if we start with that on handoff function, so this is the callback that gets called
00:28:56.240 | whenever a handoff happens. So in here, we can, we're just going to print handoff called to the,
00:29:01.760 | to the console. And then what we do is we wrap our agent that we're going to be handing off to
00:29:09.440 | inside this handoff object. And we also pair that with the on handoff function that we defined here.
00:29:16.560 | And then these become the handoff agents that we create. We define those and yeah, let's run that.
00:29:26.160 | And then we're just going to run this. Okay. So what we should see is when the handoff occurs,
00:29:31.120 | we should see that we should see handoff called below our cell that we're running. So let's run that.
00:29:36.800 | Okay. So we can see the print has occurred there and then we get this. Okay. So it's the same as
00:29:45.040 | before. Great. So we have that. Let's add a little more to this. So one thing with this information
00:29:53.920 | here is that we don't actually get much information being passed to the callback handler. So what I'm
00:30:00.880 | going to do is I'm going to add a little more information. What I'm going to do is say, okay,
00:30:05.600 | I know the handoff is happening, but why is it happening? And where is it being handed off to?
00:30:11.360 | Okay. So I'm defining this pedantic base model. I'm saying, I want the sub-agent name. We set as field
00:30:18.800 | and we say, okay, this is the name of sub-agent that is being called. Then the reason, like,
00:30:23.840 | why is this sub-agent being called? And the main agent is going to have to generate this
00:30:30.240 | for that handoff. Okay. So we're going to be able to see exactly why things are happening according to
00:30:37.680 | the, according to the main agent. Okay. And then we're just going to print it out so we can, we can read
00:30:42.080 | that. So we do that. We use on handoff again, which is the same as before, but we've added this handoff
00:30:50.880 | info via the input data there. Then we also add input type. Okay. So you can see how both of these
00:30:59.440 | together, it can be really helpful for debugging things. So I'm going to go ahead and initialize
00:31:05.360 | that and just run and let's see what happens. Okay. So we see that we are handing off to the internal
00:31:15.120 | docs agent because to determine the date of the most recent revenue report for the user, which is,
00:31:20.800 | okay. That's what we would expect. And then we get the correct answer again. Okay. So that is really,
00:31:27.120 | really helpful. Now with the handoffs, all of our chat history is being passed to the sub agents.
00:31:37.840 | And in some cases, we might actually not want that to happen. In many cases, we probably do want that to
00:31:43.920 | be the case. Generally speaking, I think LMs are going to perform at their best with maximal context and
00:31:53.120 | information to a certain level. You probably like, for example, frag, I don't think there's any point
00:32:01.200 | in just sending everything to an agent. It's better to filter that down. But when it comes to chat history,
00:32:08.000 | they saw and tool calls that have been made. I think it is generally best to keep all that information
00:32:13.280 | there and available to the LMs, but maybe in some cases, you actually might want to filter that stuff out.
00:32:18.880 | So we can do that with the handoff filters. And we, at least for now, the only thing that we can
00:32:25.280 | filter is all the tool call messages. Okay. So I'm going to add that. So we do that with input filter
00:32:32.560 | and we're adding handoff filters, remove all tools. This is going to remove all the tool messages.
00:32:38.160 | Okay. Now the only tool that can be called here is the get current date tool. And we've been using
00:32:46.000 | that to answer all the questions accurately. So we're going to actually see now that this,
00:32:50.800 | our workflow won't be able to answer this question because we're going to be filtering out that
00:32:58.160 | information. So that tool is going to be called by the main agent, but then the fact that that tool was
00:33:04.320 | called, that is not going to make it to the sub-agents that we hand off to. So let's run this and we'll see
00:33:11.760 | what happens. So we have the hand off to internal documentation to find the day of the most recent
00:33:17.120 | revenue report. And yeah, so now it's telling me that today is April 27th, 2025, which it is,
00:33:23.200 | it's not, it's, we're in May. So it's a little while, it's a little bit off. So yeah, it's incorrect,
00:33:30.400 | right? And that's because we're using that filter. Now there may be cases where you do want to filter out
00:33:36.400 | those things. It just depends on your use case and what you're building. So that is actually it. That's
00:33:42.000 | all I wanted to cover. We've, I think, really dived into what handoffs are, where we might want to use
00:33:47.600 | them and also where we might not want to use them and the various tools and features that I have also
00:33:55.760 | included for handoffs. I think in general, handoffs are a really good concept for building multi-agent
00:34:04.080 | systems. And I think that's obvious from what we've seen here as well. But of course, there are cases
00:34:08.960 | where we might want to go with that, like orchestrate a sub-agent pattern or something else, or maybe a mix of
00:34:13.520 | both. It really depends on what you're building, but it's very good to just be aware of all these different
00:34:19.040 | approaches that you can take when building these multi-agent systems. But yeah, that's all I wanted
00:34:24.000 | to cover. So I hope all this has been useful and interesting. Thank you very much for watching,
00:34:30.000 | and I will see you again in the next one. Bye.
00:34:46.820 | We'll see you next time.