OpenAI Agents SDK Handoffs | Deep Dive Tutorial

Today, we're going to be taking a look at OpenAI's Agents SDK and how we can use agent handoffs within the framework. Now, this is the second video on multi-agent systems in Agents SDK. The first one, we were looking at more of a orchestrate sub-agent pattern and how we achieve that with the Agents as Tools method, but we didn't cover the handoffs and the handoffs are, well, they're different to the as Tools method.

And the best way of thinking about these two different methods and understanding them is the as Tools method or the orchestrate sub-agent pattern. When you use that, you always have one of your agents as the orchestrator, as the controller of the entire workflow. It's that orchestrator that is receiving and sending messages to your user.

It's that orchestrator that is deciding which sub-agent to go ahead and use. And those sub-agents are always going to respond back to that orchestrator. So, the orchestrator is always in control. Whereas, with handoffs, it is slightly different. I mean, you kind of have it in the name. It's a handoff.

So, let's say we still had an orchestrator, although that wouldn't be a very good name in this scenario. So, let's just call our main agent the orchestrator. In the handoff setup, if we maintain three sub-agents, the orchestrator would be handing off full control to the sub-agent. So, the sub-agent may be able to go back to the orchestrator if it likes.

It depends on how you set things up. But in many cases, it will probably go directly back to the user. So, the sub-agents in this scenario can actually respond directly to the user. Now, both of these approaches have their pros and cons. The orchestrator pattern, you can use this orchestrator agent to have very fine-grained control over what is happening within the workflow.

You can call various agents, also in parallel if you need, and prompt each one of those agents to specialize in whatever it is they are doing. With the handoff, you can still do that. You can still specialize each of these agents to be good at specific tasks. But they also need to be capable of answering correctly to a user.

And generally, because of that, they need to have a better understanding of the overall context of the system. And one of the biggest differences, one of the biggest pros or cons, depending on which way you're looking at it, the orchestrator system is generally going to use more tokens, and it's going to be slower, because everything is going through our orchestrator.

So, when a user asks for some information from the web, the orchestrator is going to be receiving those tokens. The orchestrator is then going to be generating tokens to say, "Okay, use the web search sub-agent." The web sub-agent is going to be generating tokens to use its internal web search tool, then it's going to be generating tokens to return the answer back to the orchestrator.

And the orchestrator is going to be generating tokens to respond to the user. This is a pretty inefficient approach. There are just a lot of tokens being created. It's expensive. And that can benefit you where you really need an accurate system that can do many things. But if it is, like I just explained with the web search sample, you just need to use the web search, it's super inefficient.

The handoff approach, because you're handing off to that web search agent, the web search agent can generate a response early to the user. So, you are using one less generation step with that, which it makes a big difference. So, which of these approaches you're going to go for? It's kind of up to you.

Handoffs are very useful though. So, to get started, we are going through this article here, which is multi-agent systems and agents SDK. And in this article we covered the orchestrator sub-agent pattern to begin with. That's the video from before. And the handoff part of this article actually comes a little bit later.

So, click over here and we get to the handoff part. So, this handoff section is what we're going to be walking through in this video. Now, this all comes with code. So, the code we will find over here. So, this Aurelio CoLab's Agents SDK course. This is part of a broader course on the Agents SDK.

And we want to go to the multi-agent notebook. This is what we're going to be running through. You can open it in CoLab. That's the way that I recommend running this. You can also run this locally, which is actually what I am doing. But again, I would recommend running CoLab.

It's just easier and simpler. So, if you are running in CoLab, you will want to run this cell here. This will install everything you need to run through the code. If you are running it locally, we have setup instructions over in the README, which will explain how you get everything set up.

We use the UMV package manager. And it's pretty simple. But again, not as simple as just using CoLab. So, it's up to you. Now, let's get started. The first thing we do want to do here is set up our OpenAI API key. I will note that there are many features in OpenAI's Agents SDK that you can use with other LLM providers.

But handoffs are not one of those. So, handoffs, from what I have seen so far, do only work with OpenAI LLMs. So, we will need an OpenAI API key. So, we'll run this. And you get this from the OpenAI platform, which is platform.openai.com. Now, we will be going through some of the orchestrator sub-agent code just to sell our agents in particular because we're using those both in the orchestrator sub-agent part of this and also in the handoffs section as well.

So, we'll go and initialize those. But the actual agents themselves are basically the same. We're going to prompt the orchestrator a little bit differently. But otherwise, they are the same. So, let's go ahead and just initialize all those. I have spoken more about them in the orchestrator sub-agent video.

So, if you really do want to cover multi-agent systems in Agents SDK, I would recommend watching that as well. Whether that's before or after this, that's completely up to you. I will explain the bare minimum to understand everything here. So, we are first going to initialize this web search sub-agent, okay?

Just to be clear, this is not the architecture.webbuilding. The architecture.webbuilding is shown in this later graph here, which is the main agent. It's basically the same book using handoffs, okay? So, we're going to go ahead and initialize those. So, the web search sub-agent is using this LinkUp API. LinkUp is a search provider like Perplexity, like EXA.

So, if you've used either of those services, it's a similar sort of thing. But in general, they provide really good search results. So, I do really like using these guys. So, we will need to set up our LinkUp API key. So, to do that, you need to click over here.

We have this LinkUp reference and we just need to obviously sign up if you don't already have an account. If you're following the last video, you will already have one. So, you will sign in and you should find that you'll have some free credits. So, it will probably last you a while, I think.

So, you need to copy your API key and we'll come back over to here and I just want to run this, okay? And then to my API. Great. So, we have that. Now, once we have that, we'll want to perform a search. I do generally say, you know, you should always use async because if you're building AI agents into an application, like AI is using a lot of API calls in general.

If you're going fully local, then it's different, of course. But a lot of the time, you're using a lot of API calls. And API calls, if you're writing synchronous code, your Python instance is just going to be waiting for a response. So, your code is sending the API request and it's just doing nothing whilst it waits for it to return.

And that's super inefficient because that's a lot of time, especially when you think about how much code could be executed in the time it takes for an LLM to respond to you, which is, you know, it's like two or so seconds. You could be doing a lot in that time.

So, write everything asynchronously and whilst you're waiting for that response from an API, your Python code can go and it can be doing other things within that time. So, yeah, AI especially is one of those fields where writing async code is generally very, very useful. So, yeah, we use the async search because of that.

Great. We get our search results there. We pass those out and you can kind of see here like this, you know, this is telling me, did I search for world news, right? Search for latest world news. And this is everything I'm getting there. Then I create this search web tool.

This is what my search web agent is going to be using. Okay. You see here, this is the prompting and everything for that web search agent. And yeah, we have that. So, we can also confirm, does that work? I'm going to ask you, how's the weather in Tokyo? It will tell us.

So, let's see that. Okay. Very nice weather. And that is, yeah, all seems to work. That's great. Now, we move on to the next sub-agent, which is like a dummy rag agent. So, we're pretending to have these internal docs, of which you can read those docs here. Basically, we have our AI robotics startup, which is Skynet, talking about, I think it's like a T1000 or something robots.

I can't quite remember. It was a while ago when I last read it. But it's talking about how Skynet is doing. And you can see here, right? So, it's basically all this information here, right? And we are essentially creating this dummy rag search here. So, this query that the LLM is passing to our dummy search tool doesn't actually get used.

But that's fine because we're not, you know, we're focusing really on that agentic architecture here, not necessarily on a specific rag tool, which takes a little bit of setup. So, yeah, we have that. Then we have our code execution sub-agent. Code execution sub-agent is just going to execute some code that our LLM generates.

Again, yeah, it's the same sort of thing, right? So, we have our code execution tool, code execution agent. Then we provide that agent with specific prompting based on what we need it to do. And yeah, we can see everything it's doing here and what that comes out to. Cool.

Then we have the orchestrator. I don't think we necessarily need to define everything here, but we'll just run through just in case. I think we might need this get current date tool. This is a tool of itself. This is not a sub-agent or anything like that. We just also define this get current date tool just to see, okay, our architecture here can use both agents as tools and just tools.

And then when we move on in a moment to the handoffs, we'll see that we have both agents that can hand off to other agents and our agent that can just use a tool or do both, right? So, the agent can use a tool then hand off to another agent, for example.

Okay. So, we have that. I'm not going to go and run the orchestrator stuff. That's what we're building here. Instead, we're going to move on to the handoff parts. We have all of these components basically ready. We're going to redefine the main agent here, but the website, sub-agent, term.sub-agent, and codecluson.sub-agent, they are all ready.

The main difference, as I mentioned near the start of the video, is that each of these sub-agents, as you can see here, each of these sub-agents can go ahead and respond to the human user directly. Okay. That's one of the main differences here. So, as I mentioned, one of the biggest positives there is latency, right?

Latency and cost as well. So, we are removing that additional step to go back to the orchestrator or main agent to get back to the user, right? Which is not as direct as it could be. So, let's go ahead and implement that handoff. So, with the handoffs, OpenAI within the SDK, they have provided this default prompt that you can add to your orchestrator or main agent prompt or instructions that essentially just describes what handoffs are and that sort of broader system context for the LLM.

And I find this quite useful because it's your LLM, it has no idea where it is. Okay. And what that means is that if it doesn't know where it is, it doesn't know the context of, you know, within what system it's being used, it might do things that we wouldn't necessarily want it to do, right?

It might misuse that system and just understand how to use the system. And of course, we don't want that. We want it. If we explain to the LLM what the multi-agent architecture around it looks like, generally speaking, the LLM is going to know, is going to be able to better use its tools, its other agents, and understand its own place within that system.

And we'll generally get better results from that. So, this is really useful. They have this recommended prompt prefix here for handoffs, this handoff prompt, recommended prompt prefix. And we can read it here. Okay. So "handoffs are achieved by calling a handoff function, generally named transfer to agent name". Okay.

So, internally, what is happening here is when we have these handoffs here, each one of these is going to be renamed internally to something like transfer to, and then I believe it would be web search agent or internal docs agent. And all those are going to be presented to this main agent as a thing, like essentially almost tools that it can call.

So, transfers between agents are handled seamlessly in the background. Do not mention or draw attention to these transfers in your conversation with the user. I think that bit is important because it's very easy for a LLM within the multi-agent system, or even, you know, when it's using tools to talk directly to the user about what it's doing.

And in some cases, you might want to do this. So, you know, if it's like a research agent and you might want the LLM to say, "Okay, I'm just going to refer to this tool to look up these particular bits of information. And hey, look, this is what I got from this tool, this information here".

So, you might want some level of detail in there, but most likely you don't want too much. Like, you don't want your LLM to be saying, "Hey, I'm going to use the the transfer_to_web_agent and the transfer_to_internal_docsagent". You don't want all of that specific information. Instead, you're probably going to want to, maybe through streaming or some other interface, potentially show the user, "Oh, we're using these various tools or we're using a tool".

And you're probably going to want to include sources in particular, which we do do. We've prompted all of our sub-agents to provide markdown citations. So, all that does get returned to us, but we don't want it to just return too much information. Okay? So, yeah, we have this, we have our recommended prompt prefix.

You, depending on what you're doing here, you might actually want to, well, use this as a prefix rather than just the prompt. But in this case, we don't necessarily need to. So, if you are going to use it as a prefix, which I think most probably would be doing, you're going to be adding more text like it kind of has on here.

So, yeah, it would be something like this in this scenario, right? Because it's like an internal company agent that we're building. So, you would use this as a prefix and then add more use case specific information following that prefix. So, then we have our handoffs. So, handoffs are easy, right?

So, we've defined our agents already, they're standard agents, and then we pass them as a list into the handoffs parameter of our agent definition. Then we also want to provide handoff description. So, this is just, okay, in general, how should I use these handoffs? You can get more specific in this as well if you, depends on how your agent is behaving.

If you find that it's not using the tools correctly in various cases, you might want to say, oh, you need to use this handoff in this particular scenario, right? You can get more specific. It just depends on what you're going for. And based on your testing of the agents, where is it lacking ability and where is it performing well, right?

You would obviously just iterate on your instructions and your handoff description based on that. And the final thing is that we do also include a tool. So, this tool is just for the main agent. Okay. Cool. So, we have that. We run it and we go on and just say, okay, how long ago from today was it when we got our last red and report?

Okay. So, ideally, what is going to happen here is the main agent is going to go and use the get current date tool. It's going to receive that information and then all of that then it's going to hand off to the, I would say, the internal docs sub-agent. And the internal docs sub-agent will see, okay, the previous tool call for the get current time.

It will see the query from the user and it will be able to go and check its own internal docs tool, get the revenue report, and return all of that information to the user. So, I think the revenue report in the internal docs that we have, the date on that is April 2nd, 2025.

So, let's see what happens there. When we're comparing the time between the orchestrated sub-agent, I was getting like 7.5 seconds for this query with the orchestrated sub-agent pattern. So, that's without handoffs. And then with the handoffs, I was getting there 6.4 seconds. There may also be network latency involved there.

And we can try running it again, although we might run into some caching or prompt caching there. So, let's just see what we get. So, interestingly, I'm getting very long response time to say, which is not normal. This is, I assume it's my network latency or something wrong with OpenAI.

Okay. Well, we see it's still running. That was an insanely long time. The answer is correct, right? But that time is a bit crazy. But we can have a look. So, let's just take a look at the tracing dashboard, because we also get tracing by default with Agents SDK, which is really very useful.

So, let's talk a little bit about this in the of the previous videos. But I'm going to go to the dashboard here. I'm going to go to traces and we'll just see, okay, why did that take so long? And it does seem like it was actually on the OpenAI side, which is pretty wild.

Wow. So, when we look at this, we can see, well, what took such a long time. So, our main agent step here was just 3.8 seconds. Okay. And that was, you know, it was just, it was normal, right? So, we got the get, we decided, okay, I'm going to use the get current date tool.

It did. It got that and then it decided, okay, I'm going to use the transfer to internal docs agents. And it did that. So, there was a, it created that handoff. And that's what we see here. Then we came to this internal docs agent and took an incredibly long time.

The internal docs agent decided first to use the search internal docs tool. And that was very quick. So, we had this, it took 1.5 seconds to generate that. The docs tool, it's not really doing anything. So, it was really fast to respond. Then, this response here took a incredibly, incredibly long time.

And looking at this, there's nothing in here that, in my opinion, there's nothing in here that should have taken a very long time. All it did was generate this. So, it read all this information here, the LM, and we're using GPT 4.1 mini, should be very fast. And this is a response we got.

Okay. So, there's nothing crazy in here. Like, I could have assumed if this was like a basically an essay, but it's not. It's 55 tokens output and 615 tokens input, which is not that, there's nothing significant. So, to me, this time here is a bit of an outlier and very likely from OpenAI.

And if I run it again, it's 11.2 seconds. Not as good as what I was seeing before, but at least a lot more reasonable than what we saw. So, yeah, the correct information has been returned is going through that. So, okay. That seems to be working other than the outliers in latency there.

Just to confirm, we can take a look at that last run and just check, okay, what actually happened there. And we can see exactly the same thing. Just that final post, that final LM generation was not like insanely long, which is what we see there. Great. So, we have that.

And now we have our next question, which is what is our current revenue and what century of revenue comes from T1000 units? Now, what we would typically see here, or what we would need to be aware of is in this question, we're kind of asking for two sub-agents to be used.

We have what is our current revenue, which is the internal docs sub-agent. And then we also have what percentage of revenue comes from the T1000 units, which is ideally the code execution agent, you know, just to be careful in how we're computing things. Although what we'll find is that it will, the internal docs sub-agent will likely just calculate it itself, because it's not too difficult of a calculation, but particularly for those more complicated calculations, we'd ideally want to hand off to the code execution sub-agent.

And to be honest, even ideally for simple calculations, because LLMs can fairly easy get calculations wrong, although they have gotten much better at it recently, it's still something that they're going to hallucinate every now and again. And we want to try and avoid that as much as possible, which is much easier to do when we have another agent writing and executing code.

Because if it writes that code wrong, it's not going to run. It's going to be told, "Hey, you need to write this code correctly." And it will usually basically fix itself and realize it's the error and resolve the issue. So it generally, it can be much safer to do that.

So let's see how long this one takes. So we've got 7.7 seconds. This is more aligned with what I was seeing before, actually. So we have all of this current revenue, and this is the correct percentage from the T1000 units. So yeah, this is accurate. Although if we have a look at the agent workflow, the most recent one here, we will see that, okay, use the internal docs.

It seems to use them, I think it called parallel. Yeah, it tried parallel tool calls here. It was trying to search your current revenue and revenue percentage from the T1000 units because the internal docs agent thinks that it's using like an actual WAG tool that has access to many documents.

It doesn't, it just has access to the one, but it's going to try and use that as if it has access to many documents. And of course, the response from both of those is the same. So we've got that response and it uses that to generate the answer, which is what we have here.

So yeah, all of that happened as we would expect it to happen. So nothing surprising that. Great. So that's a handoff, so high level. What I do want to cover is a few other handoff features. And these are not necessarily things that we're going to be using in production, maybe in some cases, not most of the time.

Instead, I think most of these features are very useful for development and debugging and just understanding what is going on. So we have three things on take you through. We have on handoff, which is a callback that gets executed whenever we hand off to a sub agent. And in a production scenario, you probably use this to write, you know, like the fact that a handoff happened, like some handoff event log to a database or to your telemetry provider, you know, whatever you're doing, that's probably where you would use this callback.

And in development, this can just be a very good place to put like a print statement or a log, a debug or whatever else, just to see kind of what is okay when a handoff is happening and whatever information you need within that handoff as well. So that can be really useful for that.

And I'll take you through that you're using that in a moment. We then also have input type. So input type allows us to define a specific structured format to be used by our agent in the handoff. So to pass particular information to either the sub agent or actually through the callback.

So those can obviously be used in tandem. So you can, you'll get some structured information that you then print or saw in your telemetry somewhere based on whatever you put into this input type. So that can be really useful. And then we also have input filter. I expect OpenAI are going to add more to this feature in the future.

The way that OpenAI describe input filter is like you can use it to filter various things that are going into your agent that you're handing off to. So the way that they phrase it makes it seem like you can filter. Like maybe you just want to filter your user message or assistant message or you want to filter the tools that the LM downstream might see.

But right now, the only thing that you can filter is all tool messages for in the conversation so far. So that, that is what it does. I will see in a moment an example of that and it will probably make a bit more sense. Now, all of these are set via this handoff object.

Okay. So if we start with that on handoff function, so this is the callback that gets called whenever a handoff happens. So in here, we can, we're just going to print handoff called to the, to the console. And then what we do is we wrap our agent that we're going to be handing off to inside this handoff object.

And we also pair that with the on handoff function that we defined here. And then these become the handoff agents that we create. We define those and yeah, let's run that. And then we're just going to run this. Okay. So what we should see is when the handoff occurs, we should see that we should see handoff called below our cell that we're running.

So let's run that. Okay. So we can see the print has occurred there and then we get this. Okay. So it's the same as before. Great. So we have that. Let's add a little more to this. So one thing with this information here is that we don't actually get much information being passed to the callback handler.

So what I'm going to do is I'm going to add a little more information. What I'm going to do is say, okay, I know the handoff is happening, but why is it happening? And where is it being handed off to? Okay. So I'm defining this pedantic base model. I'm saying, I want the sub-agent name.

We set as field and we say, okay, this is the name of sub-agent that is being called. Then the reason, like, why is this sub-agent being called? And the main agent is going to have to generate this for that handoff. Okay. So we're going to be able to see exactly why things are happening according to the, according to the main agent.

Okay. And then we're just going to print it out so we can, we can read that. So we do that. We use on handoff again, which is the same as before, but we've added this handoff info via the input data there. Then we also add input type. Okay. So you can see how both of these together, it can be really helpful for debugging things.

So I'm going to go ahead and initialize that and just run and let's see what happens. Okay. So we see that we are handing off to the internal docs agent because to determine the date of the most recent revenue report for the user, which is, okay. That's what we would expect.

And then we get the correct answer again. Okay. So that is really, really helpful. Now with the handoffs, all of our chat history is being passed to the sub agents. And in some cases, we might actually not want that to happen. In many cases, we probably do want that to be the case.

Generally speaking, I think LMs are going to perform at their best with maximal context and information to a certain level. You probably like, for example, frag, I don't think there's any point in just sending everything to an agent. It's better to filter that down. But when it comes to chat history, they saw and tool calls that have been made.

I think it is generally best to keep all that information there and available to the LMs, but maybe in some cases, you actually might want to filter that stuff out. So we can do that with the handoff filters. And we, at least for now, the only thing that we can filter is all the tool call messages.

Okay. So I'm going to add that. So we do that with input filter and we're adding handoff filters, remove all tools. This is going to remove all the tool messages. Okay. Now the only tool that can be called here is the get current date tool. And we've been using that to answer all the questions accurately.

So we're going to actually see now that this, our workflow won't be able to answer this question because we're going to be filtering out that information. So that tool is going to be called by the main agent, but then the fact that that tool was called, that is not going to make it to the sub-agents that we hand off to.

So let's run this and we'll see what happens. So we have the hand off to internal documentation to find the day of the most recent revenue report. And yeah, so now it's telling me that today is April 27th, 2025, which it is, it's not, it's, we're in May. So it's a little while, it's a little bit off.

So yeah, it's incorrect, right? And that's because we're using that filter. Now there may be cases where you do want to filter out those things. It just depends on your use case and what you're building. So that is actually it. That's all I wanted to cover. We've, I think, really dived into what handoffs are, where we might want to use them and also where we might not want to use them and the various tools and features that I have also included for handoffs.

I think in general, handoffs are a really good concept for building multi-agent systems. And I think that's obvious from what we've seen here as well. But of course, there are cases where we might want to go with that, like orchestrate a sub-agent pattern or something else, or maybe a mix of both.

It really depends on what you're building, but it's very good to just be aware of all these different approaches that you can take when building these multi-agent systems. But yeah, that's all I wanted to cover. So I hope all this has been useful and interesting. Thank you very much for watching, and I will see you again in the next one.

Bye. Bye. Bye. Bye. Bye. Bye. Bye. you We'll see you next time.

OpenAI Agents SDK Handoffs | Deep Dive Tutorial

Chapters

Transcript