back to indexMulti-Agent Systems in OpenAI's Agents SDK | Full Tutorial

Chapters
0:0 OpenAI's Agents SDK
1:38 Python Setup
2:51 Orchestrator Subagent
5:57 Web Search Subagent
11:40 RAG Subagent
17:15 Code Execution Subagent
23:44 Orchestrator Agent
28:44 Evaluating our Multi-Agent Workflow
39:31 Pros and Cons of Orchestrators
00:00:00.000 |
Today, we're going to be taking a look at multi-agent workflows in OpenAI's Agents SDK. 00:00:05.780 |
Now, OpenAI's Agents SDK is the production version of their earlier open source package 00:00:14.120 |
And what Swarm was focused on doing was literally building agent Swarm. 00:00:20.100 |
So you could imagine that the successor of OpenAI's agentic Swarm package 00:00:29.380 |
has fairly strong support for multi-agent systems, and that would be accurate. 00:00:35.900 |
Working multiple agents in Agents SDK is incredibly easy, works very well, and generally quite flexible. 00:00:42.740 |
Now, within the SDK, there are two primary approaches that you might take to building a multi-agent system. 00:00:49.820 |
The first of those, which is what we'll be focusing on today, is the orchestrator sub-agent pattern. 00:00:56.420 |
And the other is using agent handoffs, which we will not be covering in this video, but I will talk about in another video. 00:01:04.920 |
So let's begin by taking a look at this orchestrator sub-agent pattern. 00:01:11.820 |
Now, everything we are going to cover is going to be in various places. 00:01:15.940 |
So we have this article on the Aurelio AI site. 00:01:19.180 |
This is a chapter in our upcoming Agents SDK course, and this covers both the orchestrator sub-agent pattern and also using handoffs. 00:01:31.440 |
So you can follow this, or alternatively, and I think this is where most of us will probably go, 00:01:38.540 |
we can go to the Aurelio Labs Agents SDK course, go to chapters, and 0-4 multi-agent. 00:01:45.260 |
In here, we have all the code that I'm going to work through. 00:01:54.560 |
So you would literally click this little button here. 00:01:59.780 |
Run this, and you are set up and ready to go. 00:02:03.540 |
The other way that you can run this is locally. 00:02:06.420 |
So you get cloned the repo, and there are set up instructions for how you can do this in the repo readme. 00:02:14.640 |
And I'm actually going to be going and running all this locally, because I already have everything set up. 00:02:19.940 |
And it just looks nicer when everything is on my local code editor. 00:02:27.700 |
So the first thing that we're going to do before we jump into the orchestrator sub-agent is we need to set our API key. 00:02:41.840 |
You need to get this from the OpenAI platform, of course. 00:02:44.680 |
And you just paste your API key in the top here. 00:02:47.600 |
Or I think it's actually below the cell if you're running this in CodeLab. 00:02:50.980 |
So orchestrator sub-agent, what does that look like? 00:02:56.220 |
So we can see here, we have this human orchestrator, and then we have these sub-agents below the orchestrator. 00:03:02.200 |
The orchestrator sub-agent pan I'm talking about here is where we have a main agent, i.e. the orchestrator. 00:03:13.700 |
And essentially everything goes through this orchestrator. 00:03:19.300 |
It is orchestrating everything that is going on when we are interacting with this agent workflow. 00:03:25.560 |
The orchestrator is what we communicate with as a human. 00:03:36.600 |
The orchestrator is going to see that and the orchestrator will decide what to do. 00:03:41.640 |
Will it refer to a sub-agent for some additional information? 00:03:52.960 |
So, for example, let's say our question is, based on NVIDIA's latest earnings, what is their PTE ratio right now? 00:04:08.940 |
In that case, we would want to get the financial data from their latest earnings report using the web search sub-agent. 00:04:17.700 |
We would get that information and then we would use that, pass it over to our code execution sub-agent, which would perform some calculations for us to calculate that ratio. 00:04:29.180 |
And then that would be passed back to the orchestrator and the orchestrator would pass that information back to us. 00:04:34.820 |
Now, alternatively, if you say, hello, how are you, the orchestrator doesn't need to do anything. 00:04:40.200 |
It doesn't need to go to any of these sub-agents. 00:04:42.160 |
So it will say, okay, I'm just going to respond to the human directly. 00:04:46.380 |
There's no need to go down and run anything else here. 00:04:50.620 |
Now, this is what the orchestrator sub-agent pattern looks like. 00:04:55.040 |
Everything is controlled by the orchestrator. 00:04:58.640 |
The sub-agents, they are essentially used as tools for that orchestrator. 00:05:04.700 |
The sub-agents do not respond to the user directly and they only do something when the orchestrator tells them to do so. 00:05:15.960 |
Now, we're going to go ahead and actually build what we can see here. 00:05:23.340 |
The first thing we're going to focus on is building these sub-agents. 00:05:28.320 |
We have the web search sub-agent, which you might have guessed is doing a web search for us. 00:05:38.460 |
This is almost like a RAG agent, and I will talk a little more about that soon. 00:05:44.260 |
And then we also have the code execution agent, which I'm mostly restricting to doing calculations for us in this example. 00:05:51.920 |
So let's go ahead and start looking at how we would build those in agents SDK. 00:05:59.500 |
The web search sub-agent will take a query from the orchestrator, which is in most cases going to be some form of the user's query, and use it to search a web. 00:06:09.480 |
The agent is going to collect various sources. 00:06:12.940 |
I think by default, there's like 10 different sources I will collect. 00:06:16.680 |
It's going to collate the information from those sources and then generate a single text response and pass that back to the orchestrator. 00:06:25.700 |
Now, I considered using OpenAI's built-in web search tool for this, but to be completely honest, it is absolutely terrible. 00:06:36.840 |
So I'm not going to use it, and instead we're going to use another web search API called LinkUp. 00:06:43.440 |
Now, LinkUp does require an account, but they give a certain amount of free credits when you sign up, and I think basically you have to use a lot of searches to use up those free credits. 00:06:56.500 |
So it's more than enough for what we're doing here. 00:06:59.160 |
So let's go ahead and just go through and create a LinkUp account. 00:07:03.700 |
Clicking on this link here, and you should get sent through to a sign-up page, most likely. 00:07:09.680 |
Once you have signed up, you will get to your homepage here. 00:07:22.780 |
I'm going to run this cell, and I'm just going to enter my API key here. 00:07:27.440 |
Okay, so now let's just test this quickly and see what we get. 00:07:32.240 |
I'm going to search for the latest world news. 00:07:35.320 |
So that is running, and we should get a response fairly quickly there. 00:07:40.220 |
So, yeah, we can see there's a pretty big object here, but we can just pass it out into something a bit more usable here. 00:07:51.220 |
Okay, so I'm just looking at the first three results here. 00:07:57.380 |
Okay, so you see that we have the, we have like a title of the source. 00:08:02.980 |
We have a link for that source, and then you also have like a, the content from that source as well. 00:08:09.160 |
Now there is, so what I'm doing here is this standard search. 00:08:14.360 |
There is also a deep search, which you would use if you want like really detailed results. 00:08:20.760 |
But I just want quick results here, so I'm going with standard. 00:08:25.320 |
So we have that, and well, that's basically all our tool is going to be doing, at least the web search tool. 00:08:32.080 |
So I'm going to go ahead and create a function tool using this, the logic that we just put together. 00:08:38.860 |
I am going to be using the async search method here, because in general, if I'm building an AI agent, I want to make sure everything is async. 00:08:50.900 |
Especially those operations where you're waiting for an API because, okay, you're waiting for a response from some API request. 00:08:59.480 |
In that time, if you're not using async, your program is just going to be sat there doing nothing. 00:09:04.820 |
If you instead are writing this asynchronously, your program can go and do other things whilst you are waiting for that API request response. 00:09:16.480 |
So as what we do here, nothing really changes, you just have to await and make sure you're using async search within an async function. 00:09:31.820 |
And what we're doing here is we're going through those search results, parsing out in the format I showed you here, 00:09:39.040 |
and just generating or building a single string from those results, which we then return. 00:09:47.800 |
Now that we have that, let's go and define the agent that is going to be using this tool. 00:09:56.780 |
So we have these instructions for how the agent should behave, how it should use the web search tool. 00:10:09.020 |
The only thing that I am specifying is I'm telling it, okay, once it has this required information, which we get from the tool, 00:10:17.440 |
once we have those results, summarize those with cleanly formatted links, sourcing each bit of information that it uses when it creates a summary. 00:10:26.680 |
And I also ask it to use markdown formatting, because markdown formatting is just nicer to work with. 00:10:35.860 |
And LMs are generally good at both reading and generating markdown. 00:10:45.020 |
Now, we can talk to our agent and just confirm that this web search works. 00:10:56.100 |
So I'm going to ask you, how is the weather in Tokyo? 00:10:59.820 |
So I get the current weather in Tokyo is around 18 degrees Celsius or 64 degrees Fahrenheit, if you want to be difficult, with partly cloudy skies. 00:11:13.680 |
And we also have the sources down here as well. 00:11:21.420 |
So that's why we have done this display in markdown. 00:11:27.580 |
So if I show you this, this is what it actually looks like. 00:11:40.860 |
So now let's move on to the next subagent, which is our internal docs subagent. 00:11:46.440 |
Now, this is a very common use case, especially in corporate environments, but honestly, just in many, many places. 00:11:57.060 |
So when I say internal docs subagent, what this agent is intended to do is given a set of private information, right? 00:12:08.420 |
So that could be your own personal private information, or it could be the company that you work for. 00:12:15.100 |
It could be all of their internal documentation, could be your team's wiki page, or something along those lines. 00:12:20.860 |
Stuff that is not on the publicly accessible web, and therefore cannot be answered by a web search agent. 00:12:27.860 |
For these types of documentation or information, we very commonly see people using RAG, which is Retrieval Augmented Generation. 00:12:37.380 |
Now, RAG is an incredibly performant way of augmenting your LLM with external information, i.e. information that your LLM does not already know from its pre-training or fine-tuning. 00:12:55.500 |
It's generally very cost-effective, it's fast, it's a really good approach. 00:13:00.740 |
However, it does require a little bit of setup, despite being relatively simple. 00:13:05.160 |
So we're not going to go through and build an entire RAG pipeline in this example, but instead, I'm just going to create almost a dummy RAG tool, which is going to return a specific document for us. 00:13:19.080 |
So, that document discusses the revenue figures for a wildly successful AI robotics company that we have set up. 00:13:29.460 |
That company is called Skynet, and we have the revenue report here that you can read. 00:13:37.640 |
And you're seeing here, we've included specific bits of information that only our internal DOPS sub-agent will be able to give us. 00:13:47.780 |
So we're specifying that this is the Q1 revenue report for 2025, that it was released on April 2nd, 2025. 00:13:58.660 |
And we have a small little executive summary, just tells us, okay, what is Skynet doing? 00:14:08.800 |
So a little table that our LM is going to be able to read. 00:14:11.640 |
It tells us just, okay, what products do we have? 00:14:16.580 |
The, like, the revenue from those various little bits of information, okay? 00:14:21.000 |
And then we have some revenue insights, some forward guidance, okay? 00:14:25.900 |
So it's like a nice, really simple revenue report. 00:14:30.020 |
Now, if you are running this in Colab, you should download that document and put it in your DOPS for Colab. 00:14:39.440 |
Alternatively, you can actually download it directly from the repo. 00:14:43.360 |
And what I will do is actually make sure I share a little snippet of code to do that so that you can just pull it directly from the repo. 00:14:50.660 |
Or if you're running this locally and you clone the repo, this will work as it is. 00:14:57.360 |
This is just going to be just loading in now our revenue report. 00:15:04.500 |
It's a fake RAG tool, just making that very clear. 00:15:11.420 |
So what it's going to do is when the LLM provides a query to search with, it is basically going to ignore that query because this is a fake search function. 00:15:22.320 |
And it is going to just return that one document, that financial report that I just showed you. 00:15:29.260 |
So now what I want to do is we have our tool, our fake tool, and we're going to define our internal docs subagent. 00:15:40.440 |
Now the internal docs subagent, again, a very simple instructions here, nothing complicated at all. 00:15:46.900 |
I'm just saying you have access to internal company documents. 00:15:50.780 |
Once the user asks you questions about the company and you will use the provided internal docs to answer the question. 00:15:56.040 |
Ensure you answer the question accurately and use markdown formatting. 00:15:59.120 |
Similar instructions as to what we use with the other subagent. 00:16:07.260 |
So I'm going to ask you, what was our revenue in quarter one? 00:16:14.160 |
And you can see that's given as a breakdown of each of the various units and how much revenue they provided. 00:16:22.360 |
And this is probably where that code execution subagent would be really useful. 00:16:28.260 |
Because it can actually take all these, put them together. 00:16:31.260 |
Although at the same time, I believe these numbers are not difficult to work with. 00:16:37.120 |
So our LLM could probably put those together by itself. 00:16:40.480 |
However, LLMs doing calculations is generally a bad idea. 00:16:46.140 |
They just hallucinate quite frequently, even for some relatively simple. 00:16:53.680 |
So it's always better to get your LLM to write code where it would calculate that it can use to actually calculate the, well, any calculation. 00:17:10.020 |
So let's move on to that code execution subagent. 00:17:15.780 |
Now, in our code execution subagent, for this example, we're focusing on relatively simple calculations. 00:17:23.220 |
That is what we want the code execution agent to do here. 00:17:28.100 |
But especially with more state-of-the-art LLMs, a code execution agent could write pretty good code for a lot of different use cases. 00:17:41.120 |
I'm sure most of us are pretty familiar by now with AI code editors. 00:17:45.320 |
So I don't think I need to explain how good LLMs can be, especially on specific tasks. 00:17:54.420 |
Which is, of course, what we'd be using the LLM for here. 00:17:57.680 |
But I do like to be careful with what I'm giving an LLM when it comes to code execution, especially within a chat interface. 00:18:09.840 |
So with that in mind, this is the tool that our code execution agent will be using. 00:18:16.280 |
You see that I've also explained to the, so if you provide a dark string to your function tool, that will tell the LLM what this tool should be used for and also how to use the tool. 00:18:31.020 |
And I've specifically told it here that the output must be assigned to a variable called result. 00:18:37.380 |
You can, you can change this like output or something else. 00:18:41.840 |
But I believe that at least for OpenAI models, they've actually been trained to output code execution results to something, a variable called result. 00:18:56.280 |
You can change this to output, but what I saw when doing this is that this subagent would very frequently run this tool and it would write it as if it is writing to a variable called output, even though I prompted it not to. 00:19:11.840 |
And then see that it fails and then do it again and then get it right. 00:19:16.840 |
But why retry if we can just get it right the first time? 00:19:23.660 |
So in here, I am saying, okay, here I'm just sharing, okay, what is the code that is going to create to you? 00:19:32.140 |
We can remove it if we wanted to, but it's just for us to understand what is going on. 00:19:35.720 |
Then I'm saying, you know, because code execution can be, it can work, it cannot work. 00:19:42.880 |
I am putting this code execution within a tricep block. 00:19:48.320 |
And then within this tricep block, I'm setting the global variables for the execution scope that we're running here. 00:19:59.580 |
So what this is doing is it's basically ensuring we're not running our code with any variables are either coming from this environment or some other environment, depending on where we're running this. 00:20:14.180 |
So we're making sure there are just no variables that already exist within this execution context. 00:20:22.880 |
Then what is going to happen is this code is going to run inside this empty namespace. 00:20:30.480 |
And then the empty namespace will gather all the variables that have been created within that code execution. 00:20:35.480 |
So we can actually get the result by accessing it like this. 00:20:48.620 |
You will also notice here that you can see here, this is just a normal synchronous function. 00:20:55.860 |
The reason for that is that there's nothing in here that requires network calls or there's nothing here where our code is likely to be waiting. 00:21:08.520 |
So with an API request, you are sending that request. 00:21:11.960 |
In this case, everything is being run right here. 00:21:17.000 |
So unless something within this code here causes a network request, there's not really any need, in my opinion, to make this async. 00:21:37.240 |
You can see here that I am using GPT 4.1 or 4.1 Mini. 00:21:44.220 |
To be honest, it doesn't really matter for this example. 00:21:47.000 |
But I just want to show you that, one, you can use various LMs in different parts of your multi-agent setup, of course. 00:21:56.060 |
That's one of the benefits of multi-agent setups. 00:21:59.660 |
And two, I just want to be extra careful because we're executing code. 00:22:05.340 |
So I want, ideally, the best LM that is reasonably priced and reasonably fast for code execution. 00:22:14.720 |
So that's why I've gone with 4.1 rather than Mini here. 00:22:27.300 |
So we can run that and then we can test our sub-agent. 00:22:31.380 |
And I'm just going to ask you a nonsense question. 00:22:33.760 |
But it's a nonsense question that you can apply math to. 00:22:42.720 |
So it's telling me this is what we're printing out from the function. 00:22:53.440 |
Four apples, multiply them by 71 and 1 tenth batanas. 00:23:02.660 |
And then just multiplying those together and it stores that result here. 00:23:07.700 |
And then we are, of course, extracting that result out here. 00:23:13.380 |
That information or that result gets returned to our LLM. 00:23:17.360 |
And you can see that this is what we get from that. 00:23:22.620 |
And because we're using LLM, it is telling us that we're not being very sensible here. 00:23:30.180 |
And it says, okay, the result is a mathematical product, but in real life, you can't multiply 00:23:37.040 |
So nice little bit of telling us we're not being logical there as well. 00:23:43.580 |
So now we have our three subagents and we can move on to defining our orchestrator. 00:23:48.860 |
So as I mentioned a little bit earlier, the orchestrator is what is going to be controlling 00:23:53.980 |
the inputs and outputs throughout our entire workflow. 00:23:57.800 |
And the way that we can think of our subagents in this system is actually as tools. 00:24:04.040 |
And in fact, the way that we implement our orchestrator connected to all these subagents 00:24:11.560 |
is by turning those subagents into tools and then passing them into the tools parameter of 00:24:23.720 |
We have, I'm defining another tool here, by the way, just we'll see later why, but I 00:24:30.980 |
also want to just show you that we can use tools as well as agents as tools here. 00:24:39.240 |
It is just an agent in the same way that we earlier defined our subagents. 00:24:44.700 |
The main difference is one, its name is orchestrator. 00:24:51.680 |
In this case, I'm thinking, okay, I want this to be the reasoning engine like this is, I want 00:24:58.440 |
this to be a good LM that is powering the orchestrator. 00:25:01.720 |
To be completely honest, you probably don't need it to be 4.1. 00:25:07.780 |
But that is really up to you and your use case and what you need from it. 00:25:18.480 |
These are the subagents that we have defined. 00:25:20.480 |
And we use this as tool method to turn them into tools. 00:25:24.400 |
Now, when we use this as tool method, we also need to provide a tool name. 00:25:32.020 |
So this doesn't need to be anything complicated is just, okay, this is our tool name. 00:25:38.800 |
I'm using the function name and it's worth noting that you cannot include white space. 00:25:47.880 |
So it has to, if you want spaces, you can't, you need underscores. 00:25:53.440 |
Then, yeah, you're just giving us tool description. 00:25:57.160 |
So this is telling your orchestrator when should it use this agent as a tool. 00:26:01.780 |
So we have those tools and then we also have an actual tool. 00:26:06.520 |
Get current date is literally just a tool to get the current date solid. 00:26:12.160 |
We also have the orchestrator prompt that is just above. 00:26:17.080 |
So what I'm trying to do in this orchestrator prompt is give the orchestrator LLM context as 00:26:27.140 |
It needs to know in what type of system is it in. 00:26:32.380 |
And the reason we do that is if it knows what sort of system it is in, it will better understand, 00:26:43.040 |
And, you know, what are all these things that are around me? 00:26:47.380 |
So we're just giving it context so that it can operate better. 00:26:50.400 |
We also tell it, okay, you're in the system and this is how you should operate. 00:26:56.900 |
So what we're saying are you take the user's queries and pass them to the appropriate agent tools. 00:27:01.920 |
The agent tools will see the input you provide and use it to get all the information that you 00:27:08.580 |
We also want to say, like in my earlier example, where we were using the web search tool followed 00:27:17.200 |
by the calculator tool, we also want to explicitly tell the orchestrator that it can call multiple 00:27:25.920 |
agents, okay, to get all the information it needs. 00:27:28.980 |
Then at the very end here, one thing that we just want to be very clear with, with the LLM, 00:27:36.420 |
is that it shouldn't be drawing attention to the fact that this is a multi-agent system. 00:27:41.080 |
In some cases, maybe you would want that, but in this case, I want to build a conversational 00:27:48.300 |
I want users to come in and talk and all they really see is, okay, there's some chat interface 00:27:54.720 |
here and I'm just talking and I have no idea what is really behind the scenes. 00:27:59.040 |
Maybe there's some information like, oh, I've got this from the web, some, you know, there's a source 00:28:03.960 |
or I got this from the internal documents that I have access to. 00:28:08.040 |
Maybe we want a little bit of that, but I don't really want the user to be being told by our 00:28:14.320 |
orchestrator, hey, I just need to go and use the internal doc sub-agent because that's where I find 00:28:24.220 |
I don't really want it to go into that much depth as to what it's doing. 00:28:30.780 |
So that is why we use this last sentence of do not mention or draw attention to the fact that this 00:28:40.080 |
is a multi-agent system in your conversation with the user. 00:28:47.580 |
And now if I run both, we can go ahead and just test our agent. 00:28:55.300 |
So I'm going to say first, how long ago from today was it when we got our last revenue report? 00:29:03.120 |
So there are a couple of things that need to happen here. 00:29:07.380 |
So the orchestrator is going to need to find out, okay, what is the current day, which it 00:29:18.780 |
Then once it has the current date, the orchestrator needs to find out when the last revenue report 00:29:27.700 |
Now, if it is being prompted, well, the agent should understand, okay, although we didn't 00:29:37.380 |
explicitly say that this is for the company we belong to. 00:29:42.000 |
If we're talking to this internal company agent that has access to these internal company 00:29:48.580 |
documents, probably the user is asking about that specific company and not just when were 00:29:57.080 |
the last revenue reports in the entire world released. 00:30:00.220 |
So hopefully we should see that it doesn't use the web search tool to find whenever the 00:30:05.160 |
last revenue report in the entire world was released. 00:30:07.740 |
But instead, it should go into that internal DOPS subagent and get the information from 00:30:15.140 |
So let's run that and we'll see what happens. 00:30:17.480 |
Now, one thing that is, it's kind of hard to see here. 00:30:27.660 |
And it is showing that, okay, today is May 7, 2025. 00:30:34.360 |
It is saying the last revenue report was for the quarter ending May 31st, 2025. 00:30:43.480 |
It doesn't pick up on the April 2nd, but that might just be due to my question not being specific 00:30:50.940 |
enough on when was the last revenue report release or versus when was the end date for that last 00:31:02.420 |
We can see it's using the correct tools and the correct information from various places. 00:31:06.840 |
But I don't actually know that that is the case, meaning that I know this information is 00:31:15.660 |
I know this information is coming from somewhere. 00:31:20.240 |
And this is particularly important when we have more complex agents, where there's information 00:31:26.340 |
And we as developers might not necessarily know what all of the correct information is. 00:31:32.180 |
So what we can do in this scenario to have more insight into what has just happened, we 00:31:39.180 |
can go to Tracer's dashboard in the OpenAI platform. 00:31:42.780 |
So to do that, I'm going to go to platform, openai.com. 00:31:50.240 |
Then I need to make sure I'm in the correct project. 00:31:52.740 |
Okay, so I'm using, I am in the correct project. 00:31:58.120 |
And you will need to go to dashboard on the right here. 00:32:10.480 |
If you're in a company and you're accessing the company's traces, there's a fairly good 00:32:15.060 |
chance that maybe you can't see anything here. 00:32:19.520 |
The reason for that is that the company administrator or owner needs to go into here and give and 00:32:26.860 |
set the permissions for you to actually see their traces or logs dashboard, which they can 00:32:34.900 |
do by going over to the settings, organization settings, data controls, and making sure that 00:32:40.800 |
the logs here, which includes the traces, is visible, either for selected projects to everyone 00:32:48.000 |
or something else that works and makes this visible for you. 00:32:51.640 |
So once you can see everything in your dashboard or traces dashboard specifically, you can go 00:32:59.280 |
to your most recent trace, which should ideally be the one that you just ran. 00:33:04.240 |
And we can see, okay, we have agent workflow. 00:33:07.480 |
That's good because we didn't build handoffs into our workflow. 00:33:10.300 |
We can see the number of tools that we use, which is three. 00:33:15.860 |
And we can see the execution time was 13.73 seconds, which is long, but that is, we did 00:33:22.600 |
use an extra tool or one more tool than we needed here. 00:33:25.140 |
So let's go into this and see why that happened or at least have an idea of why that happened. 00:33:33.380 |
So we can see in here, we went in, so we had the orchestrator, started here. 00:33:39.720 |
Then we went to this web search agent, and this took the majority of the time. 00:33:44.360 |
It's like nine seconds, which is pretty long. 00:33:48.480 |
And if we look at that, we can see that we had a post to, this is OpenAI, V1 responses, the responses API. 00:34:02.040 |
And we can see that the input here, this is coming from the orchestrator, not actually the user. 00:34:08.800 |
So the orchestrator is providing that date of last revenue report, and the LLM, based on this user message, has gone and decided, okay, we need to use the search web tool. 00:34:20.480 |
And we're going to provide it with this query, which is last revenue report date, okay? 00:34:25.140 |
And this is, okay, we can see straight away, this has gone to the search web tool. 00:34:30.180 |
So this is like, okay, we need to prompt a little better here in order to make it clearer to this agent that, or to our orchestrator agent, that this is a agent that might be used by a particular company. 00:34:42.760 |
And usually company questions or revenue questions and so on would be about the internal dots rather than the web, okay? 00:34:51.700 |
And that will also explain why it told us that the revenue date was the 31st of May, okay? 00:34:58.360 |
So, okay, we can see that was sent to the search web tool. 00:35:05.340 |
And based on that, it's saying, okay, this is the current date. 00:35:08.580 |
I know quarter one ends on the 31st of March. 00:35:12.440 |
So it calculates how long ago that was after looking at the get current date tool here, okay? 00:35:22.940 |
And this is outside the web search agent, sorry. 00:35:25.260 |
So web search agent returned to the orchestrator. 00:35:31.420 |
Date of, this is coming from the web search agent, this output here. 00:35:36.140 |
Then the orchestrator decided, okay, I need to use the get current date tool. 00:35:41.780 |
It got that, got the output, and then it generated our final response. 00:35:48.240 |
So we need to print it a little better, more likely than not, at least. 00:35:52.220 |
So what we could do here is say, okay, you're orchestrated a multi-agent system. 00:36:00.480 |
Note that you are an assistant for the Skynet company. 00:36:11.760 |
If the user asks about company information, information, or finances, 00:36:24.500 |
you should use our internal information rather than public information. 00:36:38.760 |
And this should be enough to guide our observator in a bit of a better direction. 00:36:44.120 |
So let's just try again and see what happens. 00:36:46.060 |
Okay, and now we can see it is actually getting that right. 00:36:58.520 |
This is our latest run, 8.5 seconds, much faster. 00:37:03.020 |
And we can see that it went to the internal docs agent. 00:37:11.340 |
And then this is the call from the LLM back up to the orchestrator. 00:37:26.060 |
So this is what the internal docs agent or subagent provided back to the orchestrator. 00:37:34.040 |
The final response, which would be from here, is actually this. 00:37:40.240 |
Okay, exactly the same as what we have in the notebook. 00:37:47.320 |
I'm going to say what is our current revenue and what percentage of revenue comes from the T1000 units. 00:37:57.680 |
We'll see what tools or what agents, subagents, sorry, it decides to use this time. 00:38:13.540 |
Let's switch across to our traces and see what happened. 00:38:20.880 |
You see that I actually tried to use this internal docs agent twice. 00:38:24.460 |
Probably it's trying to find some information that is not within like the dummy tool. 00:38:28.480 |
So it's like trying again and then realize, it probably realizes, oh, okay, this is useless. 00:38:43.080 |
Then it decided it wants to find the percentage of revenue. 00:38:47.800 |
So it's trying to, rather than calculate itself, it's actually trying to search through our internal docs. 00:38:57.060 |
Which obviously slowed it down a little bit there. 00:38:59.800 |
And then in the end, it actually did not, it did not try to use the calculator. 00:39:13.800 |
It came up with this for the response, which is, I believe it's accurate anyway. 00:39:22.500 |
It could have used the code execution sub-agent. 00:39:25.280 |
And again, this is something where we probably want to prompt it a little better. 00:39:31.020 |
So that is actually all I wanted to go through on the orchestrator sub-agent, multi-agent workflow 00:39:41.240 |
As we've seen, there is a lot you can do with this, of course. 00:39:45.400 |
What I just showed you is so a relatively simple pattern. 00:39:52.380 |
There is definitely an argument to be made that, do we need sub-agents for those simple tasks? 00:39:59.380 |
In this case, probably, potentially not, depending on what you're looking to do. 00:40:06.160 |
I would say maybe for code execution, you should use a separate sub-agent. 00:40:12.240 |
If you actually have an internal docs use case, it might benefit you to have that sub-agent 00:40:19.720 |
because then you can prompt that sub-agent with additional context and information about 00:40:25.520 |
how to use that internal docs tool, which can be really useful in just getting better results. 00:40:33.080 |
And the same is also true for the web search sub-agent. 00:40:36.080 |
You can prompt it and give it more information about how to get the best results from your web search tool. 00:40:42.160 |
So it really depends on what you're looking to do, how important latency is. 00:40:48.480 |
One thing to be aware of with the orchestrator sub-agent pattern is everything is going through 00:40:59.320 |
So in the scenario that you only need a single sub-agent to be used, let's say for a web search, 00:41:06.480 |
the orchestrator sub-agent is not ideal because your user query goes from your user to the orchestrator. 00:41:15.180 |
The orchestrator then needs to decide to use the web search sub-agent, which is then another 00:41:23.660 |
And then that other LLM is going to create the web search tool call, get that response. 00:41:30.780 |
That LLM is going to generate another response, send it to your orchestrator, and then the orchestrator 00:41:36.700 |
is going to generate yet another response and send that back to the person. 00:41:44.980 |
We just had four different LLM generation steps. 00:41:49.160 |
Whereas if it was the orchestrator, let's call it main agent in this scenario, going directly 00:41:55.740 |
to a web search tool, it would be orchestrator generates the tool call that goes to your web 00:42:05.660 |
The web search tool returns its response to the orchestrator. 00:42:08.680 |
The orchestrator generates a response based on information and sends it back to the user, 00:42:15.840 |
Naturally, given that LLM calls tend to make up the bulk of our waiting time or our latency, 00:42:25.860 |
naturally, the orchestrator sub-agent pattern in these scenarios where we're just expecting 00:42:35.320 |
However, in this scenario where you do need these sub-agents because they just handle particular 00:42:42.000 |
tasks better than you can do with a generic all-purpose agent, in that scenario, and for queries 00:42:49.480 |
that require more than a single sub-agent to be used, the orchestrator sub-agent pattern 00:42:57.500 |
The other option that you might consider is where you have an orchestrator with sub-agents, 00:43:04.320 |
but then sub-agents can respond directly to the user. 00:43:07.880 |
In that scenario, again, it becomes more difficult to use multiple sub-agents. 00:43:14.540 |
You could have the sub-agent look at the information that has been provided and decide whether to 00:43:19.320 |
respond directly to the user or the orchestrator. 00:43:21.540 |
That is completely possible, but you need to prompt it well and make sure all that is going 00:43:27.900 |
So, this pattern can be good, but you do need to be careful with the latency, and it is generally 00:43:33.880 |
better for those cases where latency is not super, super important. 00:43:39.720 |
Although, that being said, you can still make it conversational. 00:43:43.160 |
You just need to be smart about how many tokens you're using, what tools you're using, and which 00:43:55.480 |
So, that is all I wanted to cover in this video. 00:44:01.920 |
I hope all this has been useful and interesting, but for now, I'll leave it there.