back to indexAgents SDK from OpenAI! | Full Tutorial

Chapters
0:0 OpenAI Agents SDK
1:5 Agents SDK Code
2:38 Agent and Runner
6:56 Function Tools
12:13 Agents SDK Guardrails
18:29 Conversational Agents
21:2 Thoughts on Agents SDK
00:00:00.000 |
Today, we're going to be taking a look at OpenAI's brand new Agents SDK. 00:00:04.000 |
Now, Agents SDK is OpenAI's version of a GenAI slash Agents framework, 00:00:11.100 |
similar to LineChain, Pydantic AI, and so on. 00:00:15.680 |
Now, let's start by jumping straight into their docs. 00:00:21.940 |
which we'll be covering in an actual code example 00:00:26.520 |
So they mention here that is a production-ready upgrade of the previous experiment for Agents, 00:00:34.480 |
And they've added a few things I think are quite useful. 00:00:41.320 |
You're also able to pass off from one agent to another. 00:00:46.400 |
And it's generally, generally a well-built library. 00:00:49.940 |
It still has some limitations that I think most agent frameworks are falling into at the moment, 00:00:56.240 |
which is a very strict definition of what an agent actually is. 00:01:00.100 |
But for the most part, I actually do think this is a good framework. 00:01:11.700 |
there is a link to this in the video description 00:01:18.760 |
So you can go ahead and open that and follow along with me. 00:01:22.260 |
So here I just outlined this is actually coming from their docs, which is over here. 00:01:31.720 |
Agent loop, Python first, handoffs, guardrails, function tools and tracing. 00:01:35.780 |
Well, I'm covering all of these except from handoffs and tracing. 00:01:39.840 |
I'll leave those for later, but yeah, let's jump into those. 00:01:44.000 |
So first we are just going to install the library and it will also, of course, need an OpenAI API key. 00:01:54.640 |
So technically I think there shouldn't be any reason why we can't use this framework with other LLMs. 00:02:02.600 |
Although I'm sure I have made that more difficult than it needs to be. 00:02:12.920 |
You, of course, will need an account if you don't already, although I'm sure most of you do. 00:02:17.640 |
And we'll need to go to API keys and just create a new secret key. 00:02:30.400 |
Okay, I'm going to copy that and come over to here and just paste it in here. 00:02:34.760 |
So now we're all sorted and know what API key is. 00:02:37.780 |
Now let's just take a look at the essentials. 00:02:54.460 |
You are a helpful assistant and I'm using GPT-40 mini. 00:02:58.180 |
Now running our agent, there are a few methods for doing this. 00:03:05.240 |
So we have runner.run, which we'll be using a fair bit, which is just running our agent in async, but without streaming. 00:03:16.220 |
Then there is a runner.runSync, if you need to run your agent synchronously rather than asynchronously. 00:03:23.620 |
And then there is a runner.runStreamed, which is going to run in async and also stream the response back to us. 00:03:31.520 |
We will be not using runSync, but we will be using runStreamed and run. 00:03:39.260 |
There are not many scenarios where I would ever recommend anyone to run AI applications synchronously. 00:03:52.340 |
But anyway, let's try our async run method and say, tell me a short story. 00:03:59.260 |
We're not streaming, so we don't actually see anything until the whole response has been generated. 00:04:10.280 |
Now, in most production scenarios, I think you're going to be using method three, which is the run asynchronously with streaming. 00:04:19.520 |
And the reason I say that is because in the outward-facing user application of whatever it is you're building, 00:04:27.880 |
you are probably going to want to, one, use async because async is essentially just not blocking your API. 00:04:35.960 |
If you're using, if you're implementing this behind the API, it makes your code more scalable, efficient, so on and so on. 00:04:42.100 |
And two, you are probably going to use streaming, at least if this LLM call is user-facing in any way. 00:04:51.940 |
And the reason I say that is, well, we just ran this and we had to wait quite a while for this to actually show us anything, 00:04:57.780 |
which is going to be a bad user experience in most cases. 00:05:01.740 |
So, what we want to do is just start streaming tokens as soon as we get them, so the user sees that something is happening. 00:05:08.320 |
That also allows us, as I will show you in a moment, to stream tool use updates, which I think are incredibly useful. 00:05:23.080 |
And we're just going to print every single event that is returned to us. 00:05:27.920 |
Now, this is going to be a lot of information. 00:05:30.560 |
Okay, so we can see there's a lot of stuff there. 00:05:34.260 |
Basically, for every type of event, so there is an event for, we are using this new agent, this current agent. 00:05:47.060 |
So, these are the tokens that are being generated by your LLM or updates of, okay, I'm going to use this tool or this tool, so on. 00:05:54.540 |
And then we also have this final one here, this run item stream event, which is telling us, okay, the LLM is finished, 00:06:02.260 |
or the agent, LLM, whatever, has finished generating its message output, okay? 00:06:08.860 |
And if we look at these objects, there is quite a lot of information in there. 00:06:15.000 |
So, we need to, we need to parse that out and make it a little bit easier to understand, which we can do quite easily, fortunately. 00:06:22.900 |
So, first, I'm just going to show you how we can get the raw tokens, which is, we look for the event type, 00:06:31.000 |
and we say if it's a raw response event, that is the LLM generator tokens streamed back to us, okay? 00:06:37.380 |
And you can see we get, straight away, it's streaming, that's pretty, that's pretty nice, okay? 00:06:42.800 |
But this is only going to work for a direct LLM agent output. 00:06:47.060 |
As soon as we start introducing tools, things get a little more complicated. 00:06:50.800 |
So, how do we, well, let's see, I'll show you how we do that. 00:06:56.080 |
Now, you can see that OpenAI have called their tool calling a function tool. 00:07:03.660 |
So, now OpenAI started with function calling when they first introduced the concept into their APIs. 00:07:12.280 |
Then they decided it's not called function calling, it's instead called tool calling. 00:07:16.740 |
And now it seems they have decided they don't know which one they like the most, so it's now the function tool. 00:07:25.860 |
So, thank you, OpenAI for the conciseness and clarity there. 00:07:31.140 |
The way that we use or define tools, I'm just going to call them tools, is how we would in most other AI frameworks, to be honest. 00:07:44.200 |
So, I'm defining a simple tool here, it's a multiply tool, which can take a value of float x, float y, and multiply them together. 00:07:53.840 |
I have a doc string here, this is natural language describing to the LLM or agent what this tool does. 00:08:01.180 |
And you can also put instructions on how to use tools in these doc strings as well, if needed. 00:08:07.260 |
And you can see that we're being very precise in our type annotations here, describing what everything is. 00:08:12.800 |
Essentially providing as much information to our agent as possible. 00:08:16.560 |
Then, we decorate that function with the function tool decorator from the agent's SDK. 00:08:32.660 |
We simply pass the tool within the list to the tools parameter during our agent definition. 00:08:39.540 |
I also added a little more to the system prompt slash instructions here. 00:08:43.840 |
I just added, do not rely on your own knowledge too much and instead use your tools to help you answer queries. 00:08:49.320 |
So, I basically don't want the LLM slash agent trying to do math by itself. 00:09:02.280 |
So, we execute this in the exact same way as before. 00:09:06.160 |
But I'm just going to ask it to multiply these two numbers together. 00:09:10.360 |
And of course, that we would expect our agent to use the multiply tool in this scenario. 00:09:21.680 |
And I'm going to print out all events because now we have a ton of different events coming through. 00:09:26.200 |
So, you can see that we have the raw responses stream event. 00:09:29.680 |
That is covering the LLM generating tokens for our tool calls and also our final response, which is what we see down here. 00:09:37.820 |
We also have these run item stream events, which is, okay, the tool was called. 00:09:41.600 |
And then also here, the tool has been executed and we have the output from the actual function itself. 00:09:47.300 |
And then down here, we have that, okay, I'm done event. 00:09:50.360 |
So, we need to pass all of this in a way that makes it a little easier to understand what is happening. 00:09:59.140 |
Now, that doesn't need to be super complicated, but I've added a lot of stuff in here just so you can see how we can extract different parts of these events. 00:10:11.520 |
So, this segment here, this is all raw response events. 00:10:16.760 |
So, these are the tokens as they are being streamed by our LLM. 00:10:21.000 |
Okay, now, this will output everything in some format for us. 00:10:26.720 |
However, what the agent's SDK also does for us is it provides these other events. 00:10:32.400 |
So, this event here tells us which agent we are currently using because you can use multiple agents in a sequence. 00:10:39.720 |
So, you might see this event pop up if you have one of those multi-agent workflows or if you're just running your first agent, which, of course, we are doing here. 00:10:55.600 |
So, where we are outputting the tokens from our LLM, but it waits for our entire tool call to be complete before outputting this event. 00:11:05.140 |
And in this event, it just includes all of that information in one single place, which is easier for us to pass. 00:11:11.580 |
And also, within this segment, we will get our tool output. 00:11:16.660 |
So, this is, we execute our tool function, function tool, however they've called it. 00:11:25.240 |
So, the X multiplied by Y, we get the answer from that. 00:11:29.520 |
So, if we run this, we're going to see a much cleaner output. 00:11:33.340 |
Now, we're using GPT-40 mini, so that is actually super fast. 00:11:41.800 |
But you can see here that we have, first, the current agent. 00:11:47.380 |
Then it streams all of our tool parameter tokens. 00:11:51.860 |
Then, after that has completed, our tool call is complete. 00:11:55.380 |
So, the agent SDK outputs what tool is being called, the name and the args. 00:12:00.440 |
And then it executes the tool and provides us the output from there. 00:12:04.600 |
Then, finally, we stream token by token the final output. 00:12:15.560 |
Guardrails are interesting and they're relatively open, which I like because I would implement guardrails in a slightly different way. 00:12:24.240 |
Or I would like multiple ways to implement guardrails. 00:12:27.960 |
So, if you're not already using guardrails, I would recommend using them more. 00:12:34.740 |
So, we are first, in this example, just going to implement a guardrail powered by NRLM. 00:12:42.640 |
Now, it's also worth noting that the guardrails here, there are two types. 00:12:46.620 |
There is a input guardrail, which I'll show you how to implement here. 00:12:51.120 |
And there is also an output guardrail, which is essentially exactly the same just on the other side. 00:12:56.820 |
So, the input guardrail is checking the input going to your LLM. 00:13:00.820 |
And the output guardrail is checking the output from your LLM that is going to your user. 00:13:07.440 |
So, you can guardrail both sides of the conversation, which again is pretty standard practice and I think it's important. 00:13:15.680 |
So, we are going to implement a guardrail powered by another LLM. 00:13:20.320 |
So, that means we'll just be giving OpenAI all of our money. 00:13:31.740 |
This agent's one and only job is to check if we are hitting a guardrail. 00:13:41.920 |
Okay, and specifically, this agent is checking if the user is asking the agent about its political opinions, which we don't want it to do. 00:13:58.500 |
And this guardrail output is being passed to the output type of our agent. 00:14:02.340 |
And what this is going to do is it's going to force the agent to provide us with a structured output. 00:14:08.900 |
So, the agent is going to output, it's going to generate the answer within this format that we've defined here. 00:14:18.120 |
So, it's going to provide us with a isTriggered method, which is a Boolean value. 00:14:22.200 |
So, it's going to be true if the guardrail has been triggered or false if it has not been triggered. 00:14:27.660 |
Then, we're also going to allow it to explain to us why it thinks the guardrail has been triggered or not. 00:14:33.760 |
Which, you probably, I think this can be useful during development. 00:14:38.100 |
You would probably want to turn it off in any production setting because you're just spending more and more tokens. 00:14:43.840 |
So, yeah, it's useful, but it's useful for just understanding the reasoning, of course. 00:14:56.520 |
I don't think there's anything else to say there. 00:14:57.940 |
We move on, and what we can do first is just see, okay, does this agent work? 00:15:07.060 |
So, I'm going to ask it what it thinks about the Labour Party in the UK. 00:15:14.780 |
Oh, we don't know because OpenAI returns this mess back to us. 00:15:28.060 |
IsTriggeredTrue again, and I think there's another one. 00:15:31.120 |
So, we have the answer multiple times in there, but we need to extract it out because it is hard to read. 00:15:48.460 |
So, the user is asking for an opinion on a political party, which falls under the category of political opinions. 00:15:58.500 |
Now, how do we implement that in another agent? 00:16:01.780 |
So, let's say our original agent, which had the multiply tool. 00:16:08.280 |
Well, we are going to need to use this input guardrail decorator on a function, which basically is going to run our politics agent that we just defined. 00:16:20.560 |
Get the response, and then return that response via this guardrail function object. 00:16:27.400 |
So, there's a strict format that we need to follow here in order to implement this input guardrail with any other agents. 00:16:35.160 |
So, we need our input parameters to follow this pattern. 00:16:51.740 |
And we have to output this format so that the agent SDK knows what to do with what we're outputting. 00:17:00.180 |
And once we define that, we can then plug it in to another agent. 00:17:06.420 |
This other agent is exactly the same as the agent we had before, which looked exactly like this. 00:17:12.620 |
But now, we have just added that politics guardrail. 00:17:16.140 |
And know that input guardrails here is going to be a list of those input guardrail objects. 00:17:22.480 |
Also, worth noting is if you have an output guardrail, it would just be like this. 00:17:29.020 |
So, you'd have output guardrails, and then you'd put politics guardrail or whatever else. 00:17:34.360 |
The only other difference is that up here is the only other difference is that up here, this would be a output guardrail. 00:17:47.820 |
And we are going to ask it again what these two numbers multiplied together are. 00:17:55.060 |
And we should see that it will answer us using the tool. 00:18:03.580 |
But what if we ask it about the Labour Party in the UK again? 00:18:10.260 |
And we would, of course, in our applications, need to handle this error. 00:18:15.560 |
We can see that the error being raised here is expected. 00:18:18.720 |
It is our input guardrail tripwire triggered error. 00:18:28.800 |
Now, the final thing that I do want to cover, because this is obviously very important, 00:18:32.580 |
is so far we've just been feeding in a single input query, a single string into our agents. 00:18:39.920 |
And there are many, probably the vast majority of use cases are not going to be doing that. 00:18:45.280 |
Instead, they're going to be feeding in a list of interactions between user and assistant over time. 00:18:52.020 |
So, how do we take what we've done so far and make our agents conversational? 00:19:00.000 |
So, first, let's just ask our agent to remember the number 7.814 for us. 00:19:11.080 |
And we get, I cannot store or remember information for future use. 00:19:16.240 |
However, you can save a note or use a reminder app. 00:19:20.900 |
So, the agent is telling us, oh, we can't do that. 00:19:30.340 |
Agents SDK has this nice method, actually, which is two input lists. 00:19:38.000 |
And we are converting it into an input list for our next query or our next message. 00:19:46.520 |
The first one here is the message from us, the user message, where we ask it to remember that number. 00:19:52.320 |
Then the next one has a lot more information, but it's coming from the agent. 00:20:04.340 |
And we also have the content, which includes these annotations. 00:20:08.720 |
I assume that will be for citations or something else. 00:20:12.220 |
And we have the text content, which where it's telling us, I can't remember anything, which can't miss. 00:20:16.840 |
Okay, so we actually merge that two input list here with our next message. 00:20:23.860 |
Okay, so our next message, we are going to use a dictionary here where we specify we are the user. 00:20:30.260 |
And I'm going to say multiply the last number, so I'm not specifying what number it should remember now, by this 103.892. 00:20:43.060 |
Let's run that, and we will see our final output. 00:20:47.360 |
Okay, so the final output is the result of multiplying those two numbers is approximately 811.812. 00:20:54.860 |
So it seems like our agent can remember our previous interactions, which is great. 00:21:02.280 |
So that is actually everything I wanted to cover. 00:21:05.340 |
We've, I think, covered the essentials of the library there. 00:21:10.740 |
There are, of course, a lot of other things in there. 00:21:13.680 |
There's, of course, the handoff, the tracing. 00:21:15.920 |
And even within the features that we did just cover, there is, there's a lot more nuance and detail to those, which I will definitely, almost definitely cover pretty soon. 00:21:27.960 |
But it's definitely worth looking at the SDK. 00:21:30.000 |
And as I mentioned at the start, I think this is up there as one of my preferred frameworks for building agents, as long as those agents are not too complicated, or as long as I don't need too much flexibility in what they might look like. 00:21:46.080 |
And I think also, as long as I'm using OpenAI, which might not always be the case. 00:21:51.620 |
So, interesting framework, I think, generally well built, and definitely something we'll be covering more in the future. 00:22:01.420 |
So, thank you very much for watching, and I will see you again in the next one.