Agents SDK from OpenAI! | Full Tutorial

00:00:00.000 | Today, we're going to be taking a look at OpenAI's brand new Agents SDK.

00:00:04.000 | Now, Agents SDK is OpenAI's version of a GenAI slash Agents framework,

00:00:11.100 | similar to LineChain, Pydantic AI, and so on.

00:00:15.680 | Now, let's start by jumping straight into their docs.

00:00:19.660 | So they just outline a few things here,

00:00:21.940 | which we'll be covering in an actual code example

00:00:25.380 | I'm going to be taking through everything.

00:00:26.520 | So they mention here that is a production-ready upgrade of the previous experiment for Agents,

00:00:32.480 | which is their Swarm library.

00:00:34.480 | And they've added a few things I think are quite useful.

00:00:37.440 | So, of course, Agents, there is tool use.

00:00:41.320 | You're also able to pass off from one agent to another.

00:00:44.580 | You have input and output guardrails.

00:00:46.400 | And it's generally, generally a well-built library.

00:00:49.940 | It still has some limitations that I think most agent frameworks are falling into at the moment,

00:00:56.240 | which is a very strict definition of what an agent actually is.

00:01:00.100 | But for the most part, I actually do think this is a good framework.

00:01:05.820 | Now, let's take a look at the code.

00:01:10.160 | When we're working through this example,

00:01:11.700 | there is a link to this in the video description

00:01:15.860 | and also the comments below the video.

00:01:18.760 | So you can go ahead and open that and follow along with me.

00:01:22.260 | So here I just outlined this is actually coming from their docs, which is over here.

00:01:28.980 | These are the main features of that SDK.

00:01:31.360 | Okay.

00:01:31.720 | Agent loop, Python first, handoffs, guardrails, function tools and tracing.

00:01:35.780 | Well, I'm covering all of these except from handoffs and tracing.

00:01:39.840 | I'll leave those for later, but yeah, let's jump into those.

00:01:44.000 | So first we are just going to install the library and it will also, of course, need an OpenAI API key.

00:01:51.500 | Although note that this is open source.

00:01:54.640 | So technically I think there shouldn't be any reason why we can't use this framework with other LLMs.

00:02:02.600 | Although I'm sure I have made that more difficult than it needs to be.

00:02:06.680 | So we need our API key.

00:02:08.900 | We go to platform.openai.com.

00:02:12.920 | You, of course, will need an account if you don't already, although I'm sure most of you do.

00:02:17.640 | And we'll need to go to API keys and just create a new secret key.

00:02:22.980 | I'm going to call it agents SDK.

00:02:27.000 | You call it whatever you want, of course.

00:02:30.400 | Okay, I'm going to copy that and come over to here and just paste it in here.

00:02:34.760 | So now we're all sorted and know what API key is.

00:02:37.380 | Great.

00:02:37.780 | Now let's just take a look at the essentials.

00:02:41.860 | Okay, so there is the agent and the runner.

00:02:45.620 | So we initialize a very simple agent here.

00:02:49.400 | We give it a name.

00:02:50.220 | I'm going to call it a system.

00:02:51.320 | A very, very simple system prompt here.

00:02:54.460 | You are a helpful assistant and I'm using GPT-40 mini.

00:02:58.180 | Now running our agent, there are a few methods for doing this.

00:03:02.220 | All of these are through this runner class.

00:03:05.240 | So we have runner.run, which we'll be using a fair bit, which is just running our agent in async, but without streaming.

00:03:16.220 | Then there is a runner.runSync, if you need to run your agent synchronously rather than asynchronously.

00:03:23.620 | And then there is a runner.runStreamed, which is going to run in async and also stream the response back to us.

00:03:31.520 | We will be not using runSync, but we will be using runStreamed and run.

00:03:37.480 | Generally speaking, I would...

00:03:39.260 | There are not many scenarios where I would ever recommend anyone to run AI applications synchronously.

00:03:48.480 | And I've spoken about that a lot before.

00:03:50.100 | I won't talk about it again here.

00:03:52.340 | But anyway, let's try our async run method and say, tell me a short story.

00:03:58.000 | That will take a moment.

00:03:59.260 | We're not streaming, so we don't actually see anything until the whole response has been generated.

00:04:04.600 | Okay, and we get our response.

00:04:06.900 | Pretty standard.

00:04:07.820 | And I think there's a ton to say about that.

00:04:10.280 | Now, in most production scenarios, I think you're going to be using method three, which is the run asynchronously with streaming.

00:04:19.520 | And the reason I say that is because in the outward-facing user application of whatever it is you're building,

00:04:27.880 | you are probably going to want to, one, use async because async is essentially just not blocking your API.

00:04:35.960 | If you're using, if you're implementing this behind the API, it makes your code more scalable, efficient, so on and so on.

00:04:42.100 | And two, you are probably going to use streaming, at least if this LLM call is user-facing in any way.

00:04:51.940 | And the reason I say that is, well, we just ran this and we had to wait quite a while for this to actually show us anything,

00:04:57.780 | which is going to be a bad user experience in most cases.

00:05:01.740 | So, what we want to do is just start streaming tokens as soon as we get them, so the user sees that something is happening.

00:05:08.320 | That also allows us, as I will show you in a moment, to stream tool use updates, which I think are incredibly useful.

00:05:18.200 | So, we're going to run streamed.

00:05:20.800 | Input here is hello there.

00:05:23.080 | And we're just going to print every single event that is returned to us.

00:05:27.920 | Now, this is going to be a lot of information.

00:05:30.560 | Okay, so we can see there's a lot of stuff there.

00:05:34.260 | Basically, for every type of event, so there is an event for, we are using this new agent, this current agent.

00:05:43.440 | That has its own event.

00:05:44.960 | Then there's the stream event.

00:05:47.060 | So, these are the tokens that are being generated by your LLM or updates of, okay, I'm going to use this tool or this tool, so on.

00:05:54.540 | And then we also have this final one here, this run item stream event, which is telling us, okay, the LLM is finished,

00:06:02.260 | or the agent, LLM, whatever, has finished generating its message output, okay?

00:06:08.860 | And if we look at these objects, there is quite a lot of information in there.

00:06:15.000 | So, we need to, we need to parse that out and make it a little bit easier to understand, which we can do quite easily, fortunately.

00:06:22.900 | So, first, I'm just going to show you how we can get the raw tokens, which is, we look for the event type,

00:06:31.000 | and we say if it's a raw response event, that is the LLM generator tokens streamed back to us, okay?

00:06:37.380 | And you can see we get, straight away, it's streaming, that's pretty, that's pretty nice, okay?

00:06:42.800 | But this is only going to work for a direct LLM agent output.

00:06:47.060 | As soon as we start introducing tools, things get a little more complicated.

00:06:50.800 | So, how do we, well, let's see, I'll show you how we do that.

00:06:56.080 | Now, you can see that OpenAI have called their tool calling a function tool.

00:07:03.660 | So, now OpenAI started with function calling when they first introduced the concept into their APIs.

00:07:10.920 | It was called function calling.

00:07:12.280 | Then they decided it's not called function calling, it's instead called tool calling.

00:07:16.740 | And now it seems they have decided they don't know which one they like the most, so it's now the function tool.

00:07:25.860 | So, thank you, OpenAI for the conciseness and clarity there.

00:07:31.140 | The way that we use or define tools, I'm just going to call them tools, is how we would in most other AI frameworks, to be honest.

00:07:42.640 | It's not complicated.

00:07:44.200 | So, I'm defining a simple tool here, it's a multiply tool, which can take a value of float x, float y, and multiply them together.

00:07:52.640 | Super simple.

00:07:53.840 | I have a doc string here, this is natural language describing to the LLM or agent what this tool does.

00:08:01.180 | And you can also put instructions on how to use tools in these doc strings as well, if needed.

00:08:07.260 | And you can see that we're being very precise in our type annotations here, describing what everything is.

00:08:12.800 | Essentially providing as much information to our agent as possible.

00:08:16.560 | Then, we decorate that function with the function tool decorator from the agent's SDK.

00:08:23.220 | And that is how we define a tool.

00:08:25.740 | It's very simple.

00:08:26.860 | So, we have our tool.

00:08:28.840 | Now, how do we run our agent with that tool?

00:08:30.720 | Again, not difficult.

00:08:32.660 | We simply pass the tool within the list to the tools parameter during our agent definition.

00:08:39.540 | I also added a little more to the system prompt slash instructions here.

00:08:43.840 | I just added, do not rely on your own knowledge too much and instead use your tools to help you answer queries.

00:08:49.320 | So, I basically don't want the LLM slash agent trying to do math by itself.

00:08:56.060 | I want it to use my multiply tool.

00:08:57.840 | So, we have that.

00:09:00.260 | And now, we can run it.

00:09:02.280 | So, we execute this in the exact same way as before.

00:09:06.160 | But I'm just going to ask it to multiply these two numbers together.

00:09:10.360 | And of course, that we would expect our agent to use the multiply tool in this scenario.

00:09:17.800 | So, we do that.

00:09:21.680 | And I'm going to print out all events because now we have a ton of different events coming through.

00:09:26.200 | So, you can see that we have the raw responses stream event.

00:09:29.680 | That is covering the LLM generating tokens for our tool calls and also our final response, which is what we see down here.

00:09:37.820 | We also have these run item stream events, which is, okay, the tool was called.

00:09:41.600 | And then also here, the tool has been executed and we have the output from the actual function itself.

00:09:47.300 | And then down here, we have that, okay, I'm done event.

00:09:50.360 | So, we need to pass all of this in a way that makes it a little easier to understand what is happening.

00:09:59.140 | Now, that doesn't need to be super complicated, but I've added a lot of stuff in here just so you can see how we can extract different parts of these events.

00:10:11.520 | So, this segment here, this is all raw response events.

00:10:16.760 | So, these are the tokens as they are being streamed by our LLM.

00:10:21.000 | Okay, now, this will output everything in some format for us.

00:10:26.720 | However, what the agent's SDK also does for us is it provides these other events.

00:10:32.400 | So, this event here tells us which agent we are currently using because you can use multiple agents in a sequence.

00:10:39.720 | So, you might see this event pop up if you have one of those multi-agent workflows or if you're just running your first agent, which, of course, we are doing here.

00:10:50.040 | We also have this run item stream event.

00:10:52.720 | So, this includes the tool calling.

00:10:55.600 | So, where we are outputting the tokens from our LLM, but it waits for our entire tool call to be complete before outputting this event.

00:11:05.140 | And in this event, it just includes all of that information in one single place, which is easier for us to pass.

00:11:11.580 | And also, within this segment, we will get our tool output.

00:11:16.660 | So, this is, we execute our tool function, function tool, however they've called it.

00:11:22.220 | And we have that answer.

00:11:25.240 | So, the X multiplied by Y, we get the answer from that.

00:11:29.520 | So, if we run this, we're going to see a much cleaner output.

00:11:33.340 | Now, we're using GPT-40 mini, so that is actually super fast.

00:11:39.820 | I will slow it down for you now.

00:11:41.800 | But you can see here that we have, first, the current agent.

00:11:45.560 | We can see which agent is being used.

00:11:47.380 | Then it streams all of our tool parameter tokens.

00:11:51.860 | Then, after that has completed, our tool call is complete.

00:11:55.380 | So, the agent SDK outputs what tool is being called, the name and the args.

00:12:00.440 | And then it executes the tool and provides us the output from there.

00:12:04.600 | Then, finally, we stream token by token the final output.

00:12:08.900 | Great.

00:12:10.080 | So, we have that.

00:12:11.160 | That is our streaming.

00:12:13.600 | Now, we also have guardrails.

00:12:15.560 | Guardrails are interesting and they're relatively open, which I like because I would implement guardrails in a slightly different way.

00:12:24.240 | Or I would like multiple ways to implement guardrails.

00:12:26.540 | They are super important though.

00:12:27.960 | So, if you're not already using guardrails, I would recommend using them more.

00:12:34.740 | So, we are first, in this example, just going to implement a guardrail powered by NRLM.

00:12:42.640 | Now, it's also worth noting that the guardrails here, there are two types.

00:12:46.620 | There is a input guardrail, which I'll show you how to implement here.

00:12:51.120 | And there is also an output guardrail, which is essentially exactly the same just on the other side.

00:12:56.820 | So, the input guardrail is checking the input going to your LLM.

00:13:00.820 | And the output guardrail is checking the output from your LLM that is going to your user.

00:13:07.440 | So, you can guardrail both sides of the conversation, which again is pretty standard practice and I think it's important.

00:13:15.680 | So, we are going to implement a guardrail powered by another LLM.

00:13:20.320 | So, that means we'll just be giving OpenAI all of our money.

00:13:25.800 | And to do that, we implement another agent.

00:13:31.740 | This agent's one and only job is to check if we are hitting a guardrail.

00:13:41.920 | Okay, and specifically, this agent is checking if the user is asking the agent about its political opinions, which we don't want it to do.

00:13:51.460 | So, we define this guardrail output item.

00:13:58.500 | And this guardrail output is being passed to the output type of our agent.

00:14:02.340 | And what this is going to do is it's going to force the agent to provide us with a structured output.

00:14:08.900 | So, the agent is going to output, it's going to generate the answer within this format that we've defined here.

00:14:18.120 | So, it's going to provide us with a isTriggered method, which is a Boolean value.

00:14:22.200 | So, it's going to be true if the guardrail has been triggered or false if it has not been triggered.

00:14:27.660 | Then, we're also going to allow it to explain to us why it thinks the guardrail has been triggered or not.

00:14:33.760 | Which, you probably, I think this can be useful during development.

00:14:38.100 | You would probably want to turn it off in any production setting because you're just spending more and more tokens.

00:14:43.840 | So, yeah, it's useful, but it's useful for just understanding the reasoning, of course.

00:14:51.380 | Great.

00:14:52.760 | So, we initialize that.

00:14:56.520 | I don't think there's anything else to say there.

00:14:57.940 | We move on, and what we can do first is just see, okay, does this agent work?

00:15:04.840 | Does it?

00:15:05.820 | I see what it outputs.

00:15:07.060 | So, I'm going to ask it what it thinks about the Labour Party in the UK.

00:15:10.320 | And what does it think?

00:15:14.780 | Oh, we don't know because OpenAI returns this mess back to us.

00:15:18.880 | So, the answer is in here.

00:15:22.440 | It's just hidden, and you can find it.

00:15:25.120 | I think it's even in multiple places.

00:15:26.660 | Look, we have isTriggeredTrue.

00:15:28.060 | IsTriggeredTrue again, and I think there's another one.

00:15:31.120 | So, we have the answer multiple times in there, but we need to extract it out because it is hard to read.

00:15:39.360 | So, we just feel result final output.

00:15:41.500 | And then we get this nice, pedantic class.

00:15:43.920 | Got our outputs.

00:15:45.120 | We have isTriggeredTrue.

00:15:46.680 | And we say the reasoning.

00:15:48.460 | So, the user is asking for an opinion on a political party, which falls under the category of political opinions.

00:15:53.900 | Thank you so much.

00:15:55.520 | So, we have our logic.

00:15:58.500 | Now, how do we implement that in another agent?

00:16:01.780 | So, let's say our original agent, which had the multiply tool.

00:16:05.600 | Let's go and see how we do that.

00:16:08.280 | Well, we are going to need to use this input guardrail decorator on a function, which basically is going to run our politics agent that we just defined.

00:16:20.560 | Get the response, and then return that response via this guardrail function object.

00:16:26.400 | Okay?

00:16:27.400 | So, there's a strict format that we need to follow here in order to implement this input guardrail with any other agents.

00:16:35.160 | So, we need our input parameters to follow this pattern.

00:16:38.700 | All right?

00:16:39.420 | We don't even use these two.

00:16:40.660 | In this example, we're not using these.

00:16:42.260 | We're just using this input.

00:16:43.680 | But we have to have these two parameters.

00:16:46.940 | Otherwise, this will not work.

00:16:50.040 | It won't be an invalid guardrail.

00:16:51.740 | And we have to output this format so that the agent SDK knows what to do with what we're outputting.

00:16:59.660 | Okay?

00:17:00.180 | And once we define that, we can then plug it in to another agent.

00:17:06.420 | This other agent is exactly the same as the agent we had before, which looked exactly like this.

00:17:12.620 | But now, we have just added that politics guardrail.

00:17:16.140 | And know that input guardrails here is going to be a list of those input guardrail objects.

00:17:22.480 | Also, worth noting is if you have an output guardrail, it would just be like this.

00:17:29.020 | So, you'd have output guardrails, and then you'd put politics guardrail or whatever else.

00:17:34.360 | The only other difference is that up here is the only other difference is that up here, this would be a output guardrail.

00:17:39.460 | Worth noting.

00:17:41.140 | So, let's define our new safe agent.

00:17:47.820 | And we are going to ask it again what these two numbers multiplied together are.

00:17:55.060 | And we should see that it will answer us using the tool.

00:17:59.920 | Okay?

00:18:00.940 | That's great.

00:18:01.800 | So, we're not blocking everything.

00:18:03.580 | But what if we ask it about the Labour Party in the UK again?

00:18:07.680 | We will see an error.

00:18:10.260 | And we would, of course, in our applications, need to handle this error.

00:18:15.560 | We can see that the error being raised here is expected.

00:18:18.720 | It is our input guardrail tripwire triggered error.

00:18:22.640 | So, that is pretty useful.

00:18:25.560 | And that is how we use guardrails.

00:18:28.800 | Now, the final thing that I do want to cover, because this is obviously very important,

00:18:32.580 | is so far we've just been feeding in a single input query, a single string into our agents.

00:18:39.920 | And there are many, probably the vast majority of use cases are not going to be doing that.

00:18:45.280 | Instead, they're going to be feeding in a list of interactions between user and assistant over time.

00:18:52.020 | So, how do we take what we've done so far and make our agents conversational?

00:18:55.860 | It is fairly straightforward.

00:18:59.200 | It's not complicated.

00:19:00.000 | So, first, let's just ask our agent to remember the number 7.814 for us.

00:19:08.720 | And we remember to use our manners there.

00:19:11.080 | And we get, I cannot store or remember information for future use.

00:19:16.240 | However, you can save a note or use a reminder app.

00:19:19.500 | Thank you very much.

00:19:20.900 | So, the agent is telling us, oh, we can't do that.

00:19:23.880 | But actually, we can do that.

00:19:25.800 | The agent just doesn't know it.

00:19:27.280 | So, we come down to here.

00:19:30.340 | Agents SDK has this nice method, actually, which is two input lists.

00:19:35.740 | So, we're taking our result here.

00:19:38.000 | And we are converting it into an input list for our next query or our next message.

00:19:43.320 | And we get this list of messages.

00:19:46.520 | The first one here is the message from us, the user message, where we ask it to remember that number.

00:19:52.320 | Then the next one has a lot more information, but it's coming from the agent.

00:19:57.120 | We see that the role here is assistant.

00:19:58.900 | And that is not the name of our agent.

00:20:02.180 | That is just the AI message.

00:20:04.340 | And we also have the content, which includes these annotations.

00:20:08.720 | I assume that will be for citations or something else.

00:20:12.220 | And we have the text content, which where it's telling us, I can't remember anything, which can't miss.

00:20:16.840 | Okay, so we actually merge that two input list here with our next message.

00:20:23.860 | Okay, so our next message, we are going to use a dictionary here where we specify we are the user.

00:20:28.800 | This is the user message.

00:20:30.260 | And I'm going to say multiply the last number, so I'm not specifying what number it should remember now, by this 103.892.

00:20:43.060 | Let's run that, and we will see our final output.

00:20:47.360 | Okay, so the final output is the result of multiplying those two numbers is approximately 811.812.

00:20:54.860 | So it seems like our agent can remember our previous interactions, which is great.

00:21:02.280 | So that is actually everything I wanted to cover.

00:21:05.340 | We've, I think, covered the essentials of the library there.

00:21:10.740 | There are, of course, a lot of other things in there.

00:21:13.680 | There's, of course, the handoff, the tracing.

00:21:15.920 | And even within the features that we did just cover, there is, there's a lot more nuance and detail to those, which I will definitely, almost definitely cover pretty soon.

00:21:27.960 | But it's definitely worth looking at the SDK.

00:21:30.000 | And as I mentioned at the start, I think this is up there as one of my preferred frameworks for building agents, as long as those agents are not too complicated, or as long as I don't need too much flexibility in what they might look like.

00:21:46.080 | And I think also, as long as I'm using OpenAI, which might not always be the case.

00:21:51.620 | So, interesting framework, I think, generally well built, and definitely something we'll be covering more in the future.

00:22:00.240 | For now, I'll leave it there.

00:22:01.420 | So, thank you very much for watching, and I will see you again in the next one.

00:22:07.020 | Bye.

00:22:07.220 | Bye.

00:22:07.500 | Bye.

00:22:08.500 | Bye.

00:22:09.500 | Bye.

00:22:10.500 | Bye.

00:22:11.500 | Bye.

00:22:12.500 | Bye.