Agents SDK from OpenAI! | Full Tutorial

Today, we're going to be taking a look at OpenAI's brand new Agents SDK. Now, Agents SDK is OpenAI's version of a GenAI slash Agents framework, similar to LineChain, Pydantic AI, and so on. Now, let's start by jumping straight into their docs. So they just outline a few things here, which we'll be covering in an actual code example I'm going to be taking through everything.

So they mention here that is a production-ready upgrade of the previous experiment for Agents, which is their Swarm library. And they've added a few things I think are quite useful. So, of course, Agents, there is tool use. You're also able to pass off from one agent to another. You have input and output guardrails.

And it's generally, generally a well-built library. It still has some limitations that I think most agent frameworks are falling into at the moment, which is a very strict definition of what an agent actually is. But for the most part, I actually do think this is a good framework. Now, let's take a look at the code.

When we're working through this example, there is a link to this in the video description and also the comments below the video. So you can go ahead and open that and follow along with me. So here I just outlined this is actually coming from their docs, which is over here.

These are the main features of that SDK. Okay. Agent loop, Python first, handoffs, guardrails, function tools and tracing. Well, I'm covering all of these except from handoffs and tracing. I'll leave those for later, but yeah, let's jump into those. So first we are just going to install the library and it will also, of course, need an OpenAI API key.

Although note that this is open source. So technically I think there shouldn't be any reason why we can't use this framework with other LLMs. Although I'm sure I have made that more difficult than it needs to be. So we need our API key. We go to platform.openai.com. You, of course, will need an account if you don't already, although I'm sure most of you do.

And we'll need to go to API keys and just create a new secret key. I'm going to call it agents SDK. You call it whatever you want, of course. Okay, I'm going to copy that and come over to here and just paste it in here. So now we're all sorted and know what API key is.

Great. Now let's just take a look at the essentials. Okay, so there is the agent and the runner. So we initialize a very simple agent here. We give it a name. I'm going to call it a system. A very, very simple system prompt here. You are a helpful assistant and I'm using GPT-40 mini.

Now running our agent, there are a few methods for doing this. All of these are through this runner class. So we have runner.run, which we'll be using a fair bit, which is just running our agent in async, but without streaming. Then there is a runner.runSync, if you need to run your agent synchronously rather than asynchronously.

And then there is a runner.runStreamed, which is going to run in async and also stream the response back to us. We will be not using runSync, but we will be using runStreamed and run. Generally speaking, I would... There are not many scenarios where I would ever recommend anyone to run AI applications synchronously.

And I've spoken about that a lot before. I won't talk about it again here. But anyway, let's try our async run method and say, tell me a short story. That will take a moment. We're not streaming, so we don't actually see anything until the whole response has been generated.

Okay, and we get our response. Pretty standard. And I think there's a ton to say about that. Now, in most production scenarios, I think you're going to be using method three, which is the run asynchronously with streaming. And the reason I say that is because in the outward-facing user application of whatever it is you're building, you are probably going to want to, one, use async because async is essentially just not blocking your API.

If you're using, if you're implementing this behind the API, it makes your code more scalable, efficient, so on and so on. And two, you are probably going to use streaming, at least if this LLM call is user-facing in any way. And the reason I say that is, well, we just ran this and we had to wait quite a while for this to actually show us anything, which is going to be a bad user experience in most cases.

So, what we want to do is just start streaming tokens as soon as we get them, so the user sees that something is happening. That also allows us, as I will show you in a moment, to stream tool use updates, which I think are incredibly useful. So, we're going to run streamed.

Input here is hello there. And we're just going to print every single event that is returned to us. Now, this is going to be a lot of information. Okay, so we can see there's a lot of stuff there. Basically, for every type of event, so there is an event for, we are using this new agent, this current agent.

That has its own event. Then there's the stream event. So, these are the tokens that are being generated by your LLM or updates of, okay, I'm going to use this tool or this tool, so on. And then we also have this final one here, this run item stream event, which is telling us, okay, the LLM is finished, or the agent, LLM, whatever, has finished generating its message output, okay?

And if we look at these objects, there is quite a lot of information in there. So, we need to, we need to parse that out and make it a little bit easier to understand, which we can do quite easily, fortunately. So, first, I'm just going to show you how we can get the raw tokens, which is, we look for the event type, and we say if it's a raw response event, that is the LLM generator tokens streamed back to us, okay?

And you can see we get, straight away, it's streaming, that's pretty, that's pretty nice, okay? But this is only going to work for a direct LLM agent output. As soon as we start introducing tools, things get a little more complicated. So, how do we, well, let's see, I'll show you how we do that.

Now, you can see that OpenAI have called their tool calling a function tool. So, now OpenAI started with function calling when they first introduced the concept into their APIs. It was called function calling. Then they decided it's not called function calling, it's instead called tool calling. And now it seems they have decided they don't know which one they like the most, so it's now the function tool.

So, thank you, OpenAI for the conciseness and clarity there. The way that we use or define tools, I'm just going to call them tools, is how we would in most other AI frameworks, to be honest. It's not complicated. So, I'm defining a simple tool here, it's a multiply tool, which can take a value of float x, float y, and multiply them together.

Super simple. I have a doc string here, this is natural language describing to the LLM or agent what this tool does. And you can also put instructions on how to use tools in these doc strings as well, if needed. And you can see that we're being very precise in our type annotations here, describing what everything is.

Essentially providing as much information to our agent as possible. Then, we decorate that function with the function tool decorator from the agent's SDK. And that is how we define a tool. It's very simple. So, we have our tool. Now, how do we run our agent with that tool? Again, not difficult.

We simply pass the tool within the list to the tools parameter during our agent definition. I also added a little more to the system prompt slash instructions here. I just added, do not rely on your own knowledge too much and instead use your tools to help you answer queries.

So, I basically don't want the LLM slash agent trying to do math by itself. I want it to use my multiply tool. So, we have that. And now, we can run it. So, we execute this in the exact same way as before. But I'm just going to ask it to multiply these two numbers together.

And of course, that we would expect our agent to use the multiply tool in this scenario. So, we do that. And I'm going to print out all events because now we have a ton of different events coming through. So, you can see that we have the raw responses stream event.

That is covering the LLM generating tokens for our tool calls and also our final response, which is what we see down here. We also have these run item stream events, which is, okay, the tool was called. And then also here, the tool has been executed and we have the output from the actual function itself.

And then down here, we have that, okay, I'm done event. So, we need to pass all of this in a way that makes it a little easier to understand what is happening. Now, that doesn't need to be super complicated, but I've added a lot of stuff in here just so you can see how we can extract different parts of these events.

So, this segment here, this is all raw response events. So, these are the tokens as they are being streamed by our LLM. Okay, now, this will output everything in some format for us. However, what the agent's SDK also does for us is it provides these other events. So, this event here tells us which agent we are currently using because you can use multiple agents in a sequence.

So, you might see this event pop up if you have one of those multi-agent workflows or if you're just running your first agent, which, of course, we are doing here. We also have this run item stream event. So, this includes the tool calling. So, where we are outputting the tokens from our LLM, but it waits for our entire tool call to be complete before outputting this event.

And in this event, it just includes all of that information in one single place, which is easier for us to pass. And also, within this segment, we will get our tool output. So, this is, we execute our tool function, function tool, however they've called it. And we have that answer.

So, the X multiplied by Y, we get the answer from that. So, if we run this, we're going to see a much cleaner output. Now, we're using GPT-40 mini, so that is actually super fast. I will slow it down for you now. But you can see here that we have, first, the current agent.

We can see which agent is being used. Then it streams all of our tool parameter tokens. Then, after that has completed, our tool call is complete. So, the agent SDK outputs what tool is being called, the name and the args. And then it executes the tool and provides us the output from there.

Then, finally, we stream token by token the final output. Great. So, we have that. That is our streaming. Now, we also have guardrails. Guardrails are interesting and they're relatively open, which I like because I would implement guardrails in a slightly different way. Or I would like multiple ways to implement guardrails.

They are super important though. So, if you're not already using guardrails, I would recommend using them more. So, we are first, in this example, just going to implement a guardrail powered by NRLM. Now, it's also worth noting that the guardrails here, there are two types. There is a input guardrail, which I'll show you how to implement here.

And there is also an output guardrail, which is essentially exactly the same just on the other side. So, the input guardrail is checking the input going to your LLM. And the output guardrail is checking the output from your LLM that is going to your user. So, you can guardrail both sides of the conversation, which again is pretty standard practice and I think it's important.

So, we are going to implement a guardrail powered by another LLM. So, that means we'll just be giving OpenAI all of our money. And to do that, we implement another agent. This agent's one and only job is to check if we are hitting a guardrail. Okay, and specifically, this agent is checking if the user is asking the agent about its political opinions, which we don't want it to do.

So, we define this guardrail output item. And this guardrail output is being passed to the output type of our agent. And what this is going to do is it's going to force the agent to provide us with a structured output. So, the agent is going to output, it's going to generate the answer within this format that we've defined here.

So, it's going to provide us with a isTriggered method, which is a Boolean value. So, it's going to be true if the guardrail has been triggered or false if it has not been triggered. Then, we're also going to allow it to explain to us why it thinks the guardrail has been triggered or not.

Which, you probably, I think this can be useful during development. You would probably want to turn it off in any production setting because you're just spending more and more tokens. So, yeah, it's useful, but it's useful for just understanding the reasoning, of course. Great. So, we initialize that. I don't think there's anything else to say there.

We move on, and what we can do first is just see, okay, does this agent work? Does it? I see what it outputs. So, I'm going to ask it what it thinks about the Labour Party in the UK. And what does it think? Oh, we don't know because OpenAI returns this mess back to us.

So, the answer is in here. It's just hidden, and you can find it. I think it's even in multiple places. Look, we have isTriggeredTrue. IsTriggeredTrue again, and I think there's another one. So, we have the answer multiple times in there, but we need to extract it out because it is hard to read.

So, we just feel result final output. And then we get this nice, pedantic class. Got our outputs. We have isTriggeredTrue. And we say the reasoning. So, the user is asking for an opinion on a political party, which falls under the category of political opinions. Thank you so much. So, we have our logic.

Now, how do we implement that in another agent? So, let's say our original agent, which had the multiply tool. Let's go and see how we do that. Well, we are going to need to use this input guardrail decorator on a function, which basically is going to run our politics agent that we just defined.

Get the response, and then return that response via this guardrail function object. Okay? So, there's a strict format that we need to follow here in order to implement this input guardrail with any other agents. So, we need our input parameters to follow this pattern. All right? We don't even use these two.

In this example, we're not using these. We're just using this input. But we have to have these two parameters. Otherwise, this will not work. It won't be an invalid guardrail. And we have to output this format so that the agent SDK knows what to do with what we're outputting.

Okay? And once we define that, we can then plug it in to another agent. This other agent is exactly the same as the agent we had before, which looked exactly like this. But now, we have just added that politics guardrail. And know that input guardrails here is going to be a list of those input guardrail objects.

Also, worth noting is if you have an output guardrail, it would just be like this. So, you'd have output guardrails, and then you'd put politics guardrail or whatever else. The only other difference is that up here is the only other difference is that up here, this would be a output guardrail.

Worth noting. So, let's define our new safe agent. And we are going to ask it again what these two numbers multiplied together are. And we should see that it will answer us using the tool. Okay? That's great. So, we're not blocking everything. But what if we ask it about the Labour Party in the UK again?

We will see an error. And we would, of course, in our applications, need to handle this error. We can see that the error being raised here is expected. It is our input guardrail tripwire triggered error. So, that is pretty useful. And that is how we use guardrails. Now, the final thing that I do want to cover, because this is obviously very important, is so far we've just been feeding in a single input query, a single string into our agents.

And there are many, probably the vast majority of use cases are not going to be doing that. Instead, they're going to be feeding in a list of interactions between user and assistant over time. So, how do we take what we've done so far and make our agents conversational? It is fairly straightforward.

It's not complicated. So, first, let's just ask our agent to remember the number 7.814 for us. And we remember to use our manners there. And we get, I cannot store or remember information for future use. However, you can save a note or use a reminder app. Thank you very much.

So, the agent is telling us, oh, we can't do that. But actually, we can do that. The agent just doesn't know it. So, we come down to here. Agents SDK has this nice method, actually, which is two input lists. So, we're taking our result here. And we are converting it into an input list for our next query or our next message.

And we get this list of messages. The first one here is the message from us, the user message, where we ask it to remember that number. Then the next one has a lot more information, but it's coming from the agent. We see that the role here is assistant. And that is not the name of our agent.

That is just the AI message. And we also have the content, which includes these annotations. I assume that will be for citations or something else. And we have the text content, which where it's telling us, I can't remember anything, which can't miss. Okay, so we actually merge that two input list here with our next message.

Okay, so our next message, we are going to use a dictionary here where we specify we are the user. This is the user message. And I'm going to say multiply the last number, so I'm not specifying what number it should remember now, by this 103.892. Let's run that, and we will see our final output.

Okay, so the final output is the result of multiplying those two numbers is approximately 811.812. So it seems like our agent can remember our previous interactions, which is great. So that is actually everything I wanted to cover. We've, I think, covered the essentials of the library there. There are, of course, a lot of other things in there.

There's, of course, the handoff, the tracing. And even within the features that we did just cover, there is, there's a lot more nuance and detail to those, which I will definitely, almost definitely cover pretty soon. But it's definitely worth looking at the SDK. And as I mentioned at the start, I think this is up there as one of my preferred frameworks for building agents, as long as those agents are not too complicated, or as long as I don't need too much flexibility in what they might look like.

And I think also, as long as I'm using OpenAI, which might not always be the case. So, interesting framework, I think, generally well built, and definitely something we'll be covering more in the future. For now, I'll leave it there. So, thank you very much for watching, and I will see you again in the next one.

Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Thank you.

Agents SDK from OpenAI! | Full Tutorial

Chapters

Transcript