You're doing Agentic chat history wrong

Okay, so beginning with chapter one, we're going to be covering prompting. Now, I know prompting isn't the most exciting thing about agents in the entire world, but prompting is a core component of any AI system. You just can't get around or avoid prompting. You have to learn it. And what I want to cover in this chapter is, of course, just the hands-on of prompting with agents SDK, but also I really want to talk about how you might want to think about the various prompts and the way of building a conversation with an agent in a very different way.

And I think that will be very useful for a lot of people out there building with agents. So that part, in my opinion, is actually quite exciting and for sure, very useful. So we're going to start in the agents SDK course repo, we're going to go to chapters, and of course, we're going to chapter one.

I would highly recommend that you go with the open with Colab approach here. It's going to save you time rather than setting it up locally, but of course, you can set it up locally if you prefer. I actually do have it set up locally, so that's what I'm going to do.

So in Colab, the first thing you'll need to do is just install OpenAI agents using 0.1, as you can see here. Then if you're running this locally, of course, you should have already set up your UV environment and you would just be clicking up here and clicking here to select that UV environment.

Then we're going to come down and the first thing we're going to do is set up our OpenAI API key. Now you'll get this from platformopenai.com API keys. Once you have it, run this cell here and we'll get a little text box pop up locally, at least in cursor and VS Code, it's going to be here.

And in Colab, it will be just under the cell. So now we're going to take a look at the first type prompt, which is the most basic. It's just a prompt. And it is the static and it is static instructions. So this is when I say instructions here, for those of you that have been using agents or LLMs, think of instructions as your system prompt, because that is exactly what it is.

So the instructions are the system prompt. They guide the behavior of your agent and we require them here. OK, so in these instructions, we're telling our agent to speak like a pirate. And this is how we would initialize our agent. It's very simple. You have the name of the agent.

You have the model that we'd like to use, GP 4.1 mini. And it is pretty simple. So we'll run that. And once we've defined our agent, we want to run it. Now, the way that we run an agent is, I suppose it would be, it feels a little more complicated, but it's still very simple and it makes sense.

So there are these various runner objects in Agents SDK. There are different runner objects that you would use in various scenarios. I think for the most part, you're going to be using this runner object. The others that you might be using are for like synchronous execution, which it would be kind of weird if you're building an AI application and writing synchronous code In general, I don't think I've ever been on a project where we haven't needed to use async code.

So in general, I would say, yeah, you can use synchronous code and methods. And I know that might be simpler if you haven't used async before, but I would learn async as quickly as possible. And it's very simple, especially when you're using these libraries. So in this case, all we need to do is, okay, this runner object that we have down here, we're going to run it.

This will run the async execution of our, or whatever we set up in our job here. And because it's async, we need to await it. Okay, so this is for the most part, this is all we really need to modify in many cases. Sometimes it can get a little more complicated.

Depends on what you're doing. But anyway, in this case, all we need to do to run our async job is to await it here. Now, there's a little more going on within this job. As you can see, we have a starting agent and we also have our input. The input is the user query.

Okay, that is your user prompt. You may modify this in some cases. You can have multiple messages, for example. But in this case, it's just a user query. The starting agent here, the reason that we even have a starting agent is because a agent may also be able to hand off the task to other agents.

And in that case, it makes sense that you would call this parameter here the starting agent, because maybe our agent here can hand off to multiple other agents. You may be able to hand off to other multiple agents. And of course, in that case, we're not just using a single agent.

So this is why this is called the starting agents, where we start. However, in this case, of course, we just have a single agent. So we're just running our single agent. So we can run that and we'll get some output down here. Okay. And what we've asked. Okay, so we've our system prompt, our instructions have told our agent to speak like a pirate.

We've come down here and then our input or user query has said, write me a haiku. And that is exactly what it has done. Okay. So that's great. It's working. Now let's move on to some slightly more dynamic instructions. So this again, it's nothing complicated here. What we're doing is, for example, as we'll see in a moment, let's say we would like our agent to be aware of the current date and time.

In this scenario, we might want our instructions to include the current date and time. But of course, we don't pre-create that prompt because then that prompt will include the current date and time from when we created the prompt. So what we need to do is pass a function to our agent that can be called whenever it is being used, whenever our agent is being used, and it can get the actual current date and time.

So that's what we're doing here. We have this time-based instructions method. We have to pass a context, which is this run context wrapper, and we have to pass an agent here. So in the back end, when the agent method or job method is running and calling this time-based instructions function, it's going to be passing in those parameters, even though we don't actually use them.

In this case, in other scenarios, you actually might. But in this case, we don't. So here, we're just saying, OK, the current time is time. If it is the afternoon, speak like a pirate. Otherwise, do not. OK, so right now it is just the afternoon, so it will probably speak like a pirate.

So let's run that. We initialize our agent here. This is now the time agent, and we have our time-based instructions. And we can come down here, and I'm going to say, hello, what time is it? And we'll see. OK. It is in the afternoon. Now, we can modify this just to confirm, OK, is this working?

I'm going to say, if it is later than 1 p.m., speak like a pirate. Otherwise, do not. OK, reinitialize those, come down here. OK, and now we can see that it is not speaking like a pirate. So, yeah, we have that. We can modify that, of course. It can, you know, these dynamic instructions can become much more complicated than what you see here.

OK, so we have that. Now, I want to talk about message types. And this is where I want to be very specific about how you should think about a conversation with an agent. Because the default thinking here, when it comes to speaking with agents, is that your conversation or chat history with an agent is a set of messages.

OK, it's like a set of back and forth, like you would have with a person. OK, it's like I say something, they say something, so on and so on. But I think, and this is from years of coding agents, I think the better way of thinking about chat history and interactions is not to think of them as chat history or interactions.

I think the better way is to think about this as a sequence of events, this sequence of events does not need to be user-assistant, user-assistant. It can be many combinations of various things. And when we think more about these as events and the execution of these events, it becomes much more obvious, in my opinion, that an agent is more of a workflow execution engine that goes and performs different tasks based on particular triggers, right?

Those triggers might be a user sending a message, right? That's the traditional approach. But it could be something else, right? You could just be doing something on your computer. Maybe you open a particular window and there is something, some event that gets triggered when you open that window. Maybe it's your browser that goes off to an AI agent and tells the AI agent to go and give a summary of the current weather, okay?

And it will go and do that and it's going to come back, right? That's an event. It's not a user. You know, a user is triggering that event. They might not even realize they are triggering that event, but they are triggering that event. But it's not a user interaction, okay?

So, in my opinion, it's best to think about the interactions with an agent as being more a log of events, okay? And with that in mind, there are five primary message types from OpenAI. So, here we have our five message types. We have the developer message. This is our new system message.

So, OpenAI recently renamed this. So, rather than calling the system message or system prompt that, they are calling it the developer message or developer prompt. So, you know, just one of those complete switches of what OpenAI are calling various things for some reason. I don't know. So, for now, okay, this is now the developer message.

I don't know if they could change that back because I can't imagine that change propagating across the wider AI industry. But maybe it does. So, think about this as either the developer or system message. So, we would have our developer message and that is what we have up here, okay?

So, typical instructions, again, like more terminology that OpenAI is not being consistent with. This is also in Agents SDK. This is our, these are our instructions, okay? Instructions, developer or system message. As you prefer. You can choose any of them, apparently. So, these instructions are where we instruct our agent on what it should do, how it should behave, what it can or cannot do, right?

All of that information is in here. Then, we have our user message. Again, as I mentioned, this is, this could be a written message. It could be a event trigger. It could be anything, okay? So, it's good to think about these as events. So, in this case, this is a typical chatbot scenario.

I am a user. I'm going in. I'm saying, can you help me learn about OpenAI's Agents SDK? I'm using Python and I would like to understand what the library does. Okay? That's my, that's my question. Then, what's going to happen is this is going to go to our LM.

So, these instructions here, our developer prompt, followed by our user prompt. This is going to go to our LLM. Our LLM is going to generate this here, the function call. And, look, it could use a function call or it could just go straight ahead and jump into an assistant message.

It depends on how you've prompted things, the tools that you've got and everything set up here. In this scenario, we're going to assume that we have a RAG tool, okay? We're going to assume that this RAG tool will allow our assistant to go and retrieve information, particularly about the OpenAI Agents SDK, right?

Or it could be something else. It could be, I don't know, like a big encyclopedia of various AI libraries or AI everything, right? It could be anything. It's essentially a kind of custom search engine, okay, that you've set up yourself. So, you put whatever in there, really. And we'll be talking more about RAG later in the course as well.

So, we have that function call. This is, just to be very clear, this is generated by the LLM. The assistant message is also generated by the LLM. It's just the structure is slightly different. So, in some sense, I like to think of function calls as being also an assistant message in some way.

But it is generally not referred to as a assistant message, even though it's generated in the same way, just in a slightly different format. So, a function call is always going to be a dictionary or JSON object. And it's going to include tool name, a tool call ID. This is unique and very important.

And then, it's also going to have the tool args. These are the inputs to a particular tool, okay? A tool is just a function, okay? So, like a Python function. So, we would have, you know, we would have something which is called the, this is supposed to be death, defining a function here, which would be RAG tool.

And one of the parameters of that function would be the query parameter, okay? And our LLM would generate a search query here, okay? So, this is not exactly what the user has written. It is a generated search query that our function, which is the RAG tool function, is going to use to search for some relevant information.

Now, in between this function call here and the next function call output, in here, we are calling, like we, on our side, we call our RAG tool. It goes, does stuff. It goes and does some stuff, gets our information, and returns it to us. The information it's returning to us is what we get in this function call output message or event.

And you can see here, I've, in this example, I've shortened it down because we obviously don't have much space. I've said there's two bits of information being returned, okay? One of them is from this doc, A-B-C. Another one is this doc, G-H-I, okay? So, it's coming from two documents.

And the first one, though, is very introductory information about OpenAI Agents SDK being ideal for developing agentic apps. And the other one is specifically focusing on voice, which is a built-in feature of the SDK, right? And then there would, of course, be more information over here, but we don't have much space.

So, shortening that down. So, we've got some information from our RAG tool. Then, that is going to be passed back to our LLM alongside all of these other events up here to generate that final assistant message, okay? And then, that is what we would return to the user. So, that is, again, returning to that chat component.

So, all of these, like, pink components here, they're internal, okay? They are either events that are happening, like these here, or some setup that we've already done before even running anything. These white blocks are what, in a typical chat interface, the user would see. Now, there could be multiple other things going on here.

We might have multiple function calls, multiple function call outputs. We might even have assistant messages that are being created but are internal, where the assistant is kind of reasoning and talking to itself and thinking, hey, maybe I should do this, but maybe I shouldn't because of this other thing over here, right?

That sort of thing can be happening. So, it depends on what sort of workflow you've set up. In this scenario, it's pretty simple. So, that, you know, it would look something like this when we're just retrieving information from a reg tool and returning that to the user. So, we have those five message types and I'll just clarify that the user, developer, and assistant messages are actually all of the same message type, which is type message.

And these, this single message type is actually differentiated as having various roles. So, the role is either developer, user, or assistant. Now, most of this detail, we don't even really need to know to use Agents SDK. It abstracts away quite a lot from us. So, we initialize our agent with the instructions parameter.

We send messages with the input parameter. But, beyond that, we don't really need to use directly any of these other items. But, it's, of course, important to know this if you're developing and wanting to get good at developing AI systems. And, especially if you're thinking, okay, I'm going to use Agents SDK, but maybe in the future I might use another framework or no framework at all.

It's important to know this sort of thing. And, even just to understand how your system works. And, it's also worth noting that, for example, if you have an application that you're building where maybe users are coming back to your application after, you know, some time away from it. And, they're wanting to continue a conversation.

It's just one scenario. In that scenario, you would need to load all of your previous interactions and, of course, format them in the correct way for the Agents SDK. Right? So, in that scenario, you would actually have to understand, okay, this is an assistant message or a message with role assistant.

This is a function call, this is a function call output and so on, right? You'd need to understand what those are so that you can use them later. Or, if you need to manipulate those in any way, like manipulate in a good way. Like, you need to add some intermediate system message, for example.

Now, let's take a look at how we actually use each of these messages in Agents SDK. So, we already saw this one creating a user message. It is just, when we're running our job, we have our input, this is the user message. So, we run that, really simple, nothing to really teach you there.

If you'd like to use the types, you would want to use something like this. So, this is actually coming directly from the OpenAI library. And, we are just importing the message object there. So, we run that. And, this is just, okay, if you want to, if you want to be a bit stricter about the typing, which I generally recommend if you're building anything serious, you probably should be.

So, in this case, we'd create our message here and then we're passing in our message within the list here. So, I mean, that seems more complicated. But then, you know, again, if you're building this broader application and you're just kind of passing a string around, containing a user message rather than defining, hey, look, this is a user message, you know, you're kind of asking for trouble.

So, I would recommend you just type everything as much as you possibly can. Cool. So, we have that. Now, other thing, we can, oh, yes, we can simplify this as well. So, our user message can also just be a dictionary if we want. And, you'll just pass that in again like this.

Okay, does the same thing. But, again, I would recommend going with types. Okay, now, developer messages. Again, it's used to be system messages. So, that is defined with our instructions as we saw before. We can also define that directly using this. With the system messages in many cases, you probably wouldn't be passing those around as much as you would be a user message.

But, yeah, you might also want to type those if you can as well. Okay. And then, we have developer or system messages that might actually be inserted, you know, not just as the first, like, initial system message. So, in this case, for example, we might want to, for whatever reason, right?

I'll give you a better example at the moment. But, for whatever reason, in this case, where we have instructed our LM to speak like a pirate, and we're saying, write me a haiku. For whatever reason, maybe we've, you know, something gets triggered, and it looks like the user actually does not want the system to speak like a pirate.

And, instead, they should use obvious British slang. In that case, we might insert a system message here. Let me give you a better example, though. Okay. So, in this example, right, going back to our earlier visual here. In this example, what we might find is that we would like to ensure that our assistant is always quoting where information is coming from.

And, in this case, what we can do is we can say, okay, whenever you use the RAG tool, we're going to have some logic in our system, which is then going to go into here, right? So, we're going to go into here, and we're going to insert a developer message, which reminds our assistant to always use citations, to always use a particular format, and to never say anything that it doesn't know from the context provided.

Okay. And, that can be a really, very, very strong way of ensuring certain behaviors at certain points in our agents. So, that is another scenario, more realistic scenario, where you might want to be inserting developer or system messages at various points within a conversation. Okay. So, we have that.

Let's move on to assistant messages. So, again, these are typically our direct responses to the user. The content field in this scenario, the content field of a message is going to be generated by the LM. And, it might look something like this. Okay. So, we have our assistant role here, and then the content is going to look like this.

Okay. And, we can actually add that to our, what is becoming our chat history now, like so. So, we would have our, you know, this is our original user message. We have that developer telling us to ignore the instructions and do something else. Then, we have our assistant message here.

Okay. So, this is the assistant role. And, I'm just going to say, okay, this is my user message now. I'm going to say, can you repeat what you just said? Okay. And, let's just ensure that that does, in fact, work. And, it looks like it does. Now, the output from the result there.

Okay. Let me even show you. So, our output is not quite the format that we need for feeding back into our next input. So, what we can do is, actually, we can take these results here, and we can use two input lists to convert them into the format that we would need.

Okay. And, you see, this has been modified a little bit. It creates this list of dictionaries, which looks a bit more like what I was showing you before. And, okay, one additional thing that we get here is we get all of these additional optional fields that are returned from OpenAI that we didn't populate.

We could populate these if we wanted to, but, you know, we created these on our side. So, there's no ID that OpenAI has created. The content field, you can see, is also far more complicated. We have these annotations. We have the type log props here. We don't really need all of these.

The only one that we truly need is this text here. We have the status here. So, it's obviously completed. And, this is, of course, of type message, which I mentioned before. Okay. So, now, let's take a look at how the agent or runner maintains conversation history. So, first, I just want to point out that when we do two input lists here, we'll only see that most recent message.

So, it doesn't seem like this is maintaining the history, and we can just confirm. Okay. So, could you give me another? We're just going to say, what were we talking about? What were we talking about? We'll be starting fresh on this here voyage. There'll be no previous parlay to recall.

Okay. So, we can see that whenever we want to add the chat history in like this, we need to actually pass it in. We need to explicitly pass it in. It's not being maintained by our agent. It's not being maintained by the runner. Okay. And, it's not necessarily clear immediately whether that is the case or not.

But, you know, that is the case. So, worth just making, you know, remembering that because you will, of course, need to implement things differently based on that one small little thing. Okay. So, we can move on to the function call messages. So, those look like this. So, we have a call ID, the tool or function name, and the arguments.

Okay. So, let's say we want to construct a function call where the function or tool will be get current weather, it's called. And, the single input parameter here will be the location of London. Okay. And, this is what it would look like. So, we have the type, which is function call.

We have the call ID. Okay. So, this is unique. This is important that we have this. Okay. Without the call, without the tool call ID, both here and in the next message, OpenAI cannot pass our history, our chats, our interactions. Okay. So, it's important that we have this. Then, we have the name.

So, this is a function that we're going to call. And, then, we have the arguments for that function. So, I think this is just a keyword arguments. So, there will be a parameter in our function, which is location. And, we will provide the word London to that. So, let me come down here.

We're going to add a function call in here to our history. And, well, let me show you. Okay. We got this error. And, this is what I mentioned before, that that tool call ID is very important. Because, with every function call or tool call, you need to have the response for that function or tool call in your chat history.

So, you actually need two messages here. There must always be a pair of messages when we see that call ID field. So, this here, what we've just done, and you can see here, this is not valid. Okay. Developer, user, tool call, that's not a valid set of interactions. Whereas, developer, user, tool call, tool output, that is valid.

Right. Then, just to be very clear here, right. Here, we can see that the call ID for our tool call and tool output is the same. Here, it is not. For the tool call, the call ID is call123. For the tool output, the call ID is call456. This is invalid.

This, OpenAI won't process this. Okay. Your tool call and tool output, you need to have pairs of those messages, and they need to have the same ID. So, that's important to know. We'll cover more on that in the chapter on tools. So, we need a function call output. We're going to say this is a type.

It's a function call output. The call ID here is call123. And the output here is, of course, it's London. It is raining. Okay. Now, we can try again. Let's see if it works this time. Now that we have our pair. We have our function call and function call output.

Okay. Right. And we can see it. This is correct. So, obviously, we didn't actually use the tool here. I'm sure it probably is raining. But in any case, we didn't use an actual tool here. So, it's telling us, okay, London today is raining. So, that is it for this first chapter on prompting and, I think, a lot more in Agents SDK.

We've covered a lot of things, of course, just how prompting works in Agents SDK. The many different ways that you can call a system prompt now. You can call it instructions or develop a message as well if you want. User messages as well are just as confusing. Are now user message or input.

But then input can also mean a many input messages. You can use any of those as you prefer, of course. But what we actually get from all of this is just in reality, there's a lot of structure that is behind all these various prompt types. These different message types, the way that we do function calls, function call outputs, all of that.

And it's important that we, as engineers using the Agents SDK, are fully aware of this and fully aware of, in the back-end, how all of these things are structured. So that we can avoid issues like what we just saw at the end there. Where we don't have a pair of function call and function call output.

And it's also very important for us to be aware of how, you know, we can structure this chat history. And maybe think about it as less of a chat history and more as a stream of events. So that is it for this chapter. We'll move on to the next one.

Thanks. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. you you Thank you.

You're doing Agentic chat history wrong | OpenAI Agents SDK

Chapters

Transcript