Today, we're going to be taking a look at building a minimal agent framework, kind of like Lang chain, but without so much overhead. Something really simple that just uses OpenAI's new function calling method and is really minimalistic in terms of what it does and just focuses on that single being an agent that can use tools.
Now, OpenAI recently released this function calling feature, and I did do a video on this, so you can check that out. There'll be a link at the top of the video right now. But what it essentially allows us to do is pass a description or a set of instructions on how to use a particular function.
That function can be in Python or any other language, it's language agnostic. And when you are passing instructions to GPT-4 or GPT-3.5, they will be able to actually return a JSON response, which sets the parameters that should be included as input to this function and their values based on whatever query you have asked.
So in that previous video, I demoed an example of using this to actually generate like a product web page. So there were a couple of items in there. The GPT-4 in that case would have to generate a title for the product that I described. It would have to generate like a product description for that product.
And it would then also need to generate a prompt that would be passed to a image generation model that would create a image to be used on that product page. And that was really cool because it was so easy to set up, but straight away you could see the potential of using this sort of thing.
So what I want to do with this minimal agent framework is a similar thing. We're going to be using those OpenAI function calling. Of course, we can extend that to other models in the future as well. But we want to be using the function calling. We want to automatically infer the function instructions based on a function and the doc strings or anything else that we have in there to enable or to include conversational history in there and to make all this super robust and just easy to use with as little overhead as possible.
So before we jump into how we build that agent, I just want to show you how it works. So I'm going to use this notebook here. And all I'm going to do is I'm going to go from FuncAgent. So it's a function calling agent, hence why I named it FuncAgent.
I don't know if that's a good name or not, but it's what came to my head. I'm going to import the agents file. So in agents, we have right now just a single agent. It's kind of like a React agent, but maybe not as sophisticated for now, but it works.
And again, like I said, I want it to be very minimal. Okay, so what we're going to do is we're going to say agent equals agents, agent. And in here, we have our OpenAI API key. I have set that already in, so OpenAI API key up here. The model name is this Jupyter 4.0.6.13.
And we also need to include a list of functions that we would like our agent to be able to use. Now, those functions is just this circumference calculator up here. So just a really simple example. We'll try something more interesting later. So, I'm going to do functions equals, and then I'm just going to pass that in there.
So circumference calculator. Okay. Now, let's run that. And then from there, I'm just going to say, okay, agent ask, and I'm going to say, what is the circumference of a circle with a radius of five? Okay. And let's just see what that comes up with. So we can see here, actually, I have something there.
Actually, we'll keep something in there because I want to show you what difference that makes when we're defining the tools, but in reality, we're only going to be using this parameter here. So I'm going to take that and we'll see what it comes back with. Now at the moment, it's coming back with everything just because I'm still developing this, but what it will eventually come back with is just the response and not everything.
Okay. So this is what we have. Circumference of a circle is calculated using the formula, and this is two pi, where R is the radius of the circle. So if the radius of the circle is five units, then the circumference C would be, and we get this 31.42 units.
Now how did it get to that answer? Because these models, LLMs, are just really bad at basic maths. So let's try and answer this question without the agent. So I'm going to initialize a new model. So just to import OpenAI, OpenAI API key, and then I'm going to run this.
So the model is what we said before, so set dupty, is it four, zero, six, 13, messages. So this is, let me get those down here or here. So messages are going to be equal to, first we have the role system, and I'm going to replicate what we're doing inside the func agent.
So we're going to say content is equal to agents sys, okay, here. So let me show you what this looks like. Let me print, maybe that will be nicer. So I think this is essentially just a copy, maybe slightly different from the line chain agent system message. So we have this.
That's the system message and then following that, we're going to have our user question. So user content, and that is just going to be our query from before. So what was our query? It was this. Okay, so there are messages, oops, let's fix that. And we are going to put those in here.
Now functions, so for this, we actually need to get the function again, like the function instructions that have been created automatically by our agent. So that will be an agent, I think functions, if I'm not wrong. So you can actually see the description or the instructions that are generated automatically when we pass in our function to the agent.
We'll explain those in a little more detail soon. So we can run that, but actually what I want to do is try it without those functions. And maybe what we'll do is make this a little bit harder by saying we're radius of 4.31, something like that, and see what we get with and without the function, okay?
So let's run that, okay, and let's do the circumference calculator to see what we're getting or what we should be getting. So if we do 5.31, 5.31, okay, you should get 33.3468. And here we get 33.39, so it's close, but it's not actually correct, which is not ideal. So if we come up to here, what I'm going to do is actually rerun this agent and see what it gives us if we ask the same question, so 5.31.
And we'll see if this answer is any better than what we just got down there, which was not quite accurate, okay? So we come here and we can see that we're getting this 33.35 units this time, okay? So if we compare that to, let me remove these bits here.
If we compare that to the circumference calculator, we get 33.346, right? And the answer we got is actually 33.35, so it's just rounding up. So actually, it's correct because we're actually using that circumference calculator in the tooling. So that's kind of the point of using this agent, like it can do things that a large language model by itself cannot do, so it can rely on these external tools.
And it also allows us to automatically generate these from just a Python function, right? So let me just go through that a little bit as well. So we have our name, the description, we have these parameters, right? These are all things that are needed by this here, right? So this functions parameter from OpenAI's chat completion endpoint.
All of these are needed. Now, how did we create those? Well, if you take a look at this, you can kind of see all this information. And let me even maybe bring this up here. You can see that all this information is contained within this definition here. So we have the name, it's circumference calculator.
We have the parameters, right? So parameters, we have radius here, which is a float, which is actually number here. And we also have this something, right? Something is included here, and that again is a number or float. So all of that information is contained within there. We also have the description, at least for the radius, because the description is contained actually within the docstring.
We don't include the description for the something variable. So actually that is just empty. And we also, for something, it's not a required parameter, because we set this value here by default. So in reality, all we need is the radius. Okay? So, GPT-4 or GPT-3.5 reads this information here, and based on that, it will allow us or it will return instructions on how to use this function when we're asking a query and how to satisfy that.
And we can kind of see that happening in here. So this, you know, it's really simple. Again, like I said, I want this to be as simple as possible, very minimal, and just as well easy to read, all this sort of stuff. So if we go to the parser file within our FuncAgent, we can see what we're doing.
Okay? So again, this is like a first iteration. It's definitely not complete. So we are going to FuncToJson, okay, that is used by an agent when it sees a function. We're using inspect to get function annotations, a docstring, descriptions, all this sort of stuff. So we're just using all of that information from the annotations of our function, from the docstring, to construct the instructions that are required by function calling in OpenAI.
Okay? And that's all we're doing. There's nothing, you know, there's nothing that complicated going on there. I mean, this whole file is 59 lines of code, and it could probably be much less as well. It's really straightforward. The one thing I will say is that it does require we use this syntax for the docstring for now.
In the future, of course, we'll probably extend that to other, like, common docstring formats as well. Okay. Cool. So we have that. That's our parser. And then we also have the agents file, which contains the agent itself, all of those instructions. So in the agent, what we need for it to work as a fully functional conversational agent is a few things.
First, we need the LLM itself. So we're kind of initializing, not the LLM, but we have, like, the model name here, and we pass that to OpenAI when we're generating some text. We need those functions that it can use, right? So that's when we're using the parser I just mentioned, right?
So we can see we have that parser func to JSON for all of the functions that we are passing in the functions list here, right? That doesn't have to be any functions there, but obviously, if we want to use an agent with tools, we kind of do want to use that.
Then what we're doing is creating this function mapping. So basically, when the LLM, gpt4, gpt3.5, comes back to us, it's going to say, you need to use this function, so, like, the circumference calculator, with these parameters, which it will give us in, like, a JSON format. So we need a way of just mapping those names of each function to the function itself.
So that's all we're doing there, again, super simple. Nothing complicated at all going on there. And then, so this is the bit that makes it conversational, right? We need to have a chat history. So that chat history allows us to have multiple messages and continue a conversation with our agent, rather than just having a single query, getting a single response, and then starting all over again.
So that chat history allows us to have a log of our interactions with the agent, and essentially have that past history of interactions considered with every new query coming in. So all those are super important. And with the chat history, we can actually come over here, and we can access that chat history.
So let me just remove the bits I don't necessarily want here. So let's remove this, this, and this, okay? So we have our agent, let's have a look at the agent, is it chat, yeah, chat history. Okay, so we can see what is happening there. So we have our query, and then we're logging the, like the response from our AI.
Now what I can do is, okay, maybe I can say, okay, agent asks, what is the circumference if we double the radius, and let's see what comes back. So we're not specifying the number here, it's going to have to refer to that conversational history in order to produce the new query.
And we can see, okay, it's explaining what it's doing, and let's come across the circumference of the circle, the doubled radius would be 66.7 units, right? And even says this simply double the original circumference, because the circumference of a circle scales linearly with the radius, right? And okay, in this case, it doesn't actually use this circumference calculator, because all it's needing to do is double the previous calculation that we got, right?
Which was the 33.35 units. So from doubling this, we get this 66.7, right? The reason it can do that, without specifying the radius that we're doubling here is because it's actually just referring to that past conversational log, it has access to this conversational history, okay? And that's why we get that.
So yeah, let's, we can copy this, and now we can see our new chat history, which is slightly longer, of course, okay, so we get this. Now, that conversational history is super important in making our agent more conversational, which is really cool, pretty simple to do, it's not exactly hard.
But there are other things as well. So when we are, you'll see that I have this really simple, just print a period here, right? And then here, there was two of them. This is just me, so I can see what the agent is actually doing. But we can see that this is coming from here.
So generate response method here, okay? Now what is this doing, okay? Why is it, with one single query, why is it generating more than one response? Well, that is because if we just do one response, let me come back to here, we're going to just get one item here, right?
So in the previous video, where I went through function calling, what I showed is that you send your query to OpenAI, and it doesn't run the function for you, GPT-4 isn't running the function for you, it's returning instructions and parameters that show you, or that you can feed, then feed in to the function in order to get the answer, right?
And the same is true for this, right? So we're taking one LLM call to create those parameters for the function. But then after that, we then need to feed those parameters into the function, get our answer, and then if we want to return a sort of a conversational response, we need to then feed that answer back in to the LLM and ask it to give us the answer, right?
So we actually do that here as well. So let me show you, when we are making a query, so when we ask something here, we initialize this internal thoughts list. Now the internal thoughts, they're kind of like the conversational history, but it's just for the, almost like the internal monologue for the LLM, right?
So inside the LLM is going to go to generate response up here, right? We're going to generate that response, it's going to return, you need to use this circumference calculator tool, here is the parameters that you need to input, okay? So we get that. And the finish reason that we're going to have in that response is not going to be stop, it's going to be function call here, right?
So if the response is function call, we need to go to handle function call, and handle function call is essentially just going to take the response from the LLM and it's going to feed it in to one of our functions here, okay? So here we're loading the parameters that GPT-4 has given to us, and then we're getting the function that we need to use, and then we're just feeding those parameters into the function and getting our answer, okay?
So then we have our answer, and what we do is we feed it back to the LLM, or we feed it to those internal thoughts as a new message of the assistant to itself, okay? And within that message, we just say the answer is this result. So that is the answer produced by our circumference calculator, okay?
So then that's added to the internal thoughts. We come back to this here, right, because this is in a while loop, and that's going to keep going, right, until we get to this stop finish reason, okay, which is probably going to happen with the next iteration. So we've got our answer from the function, we've fed that back into the LLM, and we're asking it to generate again, okay?
So the LLM is now going to see that, and it's going to say, okay, here is the answer in my internal thoughts messages. So actually, I can then generate the final answer, right? So it comes to here, and it's like, okay, let's go onto the final thought answer step or function, and what we do is we take all those internal thoughts, put them all together into a single string, and we just say, okay, based on the above, so these are all the thoughts that we've been going through, so that'll be like LLM, I am going to call this function with these parameters.
Response, the answer is this, and then that is followed by this little message here. Based on the above, I will now answer the question. Now this is important. So this message will only be seen by me. So answer with the assumption that the user has not seen this message.
If you don't include this, the LLM is going to respond with, hey, you're right, that is the correct answer, well done, which is obviously not what we want, because this is the internal monologue of the AI, not the user responding to the AI. So we need to specify that these internal thoughts that you're having, they're just for you, they're not for the user, the user is not going to see them, so you need to answer with that in mind, and the LLM does actually do that, as we've seen.
So we get our final thought, we then feed that into the chat completion, so we have our chat history, and that final thought, so not the list of final thoughts, but just that single formatted final thought, and then we also specify not to use any functions. What I found is that if it got the question of use, you know, what is a circumference, it might be tempted again to use the circumference tool again, so it's like, okay, don't use any functions.
And then we return, okay? And from that, we actually get the answer. So there are many things kind of going on here, even though it's a very simple agent, you know, we have the fact that it's able to call these functions, we have that it has this conversational history, we have that it has this internal monologue, right?
But we've done all of that in, what, so 114 lines of code for this agent one, and several of those are actually just a system message up here. So in reality, it's kind of simple, and it gets what I wanted, which was to make this as minimal as possible.
Now what I want to do is just try this out on something that is, in my opinion, a little more interesting, okay? So with those very few lines of code, plus OpenAI's function calling, which admittedly is doing most of the heavy lifting here, we get, I don't know, a really cool, minimal agent that we can use that includes all these cool features.
So yeah, I just wanted to share that, something that I worked on, didn't exactly take a huge amount of time to put together. But you can also use it yourself. So Aurelio Labs on GitHub, you go to FuncAgent, and it's here. It will also be on PyPy, so you will also be able to just do pip install FuncAgent, and you can use it in the same way that I use it, put some more interesting functions in there, and just see what you can do with that.
One thing that I will try, maybe I'm not going to do it in this video, because we've already been talking for a little while, is in the Aurelio Labs cookbook. Last week, I created the function calling example. And I think this is a little more interesting. So this is where we're creating that product page.
I think I'm going to just give that a go with this FuncAgent as well, and see how that goes. But yeah, I'm not going to do that in this video, just for the sake of time. So yeah, I hope this is interesting. I thought it was just kind of an interesting project, just to see how much we can do with very little time, and also just lines of code.
And a fun little experiment, just to better understand how a conversational agent actually works with all these different components, how they interact, and so on. But yeah, for now, that's it for this video. I hope this has been interesting, and maybe useful. Again, like I said, there is the FuncAgent repo on GitHub.
Feel free to go ahead and improve that, submit any issues you have, or PRs, or whatever. And we'll maybe try and make that a little more robust than it currently is. But yeah, I'll leave it there. So thank you very much for watching. And I will see you again in the next one.
Bye. you