back to indexBuilding Chatbot Agents from Scratch with OpenAI Functions!
Chapters
0:0 OpenAI Functions Agent
0:26 Recap of OpenAI Function Calling
2:34 How OpenAI Functions Agent Works
5:42 Answering Questions Without Tools
9:50 Generating Function Instructions
13:13 Agent Code
15:13 Giving an Agent Conversational Memory
17:25 Agent Internal Thoughts
23:29 Agent Overview
00:00:00.000 |
Today, we're going to be taking a look at building a minimal agent framework, 00:00:05.540 |
kind of like Lang chain, but without so much overhead. 00:00:09.560 |
Something really simple that just uses OpenAI's new function calling method 00:00:15.800 |
and is really minimalistic in terms of what it does 00:00:20.900 |
and just focuses on that single being an agent that can use tools. 00:00:26.040 |
Now, OpenAI recently released this function calling feature, 00:00:30.700 |
and I did do a video on this, so you can check that out. 00:00:34.100 |
There'll be a link at the top of the video right now. 00:00:36.000 |
But what it essentially allows us to do is pass a description 00:00:43.240 |
or a set of instructions on how to use a particular function. 00:00:47.940 |
That function can be in Python or any other language, it's language agnostic. 00:00:54.200 |
And when you are passing instructions to GPT-4 or GPT-3.5, 00:01:00.940 |
they will be able to actually return a JSON response, 00:01:04.900 |
which sets the parameters that should be included as input to this function 00:01:11.200 |
and their values based on whatever query you have asked. 00:01:15.500 |
So in that previous video, I demoed an example of using this 00:01:20.300 |
to actually generate like a product web page. 00:01:26.100 |
The GPT-4 in that case would have to generate a title for the product that I described. 00:01:33.400 |
It would have to generate like a product description for that product. 00:01:38.700 |
And it would then also need to generate a prompt that would be passed 00:01:43.300 |
to a image generation model that would create a image to be used on that product page. 00:01:49.500 |
And that was really cool because it was so easy to set up, 00:01:53.640 |
but straight away you could see the potential of using this sort of thing. 00:01:58.300 |
So what I want to do with this minimal agent framework is a similar thing. 00:02:02.400 |
We're going to be using those OpenAI function calling. 00:02:05.440 |
Of course, we can extend that to other models in the future as well. 00:02:09.000 |
But we want to be using the function calling. 00:02:11.600 |
We want to automatically infer the function instructions 00:02:17.200 |
based on a function and the doc strings or anything else that we have in there 00:02:21.600 |
to enable or to include conversational history in there 00:02:26.560 |
and to make all this super robust and just easy to use with as little overhead as possible. 00:02:34.800 |
So before we jump into how we build that agent, 00:02:46.700 |
And all I'm going to do is I'm going to go from FuncAgent. 00:02:50.540 |
So it's a function calling agent, hence why I named it FuncAgent. 00:02:55.800 |
I don't know if that's a good name or not, but it's what came to my head. 00:03:03.740 |
So in agents, we have right now just a single agent. 00:03:09.700 |
It's kind of like a React agent, but maybe not as sophisticated for now, but it works. 00:03:17.600 |
And again, like I said, I want it to be very minimal. 00:03:20.640 |
Okay, so what we're going to do is we're going to say agent equals agents, agent. 00:03:33.300 |
I have set that already in, so OpenAI API key up here. 00:03:46.340 |
And we also need to include a list of functions that we would like our agent to be able to use. 00:03:51.160 |
Now, those functions is just this circumference calculator up here. 00:03:59.500 |
So, I'm going to do functions equals, and then I'm just going to pass that in there. 00:04:13.900 |
And then from there, I'm just going to say, okay, agent ask, and I'm going to say, what 00:04:20.560 |
is the circumference of a circle with a radius of five? 00:04:34.800 |
So we can see here, actually, I have something there. 00:04:38.100 |
Actually, we'll keep something in there because I want to show you what difference that makes 00:04:43.760 |
when we're defining the tools, but in reality, we're only going to be using this parameter 00:04:51.360 |
So I'm going to take that and we'll see what it comes back with. 00:04:57.600 |
Now at the moment, it's coming back with everything just because I'm still developing this, but 00:05:03.920 |
what it will eventually come back with is just the response and not everything. 00:05:14.160 |
Circumference of a circle is calculated using the formula, and this is two pi, where R is 00:05:23.760 |
So if the radius of the circle is five units, then the circumference C would be, and we 00:05:37.920 |
Because these models, LLMs, are just really bad at basic maths. 00:05:42.280 |
So let's try and answer this question without the agent. 00:05:50.800 |
So just to import OpenAI, OpenAI API key, and then I'm going to run this. 00:06:04.160 |
So the model is what we said before, so set dupty, is it four, zero, six, 13, messages. 00:06:14.160 |
So this is, let me get those down here or here. 00:06:18.080 |
So messages are going to be equal to, first we have the role system, and I'm going to 00:06:24.160 |
replicate what we're doing inside the func agent. 00:06:29.220 |
So we're going to say content is equal to agents sys, okay, here. 00:06:47.200 |
So I think this is essentially just a copy, maybe slightly different from the line chain 00:06:57.940 |
That's the system message and then following that, we're going to have our user question. 00:07:04.240 |
So user content, and that is just going to be our query from before. 00:07:15.760 |
Okay, so there are messages, oops, let's fix that. 00:07:27.040 |
Now functions, so for this, we actually need to get the function again, like the function 00:07:33.640 |
instructions that have been created automatically by our agent. 00:07:38.080 |
So that will be an agent, I think functions, if I'm not wrong. 00:07:48.080 |
So you can actually see the description or the instructions that are generated automatically 00:07:57.240 |
We'll explain those in a little more detail soon. 00:07:59.940 |
So we can run that, but actually what I want to do is try it without those functions. 00:08:08.400 |
And maybe what we'll do is make this a little bit harder by saying we're radius of 4.31, 00:08:14.940 |
something like that, and see what we get with and without the function, okay? 00:08:20.660 |
So let's run that, okay, and let's do the circumference calculator to see what we're 00:08:31.500 |
So if we do 5.31, 5.31, okay, you should get 33.3468. 00:08:40.660 |
And here we get 33.39, so it's close, but it's not actually correct, which is not ideal. 00:08:50.140 |
So if we come up to here, what I'm going to do is actually rerun this agent and see what 00:08:57.300 |
it gives us if we ask the same question, so 5.31. 00:09:03.940 |
And we'll see if this answer is any better than what we just got down there, which was 00:09:09.900 |
So we come here and we can see that we're getting this 33.35 units this time, okay? 00:09:16.540 |
So if we compare that to, let me remove these bits here. 00:09:23.620 |
If we compare that to the circumference calculator, we get 33.346, right? 00:09:29.660 |
And the answer we got is actually 33.35, so it's just rounding up. 00:09:34.180 |
So actually, it's correct because we're actually using that circumference calculator in the 00:09:40.500 |
So that's kind of the point of using this agent, like it can do things that a large 00:09:46.020 |
language model by itself cannot do, so it can rely on these external tools. 00:09:50.940 |
And it also allows us to automatically generate these from just a Python function, right? 00:09:59.220 |
So let me just go through that a little bit as well. 00:10:03.940 |
So we have our name, the description, we have these parameters, right? 00:10:10.460 |
These are all things that are needed by this here, right? 00:10:15.340 |
So this functions parameter from OpenAI's chat completion endpoint. 00:10:25.220 |
Well, if you take a look at this, you can kind of see all this information. 00:10:35.060 |
You can see that all this information is contained within this definition here. 00:10:38.780 |
So we have the name, it's circumference calculator. 00:10:44.020 |
So parameters, we have radius here, which is a float, which is actually number here. 00:10:55.880 |
Something is included here, and that again is a number or float. 00:11:00.400 |
So all of that information is contained within there. 00:11:04.900 |
We also have the description, at least for the radius, because the description is contained 00:11:11.380 |
We don't include the description for the something variable. 00:11:19.100 |
And we also, for something, it's not a required parameter, because we set this value here 00:11:30.540 |
So, GPT-4 or GPT-3.5 reads this information here, and based on that, it will allow us 00:11:38.380 |
or it will return instructions on how to use this function when we're asking a query and 00:11:49.020 |
And we can kind of see that happening in here. 00:11:53.740 |
Again, like I said, I want this to be as simple as possible, very minimal, and just as well 00:12:02.540 |
So if we go to the parser file within our FuncAgent, we can see what we're doing. 00:12:12.200 |
So we are going to FuncToJson, okay, that is used by an agent when it sees a function. 00:12:20.900 |
We're using inspect to get function annotations, a docstring, descriptions, all this sort of 00:12:29.180 |
So we're just using all of that information from the annotations of our function, from 00:12:34.060 |
the docstring, to construct the instructions that are required by function calling in OpenAI. 00:12:44.540 |
There's nothing, you know, there's nothing that complicated going on there. 00:12:47.900 |
I mean, this whole file is 59 lines of code, and it could probably be much less as well. 00:12:54.940 |
The one thing I will say is that it does require we use this syntax for the docstring for now. 00:13:01.700 |
In the future, of course, we'll probably extend that to other, like, common docstring formats 00:13:11.540 |
And then we also have the agents file, which contains the agent itself, all of those instructions. 00:13:18.060 |
So in the agent, what we need for it to work as a fully functional conversational agent 00:13:30.740 |
So we're kind of initializing, not the LLM, but we have, like, the model name here, and 00:13:36.140 |
we pass that to OpenAI when we're generating some text. 00:13:41.220 |
We need those functions that it can use, right? 00:13:44.740 |
So that's when we're using the parser I just mentioned, right? 00:13:48.980 |
So we can see we have that parser func to JSON for all of the functions that we are 00:13:57.900 |
That doesn't have to be any functions there, but obviously, if we want to use an agent 00:14:03.280 |
Then what we're doing is creating this function mapping. 00:14:06.100 |
So basically, when the LLM, gpt4, gpt3.5, comes back to us, it's going to say, you need 00:14:13.960 |
to use this function, so, like, the circumference calculator, with these parameters, which it 00:14:21.600 |
So we need a way of just mapping those names of each function to the function itself. 00:14:28.380 |
So that's all we're doing there, again, super simple. 00:14:35.260 |
And then, so this is the bit that makes it conversational, right? 00:14:41.800 |
So that chat history allows us to have multiple messages and continue a conversation with 00:14:49.060 |
our agent, rather than just having a single query, getting a single response, and then 00:14:55.900 |
So that chat history allows us to have a log of our interactions with the agent, and essentially 00:15:02.820 |
have that past history of interactions considered with every new query coming in. 00:15:13.960 |
And with the chat history, we can actually come over here, and we can access that chat 00:15:19.940 |
So let me just remove the bits I don't necessarily want here. 00:15:29.660 |
So we have our agent, let's have a look at the agent, is it chat, yeah, chat history. 00:15:39.440 |
So we have our query, and then we're logging the, like the response from our AI. 00:15:46.320 |
Now what I can do is, okay, maybe I can say, okay, agent asks, what is the circumference 00:15:55.180 |
if we double the radius, and let's see what comes back. 00:15:59.940 |
So we're not specifying the number here, it's going to have to refer to that conversational 00:16:10.020 |
And we can see, okay, it's explaining what it's doing, and let's come across the circumference 00:16:16.180 |
of the circle, the doubled radius would be 66.7 units, right? 00:16:21.660 |
And even says this simply double the original circumference, because the circumference of 00:16:25.500 |
a circle scales linearly with the radius, right? 00:16:29.860 |
And okay, in this case, it doesn't actually use this circumference calculator, because 00:16:34.940 |
all it's needing to do is double the previous calculation that we got, right? 00:16:44.880 |
So from doubling this, we get this 66.7, right? 00:16:49.540 |
The reason it can do that, without specifying the radius that we're doubling here is because 00:16:54.340 |
it's actually just referring to that past conversational log, it has access to this 00:17:04.220 |
So yeah, let's, we can copy this, and now we can see our new chat history, which is 00:17:10.180 |
slightly longer, of course, okay, so we get this. 00:17:14.300 |
Now, that conversational history is super important in making our agent more conversational, 00:17:21.260 |
which is really cool, pretty simple to do, it's not exactly hard. 00:17:27.100 |
So when we are, you'll see that I have this really simple, just print a period here, right? 00:17:37.180 |
This is just me, so I can see what the agent is actually doing. 00:17:42.500 |
But we can see that this is coming from here. 00:17:55.280 |
Why is it, with one single query, why is it generating more than one response? 00:18:00.540 |
Well, that is because if we just do one response, let me come back to here, we're going to just 00:18:12.720 |
So in the previous video, where I went through function calling, what I showed is that you 00:18:20.740 |
send your query to OpenAI, and it doesn't run the function for you, GPT-4 isn't running 00:18:27.840 |
the function for you, it's returning instructions and parameters that show you, or that you 00:18:34.660 |
can feed, then feed in to the function in order to get the answer, right? 00:18:42.400 |
So we're taking one LLM call to create those parameters for the function. 00:18:49.140 |
But then after that, we then need to feed those parameters into the function, get our 00:18:54.820 |
answer, and then if we want to return a sort of a conversational response, we need to then 00:19:00.740 |
feed that answer back in to the LLM and ask it to give us the answer, right? 00:19:13.620 |
So let me show you, when we are making a query, so when we ask something here, we initialize 00:19:26.780 |
Now the internal thoughts, they're kind of like the conversational history, but it's 00:19:31.080 |
just for the, almost like the internal monologue for the LLM, right? 00:19:36.900 |
So inside the LLM is going to go to generate response up here, right? 00:19:45.220 |
We're going to generate that response, it's going to return, you need to use this circumference 00:19:49.460 |
calculator tool, here is the parameters that you need to input, okay? 00:19:55.220 |
And the finish reason that we're going to have in that response is not going to be stop, 00:20:03.980 |
So if the response is function call, we need to go to handle function call, and handle 00:20:09.940 |
function call is essentially just going to take the response from the LLM and it's going 00:20:17.140 |
to feed it in to one of our functions here, okay? 00:20:22.440 |
So here we're loading the parameters that GPT-4 has given to us, and then we're getting 00:20:28.900 |
the function that we need to use, and then we're just feeding those parameters into the 00:20:36.380 |
So then we have our answer, and what we do is we feed it back to the LLM, or we feed 00:20:42.380 |
it to those internal thoughts as a new message of the assistant to itself, okay? 00:20:52.460 |
And within that message, we just say the answer is this result. 00:20:56.140 |
So that is the answer produced by our circumference calculator, okay? 00:21:02.260 |
So then that's added to the internal thoughts. 00:21:05.180 |
We come back to this here, right, because this is in a while loop, and that's going 00:21:11.860 |
to keep going, right, until we get to this stop finish reason, okay, which is probably 00:21:19.340 |
So we've got our answer from the function, we've fed that back into the LLM, and we're 00:21:26.660 |
So the LLM is now going to see that, and it's going to say, okay, here is the answer in 00:21:35.380 |
So actually, I can then generate the final answer, right? 00:21:40.320 |
So it comes to here, and it's like, okay, let's go onto the final thought answer step 00:21:46.740 |
or function, and what we do is we take all those internal thoughts, put them all together 00:21:51.740 |
into a single string, and we just say, okay, based on the above, so these are all the thoughts 00:21:59.980 |
that we've been going through, so that'll be like LLM, I am going to call this function 00:22:06.020 |
Response, the answer is this, and then that is followed by this little message here. 00:22:11.340 |
Based on the above, I will now answer the question. 00:22:19.620 |
So answer with the assumption that the user has not seen this message. 00:22:22.580 |
If you don't include this, the LLM is going to respond with, hey, you're right, that is 00:22:28.900 |
the correct answer, well done, which is obviously not what we want, because this is the internal 00:22:34.580 |
monologue of the AI, not the user responding to the AI. 00:22:39.500 |
So we need to specify that these internal thoughts that you're having, they're just 00:22:43.900 |
for you, they're not for the user, the user is not going to see them, so you need to answer 00:22:48.500 |
with that in mind, and the LLM does actually do that, as we've seen. 00:22:52.200 |
So we get our final thought, we then feed that into the chat completion, so we have 00:22:59.140 |
our chat history, and that final thought, so not the list of final thoughts, but just 00:23:03.940 |
that single formatted final thought, and then we also specify not to use any functions. 00:23:11.460 |
What I found is that if it got the question of use, you know, what is a circumference, 00:23:16.900 |
it might be tempted again to use the circumference tool again, so it's like, okay, don't use 00:23:29.080 |
So there are many things kind of going on here, even though it's a very simple agent, 00:23:34.540 |
you know, we have the fact that it's able to call these functions, we have that it has 00:23:40.340 |
this conversational history, we have that it has this internal monologue, right? 00:23:45.380 |
But we've done all of that in, what, so 114 lines of code for this agent one, and several 00:23:53.540 |
of those are actually just a system message up here. 00:23:57.300 |
So in reality, it's kind of simple, and it gets what I wanted, which was to make this 00:24:07.380 |
Now what I want to do is just try this out on something that is, in my opinion, a little 00:24:16.460 |
So with those very few lines of code, plus OpenAI's function calling, which admittedly 00:24:23.500 |
is doing most of the heavy lifting here, we get, I don't know, a really cool, minimal 00:24:31.060 |
agent that we can use that includes all these cool features. 00:24:35.900 |
So yeah, I just wanted to share that, something that I worked on, didn't exactly take a huge 00:24:45.540 |
So Aurelio Labs on GitHub, you go to FuncAgent, and it's here. 00:24:50.820 |
It will also be on PyPy, so you will also be able to just do pip install FuncAgent, 00:24:56.820 |
and you can use it in the same way that I use it, put some more interesting functions 00:25:01.380 |
in there, and just see what you can do with that. 00:25:04.960 |
One thing that I will try, maybe I'm not going to do it in this video, because we've already 00:25:11.220 |
been talking for a little while, is in the Aurelio Labs cookbook. 00:25:17.580 |
Last week, I created the function calling example. 00:25:23.700 |
And I think this is a little more interesting. 00:25:26.420 |
So this is where we're creating that product page. 00:25:29.380 |
I think I'm going to just give that a go with this FuncAgent as well, and see how that goes. 00:25:34.780 |
But yeah, I'm not going to do that in this video, just for the sake of time. 00:25:40.580 |
I thought it was just kind of an interesting project, just to see how much we can do with 00:25:45.460 |
very little time, and also just lines of code. 00:25:49.820 |
And a fun little experiment, just to better understand how a conversational agent actually 00:25:56.180 |
works with all these different components, how they interact, and so on. 00:26:04.180 |
I hope this has been interesting, and maybe useful. 00:26:08.420 |
Again, like I said, there is the FuncAgent repo on GitHub. 00:26:12.740 |
Feel free to go ahead and improve that, submit any issues you have, or PRs, or whatever. 00:26:21.980 |
And we'll maybe try and make that a little more robust than it currently is.