back to index

Building Chatbot Agents from Scratch with OpenAI Functions!


Chapters

0:0 OpenAI Functions Agent
0:26 Recap of OpenAI Function Calling
2:34 How OpenAI Functions Agent Works
5:42 Answering Questions Without Tools
9:50 Generating Function Instructions
13:13 Agent Code
15:13 Giving an Agent Conversational Memory
17:25 Agent Internal Thoughts
23:29 Agent Overview

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today, we're going to be taking a look at building a minimal agent framework,
00:00:05.540 | kind of like Lang chain, but without so much overhead.
00:00:09.560 | Something really simple that just uses OpenAI's new function calling method
00:00:15.800 | and is really minimalistic in terms of what it does
00:00:20.900 | and just focuses on that single being an agent that can use tools.
00:00:26.040 | Now, OpenAI recently released this function calling feature,
00:00:30.700 | and I did do a video on this, so you can check that out.
00:00:34.100 | There'll be a link at the top of the video right now.
00:00:36.000 | But what it essentially allows us to do is pass a description
00:00:43.240 | or a set of instructions on how to use a particular function.
00:00:47.940 | That function can be in Python or any other language, it's language agnostic.
00:00:54.200 | And when you are passing instructions to GPT-4 or GPT-3.5,
00:01:00.940 | they will be able to actually return a JSON response,
00:01:04.900 | which sets the parameters that should be included as input to this function
00:01:11.200 | and their values based on whatever query you have asked.
00:01:15.500 | So in that previous video, I demoed an example of using this
00:01:20.300 | to actually generate like a product web page.
00:01:23.840 | So there were a couple of items in there.
00:01:26.100 | The GPT-4 in that case would have to generate a title for the product that I described.
00:01:33.400 | It would have to generate like a product description for that product.
00:01:38.700 | And it would then also need to generate a prompt that would be passed
00:01:43.300 | to a image generation model that would create a image to be used on that product page.
00:01:49.500 | And that was really cool because it was so easy to set up,
00:01:53.640 | but straight away you could see the potential of using this sort of thing.
00:01:58.300 | So what I want to do with this minimal agent framework is a similar thing.
00:02:02.400 | We're going to be using those OpenAI function calling.
00:02:05.440 | Of course, we can extend that to other models in the future as well.
00:02:09.000 | But we want to be using the function calling.
00:02:11.600 | We want to automatically infer the function instructions
00:02:17.200 | based on a function and the doc strings or anything else that we have in there
00:02:21.600 | to enable or to include conversational history in there
00:02:26.560 | and to make all this super robust and just easy to use with as little overhead as possible.
00:02:34.800 | So before we jump into how we build that agent,
00:02:40.300 | I just want to show you how it works.
00:02:43.300 | So I'm going to use this notebook here.
00:02:46.700 | And all I'm going to do is I'm going to go from FuncAgent.
00:02:50.540 | So it's a function calling agent, hence why I named it FuncAgent.
00:02:55.800 | I don't know if that's a good name or not, but it's what came to my head.
00:02:59.440 | I'm going to import the agents file.
00:03:03.740 | So in agents, we have right now just a single agent.
00:03:09.700 | It's kind of like a React agent, but maybe not as sophisticated for now, but it works.
00:03:17.600 | And again, like I said, I want it to be very minimal.
00:03:20.640 | Okay, so what we're going to do is we're going to say agent equals agents, agent.
00:03:29.340 | And in here, we have our OpenAI API key.
00:03:33.300 | I have set that already in, so OpenAI API key up here.
00:03:41.500 | The model name is this Jupyter 4.0.6.13.
00:03:46.340 | And we also need to include a list of functions that we would like our agent to be able to use.
00:03:51.160 | Now, those functions is just this circumference calculator up here.
00:03:55.540 | So just a really simple example.
00:03:57.260 | We'll try something more interesting later.
00:03:59.500 | So, I'm going to do functions equals, and then I'm just going to pass that in there.
00:04:05.000 | So circumference calculator.
00:04:08.140 | Okay.
00:04:10.440 | Now, let's run that.
00:04:13.900 | And then from there, I'm just going to say, okay, agent ask, and I'm going to say, what
00:04:20.560 | is the circumference of a circle with a radius of five?
00:04:30.440 | Okay.
00:04:31.960 | And let's just see what that comes up with.
00:04:34.800 | So we can see here, actually, I have something there.
00:04:38.100 | Actually, we'll keep something in there because I want to show you what difference that makes
00:04:43.760 | when we're defining the tools, but in reality, we're only going to be using this parameter
00:04:50.120 | here.
00:04:51.360 | So I'm going to take that and we'll see what it comes back with.
00:04:57.600 | Now at the moment, it's coming back with everything just because I'm still developing this, but
00:05:03.920 | what it will eventually come back with is just the response and not everything.
00:05:09.920 | Okay.
00:05:11.400 | So this is what we have.
00:05:14.160 | Circumference of a circle is calculated using the formula, and this is two pi, where R is
00:05:22.480 | the radius of the circle.
00:05:23.760 | So if the radius of the circle is five units, then the circumference C would be, and we
00:05:31.800 | get this 31.42 units.
00:05:35.680 | Now how did it get to that answer?
00:05:37.920 | Because these models, LLMs, are just really bad at basic maths.
00:05:42.280 | So let's try and answer this question without the agent.
00:05:47.760 | So I'm going to initialize a new model.
00:05:50.800 | So just to import OpenAI, OpenAI API key, and then I'm going to run this.
00:06:04.160 | So the model is what we said before, so set dupty, is it four, zero, six, 13, messages.
00:06:14.160 | So this is, let me get those down here or here.
00:06:18.080 | So messages are going to be equal to, first we have the role system, and I'm going to
00:06:24.160 | replicate what we're doing inside the func agent.
00:06:29.220 | So we're going to say content is equal to agents sys, okay, here.
00:06:37.240 | So let me show you what this looks like.
00:06:39.360 | Let me print, maybe that will be nicer.
00:06:47.200 | So I think this is essentially just a copy, maybe slightly different from the line chain
00:06:52.880 | agent system message.
00:06:54.700 | So we have this.
00:06:57.940 | That's the system message and then following that, we're going to have our user question.
00:07:04.240 | So user content, and that is just going to be our query from before.
00:07:12.960 | So what was our query?
00:07:13.960 | It was this.
00:07:15.760 | Okay, so there are messages, oops, let's fix that.
00:07:22.880 | And we are going to put those in here.
00:07:27.040 | Now functions, so for this, we actually need to get the function again, like the function
00:07:33.640 | instructions that have been created automatically by our agent.
00:07:38.080 | So that will be an agent, I think functions, if I'm not wrong.
00:07:48.080 | So you can actually see the description or the instructions that are generated automatically
00:07:54.060 | when we pass in our function to the agent.
00:07:57.240 | We'll explain those in a little more detail soon.
00:07:59.940 | So we can run that, but actually what I want to do is try it without those functions.
00:08:08.400 | And maybe what we'll do is make this a little bit harder by saying we're radius of 4.31,
00:08:14.940 | something like that, and see what we get with and without the function, okay?
00:08:20.660 | So let's run that, okay, and let's do the circumference calculator to see what we're
00:08:28.940 | getting or what we should be getting.
00:08:31.500 | So if we do 5.31, 5.31, okay, you should get 33.3468.
00:08:40.660 | And here we get 33.39, so it's close, but it's not actually correct, which is not ideal.
00:08:50.140 | So if we come up to here, what I'm going to do is actually rerun this agent and see what
00:08:57.300 | it gives us if we ask the same question, so 5.31.
00:09:03.940 | And we'll see if this answer is any better than what we just got down there, which was
00:09:07.740 | not quite accurate, okay?
00:09:09.900 | So we come here and we can see that we're getting this 33.35 units this time, okay?
00:09:16.540 | So if we compare that to, let me remove these bits here.
00:09:23.620 | If we compare that to the circumference calculator, we get 33.346, right?
00:09:29.660 | And the answer we got is actually 33.35, so it's just rounding up.
00:09:34.180 | So actually, it's correct because we're actually using that circumference calculator in the
00:09:39.260 | tooling.
00:09:40.500 | So that's kind of the point of using this agent, like it can do things that a large
00:09:46.020 | language model by itself cannot do, so it can rely on these external tools.
00:09:50.940 | And it also allows us to automatically generate these from just a Python function, right?
00:09:59.220 | So let me just go through that a little bit as well.
00:10:03.940 | So we have our name, the description, we have these parameters, right?
00:10:10.460 | These are all things that are needed by this here, right?
00:10:15.340 | So this functions parameter from OpenAI's chat completion endpoint.
00:10:21.220 | All of these are needed.
00:10:22.620 | Now, how did we create those?
00:10:25.220 | Well, if you take a look at this, you can kind of see all this information.
00:10:30.260 | And let me even maybe bring this up here.
00:10:35.060 | You can see that all this information is contained within this definition here.
00:10:38.780 | So we have the name, it's circumference calculator.
00:10:42.180 | We have the parameters, right?
00:10:44.020 | So parameters, we have radius here, which is a float, which is actually number here.
00:10:52.460 | And we also have this something, right?
00:10:55.880 | Something is included here, and that again is a number or float.
00:11:00.400 | So all of that information is contained within there.
00:11:04.900 | We also have the description, at least for the radius, because the description is contained
00:11:09.220 | actually within the docstring.
00:11:11.380 | We don't include the description for the something variable.
00:11:16.660 | So actually that is just empty.
00:11:19.100 | And we also, for something, it's not a required parameter, because we set this value here
00:11:25.940 | by default.
00:11:27.020 | So in reality, all we need is the radius.
00:11:29.540 | Okay?
00:11:30.540 | So, GPT-4 or GPT-3.5 reads this information here, and based on that, it will allow us
00:11:38.380 | or it will return instructions on how to use this function when we're asking a query and
00:11:46.300 | how to satisfy that.
00:11:49.020 | And we can kind of see that happening in here.
00:11:51.420 | So this, you know, it's really simple.
00:11:53.740 | Again, like I said, I want this to be as simple as possible, very minimal, and just as well
00:12:00.100 | easy to read, all this sort of stuff.
00:12:02.540 | So if we go to the parser file within our FuncAgent, we can see what we're doing.
00:12:07.620 | Okay?
00:12:08.620 | So again, this is like a first iteration.
00:12:10.420 | It's definitely not complete.
00:12:12.200 | So we are going to FuncToJson, okay, that is used by an agent when it sees a function.
00:12:20.900 | We're using inspect to get function annotations, a docstring, descriptions, all this sort of
00:12:28.180 | stuff.
00:12:29.180 | So we're just using all of that information from the annotations of our function, from
00:12:34.060 | the docstring, to construct the instructions that are required by function calling in OpenAI.
00:12:42.300 | Okay?
00:12:43.300 | And that's all we're doing.
00:12:44.540 | There's nothing, you know, there's nothing that complicated going on there.
00:12:47.900 | I mean, this whole file is 59 lines of code, and it could probably be much less as well.
00:12:53.540 | It's really straightforward.
00:12:54.940 | The one thing I will say is that it does require we use this syntax for the docstring for now.
00:13:01.700 | In the future, of course, we'll probably extend that to other, like, common docstring formats
00:13:05.980 | as well.
00:13:07.380 | Okay.
00:13:08.380 | Cool.
00:13:09.380 | So we have that.
00:13:10.380 | That's our parser.
00:13:11.540 | And then we also have the agents file, which contains the agent itself, all of those instructions.
00:13:18.060 | So in the agent, what we need for it to work as a fully functional conversational agent
00:13:26.040 | is a few things.
00:13:27.980 | First, we need the LLM itself.
00:13:30.740 | So we're kind of initializing, not the LLM, but we have, like, the model name here, and
00:13:36.140 | we pass that to OpenAI when we're generating some text.
00:13:41.220 | We need those functions that it can use, right?
00:13:44.740 | So that's when we're using the parser I just mentioned, right?
00:13:48.980 | So we can see we have that parser func to JSON for all of the functions that we are
00:13:52.880 | passing in the functions list here, right?
00:13:57.900 | That doesn't have to be any functions there, but obviously, if we want to use an agent
00:14:00.780 | with tools, we kind of do want to use that.
00:14:03.280 | Then what we're doing is creating this function mapping.
00:14:06.100 | So basically, when the LLM, gpt4, gpt3.5, comes back to us, it's going to say, you need
00:14:13.960 | to use this function, so, like, the circumference calculator, with these parameters, which it
00:14:19.100 | will give us in, like, a JSON format.
00:14:21.600 | So we need a way of just mapping those names of each function to the function itself.
00:14:28.380 | So that's all we're doing there, again, super simple.
00:14:32.000 | Nothing complicated at all going on there.
00:14:35.260 | And then, so this is the bit that makes it conversational, right?
00:14:39.460 | We need to have a chat history.
00:14:41.800 | So that chat history allows us to have multiple messages and continue a conversation with
00:14:49.060 | our agent, rather than just having a single query, getting a single response, and then
00:14:54.140 | starting all over again.
00:14:55.900 | So that chat history allows us to have a log of our interactions with the agent, and essentially
00:15:02.820 | have that past history of interactions considered with every new query coming in.
00:15:10.780 | So all those are super important.
00:15:13.960 | And with the chat history, we can actually come over here, and we can access that chat
00:15:18.940 | history.
00:15:19.940 | So let me just remove the bits I don't necessarily want here.
00:15:24.520 | So let's remove this, this, and this, okay?
00:15:29.660 | So we have our agent, let's have a look at the agent, is it chat, yeah, chat history.
00:15:36.260 | Okay, so we can see what is happening there.
00:15:39.440 | So we have our query, and then we're logging the, like the response from our AI.
00:15:46.320 | Now what I can do is, okay, maybe I can say, okay, agent asks, what is the circumference
00:15:55.180 | if we double the radius, and let's see what comes back.
00:15:59.940 | So we're not specifying the number here, it's going to have to refer to that conversational
00:16:03.900 | history in order to produce the new query.
00:16:10.020 | And we can see, okay, it's explaining what it's doing, and let's come across the circumference
00:16:16.180 | of the circle, the doubled radius would be 66.7 units, right?
00:16:21.660 | And even says this simply double the original circumference, because the circumference of
00:16:25.500 | a circle scales linearly with the radius, right?
00:16:29.860 | And okay, in this case, it doesn't actually use this circumference calculator, because
00:16:34.940 | all it's needing to do is double the previous calculation that we got, right?
00:16:39.860 | Which was the 33.35 units.
00:16:44.880 | So from doubling this, we get this 66.7, right?
00:16:49.540 | The reason it can do that, without specifying the radius that we're doubling here is because
00:16:54.340 | it's actually just referring to that past conversational log, it has access to this
00:17:00.220 | conversational history, okay?
00:17:02.260 | And that's why we get that.
00:17:04.220 | So yeah, let's, we can copy this, and now we can see our new chat history, which is
00:17:10.180 | slightly longer, of course, okay, so we get this.
00:17:14.300 | Now, that conversational history is super important in making our agent more conversational,
00:17:21.260 | which is really cool, pretty simple to do, it's not exactly hard.
00:17:25.240 | But there are other things as well.
00:17:27.100 | So when we are, you'll see that I have this really simple, just print a period here, right?
00:17:34.900 | And then here, there was two of them.
00:17:37.180 | This is just me, so I can see what the agent is actually doing.
00:17:42.500 | But we can see that this is coming from here.
00:17:47.380 | So generate response method here, okay?
00:17:51.780 | Now what is this doing, okay?
00:17:55.280 | Why is it, with one single query, why is it generating more than one response?
00:18:00.540 | Well, that is because if we just do one response, let me come back to here, we're going to just
00:18:09.780 | get one item here, right?
00:18:12.720 | So in the previous video, where I went through function calling, what I showed is that you
00:18:20.740 | send your query to OpenAI, and it doesn't run the function for you, GPT-4 isn't running
00:18:27.840 | the function for you, it's returning instructions and parameters that show you, or that you
00:18:34.660 | can feed, then feed in to the function in order to get the answer, right?
00:18:40.420 | And the same is true for this, right?
00:18:42.400 | So we're taking one LLM call to create those parameters for the function.
00:18:49.140 | But then after that, we then need to feed those parameters into the function, get our
00:18:54.820 | answer, and then if we want to return a sort of a conversational response, we need to then
00:19:00.740 | feed that answer back in to the LLM and ask it to give us the answer, right?
00:19:09.800 | So we actually do that here as well.
00:19:13.620 | So let me show you, when we are making a query, so when we ask something here, we initialize
00:19:24.140 | this internal thoughts list.
00:19:26.780 | Now the internal thoughts, they're kind of like the conversational history, but it's
00:19:31.080 | just for the, almost like the internal monologue for the LLM, right?
00:19:36.900 | So inside the LLM is going to go to generate response up here, right?
00:19:45.220 | We're going to generate that response, it's going to return, you need to use this circumference
00:19:49.460 | calculator tool, here is the parameters that you need to input, okay?
00:19:53.780 | So we get that.
00:19:55.220 | And the finish reason that we're going to have in that response is not going to be stop,
00:20:00.100 | it's going to be function call here, right?
00:20:03.980 | So if the response is function call, we need to go to handle function call, and handle
00:20:09.940 | function call is essentially just going to take the response from the LLM and it's going
00:20:17.140 | to feed it in to one of our functions here, okay?
00:20:22.440 | So here we're loading the parameters that GPT-4 has given to us, and then we're getting
00:20:28.900 | the function that we need to use, and then we're just feeding those parameters into the
00:20:32.020 | function and getting our answer, okay?
00:20:36.380 | So then we have our answer, and what we do is we feed it back to the LLM, or we feed
00:20:42.380 | it to those internal thoughts as a new message of the assistant to itself, okay?
00:20:52.460 | And within that message, we just say the answer is this result.
00:20:56.140 | So that is the answer produced by our circumference calculator, okay?
00:21:02.260 | So then that's added to the internal thoughts.
00:21:05.180 | We come back to this here, right, because this is in a while loop, and that's going
00:21:11.860 | to keep going, right, until we get to this stop finish reason, okay, which is probably
00:21:17.420 | going to happen with the next iteration.
00:21:19.340 | So we've got our answer from the function, we've fed that back into the LLM, and we're
00:21:24.080 | asking it to generate again, okay?
00:21:26.660 | So the LLM is now going to see that, and it's going to say, okay, here is the answer in
00:21:32.740 | my internal thoughts messages.
00:21:35.380 | So actually, I can then generate the final answer, right?
00:21:40.320 | So it comes to here, and it's like, okay, let's go onto the final thought answer step
00:21:46.740 | or function, and what we do is we take all those internal thoughts, put them all together
00:21:51.740 | into a single string, and we just say, okay, based on the above, so these are all the thoughts
00:21:59.980 | that we've been going through, so that'll be like LLM, I am going to call this function
00:22:04.700 | with these parameters.
00:22:06.020 | Response, the answer is this, and then that is followed by this little message here.
00:22:11.340 | Based on the above, I will now answer the question.
00:22:14.060 | Now this is important.
00:22:15.420 | So this message will only be seen by me.
00:22:19.620 | So answer with the assumption that the user has not seen this message.
00:22:22.580 | If you don't include this, the LLM is going to respond with, hey, you're right, that is
00:22:28.900 | the correct answer, well done, which is obviously not what we want, because this is the internal
00:22:34.580 | monologue of the AI, not the user responding to the AI.
00:22:39.500 | So we need to specify that these internal thoughts that you're having, they're just
00:22:43.900 | for you, they're not for the user, the user is not going to see them, so you need to answer
00:22:48.500 | with that in mind, and the LLM does actually do that, as we've seen.
00:22:52.200 | So we get our final thought, we then feed that into the chat completion, so we have
00:22:59.140 | our chat history, and that final thought, so not the list of final thoughts, but just
00:23:03.940 | that single formatted final thought, and then we also specify not to use any functions.
00:23:11.460 | What I found is that if it got the question of use, you know, what is a circumference,
00:23:16.900 | it might be tempted again to use the circumference tool again, so it's like, okay, don't use
00:23:21.740 | any functions.
00:23:23.820 | And then we return, okay?
00:23:26.260 | And from that, we actually get the answer.
00:23:29.080 | So there are many things kind of going on here, even though it's a very simple agent,
00:23:34.540 | you know, we have the fact that it's able to call these functions, we have that it has
00:23:40.340 | this conversational history, we have that it has this internal monologue, right?
00:23:45.380 | But we've done all of that in, what, so 114 lines of code for this agent one, and several
00:23:53.540 | of those are actually just a system message up here.
00:23:57.300 | So in reality, it's kind of simple, and it gets what I wanted, which was to make this
00:24:04.880 | as minimal as possible.
00:24:07.380 | Now what I want to do is just try this out on something that is, in my opinion, a little
00:24:14.620 | more interesting, okay?
00:24:16.460 | So with those very few lines of code, plus OpenAI's function calling, which admittedly
00:24:23.500 | is doing most of the heavy lifting here, we get, I don't know, a really cool, minimal
00:24:31.060 | agent that we can use that includes all these cool features.
00:24:35.900 | So yeah, I just wanted to share that, something that I worked on, didn't exactly take a huge
00:24:41.540 | amount of time to put together.
00:24:44.100 | But you can also use it yourself.
00:24:45.540 | So Aurelio Labs on GitHub, you go to FuncAgent, and it's here.
00:24:50.820 | It will also be on PyPy, so you will also be able to just do pip install FuncAgent,
00:24:56.820 | and you can use it in the same way that I use it, put some more interesting functions
00:25:01.380 | in there, and just see what you can do with that.
00:25:04.960 | One thing that I will try, maybe I'm not going to do it in this video, because we've already
00:25:11.220 | been talking for a little while, is in the Aurelio Labs cookbook.
00:25:17.580 | Last week, I created the function calling example.
00:25:23.700 | And I think this is a little more interesting.
00:25:26.420 | So this is where we're creating that product page.
00:25:29.380 | I think I'm going to just give that a go with this FuncAgent as well, and see how that goes.
00:25:34.780 | But yeah, I'm not going to do that in this video, just for the sake of time.
00:25:38.380 | So yeah, I hope this is interesting.
00:25:40.580 | I thought it was just kind of an interesting project, just to see how much we can do with
00:25:45.460 | very little time, and also just lines of code.
00:25:49.820 | And a fun little experiment, just to better understand how a conversational agent actually
00:25:56.180 | works with all these different components, how they interact, and so on.
00:26:01.700 | But yeah, for now, that's it for this video.
00:26:04.180 | I hope this has been interesting, and maybe useful.
00:26:08.420 | Again, like I said, there is the FuncAgent repo on GitHub.
00:26:12.740 | Feel free to go ahead and improve that, submit any issues you have, or PRs, or whatever.
00:26:21.980 | And we'll maybe try and make that a little more robust than it currently is.
00:26:26.420 | But yeah, I'll leave it there.
00:26:28.500 | So thank you very much for watching.
00:26:31.460 | And I will see you again in the next one.
00:26:33.980 | [MUSIC PLAYING]
00:26:37.340 | [MUSIC PLAYING]
00:26:40.700 | [MUSIC PLAYING]
00:26:44.060 | [MUSIC PLAYING]