LangChain Agent Executor Deep Dive

in this chapter we're going to be taking a deep dive into agents with the lang chain and we're going to be covering what an agent is we're going to talk a little bit conceptually about agents the react agent and the type of agent that we're going to be building and based on that knowledge we are actually going to build out our own agent execution logic which we refer to as the agent executor so in comparison to the previous video on agents in lang chain which is more of an introduction this is far more detailed we'll be getting into the weeds a lot more with both what agents are and also agents within lang chain now when we talk about agents a significant part of the agent is actually relatively simple code logic that iteratively runs lm calls and processes their outputs potentially running or executing tools the exact logic for each approach to building an agent will actually vary pretty significantly but we'll focus on one of those which is the react agent now react is it's a very common pattern and although being relatively old now most of the tool agents that we see used by open ai and essentially every lm company they all use a very similar pattern now the react agent follows a pattern like this okay so we would have our user input up here okay so our input here is a question all right aside from the apple remote what other device can control the program apple remote was originally designed to interact with now probably most elements would actually be able to answer this directly now this is from the paper which was a few years back now in this scenario assuming our lm didn't already know the answer there are multiple steps that an lm or an agent might take in order to find out the answer okay so the first of those is we say our question here is what other device can control the program apple remote was originally designed to interact with so the first thing is okay what was the program that the apple remote was originally designed to interact with that's the first question we have here so what we do is i need to search apple remote and find a program it was useful this is a reasoning step so the lm is reasoning about what it needs to do i need to search for that and find a program of useful so we are taking an action this is a tool call here okay so we're going to use the search tool and our query will be apple remote and the observation is the response we get from executing that tool okay so the response here would be the apple remote is designed to control the front row media center so now we know the program apple remote was originally designed to interact with now we're going to go through another iteration okay so this is one iteration of our reasoning action and observation so when we're talking about react here although again this sort of pattern is very common across many agents when we're talking about react the name actually is reasoning or the first two characters of re's reasoning followed by action okay so that's where the react comes from so this is one of our react agent loops iterations we're going to go and do another one so next step we have this information the lm is now provided with this information now we want to do a search for front row okay so we do that this is the reasoning step we perform the action search front row okay tool search query front row observation this is the response front row is controlled by an apple remote or keyboard function keys all right cool so we know keyboard function keys are the other device that we were asking about up here so now we have all the information we need we can provide an answer to our user so we go through another iteration here reasoning in action our reasoning is i can now provide the answer of keyboard function keys of keyboard function keys to the user okay great so then we use the answer tool it's like final answer in more common tool agent use and the answer would be keyboard function keys which we then output to our user okay so that is the react loop okay so looking at this how where are we actually calling an lm and what and in what way are we actually calling an lm so we have our reasoning step our lm is generating the text here right so lm is generating okay what should i do then our lm is going to generate input parameters to our action step here that will those input parameters and the tool being used will be taken by our code logic our agent executor logic and they will be used to execute some code in which we will get an output that output might be taken directly to our observation or our lm might take that output and then generate an observation based on that it depends on how you've implemented everything so our lm could potentially be new be being used at every single set there and of course that will repeat through every iteration so we have further iterations down here so you're potentially using lm multiple times throughout this whole process which of course in terms of latency and token costs it does mean that you're going to be paying more for an agent than you are with just a standard lm but that that is of course expected because you have all of these different things going on but the idea is that what you can get out of an agent is of course much better than what you can get out of an lm alone so when we're looking at all of this all of this iterative chain of thought and tool use all this needs to be controlled by what we call the agent executor okay which is our code logic which is hitting our lm processing its outputs and repeating that process until we get to our answer so breaking that part down what does it actually look like it looks kind of like this so we have our user input goes into our lm okay and then we move on to the reasoning and action steps is the action the answer if it is the answer so as we saw here where is the answer if the action is the answer so true we would just go straight to our outputs otherwise we're going to use our select tool agent executor is going to handle all this it's going to execute our tool and then from that we get our you know three reasoning action observation inputs and outputs and then we're feeding all that information back into our lm okay in which case we go back through that loop so we could be looping for a little while until we get to that final output okay so let's go across to the code when we're going to the agent executor notebook we'll open that up in colab and we'll go ahead and just install our prerequisites nothing different here just line chain linesmith optionally as before again optionally line chain api key if you do want to use linesmith okay and then we'll come down to our first section where it's going to define a few quick tools i'm not necessarily going to go through these these because we've already covered them in the agent introduction but very quickly line chain core tools we're just importing this tool decorator which transforms each of our functions here into what we would call a structured tool object this thing here okay which we can see i'm just having a quick look here and then if we wanted to we can extract all of the sort of key information from that structure tool using these parameters here or attributes so name description which gives us essentially how the llm should use our function okay so i'm going to keep pushing through that now very quickly again we did cover this in the intro video so i don't want to necessarily go over again in too much detail but our agent executor logic is going to need this part so we're going to be getting a string from our llm we're going to be loading that into to a dictionary object and we're going to be using that to actually execute our tool as we do here using keyword arguments okay like that okay so with the tools out of the way let's take a look at how we create our agent so when i say agent here i'm specifically talking about the part that is generating our reasoning step then generating which tool and what the input parameters to that tool will be then the rest of that is not actually covered by the agent okay the rest of that would be covered by agent execution logic which would be taking the tool to be used the parameters executing the tool getting the response aka the observation and then iterating through that until the llm is satisfied and we have enough information to answer a question so looking at that our agent will look something like this it's pretty simple so we have our input parameters including the chat history user query we have our input parameters including the chat history use query and actually would also have any intermediate steps that have happened in here as well we have our prompt template and then we have our llm binded with tools so let's see how all this would look starting with we'll define our prompt template searching like this we have our system message you're a helpful assistant when answering these questions you should use on talk to provide after using a tool the tool app will provide in the scratch pad below okay which we're naming here if you have an answer in scratch pad you should not use any more tools and send answer directly to the user okay so we have that as our system message we could obviously modify that based on what we're actually doing then following our system message we're going to have our chat history so any previous interactions between user and the ai then we have our current message from the user okay we should be fed into the input field there and then following this we have our agent stretch pad or the intermediate thoughts so this is where things like the llm deciding okay this is what i need to do this is how i'm going to do it aka the tool call and this is the observation that's where all of that information will be going all right so each of those you want to pass in as a message okay and the way that will look is that any tool call generation from the llm so when the llm is saying use this tool please that will be a assistant message and then the responses from our tool so the observations they will be returned as tool messages great so we'll run that to define our prompt template we're going to define our lm so we're going to be using jupy 40 mini with a temperature of zero because we want less creativity here particularly when we're doing tool calling that there's just no need for us to use a high temperature here so we need to enter our openai api key which we would get from platform openai.com you enter this then we're going to continue and we're just going to add tools to our lm here okay these and we're going to bind them here then we have tool choice any so tool choice any we'll see in a moment i'll go through this a little bit more in a second but that's going to essentially force a tool call now you can also put required which is actually a bit more a bit clearer but i'm using any here so i'll stick with it so these are our tools we're going through we have our inputs into the agent runnable we have our prompt template and then that will get fed into our lm so let's run that now we would invoke the agent part of everything here with this okay so let's see what it outputs this is important so i'm asking what is 10 percent obviously that should use the addition tool and we can actually see that happening so the agent message content is actually empty here this is where you'd usually get an answer but if we go and have a look we have additional keyword args in there we have tool calls and then we have function arguments okay so we're calling a function arguments for that function are this okay so we can see this is string again the way that we would pass that as we do json loads and that becomes a dictionary and then we can see which function is being called and it is the add function and that is all we need in order to actually execute our function or our tool okay we can see it's a lot more detail here now what do we do from here we're going to map the tool name to the tool function and then we're just going to execute the tool function with the generated args i.e those i'll also just point out quickly that here we are getting the dictionary directly which i think is coming from somewhere else and this which is probably which is here okay so even that step here where we're passing this out we don't necessarily need to do that because i think on the lang chain side they're doing it for us so we're already getting that so json loads we don't necessarily need here okay so we're just creating this tool name to function mapping dictionary here so we're taking the well the tool names and we're just mapping those back to our tool functions and this is coming from our tools lists so that tools list that we defined here okay and we can even just see quickly and that will include everything or each of the tools you define there okay that's all it is now we're going to execute using our name to tool mapping okay so this here will get us the function so it will go as this function and then to that function we're going to pass the arguments that we generated okay let's do what looks like all right so the response so the observation is 20.

now we are going to feed that back into our lm using the tool message and we're actually going to put a little bit of text around this to make it a little bit nice so we don't necessarily need to do this to be completely honest we could just return the answer directly i don't understand i don't even think there would really be any difference so we could do either in some cases that could be very useful in other cases like here it doesn't really make too much difference particularly because we have this tool call id and what this tool call id is doing is it's being used by open ai is being read by the lm so that the lm knows that the response we got here is actually mapped back to the the tool execution that it's identified here because you see that we have this id right we have an id here the lm is going to see the id it's going to see the id that we pass back in here and it's going to see those two are connected so you can see okay this is the tool i called and this is a response i got from it because of that you don't necessarily need to say which tool you used here but you can it depends on what you're doing okay so what do we get here we have okay just running everything again we've added our tool call so that's the original ai message that includes okay use the add tool and then we have the tool execution tool message which is the observation we map those to the agent stretch pad and then what do we get we have an ai message but the content is empty again which is interesting because we we said to our lm up here if you have an answer to the district in the stretch pad you should not use any more tools and send answer directly to the user so why why is our lm not answering well the reason for that is down here we specify tool choice equals any which again is the same as tool choice required which is telling the lm that it cannot actually answer directly it has to use a tool and i usually do this all right i would usually put tool choice equals any or all required and force the lm to use a tool every single time so then the question is if it has to use a tool over time how does it answer our user well we'll see in a moment first i just want to show you the two options essentially that we have the second is what i would usually use but let's let's start with the first so the first option is that we set tool choice equal to auto and this tells the lm that you can either use a tool or it can answer the user directly using the final answer or using that content field so if we run that like we're specifying tool choices auto we run that let's invoke okay initially you see ah wait there's still no content that's because we we didn't add anything into the agent scratchpad here there's no information right it's all empty um actually it's empty because sorry so here you have the chat history that's empty we didn't specify the agent scratchpad and the reason that we can do that is because we're using if you look here we're using get so essentially it's saying try and get agent scratchpad from this dictionary but if it hasn't been provided we're just going to give an empty list so that's what that's why we don't need to specify it here but that means that oh okay the the agent doesn't actually know anything here it hasn't used the tool yet so we're going to just go through our iteration again right so we're going to get our tool output we're going to use that to create the tool message and then we're going to add our tool call from the ai and the observation we're going to pass those to the agent scratchpad and this time we'll see we run that okay now we get the content okay so now it's not calling you see here there's no tool call or anything going on we just get content so that is this is a standard way of doing or building a tool calling agent the other option which i mentioned this is what i would usually go with so number two here i would usually create a final answer tool so why would we even do that why would we create a final answer tool rather than just you know this method is actually perfectly you know it works so why would we not just use this there are a few reasons the main ones are that with option two where we're forcing tool calling this removes possibility of an agent using that content field directly and the reason at least the reason i found this good when building agents in the past is that occasionally when you do want to use a tool it's actually going to go with the content field and it can get quite annoying and use the content field quite frequently when you actually do want it to be using one of the tools and this is particularly noticeable with smaller models with bigger models it's not as common although it does so happen now the second thing that i quite like about using a tool as your final answer is that you can enforce a structured output in your answers so this is something we saw in i think the first yes the first line chain example where we were using the structured output tool of line chain and what that actually is that the structured outputs feature of line chain it's actually just a tool call right so it's forcing a tool call from your lm it's just abstracted away so you don't realize that that's what it's doing but that is what it's doing so i find that structured outputs are very useful particularly when you have a lot of code around your agent so when that output needs to go downstream into some logic that can be very useful because you can you have a reliable output format that you know is going to be output and it's also incredibly useful if you have multiple outputs or multiple fields that you need to generate for so those can be very useful now to implement this so to implement option two we need to create a final answer tool we as without other tools we're actually going to provide description and you can or you cannot do this so you can you can also just return none and actually just use the generated action as the essentially what you're going to send out of your agent execution logic or you can actually just execute the tool and just pass that information directly through perhaps in some cases you might have some additional post processing for your final answer maybe you do some checks to make sure it hasn't said anything weird you could add that in this tool here but yeah in this case we're just trying to pass those two directly so let's run this we've added where are we finance it we've added the final answer tool to our named tool mapping so our agent can now use it we redefine our agent setting tool choice to any because we're forcing the tool choice here and let's go with what is 10 plus 10 see what happens okay we get this right we can also one thing nice thing here is that we don't need to check is our app in the content field or is it in the tool course field we know it's going to be in the tool course field because we're forcing that tool use which is quite nice so okay we know we're using the add tool and these are the arguments great we go or go through our process again we're going to create our tool message and then we're going to add those messages into our scratchpad or intermediate sets and then we can see again ah okay content field is empty that is expected we we're forcing tool users no way that this can be this can be or have anything inside it but then if we come down here to our tool course nice final answer arbs answer 10 plus 10 equals 20.

all right we also have this tools use where's tools use coming from okay well i mentioned before that you can add additional things or or outputs when you're using this tool use for your final answer so if you just come up here to here you can see that i asked the lm to use that tools use field which i defined here it's a list of strings use this to tell me what tools you used in your answer all right so i'm getting the normal answer but i'm also getting this information as well which is kind of nice so that's where that is coming from see that okay so we have our actual answer here and then we just have some additional information okay and we've also defined the type here it's just a list of strings which is really nice it's giving us a lot of control over what we're outputting which is perfect that's you know when you're building with agents the biggest problem in most cases is control of your lm so here we're getting a honestly pretty unbelievable amount of control over what our lm is going to be doing which is perfect for when you're building in the real world so this is everything that we need this is our answer and we would of course be passing that downstream into whatever logic our ai application would be using okay so maybe that goes directly to a front end and we're displaying this as our answer and we're maybe providing some information about okay where did this answer come from or maybe there's some additional steps downstream where we're actually doing some more processing or transformations but yeah we have that that's great now everything we've just done here we've been executing everything one by one and that's to help us understand what process we go through when we're building an agent executor but we're not going to want to do that all the time are we most of the time we probably want to abstract all this away and that's what we're going to do now so we're going to build essentially everything we've just taken we're going to abstract take that and abstract it away into a custom agent executor class so let's have a quick look at what we're doing here although it's literally just what we we just did okay so custom agent executor we initialize it we set this match situations i'll talk about this in a moment we initialize it that is going to set our chat history to just being empty okay because it's a new agent and there should be no chat history in this case then we actually define our agent right so that part of the logic that is going to be taking out inputs and generating what to do next aka what tool call to do okay and we set everything as attributes of our class and then we're going to define an invoke method this invoke method is going to take an input which is just a string so it's going to be our message from the user and what it's going to do is it's going to iterate through essentially everything we just did okay until we hit the the final answer tool okay so well what does that mean we have our tool call right which is it we're just invoking our agent all right so it's going to generate what tool to use and what parameters should go into that okay and that's a that's an ai message so we would append that to our agent scratch pad and then we're going to use the information from our tool call so the name of the tool and the args and also the id we're going to use all that information to execute our tool and then provide the observation back to our lm okay so execute our tool here we then format the tool output into a tool message see here that i'm just using the output directly i'm not adding that additional information there we do need to always pass in the tool call id so that our lm knows which output is mapped to which tool i didn't mention this before in in this video at least but that is that's important when we have multiple tool calls happening in parallel because that can happen when we have multiple tool calls happening in parallel let's say we have 10 tool calls all those responses might come back at different times so then the order of those can get messed up so we wouldn't necessarily always see that it's a ai message beginning a tool call followed by the answer to that tool call instead it might be ai message followed by like 10 different tool call responses so you need to have those ids in there okay so then we pass our tool output back to our agent scratchpad or intermediate steps i'm sending a print in here so that we can see what's happening whilst everything is running then we increment this count number we'll talk about that in a moment so coming past that we say okay if the tool name here is final answer that means we should stop okay so once we get to final answer that means we can actually extract our final answer from the the final tool call okay and in this case i'm going to say that we're going to extract the answer from the tool call or the observation we're going to extract the answer that was generated we're going to pass that into our chat history so we're going to have our user message this is the one the user came up with followed by our answer which is just the natural answer field and that's going to be an ai message but then we're actually going to be including all of the information so this is the the answer natural language answer and also the tools used output we're going to be feeding all of that out to some downstream process as preferred so we have that now one thing that can happen if we're not careful is that our agent executor might may run many many many times and particularly if we've done something wrong in our logic as we're building these things it can happen that maybe we've not connected the observation back up into our agent executor logic and in that case what we might see is our agent executor runs again and again and again and i mean that's fine we're going to stop it but if we don't realize straight away and we're doing a lot of lm calls that can get quite expensive quite quickly so what we can do is we can set a limit all right so that's what we've done up here with this max iterations we said okay if we go past three max iterations by default i'm going to say stop all right so that's that's why we have the count here while count is less than the max iterations we're going to keep going once we hit the number of max iterations we stop okay so the while loop will will just stop looping okay so it just protects us in case of that and it also potentially maybe at some point your agent might be doing too much to answer a question so this will force it to stop and just provide an answer although if that does happen i just realize there's a bit of a fault in the logic here if that does happen we wouldn't necessarily have the answer here right so we would probably want to handle that nicely but in this scenario a very simple use case we're not going to see that happening so we initialize our custom agent executor and then we invoke it okay and let's see what happens all right there we go so that just wrapped everything into a single single invoke so everything is handled for us we could say okay what is 10 you know we can modify that and say 7.4 for example and that will go through we'll use the multiply tool instead and then we'll come back to the final answer again okay so we can see that with this custom agent executor we've built an agent and we have a lot more control over everything that is going on in here one thing that we would probably need to add in this scenario is right now i'm assuming that only one tool call will happen at once and it's also why i'm asking here i'm not asking a complicated question because i don't want it to go and try and execute multiple tool calls at once which which can happen so let's just try this okay so this is actually completely fine so this did just execute it one after the other so you can see that when asking this more complicated question it first did the exponentiate tool followed by the add tool and then it actually gave us our final answer which is cool also told us we use both those tools which it did but one thing that we should just be aware of is that from open ai open ai can actually execute multiple tool calls in parallel so by specifying that we're just using this zero here we're actually assuming that we're only ever going to be calling one tool at any one time which is not always going to be the case so you would probably need to add a little bit of extra logic there in case of scenarios if you're building in an agent that is it's likely to be running parallel tool calls but yeah you can see here actually it's completely fine so it's running one after the other okay so with that we've built our agent executor i know there's a lot to that and of course you can just use the very abstract agent executor in linechain but i think it's very good to understand what is actually going on to build our own agent executor in this case and it sets you up nicely for building more complicated or or use case specific agent logic as well so that is it for this chapter you you you

LangChain Agent Executor Deep Dive

Chapters

Transcript