LangChain Agents Deep Dive with GPT 3.5

Larger language models are incredibly powerful, as we've seen, but they lack some of the abilities that even the dumbest computer programs can handle with ease. Logic, calculations, and search are just a few examples of where large language models fail and really dumb computer programs can actually perform very well.

We've been using computers to solve incredibly complex calculations for a very long time. Yet, if we ask GPT-4 to tell us the answer to what is 4.1 multiplied by 7.9, it actually fails. Isn't it fascinating that a simple calculator program can do this, but what is probably the most sophisticated AI program in the world right now that is accessible by us cannot?

And that's not all. If I ask GPT-4, my somewhat overused example by now, of how do I use the LLM chain in LANG chain, it struggles again. It's true that LANG chain was a blockchain project. Yet, there didn't seem to be any LLM chain component nor LANG tokens. These are both hallucinations.

Granted, the reason that GPT-4 is unable to tell us about these is because it hasn't heard of LANG chain, or at least not LANG chain I'm referring to. That is because GPT-4 has no connection to the outside world. The only part of the outside world that GPT-4 has seen is what it saw during its training.

And the training data cutoff of GPT-4 appears to be around September, 2021. With what seem to be major weaknesses in today's large language models, we need to find solutions. One suite of potential solutions to these problems comes in the form of agents. These agents don't just solve many of the problems we saw above, but actually many others as well.

In fact, by using agents, we actually have an almost unlimited upside in the potential of what we can do with large language models. So we're gonna learn what agents are and how we can use them within LANG chain library to superpower our large language models. What we'll do is I'll quickly go through an introduction to agents in LANG chain, and then I'll hand it over to Francisco for more of a deep dive into agents in LANG chain.

So let's jump straight into it. We can think of agents as enabling tools for large language models. Kind of like how a human like I would use a calculator for maths, or I might perform a Google search for information. Agents allow a large language model to do the same thing.

Using agents, a large language model can write and execute Python code. It can perform Google search, and it can even perform SQL queries. Let's start with a very simple example of this. What we're going to do is build a calculator agent that can also handle some general knowledge queries.

Now to use agents in LANG chain, we need three key components. That is a large language model or multiple large language models, a tool that we will be interacting with, and an agent to control that interaction. Let's start by installing LANG chain and initializing our large language model. So we're in Colab here.

There will be a link to this notebook somewhere near the top of the video right now. And we do a pip install LANG chain and OpenAI, because we're going to be using OpenAI's large language models here. But you can replace this. You can use Cohere, you can use Hug and Face models.

And I'm not sure if it's implemented yet, but I'm sure pretty soon there'll probably be Google Palm models in here as well. Okay, cool. So we need to first start by initializing our large language model. We're using text of ng003 here. You can replace that obviously with more recent models as well.

Okay, I've already run this with my API key in there. So move on. Next, we want to initialize what is going to be a calculator tool using the LLM math chain. So the LLM math chain, if you've watched previous videos in this series, you've probably seen it. It is basically a large language model that will call to Python with some code for performing a calculation.

Okay, that's what this is here, right? And what we're doing here is we're formatting it into what we call a tool. Now a tool, it is simply the run, so the functionality of this LLM math chain, plus a name for that chain, for that function, and also a description.

Now this description is useful. This is essentially the prompt. This will be included in the prompt for the agent executor large language model. And we're basically gonna say, look, this is a tool that you can use, right? It's called calculator, and you should use it when you think this is relevant.

Okay, so we're saying it's useful for when you need to answer questions about math, right? That's how the large language model will know when to use this tool. Now, when we're passing those tools to the large language model, we will actually pass a list of tools, okay? So for now, there's just one item in that list, but we can add multiple, as we'll see very soon.

Now, when we're using these tools, we need to initialize an agent that will contain them. It's gonna contain like a base large language model, which is gonna control. It's kind of like the orchestrator, the top level large language model. And there are also these different types of agents, okay?

So this one is a zero-shot react description agent. That means a few things. So zero-shot part of that means that the agent is currently looking at just the current prompt. It doesn't have any memory, right? So if you've got a chatbot, you probably wouldn't necessarily use this one, but we'll cover some alternatives later on.

So for it being zero-shot, it doesn't have that memory and it's focusing on the current action only. React is like an agent framework. The idea behind that is that we, or the agent will reason about whatever prompt has been assigned to it. It will decide on an action to take, and then it will pass some information to that action item.

The action item is the tools that we're using here. And then it will get a response from the action item, that tool, and then repeat this process of reasoning and action. We're not gonna go too into detail on that here. That kind of deserves its own video, I think.

And as for which action to take, the agent is basing that on the description of each of the tools that we have. Okay, so let's initialize that. One other important thing here is these max iterations. So I kind of mentioned just now, it can go through a loop of reason and action.

We add max iterations so that it doesn't just go into an infinite loop of reasoning and making actions and so on and so on, just forever, right? So we kind of limit it to, you can do three of these loops and that cuts off. Okay, cool. So let's ask some questions.

So the zero shot agent, we can see the thought process of the agent here, right? So saying, huh, okay, I need to calculate this expression. So what is the action I need to take? I need to use a calculator. What's the input I'm going to pass to this tool?

It is this here, right? This is our, what we've asked it to calculate here. Okay, and then the response from that agent from the LLM math chain is this here. Okay, this number. Okay, and it says the next thought it has is, I now know the final answer. It is this, right?

And then we finished the chain or the agent execution chain. So the input was this and the output we get is this, right? Now the end user doesn't need to see all of this. The reason that we're seeing all of this is because we set verbose equal to true.

Set this to false, we'll just get the output, right? But obviously we're developing this. We want to see what is actually happening. Now, what you can see, I've already run this. The output, the actual value here is in fact this number here. Okay, so that's much better than when we're using large language models typically, and they just give us like a completely wrong answer.

All right, so that looks good. Let's try something else. Now, here we're kind of using natural language to kind of define the calculation that needs to happen. We say Mary has four apples. Giorgio brings two and a half apple boxes, where an apple box contains eight apples. How many apples do we have, right?

It's just kind of simple logic and calculation here. But naturally it needs to use both the calculator chain, the LLM math chain, and also its own sort of LLM ability to reason about what is happening here. So it starts by I need to figure out how many apples are in the boxes, right?

So it's doing this step by step, which I think is really cool. So it says there's two and a half boxes, and each box contains eight apples. So there's 20 apples in the boxes. And it says I need to add the apples Mary has to the apples in the boxes.

Okay, we do calculate again, it's four plus 20, 24. And now it's like, I now have the final answer. We have 24 apples. Cool, so we see the input and we see the output. We have 24 apples, great. All of this so far that we've done here, a LLM math chain could probably do, it could definitely do this.

And there's probably a good chance it could do this. It wouldn't go through the multiple reasoning steps, but it might be able to do this in a single step. So with that in mind, why don't we just use the LLM math chain? Well, what if we ask this, what is the capital of Norway?

An LLM math chain by itself is not going to be able to answer this because it's going to go to the calculator, right? So we say, I need to look at the answer. Okay, so right now it's, what is it doing? This is something new. That's because it's seeing what is the capital of Norway in the base or the agent executes prompt.

It's saying to answer this prompt, you need to use one of the following tools, but the only tool that we're given is a calculator. The LLM knows that a calculator isn't going to give it the answer to what is the capital of Norway. So it's actually hallucinating and trying to find another tool that it believes would help.

And a lookup tool would help here, but it doesn't actually have access to a lookup tool. It's just kind of imagining that it does, it's hallucinating that it does. Okay, so it's like, okay, the only tool I actually have here is a calculator. And obviously you pass this to the calculator tool and the answer is it's not going to work, right?

So it actually just says action input NA. Like I can't, I don't even know what to give to this calculator. And then we'll get a value error. Okay, right, that's fine. It's kind of expected, we only have a calculator for this agent to use. It's not going to be able to answer this question.

But what if we do want it to be able to answer general knowledge questions as well as perform calculations? Well, okay, in that case, what we need to do is add another tool to the agent's toolbox. What we're going to do is just initialize a simple LLM chain, right?

So to answer what is the capital of Norway, a simple LLM can do that, right? So all we're going to do is create this LLM chain. We're not doing it, we're not really doing anything here. We just got a prompt template, which is just going to pass the query straight to the LLM chain.

Okay, and we're going to call this one the language model. And we're going to say, use this tool for general purpose queries and logic. Okay, cool. And we just add the new tool to our tools list, like this. Right, so now we've got two tools in that tool list.

And we just reinitialize the agent with our two tools, the calculator and the language model. Okay, now we say, what is the capital of Norway? And it should say, ah, okay, I can refer to the language model for this. And we say, what is the capital of Norway? The capital of Norway is Oslo.

Okay, so we get the correct answer this time. And yeah, I now know the final answer, capital of Norway is Oslo. Now we can ask it a math question. Okay, what is this? Okay, so it can answer math questions as well now. So all of a sudden our agent is able to do two completely different things that we would need separate LLM chains or chains for, which I think is really cool.

And these are two super simple examples. Francisco is in a moment going to go through, I think what far more interesting examples and you'll definitely see more of how we can use these agents in LLM chain. But before I finish, there is just one thing I should point out because I kind of missed it for the sake of simplicity, but it wouldn't be fair for me to not mention it, is up here we defined our math tool here, right?

In reality, there is already a math tool and a set of pre-built tools that come with LLM chain. And to use those, we would write something like from LLM chain, agents, import load tools. And from there, we would just do tools, load tools. And then here we actually pass a list of the tools that we would like to load.

So if we just want the LLM math chain again, we'd write that. Now the LLM math chain does require LLM. So we also need to pass LLM in there as well, right? So let me, let's run this again. If we look at tools, name and tools description, (keyboard clicking) we see we have calculator, useful for when you need to answer questions about math.

And if you print out a full tool list, it's going to show you a ton of other things as well, including your open AI API key, which is very useful for when you're trying to show people what is in there. So if we try the same again with this, we can, let's just copy.

And we'll see that we actually get the same thing, we have the same tool name, we have the same description. And if you print out full thing, you'll also see all of the parameters that define the tool and chain being used. So these two bits of code here do the exact same thing, right?

I'm just saying that you can initialize a tool by yourself or you can use the pre-built tools as well. Now, I think I've talked for long enough. What we'll do is we'll pass it over to Francisco, who's going to take us through these tools and agents in a lot more detail.

So over to Francisco. - Thanks James for that introduction. And now we will be diving into agents. Agents are arguably the most important building block in Lankchain. So it's really important that we get them right. And we'll be seeing a few examples to really understand how they work. As always, we need to initialize our OpenAI LLM and we will get into the definition here.

So the official definition is that agents use LLMs to determine which actions to take and in what order. And an action can be using a tool or returning to the user. But if we think about it more intuitively, what an agent does is it applies reasoning to use several tools collectively and in unison to build an answer for the user.

So this is the key behind agents. It's the agents are reasoning about what tools they need to use and how they need to use them and how they need to combine their outputs to actually give a right answer. And this is a really, really powerful framework as we will see in the next examples.

So now we will be creating a database. It's not really important to understand exactly what we're doing to create this database. The important thing here is that we're going to build an SQL database with one table, which is a table with stocks. And we will add some observations because we will want our agent to interact with these observations.

Here we have a few stocks, two stocks actually, with the different prices in different date times. And the important part where we will be creating the tool for the agent to use. So here we create a chain which uses the database we just created. This is the engine we just created over here.

And we will create a database chain from that engine. Now we will build a tool. And here, just a small definition of what a tool is. A tool is a function or a method that the agent can use when it thinks it is necessary. So how will the agent know if it's necessary?

Well, we're giving it a description here. So we're telling the agent when it should be used. Here we're giving the agent the function it should run. And here's the name. So the agent will ask our chain for a question about stocks and prices. And our chain will answer that question using the data from the database.

So just a few clarifications before we dive into our first agent type. We will see different agent types in this deep dive. But each one has three variables. So we have to define the tools. We want to give the agent the LLM because all agents are LLM based. And the agent type.

So what type of agent we want to use. We will start with this agent which is a zero-shot React agent. And we will first need to initialize our agent. And this agent will be able to basically reason about our question, gather information from tools, and answer. That's what the zero-shot agent does.

So here we will load one tool, which is the LLM math tool, which we already saw with James. And we will append the SQL tool we saw before. And as the name suggests here, we will use this agent to perform zero-shot tasks. So we will not have many interactions with this agent, but only one.

Or at least we'll have different interactions, but they will be isolated from each other. And we will be asking questions that can be answered with the tools, and the agent will help us answer them. One note here, it's important to have in mind that we should always set the max iterations.

So the max iterations with React agents basically means how many thoughts it can have about our question and about using tools. So basically what this agent does is it thinks about what it needs to do, and then it does it. And one of the actions it can do is refer to one of the tools.

So just to avoid the agent getting into an infinite loop and using tools indefinitely, we should always set a max iterations that we're comfortable with. And depending on the use case, that might change, but it's something that is useful to take into account. Here, we will set max iterations to three.

All right, so here we will create our agent, and we will ask it a very complex question, which is what is the multiplication of the ratio between stock prices for ABC and XYZ in two different dates? So it involves quite a lot of steps, and we will try to understand what the agent is doing here.

So first it needs to compare the prices, or it knows it needs to compare these prices on two different days. It queries the SQL database for the 3rd and the 4th of January with the prices for ABC and XYZ. Here, we can see that the query is generated, and we get the results.

And then the agent is getting the answer from this tool, which returns the actual data that was requested, which is this piece here. Now, the agent, with this information, is determining that these are the right prices, and now it needs to make a calculation. So how will it make this calculation with the calculator?

Let's see what it's doing. It's calculating the ratio first of the two prices on each day, and then when it has the two ratios with the calculator, it's using the calculator another time to calculate the multiplication between the two ratios. So this framework is really powerful, because as I said before, it's combining reasoning with the tools, and in several steps, it's finding and converging to the right answer by using these tools.

So this is the key takeaway here for this Zero-Shot React description agent. We can see the prompt, just as a quick pass here. What we're teaching it to do is we're sending it, obviously, the tools that it needs to use, and then we're asking it to use this question, thought, action, action, input, observation framework, and this framework can be repeated n times until it knows the final answer, or it reaches max iterations, which we referred to earlier.

This is a very powerful framework. You can check it out in the Miracle paper, which we'll link below, but the important thing here to know is that it enables us to combine reasoning with tools, and this is the type of things that you can do with agents. The level of abstraction is much higher than using a tool in isolation.

So basically, the LLM now has the ability to reason how to best use this tool, and basically, another thing that is important here is the agent's scratchpad within the prompt, and that is where every thought and action that the agent has already performed will be appended, and thus, the agent will know at each point in time what it has found out until that moment, and will be able to continue that thought process from there.

So that's where the agent notes its previous thoughts during the thought process. Okay, so now we're ready for our second type of agent. This type of agent is really similar to the first one, so let's take a look. This is called conversational react, and this is basically the last agent, but with memory.

So we can interact with it in several interactions, and ask it questions from things that we have already said. It is really useful to have a chatbot. Basically, it's the basis for chatbots in Langchain, so it's really useful. Again, we load the same tools here. We will add memory, as we said, so we will ask it a similar question, a bit different, a bit simpler, than we asked the previous agent.

So we will ask it what the ratio of stock prices is on January the 1st for these two stocks. Let's see what it answers. So it's getting the stock data, and it calculated without the calculator, since it's a simple calculation, the actual ratio we need. So let's see the prompt here.

So we will see the prompt is quite similar, but it includes this prefix, we could say, where we are telling the agent that it is an assistant, that it can assist with a wide range of tasks, and basically that it's here to assist and to answer questions or have conversations.

So this is the main thing behind this agent. And also, we can see here we have a chat history variable, where we will be including the memory for this agent. So these are the main differences. If we ask it exactly the same question as we asked the previous agent, let's see what happens.

It's using the chain, it's getting the data, but we are getting the right answer, but the action input for the chain was already to get the ratio. And then it didn't use the tool to multiply, it just multiplied on its own, right, without using a tool. So it made this decision of not using the calculator, whereas our previous agent had decided to use the calculator here.

And these two agents sometimes don't behave exactly the same because the prompts are different. Perhaps for this agent, we're telling it that it can solve a wide range of tasks, so maybe it's getting the confidence to try out some math there. But in essence, what they're doing is quite similar, but this agent is including a conversational aspect and a memory aspect.

All right, now we will see our final two agents. And this first one is called the React Docs Store agent, and it is made to interact with a document store. So let's say that we want to interact with Wikipedia, and we will need to send it two tools, one for searching articles, and the other one is for looking up specific terms in the article it found.

And this is, if we think about it, what we do when we search a document store like Wikipedia. We search for an article that might have the answer to our question, and then we search within the article for the specific paragraph or snippet where the question is. So we will do exactly that.

We will initialize this agent here, and we will run what were Archimedes' last words. Let's see. So it's entering the chain. It searches for Archimedes. That's interesting. And let's see. So it has its first observation. This observation is the first paragraph of the article. It didn't find the answer there, and then it looks up last words, and it finds the answer within that document.

So that's basically how it works. We can see more about this in the prompt. There's a few examples. We will not print it because it's really large, but you can do so. And here is the paper for this agent as well. So yeah, this agent is basically useful for interacting with very large document source.

And we can think here that this is basically the same thing that we would do. So the agent does it for us, basically. And finally, we have this last agent, which is called the self-ask with search. And basically, here the name is telling us a lot about how this agent works.

It's asking questions, which are not the user question, obviously. But they are intermediate questions that it's asking to understand all the pieces of data it needs to build the last answer it needs to give to the user. So why will it ask all these follow-up questions on the user's original query?

Because it will use a search engine to find the intermediate answers, and then build to the last final answer to the user. So as we can see here, we need to send it one tool, which is called the intermediate answer tool. And it will search. It must be some kind of search.

Here, we are using Google Search. We will not showcase this functionality. For that, you need an API key by using this API wrapper. But we will initialize this agent, and we will see its prompt, which is basically enough to understand how it works. So here, we can see how it works.

So the original question would be, who lived longer, Muhammad Ali or Alan Turing? And the follow-up questions needed here, it determines that it needs follow-up questions, and it starts asking them. So how old was Muhammad Ali when he died? Intermediate answer found by searching. Muhammad Ali was 74 years old.

How old was Alan Turing? Alan Turing was 41 years old. So the final answer is Muhammad Ali. So this is the kind of logic that the agent follows. And to get these intermediate answers, it needs some kind of search tool. This is one option that LangChain provides. There might be others, but it needs to be able to search.

And that's the main characteristic of this agent, which is that it gets intermediate answers to follow-up questions by searching. So you have the paper here also, if you want to dive deeper. And this is it. We're wrapping up for generic agents. These are the main agents in LangChain. There are others as well.

And as I mentioned briefly earlier, you can check out agent toolkits, for example. There will be others in the future too. It's really worthwhile to see the docs and follow closely the developments that there are in LangChain. And there are things you can do. You can create your own agent.

You can use agents with several other tools. And another thing worth mentioning is that you can use a tracing UI tool that is within LangChain, which will allow you to understand within a beautiful UI how the agent is thinking, or what different calls to different LLMs it did within its thought process.

So that is really convenient when you're using complex agents with several tools. And it might be tricky to track what whole thought process and what intermediate answers did it get. So that is really recommended if you are using agents in a little bit more complex scenarios. So this is it for agents here in LangChain series.

And I hope you really enjoyed this topic. I think, again, this is probably the most important topic in LangChain and the most interesting one. So yeah, see you in the next one. you

LangChain Agents Deep Dive with GPT 3.5 — LangChain #7

Chapters

Transcript