back to indexLangChain Agents Deep Dive with GPT 3.5 — LangChain #7
Chapters
0:0 Why LLMs need tools
2:35 What are agents?
3:33 LangChain agents in Python
4:25 Initializing a calculator tool
5:57 Initializing a LangChain agent
8:1 Asking our agent some questions
12:39 Adding more tools to agents
14:29 Custom and prebuilt tools
16:40 Francisco's definition of agents
17:52 Creating a SQL DB tool
19:49 Zero shot ReAct agents in LangChain
24:18 Conversational ReAct agent in LangChain
26:57 ReAct docstore agent in LangChain
28:31 Self-ask with search agent
30:33 Final thoughts on LangChain agents
00:00:00.000 |
Larger language models are incredibly powerful, 00:00:40.360 |
that a simple calculator program can do this, 00:00:42.560 |
but what is probably the most sophisticated AI program 00:01:03.080 |
It's true that LANG chain was a blockchain project. 00:01:06.280 |
Yet, there didn't seem to be any LLM chain component 00:01:32.180 |
that GPT-4 has seen is what it saw during its training. 00:01:50.600 |
One suite of potential solutions to these problems 00:01:57.500 |
These agents don't just solve many of the problems 00:02:00.540 |
we saw above, but actually many others as well. 00:02:17.220 |
and how we can use them within LANG chain library 00:02:30.380 |
for more of a deep dive into agents in LANG chain. 00:02:46.480 |
or I might perform a Google search for information. 00:03:04.260 |
Let's start with a very simple example of this. 00:03:08.620 |
What we're going to do is build a calculator agent 00:03:12.880 |
that can also handle some general knowledge queries. 00:03:43.020 |
somewhere near the top of the video right now. 00:03:45.020 |
And we do a pip install LANG chain and OpenAI, 00:03:54.220 |
You can use Cohere, you can use Hug and Face models. 00:04:03.540 |
there'll probably be Google Palm models in here as well. 00:04:19.260 |
Okay, I've already run this with my API key in there. 00:04:33.460 |
if you've watched previous videos in this series, 00:04:54.140 |
is we're formatting it into what we call a tool. 00:05:06.180 |
plus a name for that chain, for that function, 00:05:25.820 |
look, this is a tool that you can use, right? 00:05:30.100 |
and you should use it when you think this is relevant. 00:05:36.340 |
for when you need to answer questions about math, right? 00:05:50.780 |
So for now, there's just one item in that list, 00:05:53.860 |
but we can add multiple, as we'll see very soon. 00:05:59.620 |
we need to initialize an agent that will contain them. 00:06:03.900 |
It's gonna contain like a base large language model, 00:06:13.140 |
And there are also these different types of agents, okay? 00:06:17.940 |
So this one is a zero-shot react description agent. 00:06:35.860 |
you probably wouldn't necessarily use this one, 00:06:41.500 |
So for it being zero-shot, it doesn't have that memory 00:06:45.300 |
and it's focusing on the current action only. 00:07:04.580 |
and then it will pass some information to that action item. 00:07:08.660 |
The action item is the tools that we're using here. 00:07:12.020 |
And then it will get a response from the action item, 00:07:20.500 |
We're not gonna go too into detail on that here. 00:07:23.860 |
That kind of deserves its own video, I think. 00:07:39.420 |
One other important thing here is these max iterations. 00:07:44.700 |
it can go through a loop of reason and action. 00:07:48.620 |
We add max iterations so that it doesn't just go 00:07:51.540 |
into an infinite loop of reasoning and making actions 00:07:58.260 |
you can do three of these loops and that cuts off. 00:08:07.820 |
we can see the thought process of the agent here, right? 00:08:12.260 |
So saying, huh, okay, I need to calculate this expression. 00:08:20.460 |
What's the input I'm going to pass to this tool? 00:08:26.020 |
This is our, what we've asked it to calculate here. 00:08:37.860 |
Okay, and it says the next thought it has is, 00:08:49.420 |
So the input was this and the output we get is this, right? 00:08:55.060 |
Now the end user doesn't need to see all of this. 00:09:02.340 |
Set this to false, we'll just get the output, right? 00:09:11.660 |
Now, what you can see, I've already run this. 00:09:23.020 |
than when we're using large language models typically, 00:09:26.900 |
and they just give us like a completely wrong answer. 00:09:33.540 |
Now, here we're kind of using natural language 00:09:37.140 |
to kind of define the calculation that needs to happen. 00:09:52.060 |
It's just kind of simple logic and calculation here. 00:09:57.220 |
But naturally it needs to use both the calculator chain, 00:10:01.900 |
the LLM math chain, and also its own sort of LLM ability 00:10:24.700 |
And it says I need to add the apples Mary has 00:10:29.380 |
Okay, we do calculate again, it's four plus 20, 24. 00:10:33.980 |
And now it's like, I now have the final answer. 00:10:39.180 |
Cool, so we see the input and we see the output. 00:10:53.740 |
And there's probably a good chance it could do this. 00:10:57.140 |
It wouldn't go through the multiple reasoning steps, 00:10:59.980 |
but it might be able to do this in a single step. 00:11:08.420 |
Well, what if we ask this, what is the capital of Norway? 00:11:18.580 |
because it's going to go to the calculator, right? 00:11:29.300 |
That's because it's seeing what is the capital of Norway 00:11:41.420 |
but the only tool that we're given is a calculator. 00:11:51.300 |
and trying to find another tool that it believes would help. 00:11:58.300 |
but it doesn't actually have access to a lookup tool. 00:12:06.540 |
the only tool I actually have here is a calculator. 00:12:09.860 |
And obviously you pass this to the calculator tool 00:12:11.900 |
and the answer is it's not going to work, right? 00:12:27.820 |
we only have a calculator for this agent to use. 00:12:31.140 |
It's not going to be able to answer this question. 00:12:32.900 |
But what if we do want it to be able to answer 00:12:58.540 |
So all we're going to do is create this LLM chain. 00:13:02.260 |
We're not doing it, we're not really doing anything here. 00:13:11.460 |
Okay, and we're going to call this one the language model. 00:13:21.780 |
And we just add the new tool to our tools list, like this. 00:13:25.940 |
Right, so now we've got two tools in that tool list. 00:13:29.980 |
And we just reinitialize the agent with our two tools, 00:13:35.900 |
Okay, now we say, what is the capital of Norway? 00:13:48.940 |
Okay, so we get the correct answer this time. 00:13:58.580 |
Okay, so it can answer math questions as well now. 00:14:08.180 |
that we would need separate LLM chains or chains for, 00:14:17.340 |
Francisco is in a moment going to go through, 00:14:32.980 |
because I kind of missed it for the sake of simplicity, 00:14:35.620 |
but it wouldn't be fair for me to not mention it, 00:14:38.220 |
is up here we defined our math tool here, right? 00:14:48.940 |
and a set of pre-built tools that come with LLM chain. 00:14:53.940 |
And to use those, we would write something like 00:15:01.940 |
And from there, we would just do tools, load tools. 00:15:11.620 |
And then here we actually pass a list of the tools 00:15:22.740 |
So we also need to pass LLM in there as well, right? 00:15:31.980 |
If we look at tools, name and tools description, 00:15:43.780 |
useful for when you need to answer questions about math. 00:15:48.380 |
it's going to show you a ton of other things as well, 00:15:54.380 |
which is very useful for when you're trying to show people 00:16:05.740 |
And we'll see that we actually get the same thing, 00:16:07.380 |
we have the same tool name, we have the same description. 00:16:17.500 |
So these two bits of code here do the exact same thing, 00:16:22.100 |
I'm just saying that you can initialize a tool by yourself 00:16:30.980 |
What we'll do is we'll pass it over to Francisco, 00:16:33.340 |
who's going to take us through these tools and agents 00:16:48.940 |
So it's really important that we get them right. 00:16:55.700 |
As always, we need to initialize our OpenAI LLM 00:17:04.660 |
So the official definition is that agents use LLMs 00:17:08.540 |
to determine which actions to take and in what order. 00:17:11.380 |
And an action can be using a tool or returning to the user. 00:17:23.580 |
to use several tools collectively and in unison 00:17:34.140 |
It's the agents are reasoning about what tools 00:17:38.060 |
they need to use and how they need to use them 00:17:45.740 |
And this is a really, really powerful framework 00:17:56.220 |
exactly what we're doing to create this database. 00:18:12.220 |
Here we have a few stocks, two stocks actually, 00:18:15.220 |
with the different prices in different date times. 00:18:18.340 |
And the important part where we will be creating 00:18:30.820 |
This is the engine we just created over here. 00:18:35.820 |
And we will create a database chain from that engine. 00:18:43.980 |
And here, just a small definition of what a tool is. 00:18:47.980 |
A tool is a function or a method that the agent can use 00:18:55.740 |
So how will the agent know if it's necessary? 00:19:00.460 |
So we're telling the agent when it should be used. 00:19:03.300 |
Here we're giving the agent the function it should run. 00:19:29.620 |
We will see different agent types in this deep dive. 00:19:51.860 |
And we will first need to initialize our agent. 00:19:57.540 |
And this agent will be able to basically reason 00:20:00.980 |
about our question, gather information from tools, 00:20:15.780 |
And we will append the SQL tool we saw before. 00:20:22.740 |
we will use this agent to perform zero-shot tasks. 00:20:24.900 |
So we will not have many interactions with this agent, 00:20:29.260 |
Or at least we'll have different interactions, 00:20:40.860 |
One note here, it's important to have in mind 00:20:44.180 |
that we should always set the max iterations. 00:20:49.820 |
basically means how many thoughts it can have 00:21:06.820 |
So just to avoid the agent getting into an infinite loop 00:21:16.100 |
And depending on the use case, that might change, 00:21:18.580 |
but it's something that is useful to take into account. 00:21:30.300 |
which is what is the multiplication of the ratio 00:21:32.220 |
between stock prices for ABC and XYZ in two different dates? 00:21:39.020 |
and we will try to understand what the agent is doing here. 00:21:48.620 |
It queries the SQL database for the 3rd and the 4th of January 00:21:57.020 |
Here, we can see that the query is generated, 00:22:02.340 |
And then the agent is getting the answer from this tool, 00:22:06.660 |
which returns the actual data that was requested, 00:22:14.740 |
is determining that these are the right prices, 00:22:23.300 |
So how will it make this calculation with the calculator? 00:22:32.780 |
and then when it has the two ratios with the calculator, 00:22:37.580 |
to calculate the multiplication between the two ratios. 00:22:49.220 |
and converging to the right answer by using these tools. 00:22:58.020 |
We can see the prompt, just as a quick pass here. 00:23:08.020 |
and then we're asking it to use this question, 00:23:11.180 |
thought, action, action, input, observation framework, 00:23:20.660 |
or it reaches max iterations, which we referred to earlier. 00:23:32.740 |
is that it enables us to combine reasoning with tools, 00:23:44.580 |
So basically, the LLM now has the ability to reason 00:23:50.900 |
and basically, another thing that is important here 00:23:59.780 |
that the agent has already performed will be appended, 00:24:02.700 |
and thus, the agent will know at each point in time 00:24:12.260 |
So that's where the agent notes its previous thoughts 00:24:17.700 |
Okay, so now we're ready for our second type of agent. 00:24:22.020 |
This type of agent is really similar to the first one, 00:24:27.820 |
and this is basically the last agent, but with memory. 00:24:32.420 |
So we can interact with it in several interactions, 00:24:36.260 |
and ask it questions from things that we have already said. 00:24:43.980 |
Basically, it's the basis for chatbots in Langchain, 00:25:01.180 |
So we will ask it what the ratio of stock prices is 00:25:27.620 |
where we are telling the agent that it is an assistant, 00:25:31.380 |
that it can assist with a wide range of tasks, 00:25:36.540 |
and to answer questions or have conversations. 00:25:43.380 |
And also, we can see here we have a chat history variable, 00:25:47.660 |
where we will be including the memory for this agent. 00:25:57.500 |
as we asked the previous agent, let's see what happens. 00:26:23.660 |
So it made this decision of not using the calculator, 00:26:31.780 |
And these two agents sometimes don't behave exactly the same 00:26:47.860 |
But in essence, what they're doing is quite similar, 00:26:52.460 |
but this agent is including a conversational aspect 00:26:57.420 |
All right, now we will see our final two agents. 00:27:00.980 |
And this first one is called the React Docs Store agent, 00:27:03.660 |
and it is made to interact with a document store. 00:27:07.220 |
So let's say that we want to interact with Wikipedia, 00:27:13.340 |
one for searching articles, and the other one 00:27:16.100 |
is for looking up specific terms in the article it found. 00:27:22.140 |
what we do when we search a document store like Wikipedia. 00:27:24.860 |
We search for an article that might have the answer 00:27:27.500 |
to our question, and then we search within the article 00:27:30.500 |
for the specific paragraph or snippet where the question is. 00:27:38.820 |
and we will run what were Archimedes' last words. 00:27:52.860 |
This observation is the first paragraph of the article. 00:28:09.900 |
We will not print it because it's really large, 00:28:13.340 |
And here is the paper for this agent as well. 00:28:20.100 |
for interacting with very large document source. 00:28:23.020 |
And we can think here that this is basically the same thing 00:28:36.780 |
And basically, here the name is telling us a lot 00:28:42.180 |
It's asking questions, which are not the user question, 00:28:50.060 |
that it's asking to understand all the pieces of data 00:28:57.700 |
So why will it ask all these follow-up questions 00:29:08.660 |
and then build to the last final answer to the user. 00:29:15.020 |
one tool, which is called the intermediate answer tool. 00:29:26.420 |
For that, you need an API key by using this API wrapper. 00:29:33.660 |
and we will see its prompt, which is basically enough 00:29:45.340 |
who lived longer, Muhammad Ali or Alan Turing? 00:29:50.260 |
it determines that it needs follow-up questions, 00:30:06.620 |
So this is the kind of logic that the agent follows. 00:30:16.500 |
There might be others, but it needs to be able to search. 00:30:21.220 |
of this agent, which is that it gets intermediate answers 00:30:28.700 |
So you have the paper here also, if you want to dive deeper. 00:30:43.620 |
you can check out agent toolkits, for example. 00:31:15.660 |
how the agent is thinking, or what different calls 00:31:19.580 |
to different LLMs it did within its thought process. 00:31:24.380 |
you're using complex agents with several tools. 00:31:26.980 |
And it might be tricky to track what whole thought process 00:31:37.900 |
are using agents in a little bit more complex scenarios. 00:31:43.940 |
So this is it for agents here in LangChain series. 00:31:51.740 |
I think, again, this is probably the most important topic