back to indexBuild Conversational Agents with Vector DBs - LangChain #9
Chapters
0:0 LangChain Agents with Vector DBs
1:27 Code Setup and Data Prep
3:14 Vector DB Pipeline Setup
5:35 Indexing with OpenAI and Pinecone
7:53 Querying via LangChain
9:33 Building the Retrieval Augmented Chatbot
13:52 Using the Conversational Agent Chatbot
17:17 Real-world Usage of this Method
00:00:00.000 |
In one of the early videos in this series on Langtrain, 00:00:10.520 |
from that is how can the large language model know 00:00:14.680 |
when to actually search through the vector database? 00:00:21.400 |
it doesn't need to refer to any external knowledge. 00:00:24.000 |
At that point, there's no reason for the model 00:00:27.480 |
to actually go to a vector database and retrieve information. 00:00:35.400 |
where we're not always querying our vector database? 00:00:41.520 |
The first option is you just actually stick with that 00:00:47.540 |
where if a retrieved context is below that threshold, 00:00:51.900 |
you just don't include it as the added information 00:00:54.760 |
within your query to the large language model. 00:01:00.160 |
which is what we're going to talk about today 00:01:02.360 |
is actually using a retrieval tool as part of a AI agent. 00:01:07.360 |
So if you have been following along with this series, 00:01:12.160 |
we're essentially going to take what we spoke about last, 00:01:15.080 |
which is agents and what we spoke about earlier 00:01:18.080 |
in the series, which is retrieval augmentation. 00:01:22.600 |
So, I mean, let's jump straight into the code. 00:01:26.020 |
So we have this notebook that you can follow along with. 00:01:30.720 |
somewhere near the top of the video right now. 00:01:33.360 |
And the first thing we need to do is obviously 00:01:37.160 |
So we have OpenAI, Pinecone, LangChain, TicToken, 00:01:44.840 |
I've already run it, so I'm not going to run it again. 00:01:46.960 |
First thing I'm going to do is actually load our dataset. 00:01:51.160 |
Now, this is the dataset we're going to be using 00:02:08.680 |
and we can focus more on the agent side of things 00:02:13.760 |
So yeah, using the Sanford question answering dataset. 00:02:17.240 |
And within this, so the reason that we don't need to do 00:02:20.640 |
any of this data pre-processing that we usually do 00:02:25.840 |
So just conveying it into a pandas data frame 00:02:40.000 |
Typically, what you'll find is if you're, for example, 00:02:43.240 |
working with PDFs and you want to store those 00:02:56.640 |
But one thing we do need to do is actually deduplicate this 00:03:07.600 |
We're just going to keep the first of each one of those 00:03:11.280 |
And now you can see that the contexts are now different. 00:03:15.560 |
So I mean, that's the data prep side of things done. 00:03:28.000 |
So we're going to be using TextEmbeddingArda002 00:03:31.680 |
Again, you can use like any embedding model you want. 00:03:39.080 |
Okay, so I'm going to enter my API key there. 00:03:41.640 |
So the API key can go from platform.openai.com. 00:03:49.880 |
So I'm going to go into my dashboard and grab those. 00:03:57.360 |
and what I want to do is just copy the key value 00:04:20.000 |
Cool, so this is going to first initialize that index. 00:04:26.200 |
Sorry, this initializes the connection to Pinecone, 00:04:59.560 |
So again, this is something specific to each model. 00:05:24.240 |
And then we're going to describe the index stats 00:05:28.320 |
And we should see that total vector count is zero 00:05:30.400 |
'cause we haven't added anything in there yet. 00:05:43.640 |
Now, we do this directly with the Pinecone client 00:05:55.080 |
So I find this is just the better way of doing it. 00:06:39.480 |
and then we embed those using text embedding R002. 00:06:49.360 |
We usually call them either context or documents 00:07:07.200 |
So as unique ID for every item, that's important. 00:07:10.160 |
Otherwise we're going to overwrite records within Pinecone. 00:07:32.320 |
and that record's ID, embedding, and metadata. 00:07:36.000 |
Okay, so I will fast forward to let this finish. 00:07:44.280 |
and we should see now that it has been populated 00:07:48.360 |
So we have almost 19,000 vectors in there now, or records. 00:07:53.800 |
So up to here, we've been using the Pinecone client 00:07:59.640 |
than using the implementation in LineChain at the moment, 00:08:03.080 |
but now we're going to switch back to LineChain 00:08:05.360 |
because we want to be able to use the conversational agent 00:08:09.120 |
and all the other tooling that comes with LineChain. 00:08:13.360 |
So what we're going to do is reinitialize our index, 00:08:16.640 |
and we're going to use a normal index and not gRPC 00:08:20.040 |
because that is what is implemented with LineChain. 00:08:27.680 |
which is basically LineChain's version of the index 00:08:33.200 |
It just includes the embedding in there as well. 00:08:36.440 |
And we'll also, so the text field, that's important. 00:08:46.040 |
So for us, that is text because we set it here. 00:08:52.600 |
And like we did before in the previous retrieval video, 00:09:00.360 |
by using the similarity search method with our query here. 00:09:15.880 |
passages, context, whatever you want to call them. 00:09:18.080 |
And we can see that we get, so this is a document here. 00:09:21.960 |
So we have, I think that's probably relevant. 00:09:34.560 |
So let's now move on to the agent part of things. 00:09:38.800 |
Okay, so our conversational agent needs our chat, 00:09:44.000 |
conversational memory, and the retrieval QA chain. 00:10:19.440 |
because that is what the memory is referred to 00:10:24.440 |
in the, I think the conversational agent component. 00:10:29.320 |
So whenever you're using conversational agent, 00:10:34.760 |
We're going to remember the previous five interactions 00:10:38.240 |
and yeah, that's our conversational memory, okay? 00:10:41.440 |
So after that, we set up our retrieval Q&A chain. 00:10:50.280 |
so that basically means when you are retrieving the, 00:10:54.320 |
I think the three items from the vector store, 00:11:03.120 |
So we're gonna kind of like stuff them all into the context 00:11:05.920 |
rather than doing like any fancy summarization 00:11:16.680 |
It was just a slightly different class or object. 00:11:21.680 |
And then with those, we can generate our answer, okay? 00:11:24.360 |
So we run and we're using the same query here. 00:11:34.480 |
We come down and the answer is the College of Engineering 00:11:38.520 |
was established in 1920 at the University of Notre Dame. 00:11:46.360 |
by our GPT 3.5 turbo model based on the context 00:11:51.360 |
that we retrieved from our vector store, okay? 00:11:54.760 |
So basically based on these three documents here. 00:11:57.920 |
Cool, now that's good, but that isn't the whole thing yet. 00:12:02.920 |
That's actually just a retrieval Q&A chain, okay? 00:12:11.000 |
we actually need to convert our retrieval Q&A chain 00:12:20.440 |
We get a tools list, which is what we'll pass to our agent. 00:12:27.520 |
But we're only actually using one tool in this case. 00:12:52.240 |
which tool to use if you have multiple tools, 00:13:02.320 |
Okay, which I think is a pretty clear description 00:13:07.840 |
And yeah, so from there, we initialize our agent. 00:13:16.520 |
but those just means we're going to get a load 00:13:25.720 |
the agent can use, basically go through a tool usage loop, 00:13:32.920 |
Otherwise it can, what can happen is it can keep going 00:13:40.960 |
The model is going to decide when to stop generation. 00:13:45.520 |
And we also need to include our conversational memory 00:13:50.640 |
Okay, we run that and now our agent is ready to use. 00:13:54.680 |
So we can, let's pass in that query that we used before. 00:13:57.800 |
So the, this one was the, let me run it and we'll see. 00:14:04.680 |
the generated question that the LLM is passing to our tool. 00:14:15.800 |
Basically, sometimes the agent will reformat this 00:14:24.760 |
So our question is, when was the College of Engineering 00:14:34.120 |
So the observation is the College of Engineering 00:14:37.280 |
at the University of Notre Dame was established in 1920. 00:14:43.880 |
I think I have enough information to answer this question. 00:14:48.400 |
And then the final answer it returns is this. 00:15:08.440 |
And you see, it doesn't decide to use a knowledge base here. 00:15:29.360 |
and to pass in University of Notre Dame facts. 00:15:32.160 |
So you can see here that it's not just passing in 00:15:39.560 |
that it thinks will basically return better results. 00:15:45.520 |
obviously it got some of the context that we saw before. 00:15:49.680 |
And based on the information in those contexts, 00:15:52.400 |
it's come up with all of this, all these facts, right? 00:16:33.520 |
Like, can you summarize these facts in two short sentences? 00:16:38.880 |
we're just kind of referring to the previous interaction. 00:16:42.320 |
So let's run that and see what it comes up with. 00:16:47.560 |
I'm just gonna read it here 'cause it's a little bit easier. 00:16:51.840 |
The University of Notre Dame is a Catholic research 00:16:54.520 |
university located in South Bend, Indiana, USA, 00:16:58.760 |
is consistently ranked among the top 20 universities 00:17:03.520 |
in the United States and as a major global university. 00:17:20.880 |
So that is how we would implement a retrieval, 00:17:29.480 |
So really kind of taking the previous few videos 00:17:35.240 |
So, you know, we took the retrieval augmentation, 00:17:37.920 |
we took an agent and we created a tool that, you know, 00:17:41.840 |
could allow us to access our external knowledge base 00:17:54.200 |
I'm already seeing actually quite a lot in many use cases. 00:18:08.160 |
especially if you're building tools or applications 00:18:15.000 |
this is probably something that you're gonna come across 00:18:20.160 |
I hope all of this has been useful and interesting.