back to index

Build Conversational Agents with Vector DBs - LangChain #9


Chapters

0:0 LangChain Agents with Vector DBs
1:27 Code Setup and Data Prep
3:14 Vector DB Pipeline Setup
5:35 Indexing with OpenAI and Pinecone
7:53 Querying via LangChain
9:33 Building the Retrieval Augmented Chatbot
13:52 Using the Conversational Agent Chatbot
17:17 Real-world Usage of this Method

Whisper Transcript | Transcript Only Page

00:00:00.000 | In one of the early videos in this series on Langtrain,
00:00:03.180 | we talked about retrieval augmentation.
00:00:05.760 | And one of the most commonly asked questions
00:00:10.520 | from that is how can the large language model know
00:00:14.680 | when to actually search through the vector database?
00:00:17.880 | Because obviously if you're just chatting
00:00:20.400 | with the chat button,
00:00:21.400 | it doesn't need to refer to any external knowledge.
00:00:24.000 | At that point, there's no reason for the model
00:00:27.480 | to actually go to a vector database and retrieve information.
00:00:31.600 | So how can we make it an optional thing
00:00:35.400 | where we're not always querying our vector database?
00:00:39.200 | Well, I mean, there's kind of two options.
00:00:41.520 | The first option is you just actually stick with that
00:00:44.600 | and you just set like a similarity threshold
00:00:47.540 | where if a retrieved context is below that threshold,
00:00:51.900 | you just don't include it as the added information
00:00:54.760 | within your query to the large language model.
00:00:58.960 | And the second option,
00:01:00.160 | which is what we're going to talk about today
00:01:02.360 | is actually using a retrieval tool as part of a AI agent.
00:01:07.360 | So if you have been following along with this series,
00:01:12.160 | we're essentially going to take what we spoke about last,
00:01:15.080 | which is agents and what we spoke about earlier
00:01:18.080 | in the series, which is retrieval augmentation.
00:01:20.400 | And we're going to put them both together.
00:01:22.600 | So, I mean, let's jump straight into the code.
00:01:26.020 | So we have this notebook that you can follow along with.
00:01:29.160 | There'll be a link to this
00:01:30.720 | somewhere near the top of the video right now.
00:01:33.360 | And the first thing we need to do is obviously
00:01:35.200 | install any prerequisite libraries.
00:01:37.160 | So we have OpenAI, Pinecone, LangChain, TicToken,
00:01:40.880 | and the HungFace datasets.
00:01:43.000 | So, yeah, we run those.
00:01:44.840 | I've already run it, so I'm not going to run it again.
00:01:46.960 | First thing I'm going to do is actually load our dataset.
00:01:51.160 | Now, this is the dataset we're going to be using
00:01:53.640 | to create our knowledge base.
00:01:57.040 | It's a basically pre-processed dataset.
00:02:00.920 | We won't need to do any of the chunking
00:02:04.400 | or anything that we would usually do.
00:02:06.040 | And that's on purpose.
00:02:07.240 | I want this to be pretty simple
00:02:08.680 | and we can focus more on the agent side of things
00:02:11.600 | rather than the data prep side of things.
00:02:13.760 | So yeah, using the Sanford question answering dataset.
00:02:17.240 | And within this, so the reason that we don't need to do
00:02:20.640 | any of this data pre-processing that we usually do
00:02:23.400 | is if we just run this.
00:02:25.840 | So just conveying it into a pandas data frame
00:02:28.920 | is because we have these contexts
00:02:31.440 | and each context is roughly a paragraph
00:02:34.160 | or a little bit more of text.
00:02:36.520 | And that's what we're going to be indexing
00:02:38.560 | within our knowledge base.
00:02:40.000 | Typically, what you'll find is if you're, for example,
00:02:43.240 | working with PDFs and you want to store those
00:02:45.040 | in your knowledge base,
00:02:46.400 | you'll need to chunk that long piece of text
00:02:49.720 | into smaller chunks.
00:02:51.720 | This basically is already chunked.
00:02:54.400 | It just makes our life a little bit easier.
00:02:56.640 | But one thing we do need to do is actually deduplicate this
00:02:59.480 | because we have many of the same contexts
00:03:01.560 | over and over again.
00:03:02.840 | So we just do that here.
00:03:05.080 | So just drop duplicates on the subset.
00:03:07.600 | We're just going to keep the first of each one of those
00:03:09.680 | and do all that in place.
00:03:11.280 | And now you can see that the contexts are now different.
00:03:14.400 | Okay, cool.
00:03:15.560 | So I mean, that's the data prep side of things done.
00:03:19.480 | And what we're going to want to do now
00:03:21.560 | is initialize both the embedding model
00:03:24.320 | and our vector database.
00:03:26.080 | So embedding model first.
00:03:28.000 | So we're going to be using TextEmbeddingArda002
00:03:30.400 | from OpenAI.
00:03:31.680 | Again, you can use like any embedding model you want.
00:03:34.560 | It doesn't need to be OpenAI,
00:03:36.440 | doesn't need to be TextEmbeddingArda002.
00:03:39.080 | Okay, so I'm going to enter my API key there.
00:03:41.640 | So the API key can go from platform.openai.com.
00:03:45.880 | And then we need our Pinecone API key
00:03:48.600 | and Pinecone environment.
00:03:49.880 | So I'm going to go into my dashboard and grab those.
00:03:53.360 | So that is app.pinecone.io.
00:03:55.840 | You go to API keys,
00:03:57.360 | and what I want to do is just copy the key value
00:04:00.120 | and remember my environment here.
00:04:02.000 | So I've got us-west1-gcp.
00:04:04.280 | Okay, yours might vary.
00:04:05.920 | So make sure you actually check this
00:04:08.520 | for your environment variable.
00:04:11.080 | So now I'm going to run this.
00:04:12.760 | Enter my API key and then my environment.
00:04:16.040 | So us-west1-gcp.
00:04:20.000 | Cool, so this is going to first initialize that index.
00:04:24.400 | All right, so here.
00:04:26.200 | Sorry, this initializes the connection to Pinecone,
00:04:30.000 | not the index.
00:04:31.440 | And then if we don't have an existing index
00:04:35.360 | with this index name,
00:04:37.640 | then it initializes the index.
00:04:40.040 | Now for the metric, we're using dot product.
00:04:42.160 | That is specific to text embedding R002.
00:04:46.160 | A lot of models actually use cosine.
00:04:48.280 | So if you're not sure what to use,
00:04:51.360 | then I'd recommend you just use cosine
00:04:54.160 | and see how it works, if it works.
00:04:57.640 | And also dimensionality.
00:04:59.560 | So again, this is something specific to each model.
00:05:02.120 | For text embedding R002, it is 1536.
00:05:06.760 | Okay, cool, I will let that run.
00:05:09.560 | Okay, that's initialized.
00:05:11.120 | And then we connect to the index.
00:05:12.480 | So again, passing the same index name.
00:05:15.080 | I'm using gRPC index.
00:05:16.480 | You can also just use index.
00:05:18.640 | But gRPC index is just more reliable
00:05:21.120 | and a little bit faster.
00:05:22.280 | So I go with that.
00:05:24.240 | And then we're going to describe the index stats
00:05:26.160 | so we can see what's in there at the moment.
00:05:28.320 | And we should see that total vector count is zero
00:05:30.400 | 'cause we haven't added anything in there yet.
00:05:33.360 | So, okay, that's great.
00:05:35.480 | And then we move on to indexing.
00:05:37.480 | So this is just where we're going to add
00:05:39.480 | all of these embeddings into Pinecone.
00:05:43.640 | Now, we do this directly with the Pinecone client
00:05:48.640 | and the gRPC index that we have here,
00:05:50.920 | rather than through LineChain
00:05:52.400 | because with LineChain, it's just slower.
00:05:55.080 | So I find this is just the better way of doing it.
00:05:58.840 | So we set our batch size to 100.
00:06:01.120 | That means we're going to just encode
00:06:03.040 | a hundred records or contexts at once.
00:06:06.040 | And we're going to add those to Pinecone
00:06:09.920 | in batches of a hundred at once as well.
00:06:12.520 | So then we just loop through our dataset.
00:06:14.480 | We get the batch, we get metadata.
00:06:18.080 | So metadata is just going to contain
00:06:19.760 | the title and the context.
00:06:22.640 | So if we come up here, title is this,
00:06:26.440 | and this is the context.
00:06:27.680 | Okay, looks good.
00:06:28.960 | And then, okay, where are we?
00:06:31.960 | Yeah, let's just run this actually.
00:06:33.440 | Okay, so we're creating our metadata.
00:06:35.840 | We get our context from the current batch,
00:06:39.480 | and then we embed those using text embedding R002.
00:06:43.680 | Okay, so these are like the chunks of text
00:06:48.520 | that we're passing in.
00:06:49.360 | We usually call them either context or documents
00:06:52.440 | or also passages as well.
00:06:54.600 | You can also call them that.
00:06:56.200 | They get referred to as any of those.
00:06:59.960 | Okay, and then what we do is we get our IDs.
00:07:03.920 | So the ID, again, it's just this here.
00:07:07.200 | So as unique ID for every item, that's important.
00:07:10.160 | Otherwise we're going to overwrite records within Pinecone.
00:07:14.400 | And then we just add everything to Pinecone.
00:07:16.120 | So we basically just take our IDs, okay?
00:07:20.080 | IDs, embeddings, and the metadata,
00:07:23.640 | and each of these is a list,
00:07:25.080 | and we zip those all together
00:07:26.480 | so that we get a list of tuples
00:07:29.240 | where each tuple contains a single record
00:07:32.320 | and that record's ID, embedding, and metadata.
00:07:36.000 | Okay, so I will fast forward to let this finish.
00:07:40.440 | Okay, so it's finished.
00:07:41.920 | And again, we can describe index stats,
00:07:44.280 | and we should see now that it has been populated
00:07:47.400 | with vectors, okay?
00:07:48.360 | So we have almost 19,000 vectors in there now, or records.
00:07:52.600 | Okay, cool.
00:07:53.800 | So up to here, we've been using the Pinecone client
00:07:57.120 | to do this.
00:07:57.960 | Again, like I said, it's just faster
00:07:59.640 | than using the implementation in LineChain at the moment,
00:08:03.080 | but now we're going to switch back to LineChain
00:08:05.360 | because we want to be able to use the conversational agent
00:08:09.120 | and all the other tooling that comes with LineChain.
00:08:13.360 | So what we're going to do is reinitialize our index,
00:08:16.640 | and we're going to use a normal index and not gRPC
00:08:20.040 | because that is what is implemented with LineChain.
00:08:23.360 | So we initialize that,
00:08:24.840 | and then we initialize a vector sort object,
00:08:27.680 | which is basically LineChain's version of the index
00:08:32.080 | that we create here.
00:08:33.200 | It just includes the embedding in there as well.
00:08:36.440 | And we'll also, so the text field, that's important.
00:08:39.680 | That's just the field within your metadata
00:08:42.720 | that contains the text for each record.
00:08:46.040 | So for us, that is text because we set it here.
00:08:49.000 | So let's run this.
00:08:52.600 | And like we did before in the previous retrieval video,
00:08:57.360 | we can test that this is working
00:09:00.360 | by using the similarity search method with our query here.
00:09:04.000 | So when was the College of Engineering
00:09:05.960 | in the University of Notre Dame established?
00:09:09.520 | And yeah, we pass that,
00:09:11.040 | and we say you want to return
00:09:12.160 | to top three most relevant documents,
00:09:15.880 | passages, context, whatever you want to call them.
00:09:18.080 | And we can see that we get, so this is a document here.
00:09:21.960 | So we have, I think that's probably relevant.
00:09:25.280 | This one is definitely relevant.
00:09:28.240 | And we have another document there as well.
00:09:31.480 | Okay, so we get those three results.
00:09:33.720 | Looks good.
00:09:34.560 | So let's now move on to the agent part of things.
00:09:38.800 | Okay, so our conversational agent needs our chat,
00:09:42.840 | large language model,
00:09:44.000 | conversational memory, and the retrieval QA chain.
00:09:47.280 | So we import each of those here, right?
00:09:51.520 | And let me explain what those actually are.
00:09:53.680 | So we have the chat LLM,
00:09:56.280 | that is basically is chat GPT, okay?
00:09:59.840 | So chat LLMs, they just receive the input
00:10:04.200 | in a different format to normal LLMs.
00:10:06.760 | That is more inducive to a chat
00:10:09.640 | like a stream of data or information.
00:10:13.000 | And then we have our conversational memory.
00:10:15.480 | So this is important.
00:10:16.920 | So we have our memory key.
00:10:18.080 | We're using chat history
00:10:19.440 | because that is what the memory is referred to
00:10:24.440 | in the, I think the conversational agent component.
00:10:29.320 | So whenever you're using conversational agent,
00:10:31.360 | you need to make sure you set memory key
00:10:32.920 | equal to chat history here.
00:10:34.760 | We're going to remember the previous five interactions
00:10:38.240 | and yeah, that's our conversational memory, okay?
00:10:41.440 | So after that, we set up our retrieval Q&A chain.
00:10:45.680 | So for that, we need our chat LLM.
00:10:48.400 | We set the chain type here to stuff,
00:10:50.280 | so that basically means when you are retrieving the,
00:10:54.320 | I think the three items from the vector store,
00:10:57.200 | we're going to just place them as is
00:11:00.440 | into the retrieval Q&A.
00:11:03.120 | So we're gonna kind of like stuff them all into the context
00:11:05.920 | rather than doing like any fancy summarization
00:11:08.600 | or anything like that, okay?
00:11:10.080 | And then we set our retriever
00:11:11.720 | and the retriever is our vector store,
00:11:14.480 | but as a retriever, okay?
00:11:16.680 | It was just a slightly different class or object.
00:11:19.920 | All right, so we run that.
00:11:21.680 | And then with those, we can generate our answer, okay?
00:11:24.360 | So we run and we're using the same query here.
00:11:26.920 | So you see that was a query.
00:11:28.680 | Let me come up here.
00:11:30.280 | When was the College of Engineering
00:11:32.200 | and University of Notre Dame established?
00:11:34.480 | We come down and the answer is the College of Engineering
00:11:38.520 | was established in 1920 at the University of Notre Dame.
00:11:41.960 | Okay, so cool.
00:11:43.400 | We get the answer and it is generated
00:11:46.360 | by our GPT 3.5 turbo model based on the context
00:11:51.360 | that we retrieved from our vector store, okay?
00:11:54.760 | So basically based on these three documents here.
00:11:57.920 | Cool, now that's good, but that isn't the whole thing yet.
00:12:02.920 | That's actually just a retrieval Q&A chain, okay?
00:12:06.160 | That isn't a conversational agent.
00:12:08.360 | To create our conversational agent,
00:12:11.000 | we actually need to convert our retrieval Q&A chain
00:12:15.080 | into a tool that the agent can use.
00:12:19.000 | So that's what we're doing here.
00:12:20.440 | We get a tools list, which is what we'll pass to our agent.
00:12:23.800 | And we can include multiple tools in there.
00:12:26.320 | That's why it's a list.
00:12:27.520 | But we're only actually using one tool in this case.
00:12:30.640 | So we define the name of that tool.
00:12:32.560 | We're gonna call it the knowledge base.
00:12:34.320 | We pass in the function that runs
00:12:36.960 | when the agent calls this chain,
00:12:39.800 | which is just Q&A run like we did here.
00:12:42.800 | And then we set a description.
00:12:44.680 | So this description is important
00:12:46.400 | because it is using this description
00:12:48.080 | that the conversational agent will decide
00:12:52.240 | which tool to use if you have multiple tools,
00:12:55.120 | or also just whether to use this tool.
00:12:57.080 | So we say use this tool
00:12:58.400 | when answering general knowledge queries
00:13:00.120 | to get more information about the topic.
00:13:02.320 | Okay, which I think is a pretty clear description
00:13:05.200 | as to when to use this tool, okay?
00:13:07.840 | And yeah, so from there, we initialize our agent.
00:13:10.800 | We're using this chat conversational
00:13:12.520 | react description agent.
00:13:14.280 | We pass in our tools, our LLM,
00:13:16.520 | but those just means we're going to get a load
00:13:18.280 | of printed output, which helps us just see
00:13:20.880 | what is actually going on.
00:13:22.560 | Max iterations defines a number of times
00:13:25.720 | the agent can use, basically go through a tool usage loop,
00:13:30.720 | which is, we're going to limit it to three.
00:13:32.920 | Otherwise it can, what can happen is it can keep going
00:13:36.040 | to tools over and over again and get stuck
00:13:38.280 | in an infinite loop, which we don't want.
00:13:40.960 | The model is going to decide when to stop generation.
00:13:45.520 | And we also need to include our conversational memory
00:13:48.080 | because this is a conversational agent.
00:13:50.640 | Okay, we run that and now our agent is ready to use.
00:13:54.680 | So we can, let's pass in that query that we used before.
00:13:57.800 | So the, this one was the, let me run it and we'll see.
00:14:01.760 | Okay, so this action input here is actually
00:14:04.680 | the generated question that the LLM is passing to our tool.
00:14:09.400 | So it might not be exactly what we put in
00:14:12.440 | or it might actually be the same.
00:14:14.960 | It depends.
00:14:15.800 | Basically, sometimes the agent will reformat this
00:14:19.720 | into a question that it thinks is going
00:14:22.400 | to get us better results.
00:14:24.760 | So our question is, when was the College of Engineering
00:14:27.440 | at the University of Notre Dame established?
00:14:30.200 | And the observation, because it refers
00:14:32.760 | to the knowledge base for this.
00:14:34.120 | So the observation is the College of Engineering
00:14:37.280 | at the University of Notre Dame was established in 1920.
00:14:40.400 | Okay.
00:14:41.240 | Then the agent is like, okay,
00:14:43.880 | I think I have enough information to answer this question.
00:14:46.360 | So it says final answer.
00:14:48.400 | And then the final answer it returns is this.
00:14:51.040 | Okay.
00:14:51.880 | Which is the same, same thing.
00:14:53.640 | Right, and then we can see that here.
00:14:55.600 | So that final output.
00:14:57.560 | Okay.
00:14:58.480 | Now, what if we ask it a,
00:15:00.560 | something that is not general knowledge.
00:15:02.400 | So what is two times seven?
00:15:04.560 | See what it will say.
00:15:07.600 | Okay.
00:15:08.440 | And you see, it doesn't decide to use a knowledge base here.
00:15:11.160 | It knows that it doesn't need to.
00:15:13.120 | So it just goes straight to final answer
00:15:15.080 | and it tells us it is 14.
00:15:18.200 | Okay.
00:15:19.480 | Now let's try some more.
00:15:22.400 | So I'm going to ask it to tell me some facts
00:15:24.800 | about the University of Notre Dame.
00:15:26.840 | So it knows to use a knowledge base
00:15:29.360 | and to pass in University of Notre Dame facts.
00:15:32.160 | So you can see here that it's not just passing in
00:15:35.760 | what I wrote here.
00:15:37.200 | It's actually passing in a generated version
00:15:39.560 | that it thinks will basically return better results.
00:15:42.600 | Okay.
00:15:43.440 | And what it got was,
00:15:45.520 | obviously it got some of the context that we saw before.
00:15:49.680 | And based on the information in those contexts,
00:15:52.400 | it's come up with all of this, all these facts, right?
00:15:55.960 | Which is quite a lot.
00:15:56.800 | So it's given us this bullet point list.
00:15:58.520 | And then the final answer.
00:15:59.760 | So based on this bullet point list,
00:16:02.240 | it's given us this like paragraph.
00:16:04.520 | So yeah, you can see.
00:16:07.040 | I haven't been through this.
00:16:08.440 | So I'm not sure how correct it is,
00:16:10.280 | but we can see that it is using the tool
00:16:13.080 | and it looks relatively accurate, I think.
00:16:16.200 | Okay.
00:16:18.200 | You can also see that here.
00:16:19.280 | So this output.
00:16:21.320 | Cool.
00:16:22.160 | Looks good.
00:16:23.000 | And what we can do is,
00:16:25.000 | because this is a conversational agent,
00:16:27.280 | we can actually ask it questions
00:16:29.880 | that are dependent on previous interactions.
00:16:33.520 | Like, can you summarize these facts in two short sentences?
00:16:37.160 | So we're not telling it which facts,
00:16:38.880 | we're just kind of referring to the previous interaction.
00:16:42.320 | So let's run that and see what it comes up with.
00:16:46.720 | Okay.
00:16:47.560 | I'm just gonna read it here 'cause it's a little bit easier.
00:16:50.160 | So yeah.
00:16:51.000 | We got our output here.
00:16:51.840 | The University of Notre Dame is a Catholic research
00:16:54.520 | university located in South Bend, Indiana, USA,
00:16:58.760 | is consistently ranked among the top 20 universities
00:17:03.520 | in the United States and as a major global university.
00:17:07.960 | Cool.
00:17:08.800 | So it managed to kind of summarize,
00:17:11.040 | obviously not good at everything.
00:17:12.440 | That would be pretty difficult,
00:17:14.920 | but I think it has a good summary there.
00:17:17.840 | Cool.
00:17:18.680 | So actually that's it for this video.
00:17:20.880 | So that is how we would implement a retrieval,
00:17:25.880 | augmented conversational agent in Liontrain.
00:17:29.480 | So really kind of taking the previous few videos
00:17:33.080 | and almost merging those all together.
00:17:35.240 | So, you know, we took the retrieval augmentation,
00:17:37.920 | we took an agent and we created a tool that, you know,
00:17:41.840 | could allow us to access our external knowledge base
00:17:45.120 | and implement that sort of long-term memory
00:17:47.840 | for our conversational agents.
00:17:51.000 | So this is a sort of pattern that we,
00:17:54.200 | I'm already seeing actually quite a lot in many use cases.
00:17:58.840 | So where we have this long-term memory,
00:18:00.520 | where we have agents
00:18:02.200 | and where we have conversational history.
00:18:05.960 | So I think, you know,
00:18:08.160 | especially if you're building tools or applications
00:18:12.800 | that use NLP, large language models,
00:18:15.000 | this is probably something that you're gonna come across
00:18:16.920 | if you haven't already.
00:18:18.200 | But anyway, that's it for this video.
00:18:20.160 | I hope all of this has been useful and interesting.
00:18:23.880 | So thank you very much for watching
00:18:26.400 | and I will see you again in the next one.
00:18:29.520 | (upbeat music)
00:18:32.960 | (soft music)
00:18:35.640 | (soft music)
00:18:38.040 | (soft music)
00:18:40.440 | (gentle music)