Build Conversational Agents with Vector DBs

00:00:00.000 | In one of the early videos in this series on Langtrain,

00:00:03.180 | we talked about retrieval augmentation.

00:00:05.760 | And one of the most commonly asked questions

00:00:10.520 | from that is how can the large language model know

00:00:14.680 | when to actually search through the vector database?

00:00:17.880 | Because obviously if you're just chatting

00:00:20.400 | with the chat button,

00:00:21.400 | it doesn't need to refer to any external knowledge.

00:00:24.000 | At that point, there's no reason for the model

00:00:27.480 | to actually go to a vector database and retrieve information.

00:00:31.600 | So how can we make it an optional thing

00:00:35.400 | where we're not always querying our vector database?

00:00:39.200 | Well, I mean, there's kind of two options.

00:00:41.520 | The first option is you just actually stick with that

00:00:44.600 | and you just set like a similarity threshold

00:00:47.540 | where if a retrieved context is below that threshold,

00:00:51.900 | you just don't include it as the added information

00:00:54.760 | within your query to the large language model.

00:00:58.960 | And the second option,

00:01:00.160 | which is what we're going to talk about today

00:01:02.360 | is actually using a retrieval tool as part of a AI agent.

00:01:07.360 | So if you have been following along with this series,

00:01:12.160 | we're essentially going to take what we spoke about last,

00:01:15.080 | which is agents and what we spoke about earlier

00:01:18.080 | in the series, which is retrieval augmentation.

00:01:20.400 | And we're going to put them both together.

00:01:22.600 | So, I mean, let's jump straight into the code.

00:01:26.020 | So we have this notebook that you can follow along with.

00:01:29.160 | There'll be a link to this

00:01:30.720 | somewhere near the top of the video right now.

00:01:33.360 | And the first thing we need to do is obviously

00:01:35.200 | install any prerequisite libraries.

00:01:37.160 | So we have OpenAI, Pinecone, LangChain, TicToken,

00:01:40.880 | and the HungFace datasets.

00:01:43.000 | So, yeah, we run those.

00:01:44.840 | I've already run it, so I'm not going to run it again.

00:01:46.960 | First thing I'm going to do is actually load our dataset.

00:01:51.160 | Now, this is the dataset we're going to be using

00:01:53.640 | to create our knowledge base.

00:01:57.040 | It's a basically pre-processed dataset.

00:02:00.920 | We won't need to do any of the chunking

00:02:04.400 | or anything that we would usually do.

00:02:06.040 | And that's on purpose.

00:02:07.240 | I want this to be pretty simple

00:02:08.680 | and we can focus more on the agent side of things

00:02:11.600 | rather than the data prep side of things.

00:02:13.760 | So yeah, using the Sanford question answering dataset.

00:02:17.240 | And within this, so the reason that we don't need to do

00:02:20.640 | any of this data pre-processing that we usually do

00:02:23.400 | is if we just run this.

00:02:25.840 | So just conveying it into a pandas data frame

00:02:28.920 | is because we have these contexts

00:02:31.440 | and each context is roughly a paragraph

00:02:34.160 | or a little bit more of text.

00:02:36.520 | And that's what we're going to be indexing

00:02:38.560 | within our knowledge base.

00:02:40.000 | Typically, what you'll find is if you're, for example,

00:02:43.240 | working with PDFs and you want to store those

00:02:45.040 | in your knowledge base,

00:02:46.400 | you'll need to chunk that long piece of text

00:02:49.720 | into smaller chunks.

00:02:51.720 | This basically is already chunked.

00:02:54.400 | It just makes our life a little bit easier.

00:02:56.640 | But one thing we do need to do is actually deduplicate this

00:02:59.480 | because we have many of the same contexts

00:03:01.560 | over and over again.

00:03:02.840 | So we just do that here.

00:03:05.080 | So just drop duplicates on the subset.

00:03:07.600 | We're just going to keep the first of each one of those

00:03:09.680 | and do all that in place.

00:03:11.280 | And now you can see that the contexts are now different.

00:03:14.400 | Okay, cool.

00:03:15.560 | So I mean, that's the data prep side of things done.

00:03:19.480 | And what we're going to want to do now

00:03:21.560 | is initialize both the embedding model

00:03:24.320 | and our vector database.

00:03:26.080 | So embedding model first.

00:03:28.000 | So we're going to be using TextEmbeddingArda002

00:03:30.400 | from OpenAI.

00:03:31.680 | Again, you can use like any embedding model you want.

00:03:34.560 | It doesn't need to be OpenAI,

00:03:36.440 | doesn't need to be TextEmbeddingArda002.

00:03:39.080 | Okay, so I'm going to enter my API key there.

00:03:41.640 | So the API key can go from platform.openai.com.

00:03:45.880 | And then we need our Pinecone API key

00:03:48.600 | and Pinecone environment.

00:03:49.880 | So I'm going to go into my dashboard and grab those.

00:03:53.360 | So that is app.pinecone.io.

00:03:55.840 | You go to API keys,

00:03:57.360 | and what I want to do is just copy the key value

00:04:00.120 | and remember my environment here.

00:04:02.000 | So I've got us-west1-gcp.

00:04:04.280 | Okay, yours might vary.

00:04:05.920 | So make sure you actually check this

00:04:08.520 | for your environment variable.

00:04:11.080 | So now I'm going to run this.

00:04:12.760 | Enter my API key and then my environment.

00:04:16.040 | So us-west1-gcp.

00:04:20.000 | Cool, so this is going to first initialize that index.

00:04:24.400 | All right, so here.

00:04:26.200 | Sorry, this initializes the connection to Pinecone,

00:04:30.000 | not the index.

00:04:31.440 | And then if we don't have an existing index

00:04:35.360 | with this index name,

00:04:37.640 | then it initializes the index.

00:04:40.040 | Now for the metric, we're using dot product.

00:04:42.160 | That is specific to text embedding R002.

00:04:46.160 | A lot of models actually use cosine.

00:04:48.280 | So if you're not sure what to use,

00:04:51.360 | then I'd recommend you just use cosine

00:04:54.160 | and see how it works, if it works.

00:04:57.640 | And also dimensionality.

00:04:59.560 | So again, this is something specific to each model.

00:05:02.120 | For text embedding R002, it is 1536.

00:05:06.760 | Okay, cool, I will let that run.

00:05:09.560 | Okay, that's initialized.

00:05:11.120 | And then we connect to the index.

00:05:12.480 | So again, passing the same index name.

00:05:15.080 | I'm using gRPC index.

00:05:16.480 | You can also just use index.

00:05:18.640 | But gRPC index is just more reliable

00:05:21.120 | and a little bit faster.

00:05:22.280 | So I go with that.

00:05:24.240 | And then we're going to describe the index stats

00:05:26.160 | so we can see what's in there at the moment.

00:05:28.320 | And we should see that total vector count is zero

00:05:30.400 | 'cause we haven't added anything in there yet.

00:05:33.360 | So, okay, that's great.

00:05:35.480 | And then we move on to indexing.

00:05:37.480 | So this is just where we're going to add

00:05:39.480 | all of these embeddings into Pinecone.

00:05:43.640 | Now, we do this directly with the Pinecone client

00:05:48.640 | and the gRPC index that we have here,

00:05:50.920 | rather than through LineChain

00:05:52.400 | because with LineChain, it's just slower.

00:05:55.080 | So I find this is just the better way of doing it.

00:05:58.840 | So we set our batch size to 100.

00:06:01.120 | That means we're going to just encode

00:06:03.040 | a hundred records or contexts at once.

00:06:06.040 | And we're going to add those to Pinecone

00:06:09.920 | in batches of a hundred at once as well.

00:06:12.520 | So then we just loop through our dataset.

00:06:14.480 | We get the batch, we get metadata.

00:06:18.080 | So metadata is just going to contain

00:06:19.760 | the title and the context.

00:06:22.640 | So if we come up here, title is this,

00:06:26.440 | and this is the context.

00:06:27.680 | Okay, looks good.

00:06:28.960 | And then, okay, where are we?

00:06:31.960 | Yeah, let's just run this actually.

00:06:33.440 | Okay, so we're creating our metadata.

00:06:35.840 | We get our context from the current batch,

00:06:39.480 | and then we embed those using text embedding R002.

00:06:43.680 | Okay, so these are like the chunks of text

00:06:48.520 | that we're passing in.

00:06:49.360 | We usually call them either context or documents

00:06:52.440 | or also passages as well.

00:06:54.600 | You can also call them that.

00:06:56.200 | They get referred to as any of those.

00:06:59.960 | Okay, and then what we do is we get our IDs.

00:07:03.920 | So the ID, again, it's just this here.

00:07:07.200 | So as unique ID for every item, that's important.

00:07:10.160 | Otherwise we're going to overwrite records within Pinecone.

00:07:14.400 | And then we just add everything to Pinecone.

00:07:16.120 | So we basically just take our IDs, okay?

00:07:20.080 | IDs, embeddings, and the metadata,

00:07:23.640 | and each of these is a list,

00:07:25.080 | and we zip those all together

00:07:26.480 | so that we get a list of tuples

00:07:29.240 | where each tuple contains a single record

00:07:32.320 | and that record's ID, embedding, and metadata.

00:07:36.000 | Okay, so I will fast forward to let this finish.

00:07:40.440 | Okay, so it's finished.

00:07:41.920 | And again, we can describe index stats,

00:07:44.280 | and we should see now that it has been populated

00:07:47.400 | with vectors, okay?

00:07:48.360 | So we have almost 19,000 vectors in there now, or records.

00:07:52.600 | Okay, cool.

00:07:53.800 | So up to here, we've been using the Pinecone client

00:07:57.120 | to do this.

00:07:57.960 | Again, like I said, it's just faster

00:07:59.640 | than using the implementation in LineChain at the moment,

00:08:03.080 | but now we're going to switch back to LineChain

00:08:05.360 | because we want to be able to use the conversational agent

00:08:09.120 | and all the other tooling that comes with LineChain.

00:08:13.360 | So what we're going to do is reinitialize our index,

00:08:16.640 | and we're going to use a normal index and not gRPC

00:08:20.040 | because that is what is implemented with LineChain.

00:08:23.360 | So we initialize that,

00:08:24.840 | and then we initialize a vector sort object,

00:08:27.680 | which is basically LineChain's version of the index

00:08:32.080 | that we create here.

00:08:33.200 | It just includes the embedding in there as well.

00:08:36.440 | And we'll also, so the text field, that's important.

00:08:39.680 | That's just the field within your metadata

00:08:42.720 | that contains the text for each record.

00:08:46.040 | So for us, that is text because we set it here.

00:08:49.000 | So let's run this.

00:08:52.600 | And like we did before in the previous retrieval video,

00:08:57.360 | we can test that this is working

00:09:00.360 | by using the similarity search method with our query here.

00:09:04.000 | So when was the College of Engineering

00:09:05.960 | in the University of Notre Dame established?

00:09:09.520 | And yeah, we pass that,

00:09:11.040 | and we say you want to return

00:09:12.160 | to top three most relevant documents,

00:09:15.880 | passages, context, whatever you want to call them.

00:09:18.080 | And we can see that we get, so this is a document here.

00:09:21.960 | So we have, I think that's probably relevant.

00:09:25.280 | This one is definitely relevant.

00:09:28.240 | And we have another document there as well.

00:09:31.480 | Okay, so we get those three results.

00:09:33.720 | Looks good.

00:09:34.560 | So let's now move on to the agent part of things.

00:09:38.800 | Okay, so our conversational agent needs our chat,

00:09:42.840 | large language model,

00:09:44.000 | conversational memory, and the retrieval QA chain.

00:09:47.280 | So we import each of those here, right?

00:09:51.520 | And let me explain what those actually are.

00:09:53.680 | So we have the chat LLM,

00:09:56.280 | that is basically is chat GPT, okay?

00:09:59.840 | So chat LLMs, they just receive the input

00:10:04.200 | in a different format to normal LLMs.

00:10:06.760 | That is more inducive to a chat

00:10:09.640 | like a stream of data or information.

00:10:13.000 | And then we have our conversational memory.

00:10:15.480 | So this is important.

00:10:16.920 | So we have our memory key.

00:10:18.080 | We're using chat history

00:10:19.440 | because that is what the memory is referred to

00:10:24.440 | in the, I think the conversational agent component.

00:10:29.320 | So whenever you're using conversational agent,

00:10:31.360 | you need to make sure you set memory key

00:10:32.920 | equal to chat history here.

00:10:34.760 | We're going to remember the previous five interactions

00:10:38.240 | and yeah, that's our conversational memory, okay?

00:10:41.440 | So after that, we set up our retrieval Q&A chain.

00:10:45.680 | So for that, we need our chat LLM.

00:10:48.400 | We set the chain type here to stuff,

00:10:50.280 | so that basically means when you are retrieving the,

00:10:54.320 | I think the three items from the vector store,

00:10:57.200 | we're going to just place them as is

00:11:00.440 | into the retrieval Q&A.

00:11:03.120 | So we're gonna kind of like stuff them all into the context

00:11:05.920 | rather than doing like any fancy summarization

00:11:08.600 | or anything like that, okay?

00:11:10.080 | And then we set our retriever

00:11:11.720 | and the retriever is our vector store,

00:11:14.480 | but as a retriever, okay?

00:11:16.680 | It was just a slightly different class or object.

00:11:19.920 | All right, so we run that.

00:11:21.680 | And then with those, we can generate our answer, okay?

00:11:24.360 | So we run and we're using the same query here.

00:11:26.920 | So you see that was a query.

00:11:28.680 | Let me come up here.

00:11:30.280 | When was the College of Engineering

00:11:32.200 | and University of Notre Dame established?

00:11:34.480 | We come down and the answer is the College of Engineering

00:11:38.520 | was established in 1920 at the University of Notre Dame.

00:11:41.960 | Okay, so cool.

00:11:43.400 | We get the answer and it is generated

00:11:46.360 | by our GPT 3.5 turbo model based on the context

00:11:51.360 | that we retrieved from our vector store, okay?

00:11:54.760 | So basically based on these three documents here.

00:11:57.920 | Cool, now that's good, but that isn't the whole thing yet.

00:12:02.920 | That's actually just a retrieval Q&A chain, okay?

00:12:06.160 | That isn't a conversational agent.

00:12:08.360 | To create our conversational agent,

00:12:11.000 | we actually need to convert our retrieval Q&A chain

00:12:15.080 | into a tool that the agent can use.

00:12:19.000 | So that's what we're doing here.

00:12:20.440 | We get a tools list, which is what we'll pass to our agent.

00:12:23.800 | And we can include multiple tools in there.

00:12:26.320 | That's why it's a list.

00:12:27.520 | But we're only actually using one tool in this case.

00:12:30.640 | So we define the name of that tool.

00:12:32.560 | We're gonna call it the knowledge base.

00:12:34.320 | We pass in the function that runs

00:12:36.960 | when the agent calls this chain,

00:12:39.800 | which is just Q&A run like we did here.

00:12:42.800 | And then we set a description.

00:12:44.680 | So this description is important

00:12:46.400 | because it is using this description

00:12:48.080 | that the conversational agent will decide

00:12:52.240 | which tool to use if you have multiple tools,

00:12:55.120 | or also just whether to use this tool.

00:12:57.080 | So we say use this tool

00:12:58.400 | when answering general knowledge queries

00:13:00.120 | to get more information about the topic.

00:13:02.320 | Okay, which I think is a pretty clear description

00:13:05.200 | as to when to use this tool, okay?

00:13:07.840 | And yeah, so from there, we initialize our agent.

00:13:10.800 | We're using this chat conversational

00:13:12.520 | react description agent.

00:13:14.280 | We pass in our tools, our LLM,

00:13:16.520 | but those just means we're going to get a load

00:13:18.280 | of printed output, which helps us just see

00:13:20.880 | what is actually going on.

00:13:22.560 | Max iterations defines a number of times

00:13:25.720 | the agent can use, basically go through a tool usage loop,

00:13:30.720 | which is, we're going to limit it to three.

00:13:32.920 | Otherwise it can, what can happen is it can keep going

00:13:36.040 | to tools over and over again and get stuck

00:13:38.280 | in an infinite loop, which we don't want.

00:13:40.960 | The model is going to decide when to stop generation.

00:13:45.520 | And we also need to include our conversational memory

00:13:48.080 | because this is a conversational agent.

00:13:50.640 | Okay, we run that and now our agent is ready to use.

00:13:54.680 | So we can, let's pass in that query that we used before.

00:13:57.800 | So the, this one was the, let me run it and we'll see.

00:14:01.760 | Okay, so this action input here is actually

00:14:04.680 | the generated question that the LLM is passing to our tool.

00:14:09.400 | So it might not be exactly what we put in

00:14:12.440 | or it might actually be the same.

00:14:14.960 | It depends.

00:14:15.800 | Basically, sometimes the agent will reformat this

00:14:19.720 | into a question that it thinks is going

00:14:22.400 | to get us better results.

00:14:24.760 | So our question is, when was the College of Engineering

00:14:27.440 | at the University of Notre Dame established?

00:14:30.200 | And the observation, because it refers

00:14:32.760 | to the knowledge base for this.

00:14:34.120 | So the observation is the College of Engineering

00:14:37.280 | at the University of Notre Dame was established in 1920.

00:14:40.400 | Okay.

00:14:41.240 | Then the agent is like, okay,

00:14:43.880 | I think I have enough information to answer this question.

00:14:46.360 | So it says final answer.

00:14:48.400 | And then the final answer it returns is this.

00:14:51.040 | Okay.

00:14:51.880 | Which is the same, same thing.

00:14:53.640 | Right, and then we can see that here.

00:14:55.600 | So that final output.

00:14:57.560 | Okay.

00:14:58.480 | Now, what if we ask it a,

00:15:00.560 | something that is not general knowledge.

00:15:02.400 | So what is two times seven?

00:15:04.560 | See what it will say.

00:15:07.600 | Okay.

00:15:08.440 | And you see, it doesn't decide to use a knowledge base here.

00:15:11.160 | It knows that it doesn't need to.

00:15:13.120 | So it just goes straight to final answer

00:15:15.080 | and it tells us it is 14.

00:15:18.200 | Okay.

00:15:19.480 | Now let's try some more.

00:15:22.400 | So I'm going to ask it to tell me some facts

00:15:24.800 | about the University of Notre Dame.

00:15:26.840 | So it knows to use a knowledge base

00:15:29.360 | and to pass in University of Notre Dame facts.

00:15:32.160 | So you can see here that it's not just passing in

00:15:35.760 | what I wrote here.

00:15:37.200 | It's actually passing in a generated version

00:15:39.560 | that it thinks will basically return better results.

00:15:42.600 | Okay.

00:15:43.440 | And what it got was,

00:15:45.520 | obviously it got some of the context that we saw before.

00:15:49.680 | And based on the information in those contexts,

00:15:52.400 | it's come up with all of this, all these facts, right?

00:15:55.960 | Which is quite a lot.

00:15:56.800 | So it's given us this bullet point list.

00:15:58.520 | And then the final answer.

00:15:59.760 | So based on this bullet point list,

00:16:02.240 | it's given us this like paragraph.

00:16:04.520 | So yeah, you can see.

00:16:07.040 | I haven't been through this.

00:16:08.440 | So I'm not sure how correct it is,

00:16:10.280 | but we can see that it is using the tool

00:16:13.080 | and it looks relatively accurate, I think.

00:16:16.200 | Okay.

00:16:18.200 | You can also see that here.

00:16:19.280 | So this output.

00:16:21.320 | Cool.

00:16:22.160 | Looks good.

00:16:23.000 | And what we can do is,

00:16:25.000 | because this is a conversational agent,

00:16:27.280 | we can actually ask it questions

00:16:29.880 | that are dependent on previous interactions.

00:16:33.520 | Like, can you summarize these facts in two short sentences?

00:16:37.160 | So we're not telling it which facts,

00:16:38.880 | we're just kind of referring to the previous interaction.

00:16:42.320 | So let's run that and see what it comes up with.

00:16:46.720 | Okay.

00:16:47.560 | I'm just gonna read it here 'cause it's a little bit easier.

00:16:50.160 | So yeah.

00:16:51.000 | We got our output here.

00:16:51.840 | The University of Notre Dame is a Catholic research

00:16:54.520 | university located in South Bend, Indiana, USA,

00:16:58.760 | is consistently ranked among the top 20 universities

00:17:03.520 | in the United States and as a major global university.

00:17:07.960 | Cool.

00:17:08.800 | So it managed to kind of summarize,

00:17:11.040 | obviously not good at everything.

00:17:12.440 | That would be pretty difficult,

00:17:14.920 | but I think it has a good summary there.

00:17:17.840 | Cool.

00:17:18.680 | So actually that's it for this video.

00:17:20.880 | So that is how we would implement a retrieval,

00:17:25.880 | augmented conversational agent in Liontrain.

00:17:29.480 | So really kind of taking the previous few videos

00:17:33.080 | and almost merging those all together.

00:17:35.240 | So, you know, we took the retrieval augmentation,

00:17:37.920 | we took an agent and we created a tool that, you know,

00:17:41.840 | could allow us to access our external knowledge base

00:17:45.120 | and implement that sort of long-term memory

00:17:47.840 | for our conversational agents.

00:17:51.000 | So this is a sort of pattern that we,

00:17:54.200 | I'm already seeing actually quite a lot in many use cases.

00:17:58.840 | So where we have this long-term memory,

00:18:00.520 | where we have agents

00:18:02.200 | and where we have conversational history.

00:18:05.960 | So I think, you know,

00:18:08.160 | especially if you're building tools or applications

00:18:12.800 | that use NLP, large language models,

00:18:15.000 | this is probably something that you're gonna come across

00:18:16.920 | if you haven't already.

00:18:18.200 | But anyway, that's it for this video.

00:18:20.160 | I hope all of this has been useful and interesting.

00:18:23.880 | So thank you very much for watching

00:18:26.400 | and I will see you again in the next one.

00:18:28.520 | Bye.

00:18:29.520 | (upbeat music)

00:18:32.960 | (soft music)

00:18:35.640 | (soft music)

00:18:38.040 | (soft music)

00:18:40.440 | (gentle music)

00:18:43.020 | you

Build Conversational Agents with Vector DBs - LangChain #9

Chapters