Build Conversational Agents with Vector DBs

In one of the early videos in this series on Langtrain, we talked about retrieval augmentation. And one of the most commonly asked questions from that is how can the large language model know when to actually search through the vector database? Because obviously if you're just chatting with the chat button, it doesn't need to refer to any external knowledge.

At that point, there's no reason for the model to actually go to a vector database and retrieve information. So how can we make it an optional thing where we're not always querying our vector database? Well, I mean, there's kind of two options. The first option is you just actually stick with that and you just set like a similarity threshold where if a retrieved context is below that threshold, you just don't include it as the added information within your query to the large language model.

And the second option, which is what we're going to talk about today is actually using a retrieval tool as part of a AI agent. So if you have been following along with this series, we're essentially going to take what we spoke about last, which is agents and what we spoke about earlier in the series, which is retrieval augmentation.

And we're going to put them both together. So, I mean, let's jump straight into the code. So we have this notebook that you can follow along with. There'll be a link to this somewhere near the top of the video right now. And the first thing we need to do is obviously install any prerequisite libraries.

So we have OpenAI, Pinecone, LangChain, TicToken, and the HungFace datasets. So, yeah, we run those. I've already run it, so I'm not going to run it again. First thing I'm going to do is actually load our dataset. Now, this is the dataset we're going to be using to create our knowledge base.

It's a basically pre-processed dataset. We won't need to do any of the chunking or anything that we would usually do. And that's on purpose. I want this to be pretty simple and we can focus more on the agent side of things rather than the data prep side of things.

So yeah, using the Sanford question answering dataset. And within this, so the reason that we don't need to do any of this data pre-processing that we usually do is if we just run this. So just conveying it into a pandas data frame is because we have these contexts and each context is roughly a paragraph or a little bit more of text.

And that's what we're going to be indexing within our knowledge base. Typically, what you'll find is if you're, for example, working with PDFs and you want to store those in your knowledge base, you'll need to chunk that long piece of text into smaller chunks. This basically is already chunked.

It just makes our life a little bit easier. But one thing we do need to do is actually deduplicate this because we have many of the same contexts over and over again. So we just do that here. So just drop duplicates on the subset. We're just going to keep the first of each one of those and do all that in place.

And now you can see that the contexts are now different. Okay, cool. So I mean, that's the data prep side of things done. And what we're going to want to do now is initialize both the embedding model and our vector database. So embedding model first. So we're going to be using TextEmbeddingArda002 from OpenAI.

Again, you can use like any embedding model you want. It doesn't need to be OpenAI, doesn't need to be TextEmbeddingArda002. Okay, so I'm going to enter my API key there. So the API key can go from platform.openai.com. And then we need our Pinecone API key and Pinecone environment. So I'm going to go into my dashboard and grab those.

So that is app.pinecone.io. You go to API keys, and what I want to do is just copy the key value and remember my environment here. So I've got us-west1-gcp. Okay, yours might vary. So make sure you actually check this for your environment variable. So now I'm going to run this.

Enter my API key and then my environment. So us-west1-gcp. Cool, so this is going to first initialize that index. All right, so here. Sorry, this initializes the connection to Pinecone, not the index. And then if we don't have an existing index with this index name, then it initializes the index.

Now for the metric, we're using dot product. That is specific to text embedding R002. A lot of models actually use cosine. So if you're not sure what to use, then I'd recommend you just use cosine and see how it works, if it works. And also dimensionality. So again, this is something specific to each model.

For text embedding R002, it is 1536. Okay, cool, I will let that run. Okay, that's initialized. And then we connect to the index. So again, passing the same index name. I'm using gRPC index. You can also just use index. But gRPC index is just more reliable and a little bit faster.

So I go with that. And then we're going to describe the index stats so we can see what's in there at the moment. And we should see that total vector count is zero 'cause we haven't added anything in there yet. So, okay, that's great. And then we move on to indexing.

So this is just where we're going to add all of these embeddings into Pinecone. Now, we do this directly with the Pinecone client and the gRPC index that we have here, rather than through LineChain because with LineChain, it's just slower. So I find this is just the better way of doing it.

So we set our batch size to 100. That means we're going to just encode a hundred records or contexts at once. And we're going to add those to Pinecone in batches of a hundred at once as well. So then we just loop through our dataset. We get the batch, we get metadata.

So metadata is just going to contain the title and the context. So if we come up here, title is this, and this is the context. Okay, looks good. And then, okay, where are we? Yeah, let's just run this actually. Okay, so we're creating our metadata. We get our context from the current batch, and then we embed those using text embedding R002.

Okay, so these are like the chunks of text that we're passing in. We usually call them either context or documents or also passages as well. You can also call them that. They get referred to as any of those. Okay, and then what we do is we get our IDs.

So the ID, again, it's just this here. So as unique ID for every item, that's important. Otherwise we're going to overwrite records within Pinecone. And then we just add everything to Pinecone. So we basically just take our IDs, okay? IDs, embeddings, and the metadata, and each of these is a list, and we zip those all together so that we get a list of tuples where each tuple contains a single record and that record's ID, embedding, and metadata.

Okay, so I will fast forward to let this finish. Okay, so it's finished. And again, we can describe index stats, and we should see now that it has been populated with vectors, okay? So we have almost 19,000 vectors in there now, or records. Okay, cool. So up to here, we've been using the Pinecone client to do this.

Again, like I said, it's just faster than using the implementation in LineChain at the moment, but now we're going to switch back to LineChain because we want to be able to use the conversational agent and all the other tooling that comes with LineChain. So what we're going to do is reinitialize our index, and we're going to use a normal index and not gRPC because that is what is implemented with LineChain.

So we initialize that, and then we initialize a vector sort object, which is basically LineChain's version of the index that we create here. It just includes the embedding in there as well. And we'll also, so the text field, that's important. That's just the field within your metadata that contains the text for each record.

So for us, that is text because we set it here. So let's run this. And like we did before in the previous retrieval video, we can test that this is working by using the similarity search method with our query here. So when was the College of Engineering in the University of Notre Dame established?

And yeah, we pass that, and we say you want to return to top three most relevant documents, passages, context, whatever you want to call them. And we can see that we get, so this is a document here. So we have, I think that's probably relevant. This one is definitely relevant.

And we have another document there as well. Okay, so we get those three results. Looks good. So let's now move on to the agent part of things. Okay, so our conversational agent needs our chat, large language model, conversational memory, and the retrieval QA chain. So we import each of those here, right?

And let me explain what those actually are. So we have the chat LLM, that is basically is chat GPT, okay? So chat LLMs, they just receive the input in a different format to normal LLMs. That is more inducive to a chat like a stream of data or information. And then we have our conversational memory.

So this is important. So we have our memory key. We're using chat history because that is what the memory is referred to in the, I think the conversational agent component. So whenever you're using conversational agent, you need to make sure you set memory key equal to chat history here.

We're going to remember the previous five interactions and yeah, that's our conversational memory, okay? So after that, we set up our retrieval Q&A chain. So for that, we need our chat LLM. We set the chain type here to stuff, so that basically means when you are retrieving the, I think the three items from the vector store, we're going to just place them as is into the retrieval Q&A.

So we're gonna kind of like stuff them all into the context rather than doing like any fancy summarization or anything like that, okay? And then we set our retriever and the retriever is our vector store, but as a retriever, okay? It was just a slightly different class or object.

All right, so we run that. And then with those, we can generate our answer, okay? So we run and we're using the same query here. So you see that was a query. Let me come up here. When was the College of Engineering and University of Notre Dame established? We come down and the answer is the College of Engineering was established in 1920 at the University of Notre Dame.

Okay, so cool. We get the answer and it is generated by our GPT 3.5 turbo model based on the context that we retrieved from our vector store, okay? So basically based on these three documents here. Cool, now that's good, but that isn't the whole thing yet. That's actually just a retrieval Q&A chain, okay?

That isn't a conversational agent. To create our conversational agent, we actually need to convert our retrieval Q&A chain into a tool that the agent can use. So that's what we're doing here. We get a tools list, which is what we'll pass to our agent. And we can include multiple tools in there.

That's why it's a list. But we're only actually using one tool in this case. So we define the name of that tool. We're gonna call it the knowledge base. We pass in the function that runs when the agent calls this chain, which is just Q&A run like we did here.

And then we set a description. So this description is important because it is using this description that the conversational agent will decide which tool to use if you have multiple tools, or also just whether to use this tool. So we say use this tool when answering general knowledge queries to get more information about the topic.

Okay, which I think is a pretty clear description as to when to use this tool, okay? And yeah, so from there, we initialize our agent. We're using this chat conversational react description agent. We pass in our tools, our LLM, but those just means we're going to get a load of printed output, which helps us just see what is actually going on.

Max iterations defines a number of times the agent can use, basically go through a tool usage loop, which is, we're going to limit it to three. Otherwise it can, what can happen is it can keep going to tools over and over again and get stuck in an infinite loop, which we don't want.

The model is going to decide when to stop generation. And we also need to include our conversational memory because this is a conversational agent. Okay, we run that and now our agent is ready to use. So we can, let's pass in that query that we used before. So the, this one was the, let me run it and we'll see.

Okay, so this action input here is actually the generated question that the LLM is passing to our tool. So it might not be exactly what we put in or it might actually be the same. It depends. Basically, sometimes the agent will reformat this into a question that it thinks is going to get us better results.

So our question is, when was the College of Engineering at the University of Notre Dame established? And the observation, because it refers to the knowledge base for this. So the observation is the College of Engineering at the University of Notre Dame was established in 1920. Okay. Then the agent is like, okay, I think I have enough information to answer this question.

So it says final answer. And then the final answer it returns is this. Okay. Which is the same, same thing. Right, and then we can see that here. So that final output. Okay. Now, what if we ask it a, something that is not general knowledge. So what is two times seven?

See what it will say. Okay. And you see, it doesn't decide to use a knowledge base here. It knows that it doesn't need to. So it just goes straight to final answer and it tells us it is 14. Okay. Now let's try some more. So I'm going to ask it to tell me some facts about the University of Notre Dame.

So it knows to use a knowledge base and to pass in University of Notre Dame facts. So you can see here that it's not just passing in what I wrote here. It's actually passing in a generated version that it thinks will basically return better results. Okay. And what it got was, obviously it got some of the context that we saw before.

And based on the information in those contexts, it's come up with all of this, all these facts, right? Which is quite a lot. So it's given us this bullet point list. And then the final answer. So based on this bullet point list, it's given us this like paragraph. So yeah, you can see.

I haven't been through this. So I'm not sure how correct it is, but we can see that it is using the tool and it looks relatively accurate, I think. Okay. You can also see that here. So this output. Cool. Looks good. And what we can do is, because this is a conversational agent, we can actually ask it questions that are dependent on previous interactions.

Like, can you summarize these facts in two short sentences? So we're not telling it which facts, we're just kind of referring to the previous interaction. So let's run that and see what it comes up with. Okay. I'm just gonna read it here 'cause it's a little bit easier. So yeah.

We got our output here. The University of Notre Dame is a Catholic research university located in South Bend, Indiana, USA, is consistently ranked among the top 20 universities in the United States and as a major global university. Cool. So it managed to kind of summarize, obviously not good at everything.

That would be pretty difficult, but I think it has a good summary there. Cool. So actually that's it for this video. So that is how we would implement a retrieval, augmented conversational agent in Liontrain. So really kind of taking the previous few videos and almost merging those all together.

So, you know, we took the retrieval augmentation, we took an agent and we created a tool that, you know, could allow us to access our external knowledge base and implement that sort of long-term memory for our conversational agents. So this is a sort of pattern that we, I'm already seeing actually quite a lot in many use cases.

So where we have this long-term memory, where we have agents and where we have conversational history. So I think, you know, especially if you're building tools or applications that use NLP, large language models, this is probably something that you're gonna come across if you haven't already. But anyway, that's it for this video.

I hope all of this has been useful and interesting. So thank you very much for watching and I will see you again in the next one. Bye. (upbeat music) (soft music) (soft music) (soft music) (gentle music) you

Build Conversational Agents with Vector DBs - LangChain #9

Chapters

Transcript