Claude 3 Opus RAG Chatbot (Full Walkthrough)

Today we're going to be taking a look at how we can build a fully conversational agent using Anthropic's new Cloud 3 Opus model with Voyage AI Embeddings and Pinecone Vector Database. We're going to be using all of these services via LineChain and that is the latest version of LineChain.

We're using 0.1.11 at the moment, which allows us to put all of this together pretty easily. So we're going to jump straight into it. Now I'm going to be using this notebook here, which you can find in the Pinecone examples repo. And I'm just going to go ahead and click the open in Colab button.

Okay, once that has opened, we're going to go click connect. And we're first just going to install all the prerequisites. There's a few here that we need. So the main ones, of course, the LineChain ones. We also use the LineChain Anthropic package here so that we can use the Anthropic models and particularly the latest Cloud 3 models.

Without this, we can't use the latest Cloud 3 models, at least with the current versions of LineChain Community. And yeah, I think everything else is pretty self-explanatory. So I'm going to go ahead and install that. That will take a moment to install. Now once those have installed, we're going to download a dataset.

I'm just going to use the AI Archive Chunked dataset as usual. I'm not going to go through too much of this, but the bit that we do actually need is here. So we're going to be using the Voyage Embeddings. Now Voyage AI is a relatively new AI company that focuses on embedding models at the moment.

So we will need to go ahead and grab an API key from them. The URL for that is this. So dash.voyageai.com/api-keys. You will need to sign up for an account if you haven't already, or if you have, you just log in. Okay, I can see I already have my demo API key here.

I'm just going to copy that and pull that in, run this, and enter the key. Then we should be able to initialize our embeddings here. And now we need to jump across to Pinecone and get another API key. So this is app.pinecone.io, log in, okay. I'm going to get my API key from here, copy that, and pull it in.

Cool. So we have that. And here I'm using Pinecone serverless here. So I'm going to run that. And first thing I want to do is just check the embedding size that this model uses. So we can see it's 1024 here. And we're going to need to use that when we initialize our index.

So we run into here, yep, here we're passing in that dimensionality. Now I've actually already initialized this index. So we'll come here and we'll see that my index is already populated. If you're running through this for the first time, you should see that this will be zero. Now let's come down to here.

This is where you'll be populating your index. So literally looping through the entire dataset, embedding everything, and throwing it in there. We add some metadata for the actual chunks of text, for the source of those chunks, and for the title being the archive paper title. But I am not going to go, I'm not going to run that because I already have it in there.

But you should expect it to take, so here it took me about 12 minutes, that is on Colab. So expect something similar if you have a decent internet connection. All right, cool. So now we need to go into the agent component, right? We've just, we've set up our knowledge base with Voyage AI embeddings.

Now we will get into the agent component. Now our agent is, as I mentioned before, using a Cloud3 Opus LLM, and it will have a tool that it can use to retrieve data, which is the knowledge base that we have set up. So we need to initialize or define that tool.

So we're going to call it the archive search tool here. We're using the tool decorator from Lionchain agents to define this as a tool. This prompt here, that basically gives our agent a description on when to use this tool. And in order to use this tool, it must consume a string, and it will output a string, okay?

And that is it. So we initialize our archive search tool definition there. We pass that into a list for tools. If we had multiple tools, of course, we would have, you know, we might have some other search tool or something in there as well. But we just have one.

So I'm going to run that. And let's just have a look at what this will actually look like when our agent is using it. So if we have this query, can you tell me about Llama2, our agent is going to run the tool using archive search run, and then pass in this tool input parameter, okay?

So it would look like this. So it would use the tool like this, and this is the output that it would get, okay? These are the context that we've returned. Now we're using anthropic models here, and anthropic models, they work very well with what is called an XML agent.

An XML agent, it's using a slightly different format to other agents in that we, one, have the input defined like this, and we also have tool usage defined like this. So using these XML tags. So we need to first initialize our prompt for that. So I'm going to use from the long chain hub, Harrison has put together this XML agent convo prompt.

So I'm going to initialize that. Okay? And you can see that's defining, that's the way that we want the agent to use that XML format here. We have observations, and then final answer is being instructed to provide a final answer using, again, these XML tags. Okay. Now, what we also need to do is initialize our anthropic chat LLM.

This is slightly different. So now we're using the line chain anthropic package rather than I think before we were using something like line chain.community, or sorry, it would be line chain_community.anthropic.chatmodels. Something along those lines. Now we are using the line chain anthropic library directly, and we need to do that in order to use the Cloud 3 Opus model.

If we use the old method and try and use this, we're going to get an error. Something along the lines of the messages format is incorrect. So we do need to use this. We do need a anthropic API key. So let's go and get that. We go to console.anthropic.com.

I'm going to go to get API keys. I will create a new key. We're going to call this rag demo. And I'm just going to paste that key into here. Okay. Cool. One thing you can do here if you would like faster response times, you can use SONET rather than Opus.

But for now, I think we're pretty good. So we've initialized that. We have a few intermediate steps that we need in order to basically support the XML format that this agent requires. So we add that and also convert the tool names into the format, again, that we need for this agent.

And then what we do is we initialize the inputs to our agent here. Okay? So we have our inputs. That is how everything is being pulled into the agent pipeline. And they are piped into our prompt transformation here, which we defined with the convert tools up there. Then they are piped into the LLM.

The LLM will stop whenever we hit either tool input or final answer, like the ending tags. You can see that here. And then the output from that is piped into our XML agent output parser. So that we get, you know, like a nice format at the end there that we can actually work with.

So that is kind of like our agent flow. Then we need to pass our agent flow as an agent into our agent executor alongside our tools. And we can either say verbose is true or false. It's kind of up to you. Just so that we can see everything, I'm going to say it's true.

It will just print out a ton of stuff, essentially. Now, let's try our agent. For now, there's going to be no chat history. We'll add that after in a moment. So let's try. So it's using the archive search tool. My question is, can you tell me about Llama 2?

That's good. It should do. The context I got back are the same as what we saw before. Okay. So this blue bit here, they're the return context. And then we get our final answer. That actually did take a little bit of time there, 36 seconds. So we get a final answer here.

And we can actually read it here as well. So Llama 2 is a large-diameter model developed by Meta.ai. Good. Code Llama is a version of Llama 2. Yeah, it's all good, all good, all good. Okay. Cool. I mean, that all looks pretty good. I think, yeah, there's nothing weird there.

What we now might want to do, okay, so right now we don't have that chat history, which means our model, our agent is stateless, can't refer to previous interactions. So what we need to do is add those previous interactions in a way of maintaining the state of the conversation over multiple interactions.

So we're going to be using conversational memory object, or conversational buffer window memory object from LangChain. So we initialize that. And at the moment, our conversational memory is, of course, empty because we haven't added anything to it yet. So what we can do is use the addUserMessage and addAIMessage methods to add some memory or messages into that.

So now if we print that out, we can see that we have a human message, AI message, and that is it. We've only had those two messages so far. So let's see how we can feed that into our XML Cloud 3 agent. So we can't send these messages into the agent directly.

Instead, we need to pass a string in this format, okay, human and AI. So we're going to create a new function here, memory to string, that takes our conversational buffer window memory, retrieves those messages, and then formats them into that format that we need. So with human and AI, and returns a string of that.

So let's run this, and let's see what we get when we print that out. Okay. So we get human, AI. Cool. Looks good. Now what we want to do is rather than, you know, because we're going to have to do this code every time where we're taking or we're invoking our agent and then doing this.

So rather than just repeating that every time, let's wrap that all into a single function called chat, and we'll use this to call our agent and also maintain our conversation state. So we have our agent execute to invoke as we did before, we're passing in our conversational memory, and then after that, we are adding the previous interactions to our conversational memory for any future interactions.

So we run that, and let's see what we get. Okay. So what I'm doing here, we've already spoken about LLAMA2 with the model, we're asking about LLAMA2, but now I'm asking a question that doesn't specify that I'm talking about LLAMA2. Okay. I'm saying was any red teaming done with the model?

I don't say LLAMA2 at all here. So the model needs to be able to look at those previous interactions in order to understand that I'm actually asking about LLAMA2, and in order to perform the search with LLAMA2 in there. Because if you just do a search with red teaming the model, the results are not going to be specific to LLAMA2.

So fortunately, the model actually handles this pretty nicely, and you can see that the input it gives to the tool here is LLAMA2 red teaming, so it's looking at those previous interactions correctly and pulling in that information. And we see in blue the results that we're getting, and we see that after conducting red team exercises, we asked participants who had also participated in LLAMA2 chat exercises to also provide a qualitative assessment of safety capabilities of the model.

Okay. So you can see that seems pretty relevant. So let's come down here and have a look at the final answer that they got or that the model produced. So it said, "Yes, Meta-AI conducted red teaming exercises on LLAMA2 and Co-LLAMA models. They conducted three red teaming exercises with 25 Meta employees, including domain experts in responsible AI, malware development, and offensive security engineering." And you can see, okay, I think there's more information here.

I think it all looks pretty good, to be honest. So as I look at the summary, "Meta-AI did conduct red teaming to proactively identify risks in those models, uncovering some potential issues, especially with ambiguous prompts." Let's also put any risks in perspective. All right. I mean, it looks pretty good.

That's a good answer. Actually, probably one of the best answers I've seen from running this sort of test. So yeah, that is how we would put together a conversational agent that has access to external memory using our knowledge base and is also able to refer to past interactions in order to formulate responses and have that more sort of conversational nature.

I mean, it's a pretty simple version, but it works pretty well. So yeah, I hope this is useful in just kind of putting together your own Cloud 3 Opus or Cloud 3 Sonnet RAG agents. But for now, that's all I have on Cloud 3 agents. So I will leave it there.

I hope this has been useful and interesting. So thank you very much for watching, and I will see you again in the next one. Bye.

Claude 3 Opus RAG Chatbot (Full Walkthrough)

Chapters

Transcript