back to indexClaude 3 Opus RAG Chatbot (Full Walkthrough)
Chapters
0:0 Claude 3 AI Agent in LangChain
0:33 Finding Claude 3 RAG Code
1:35 Using Voyage AI Embeddings
2:25 Using Pinecone Knowledge Base for RAG
3:55 Claude 3 AI Agent Setup
9:19 Using Claude 3 Agent
10:17 Adding Conversational Memory
12:32 Testing Claude 3 Agent with Memory
14:40 Final Thoughts on AI Agents and Anthropic
00:00:00.000 |
Today we're going to be taking a look at how we can build a fully conversational agent 00:00:06.280 |
using Anthropic's new Cloud 3 Opus model with Voyage AI Embeddings and Pinecone Vector 00:00:15.720 |
We're going to be using all of these services via LineChain and that is the latest version 00:00:22.100 |
We're using 0.1.11 at the moment, which allows us to put all of this together pretty easily. 00:00:33.380 |
Now I'm going to be using this notebook here, which you can find in the Pinecone examples 00:00:38.740 |
And I'm just going to go ahead and click the open in Colab button. 00:00:42.580 |
Okay, once that has opened, we're going to go click connect. 00:00:47.100 |
And we're first just going to install all the prerequisites. 00:00:52.940 |
So the main ones, of course, the LineChain ones. 00:00:56.980 |
We also use the LineChain Anthropic package here so that we can use the Anthropic models 00:01:06.620 |
Without this, we can't use the latest Cloud 3 models, at least with the current versions 00:01:13.340 |
And yeah, I think everything else is pretty self-explanatory. 00:01:24.200 |
Now once those have installed, we're going to download a dataset. 00:01:27.980 |
I'm just going to use the AI Archive Chunked dataset as usual. 00:01:32.620 |
I'm not going to go through too much of this, but the bit that we do actually need is here. 00:01:38.620 |
So we're going to be using the Voyage Embeddings. 00:01:42.480 |
Now Voyage AI is a relatively new AI company that focuses on embedding models at the moment. 00:01:52.700 |
So we will need to go ahead and grab an API key from them. 00:02:06.580 |
You will need to sign up for an account if you haven't already, or if you have, you just 00:02:11.540 |
Okay, I can see I already have my demo API key here. 00:02:15.360 |
I'm just going to copy that and pull that in, run this, and enter the key. 00:02:21.980 |
Then we should be able to initialize our embeddings here. 00:02:25.980 |
And now we need to jump across to Pinecone and get another API key. 00:02:34.860 |
I'm going to get my API key from here, copy that, and pull it in. 00:02:46.100 |
And first thing I want to do is just check the embedding size that this model uses. 00:02:56.580 |
And we're going to need to use that when we initialize our index. 00:03:01.980 |
So we run into here, yep, here we're passing in that dimensionality. 00:03:07.500 |
Now I've actually already initialized this index. 00:03:10.160 |
So we'll come here and we'll see that my index is already populated. 00:03:14.660 |
If you're running through this for the first time, you should see that this will be zero. 00:03:23.020 |
This is where you'll be populating your index. 00:03:25.220 |
So literally looping through the entire dataset, embedding everything, and throwing it in there. 00:03:31.020 |
We add some metadata for the actual chunks of text, for the source of those chunks, and 00:03:40.700 |
But I am not going to go, I'm not going to run that because I already have it in there. 00:03:46.980 |
But you should expect it to take, so here it took me about 12 minutes, that is on Colab. 00:03:51.540 |
So expect something similar if you have a decent internet connection. 00:03:56.600 |
So now we need to go into the agent component, right? 00:04:00.980 |
We've just, we've set up our knowledge base with Voyage AI embeddings. 00:04:09.540 |
Now our agent is, as I mentioned before, using a Cloud3 Opus LLM, and it will have a tool 00:04:19.500 |
that it can use to retrieve data, which is the knowledge base that we have set up. 00:04:25.940 |
So we need to initialize or define that tool. 00:04:30.160 |
So we're going to call it the archive search tool here. 00:04:33.580 |
We're using the tool decorator from Lionchain agents to define this as a tool. 00:04:38.580 |
This prompt here, that basically gives our agent a description on when to use this tool. 00:04:45.900 |
And in order to use this tool, it must consume a string, and it will output a string, okay? 00:04:53.940 |
So we initialize our archive search tool definition there. 00:05:01.340 |
If we had multiple tools, of course, we would have, you know, we might have some other search 00:05:12.180 |
And let's just have a look at what this will actually look like when our agent is using 00:05:17.780 |
So if we have this query, can you tell me about Llama2, our agent is going to run the 00:05:23.380 |
tool using archive search run, and then pass in this tool input parameter, okay? 00:05:31.540 |
So it would use the tool like this, and this is the output that it would get, okay? 00:05:39.140 |
Now we're using anthropic models here, and anthropic models, they work very well with 00:05:47.180 |
An XML agent, it's using a slightly different format to other agents in that we, one, have 00:05:54.620 |
the input defined like this, and we also have tool usage defined like this. 00:06:02.680 |
So we need to first initialize our prompt for that. 00:06:07.960 |
So I'm going to use from the long chain hub, Harrison has put together this XML agent convo 00:06:18.060 |
And you can see that's defining, that's the way that we want the agent to use that 00:06:26.720 |
We have observations, and then final answer is being instructed to provide a final answer 00:06:38.320 |
Now, what we also need to do is initialize our anthropic chat LLM. 00:06:46.540 |
So now we're using the line chain anthropic package rather than I think before we were 00:06:50.920 |
using something like line chain.community, or sorry, it would be line chain_community.anthropic.chatmodels. 00:07:02.620 |
Now we are using the line chain anthropic library directly, and we need to do that in 00:07:10.180 |
If we use the old method and try and use this, we're going to get an error. 00:07:14.440 |
Something along the lines of the messages format is incorrect. 00:07:36.660 |
And I'm just going to paste that key into here. 00:07:41.780 |
One thing you can do here if you would like faster response times, you can use SONET rather 00:07:54.860 |
We have a few intermediate steps that we need in order to basically support the XML format 00:08:03.120 |
So we add that and also convert the tool names into the format, again, that we need for this 00:08:13.060 |
And then what we do is we initialize the inputs to our agent here. 00:08:21.700 |
That is how everything is being pulled into the agent pipeline. 00:08:26.500 |
And they are piped into our prompt transformation here, which we defined with the convert tools 00:08:39.420 |
The LLM will stop whenever we hit either tool input or final answer, like the ending tags. 00:08:48.060 |
And then the output from that is piped into our XML agent output parser. 00:08:54.100 |
So that we get, you know, like a nice format at the end there that we can actually work 00:09:02.180 |
Then we need to pass our agent flow as an agent into our agent executor alongside our 00:09:08.540 |
And we can either say verbose is true or false. 00:09:14.460 |
Just so that we can see everything, I'm going to say it's true. 00:09:16.700 |
It will just print out a ton of stuff, essentially. 00:09:22.580 |
For now, there's going to be no chat history. 00:09:33.180 |
My question is, can you tell me about Llama 2? 00:09:37.620 |
The context I got back are the same as what we saw before. 00:09:44.020 |
So this blue bit here, they're the return context. 00:09:50.260 |
That actually did take a little bit of time there, 36 seconds. 00:09:56.780 |
So Llama 2 is a large-diameter model developed by Meta.ai. 00:10:17.700 |
What we now might want to do, okay, so right now we don't have that chat history, which 00:10:22.340 |
means our model, our agent is stateless, can't refer to previous interactions. 00:10:28.460 |
So what we need to do is add those previous interactions in a way of maintaining the state 00:10:34.020 |
of the conversation over multiple interactions. 00:10:37.260 |
So we're going to be using conversational memory object, or conversational buffer window 00:10:47.260 |
And at the moment, our conversational memory is, of course, empty because we haven't added 00:10:53.860 |
So what we can do is use the addUserMessage and addAIMessage methods to add some memory 00:11:01.820 |
So now if we print that out, we can see that we have a human message, AI message, and that 00:11:09.540 |
So let's see how we can feed that into our XML Cloud 3 agent. 00:11:17.780 |
So we can't send these messages into the agent directly. 00:11:21.580 |
Instead, we need to pass a string in this format, okay, human and AI. 00:11:27.100 |
So we're going to create a new function here, memory to string, that takes our conversational 00:11:32.340 |
buffer window memory, retrieves those messages, and then formats them into that format that 00:11:40.020 |
So with human and AI, and returns a string of that. 00:11:44.600 |
So let's run this, and let's see what we get when we print that out. 00:11:54.500 |
Now what we want to do is rather than, you know, because we're going to have to do this 00:11:59.780 |
code every time where we're taking or we're invoking our agent and then doing this. 00:12:04.920 |
So rather than just repeating that every time, let's wrap that all into a single function 00:12:10.940 |
called chat, and we'll use this to call our agent and also maintain our conversation state. 00:12:18.020 |
So we have our agent execute to invoke as we did before, we're passing in our conversational 00:12:23.620 |
memory, and then after that, we are adding the previous interactions to our conversational 00:12:39.500 |
So what I'm doing here, we've already spoken about LLAMA2 with the model, we're asking 00:12:45.980 |
about LLAMA2, but now I'm asking a question that doesn't specify that I'm talking about 00:12:51.740 |
I'm saying was any red teaming done with the model? 00:12:57.960 |
So the model needs to be able to look at those previous interactions in order to understand 00:13:02.180 |
that I'm actually asking about LLAMA2, and in order to perform the search with LLAMA2 00:13:09.500 |
Because if you just do a search with red teaming the model, the results are not going to be 00:13:17.740 |
So fortunately, the model actually handles this pretty nicely, and you can see that the 00:13:22.580 |
input it gives to the tool here is LLAMA2 red teaming, so it's looking at those previous 00:13:27.620 |
interactions correctly and pulling in that information. 00:13:32.700 |
And we see in blue the results that we're getting, and we see that after conducting 00:13:37.100 |
red team exercises, we asked participants who had also participated in LLAMA2 chat exercises 00:13:43.340 |
to also provide a qualitative assessment of safety capabilities of the model. 00:13:51.620 |
So let's come down here and have a look at the final answer that they got or that the 00:13:57.580 |
So it said, "Yes, Meta-AI conducted red teaming exercises on LLAMA2 and Co-LLAMA models. 00:14:03.580 |
They conducted three red teaming exercises with 25 Meta employees, including domain experts 00:14:09.400 |
in responsible AI, malware development, and offensive security engineering." 00:14:13.340 |
And you can see, okay, I think there's more information here. 00:14:17.740 |
I think it all looks pretty good, to be honest. 00:14:19.540 |
So as I look at the summary, "Meta-AI did conduct red teaming to proactively identify 00:14:24.620 |
risks in those models, uncovering some potential issues, especially with ambiguous prompts." 00:14:35.780 |
Actually, probably one of the best answers I've seen from running this sort of test. 00:14:40.920 |
So yeah, that is how we would put together a conversational agent that has access to 00:14:47.840 |
external memory using our knowledge base and is also able to refer to past interactions 00:14:56.440 |
in order to formulate responses and have that more sort of conversational nature. 00:15:00.600 |
I mean, it's a pretty simple version, but it works pretty well. 00:15:04.960 |
So yeah, I hope this is useful in just kind of putting together your own Cloud 3 Opus 00:15:15.820 |
But for now, that's all I have on Cloud 3 agents. 00:15:23.780 |
So thank you very much for watching, and I will see you again in the next one.