back to index

Claude 3 Opus RAG Chatbot (Full Walkthrough)


Chapters

0:0 Claude 3 AI Agent in LangChain
0:33 Finding Claude 3 RAG Code
1:35 Using Voyage AI Embeddings
2:25 Using Pinecone Knowledge Base for RAG
3:55 Claude 3 AI Agent Setup
9:19 Using Claude 3 Agent
10:17 Adding Conversational Memory
12:32 Testing Claude 3 Agent with Memory
14:40 Final Thoughts on AI Agents and Anthropic

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we're going to be taking a look at how we can build a fully conversational agent
00:00:06.280 | using Anthropic's new Cloud 3 Opus model with Voyage AI Embeddings and Pinecone Vector
00:00:14.720 | Database.
00:00:15.720 | We're going to be using all of these services via LineChain and that is the latest version
00:00:21.100 | of LineChain.
00:00:22.100 | We're using 0.1.11 at the moment, which allows us to put all of this together pretty easily.
00:00:31.020 | So we're going to jump straight into it.
00:00:33.380 | Now I'm going to be using this notebook here, which you can find in the Pinecone examples
00:00:37.740 | repo.
00:00:38.740 | And I'm just going to go ahead and click the open in Colab button.
00:00:42.580 | Okay, once that has opened, we're going to go click connect.
00:00:47.100 | And we're first just going to install all the prerequisites.
00:00:50.460 | There's a few here that we need.
00:00:52.940 | So the main ones, of course, the LineChain ones.
00:00:56.980 | We also use the LineChain Anthropic package here so that we can use the Anthropic models
00:01:02.580 | and particularly the latest Cloud 3 models.
00:01:06.620 | Without this, we can't use the latest Cloud 3 models, at least with the current versions
00:01:11.060 | of LineChain Community.
00:01:13.340 | And yeah, I think everything else is pretty self-explanatory.
00:01:16.460 | So I'm going to go ahead and install that.
00:01:20.620 | That will take a moment to install.
00:01:24.200 | Now once those have installed, we're going to download a dataset.
00:01:27.980 | I'm just going to use the AI Archive Chunked dataset as usual.
00:01:32.620 | I'm not going to go through too much of this, but the bit that we do actually need is here.
00:01:38.620 | So we're going to be using the Voyage Embeddings.
00:01:42.480 | Now Voyage AI is a relatively new AI company that focuses on embedding models at the moment.
00:01:52.700 | So we will need to go ahead and grab an API key from them.
00:01:58.180 | The URL for that is this.
00:02:02.040 | So dash.voyageai.com/api-keys.
00:02:06.580 | You will need to sign up for an account if you haven't already, or if you have, you just
00:02:10.540 | log in.
00:02:11.540 | Okay, I can see I already have my demo API key here.
00:02:15.360 | I'm just going to copy that and pull that in, run this, and enter the key.
00:02:21.980 | Then we should be able to initialize our embeddings here.
00:02:25.980 | And now we need to jump across to Pinecone and get another API key.
00:02:30.160 | So this is app.pinecone.io, log in, okay.
00:02:34.860 | I'm going to get my API key from here, copy that, and pull it in.
00:02:39.140 | Cool.
00:02:40.140 | So we have that.
00:02:41.140 | And here I'm using Pinecone serverless here.
00:02:43.780 | So I'm going to run that.
00:02:46.100 | And first thing I want to do is just check the embedding size that this model uses.
00:02:52.180 | So we can see it's 1024 here.
00:02:56.580 | And we're going to need to use that when we initialize our index.
00:03:01.980 | So we run into here, yep, here we're passing in that dimensionality.
00:03:07.500 | Now I've actually already initialized this index.
00:03:10.160 | So we'll come here and we'll see that my index is already populated.
00:03:14.660 | If you're running through this for the first time, you should see that this will be zero.
00:03:20.780 | Now let's come down to here.
00:03:23.020 | This is where you'll be populating your index.
00:03:25.220 | So literally looping through the entire dataset, embedding everything, and throwing it in there.
00:03:31.020 | We add some metadata for the actual chunks of text, for the source of those chunks, and
00:03:37.540 | for the title being the archive paper title.
00:03:40.700 | But I am not going to go, I'm not going to run that because I already have it in there.
00:03:46.980 | But you should expect it to take, so here it took me about 12 minutes, that is on Colab.
00:03:51.540 | So expect something similar if you have a decent internet connection.
00:03:55.600 | All right, cool.
00:03:56.600 | So now we need to go into the agent component, right?
00:04:00.980 | We've just, we've set up our knowledge base with Voyage AI embeddings.
00:04:06.480 | Now we will get into the agent component.
00:04:09.540 | Now our agent is, as I mentioned before, using a Cloud3 Opus LLM, and it will have a tool
00:04:19.500 | that it can use to retrieve data, which is the knowledge base that we have set up.
00:04:25.940 | So we need to initialize or define that tool.
00:04:30.160 | So we're going to call it the archive search tool here.
00:04:33.580 | We're using the tool decorator from Lionchain agents to define this as a tool.
00:04:38.580 | This prompt here, that basically gives our agent a description on when to use this tool.
00:04:45.900 | And in order to use this tool, it must consume a string, and it will output a string, okay?
00:04:52.940 | And that is it.
00:04:53.940 | So we initialize our archive search tool definition there.
00:04:58.680 | We pass that into a list for tools.
00:05:01.340 | If we had multiple tools, of course, we would have, you know, we might have some other search
00:05:06.580 | tool or something in there as well.
00:05:08.820 | But we just have one.
00:05:09.820 | So I'm going to run that.
00:05:12.180 | And let's just have a look at what this will actually look like when our agent is using
00:05:17.780 | So if we have this query, can you tell me about Llama2, our agent is going to run the
00:05:23.380 | tool using archive search run, and then pass in this tool input parameter, okay?
00:05:30.280 | So it would look like this.
00:05:31.540 | So it would use the tool like this, and this is the output that it would get, okay?
00:05:36.740 | These are the context that we've returned.
00:05:39.140 | Now we're using anthropic models here, and anthropic models, they work very well with
00:05:45.560 | what is called an XML agent.
00:05:47.180 | An XML agent, it's using a slightly different format to other agents in that we, one, have
00:05:54.620 | the input defined like this, and we also have tool usage defined like this.
00:06:00.740 | So using these XML tags.
00:06:02.680 | So we need to first initialize our prompt for that.
00:06:07.960 | So I'm going to use from the long chain hub, Harrison has put together this XML agent convo
00:06:15.060 | prompt.
00:06:16.060 | So I'm going to initialize that.
00:06:17.060 | Okay?
00:06:18.060 | And you can see that's defining, that's the way that we want the agent to use that
00:06:24.480 | XML format here.
00:06:26.720 | We have observations, and then final answer is being instructed to provide a final answer
00:06:34.200 | using, again, these XML tags.
00:06:37.320 | Okay.
00:06:38.320 | Now, what we also need to do is initialize our anthropic chat LLM.
00:06:45.500 | This is slightly different.
00:06:46.540 | So now we're using the line chain anthropic package rather than I think before we were
00:06:50.920 | using something like line chain.community, or sorry, it would be line chain_community.anthropic.chatmodels.
00:07:00.580 | Something along those lines.
00:07:02.620 | Now we are using the line chain anthropic library directly, and we need to do that in
00:07:07.420 | order to use the Cloud 3 Opus model.
00:07:10.180 | If we use the old method and try and use this, we're going to get an error.
00:07:14.440 | Something along the lines of the messages format is incorrect.
00:07:19.200 | So we do need to use this.
00:07:21.540 | We do need a anthropic API key.
00:07:23.340 | So let's go and get that.
00:07:24.420 | We go to console.anthropic.com.
00:07:26.420 | I'm going to go to get API keys.
00:07:29.860 | I will create a new key.
00:07:32.500 | We're going to call this rag demo.
00:07:36.660 | And I'm just going to paste that key into here.
00:07:39.780 | Okay.
00:07:40.780 | Cool.
00:07:41.780 | One thing you can do here if you would like faster response times, you can use SONET rather
00:07:48.180 | than Opus.
00:07:50.020 | But for now, I think we're pretty good.
00:07:53.020 | So we've initialized that.
00:07:54.860 | We have a few intermediate steps that we need in order to basically support the XML format
00:08:00.840 | that this agent requires.
00:08:03.120 | So we add that and also convert the tool names into the format, again, that we need for this
00:08:10.860 | agent.
00:08:13.060 | And then what we do is we initialize the inputs to our agent here.
00:08:18.540 | Okay?
00:08:19.540 | So we have our inputs.
00:08:21.700 | That is how everything is being pulled into the agent pipeline.
00:08:26.500 | And they are piped into our prompt transformation here, which we defined with the convert tools
00:08:33.300 | up there.
00:08:34.860 | Then they are piped into the LLM.
00:08:39.420 | The LLM will stop whenever we hit either tool input or final answer, like the ending tags.
00:08:45.940 | You can see that here.
00:08:48.060 | And then the output from that is piped into our XML agent output parser.
00:08:54.100 | So that we get, you know, like a nice format at the end there that we can actually work
00:08:57.980 | with.
00:08:58.980 | So that is kind of like our agent flow.
00:09:02.180 | Then we need to pass our agent flow as an agent into our agent executor alongside our
00:09:07.540 | tools.
00:09:08.540 | And we can either say verbose is true or false.
00:09:12.140 | It's kind of up to you.
00:09:14.460 | Just so that we can see everything, I'm going to say it's true.
00:09:16.700 | It will just print out a ton of stuff, essentially.
00:09:19.940 | Now, let's try our agent.
00:09:22.580 | For now, there's going to be no chat history.
00:09:24.700 | We'll add that after in a moment.
00:09:27.820 | So let's try.
00:09:31.100 | So it's using the archive search tool.
00:09:33.180 | My question is, can you tell me about Llama 2?
00:09:35.020 | That's good.
00:09:36.020 | It should do.
00:09:37.620 | The context I got back are the same as what we saw before.
00:09:43.020 | Okay.
00:09:44.020 | So this blue bit here, they're the return context.
00:09:48.620 | And then we get our final answer.
00:09:50.260 | That actually did take a little bit of time there, 36 seconds.
00:09:53.340 | So we get a final answer here.
00:09:55.540 | And we can actually read it here as well.
00:09:56.780 | So Llama 2 is a large-diameter model developed by Meta.ai.
00:10:00.180 | Good.
00:10:01.180 | Code Llama is a version of Llama 2.
00:10:05.620 | Yeah, it's all good, all good, all good.
00:10:09.860 | Okay.
00:10:10.860 | Cool.
00:10:11.860 | I mean, that all looks pretty good.
00:10:13.900 | I think, yeah, there's nothing weird there.
00:10:17.700 | What we now might want to do, okay, so right now we don't have that chat history, which
00:10:22.340 | means our model, our agent is stateless, can't refer to previous interactions.
00:10:28.460 | So what we need to do is add those previous interactions in a way of maintaining the state
00:10:34.020 | of the conversation over multiple interactions.
00:10:37.260 | So we're going to be using conversational memory object, or conversational buffer window
00:10:41.740 | memory object from LangChain.
00:10:44.980 | So we initialize that.
00:10:47.260 | And at the moment, our conversational memory is, of course, empty because we haven't added
00:10:52.140 | anything to it yet.
00:10:53.860 | So what we can do is use the addUserMessage and addAIMessage methods to add some memory
00:11:00.460 | or messages into that.
00:11:01.820 | So now if we print that out, we can see that we have a human message, AI message, and that
00:11:06.620 | is it.
00:11:07.620 | We've only had those two messages so far.
00:11:09.540 | So let's see how we can feed that into our XML Cloud 3 agent.
00:11:17.780 | So we can't send these messages into the agent directly.
00:11:21.580 | Instead, we need to pass a string in this format, okay, human and AI.
00:11:27.100 | So we're going to create a new function here, memory to string, that takes our conversational
00:11:32.340 | buffer window memory, retrieves those messages, and then formats them into that format that
00:11:39.020 | we need.
00:11:40.020 | So with human and AI, and returns a string of that.
00:11:44.600 | So let's run this, and let's see what we get when we print that out.
00:11:48.700 | Okay.
00:11:49.700 | So we get human, AI.
00:11:51.700 | Cool.
00:11:52.700 | Looks good.
00:11:54.500 | Now what we want to do is rather than, you know, because we're going to have to do this
00:11:59.780 | code every time where we're taking or we're invoking our agent and then doing this.
00:12:04.920 | So rather than just repeating that every time, let's wrap that all into a single function
00:12:10.940 | called chat, and we'll use this to call our agent and also maintain our conversation state.
00:12:18.020 | So we have our agent execute to invoke as we did before, we're passing in our conversational
00:12:23.620 | memory, and then after that, we are adding the previous interactions to our conversational
00:12:30.140 | memory for any future interactions.
00:12:33.020 | So we run that, and let's see what we get.
00:12:38.500 | Okay.
00:12:39.500 | So what I'm doing here, we've already spoken about LLAMA2 with the model, we're asking
00:12:45.980 | about LLAMA2, but now I'm asking a question that doesn't specify that I'm talking about
00:12:49.740 | LLAMA2.
00:12:50.740 | Okay.
00:12:51.740 | I'm saying was any red teaming done with the model?
00:12:54.540 | I don't say LLAMA2 at all here.
00:12:57.960 | So the model needs to be able to look at those previous interactions in order to understand
00:13:02.180 | that I'm actually asking about LLAMA2, and in order to perform the search with LLAMA2
00:13:08.500 | in there.
00:13:09.500 | Because if you just do a search with red teaming the model, the results are not going to be
00:13:14.240 | specific to LLAMA2.
00:13:17.740 | So fortunately, the model actually handles this pretty nicely, and you can see that the
00:13:22.580 | input it gives to the tool here is LLAMA2 red teaming, so it's looking at those previous
00:13:27.620 | interactions correctly and pulling in that information.
00:13:32.700 | And we see in blue the results that we're getting, and we see that after conducting
00:13:37.100 | red team exercises, we asked participants who had also participated in LLAMA2 chat exercises
00:13:43.340 | to also provide a qualitative assessment of safety capabilities of the model.
00:13:47.540 | Okay.
00:13:48.540 | So you can see that seems pretty relevant.
00:13:51.620 | So let's come down here and have a look at the final answer that they got or that the
00:13:56.580 | model produced.
00:13:57.580 | So it said, "Yes, Meta-AI conducted red teaming exercises on LLAMA2 and Co-LLAMA models.
00:14:03.580 | They conducted three red teaming exercises with 25 Meta employees, including domain experts
00:14:09.400 | in responsible AI, malware development, and offensive security engineering."
00:14:13.340 | And you can see, okay, I think there's more information here.
00:14:17.740 | I think it all looks pretty good, to be honest.
00:14:19.540 | So as I look at the summary, "Meta-AI did conduct red teaming to proactively identify
00:14:24.620 | risks in those models, uncovering some potential issues, especially with ambiguous prompts."
00:14:30.540 | Let's also put any risks in perspective.
00:14:32.180 | All right.
00:14:33.180 | I mean, it looks pretty good.
00:14:34.780 | That's a good answer.
00:14:35.780 | Actually, probably one of the best answers I've seen from running this sort of test.
00:14:40.920 | So yeah, that is how we would put together a conversational agent that has access to
00:14:47.840 | external memory using our knowledge base and is also able to refer to past interactions
00:14:56.440 | in order to formulate responses and have that more sort of conversational nature.
00:15:00.600 | I mean, it's a pretty simple version, but it works pretty well.
00:15:04.960 | So yeah, I hope this is useful in just kind of putting together your own Cloud 3 Opus
00:15:12.900 | or Cloud 3 Sonnet RAG agents.
00:15:15.820 | But for now, that's all I have on Cloud 3 agents.
00:15:20.060 | So I will leave it there.
00:15:21.060 | I hope this has been useful and interesting.
00:15:23.780 | So thank you very much for watching, and I will see you again in the next one.
00:15:27.400 | [MUSIC PLAYING]
00:15:29.400 | [END PLAYBACK]