Claude 3 Opus RAG Chatbot (Full Walkthrough)

00:00:00.000 | Today we're going to be taking a look at how we can build a fully conversational agent

00:00:06.280 | using Anthropic's new Cloud 3 Opus model with Voyage AI Embeddings and Pinecone Vector

00:00:14.720 | Database.

00:00:15.720 | We're going to be using all of these services via LineChain and that is the latest version

00:00:21.100 | of LineChain.

00:00:22.100 | We're using 0.1.11 at the moment, which allows us to put all of this together pretty easily.

00:00:31.020 | So we're going to jump straight into it.

00:00:33.380 | Now I'm going to be using this notebook here, which you can find in the Pinecone examples

00:00:37.740 | repo.

00:00:38.740 | And I'm just going to go ahead and click the open in Colab button.

00:00:42.580 | Okay, once that has opened, we're going to go click connect.

00:00:47.100 | And we're first just going to install all the prerequisites.

00:00:50.460 | There's a few here that we need.

00:00:52.940 | So the main ones, of course, the LineChain ones.

00:00:56.980 | We also use the LineChain Anthropic package here so that we can use the Anthropic models

00:01:02.580 | and particularly the latest Cloud 3 models.

00:01:06.620 | Without this, we can't use the latest Cloud 3 models, at least with the current versions

00:01:11.060 | of LineChain Community.

00:01:13.340 | And yeah, I think everything else is pretty self-explanatory.

00:01:16.460 | So I'm going to go ahead and install that.

00:01:20.620 | That will take a moment to install.

00:01:24.200 | Now once those have installed, we're going to download a dataset.

00:01:27.980 | I'm just going to use the AI Archive Chunked dataset as usual.

00:01:32.620 | I'm not going to go through too much of this, but the bit that we do actually need is here.

00:01:38.620 | So we're going to be using the Voyage Embeddings.

00:01:42.480 | Now Voyage AI is a relatively new AI company that focuses on embedding models at the moment.

00:01:52.700 | So we will need to go ahead and grab an API key from them.

00:01:58.180 | The URL for that is this.

00:02:02.040 | So dash.voyageai.com/api-keys.

00:02:06.580 | You will need to sign up for an account if you haven't already, or if you have, you just

00:02:10.540 | log in.

00:02:11.540 | Okay, I can see I already have my demo API key here.

00:02:15.360 | I'm just going to copy that and pull that in, run this, and enter the key.

00:02:21.980 | Then we should be able to initialize our embeddings here.

00:02:25.980 | And now we need to jump across to Pinecone and get another API key.

00:02:30.160 | So this is app.pinecone.io, log in, okay.

00:02:34.860 | I'm going to get my API key from here, copy that, and pull it in.

00:02:39.140 | Cool.

00:02:40.140 | So we have that.

00:02:41.140 | And here I'm using Pinecone serverless here.

00:02:43.780 | So I'm going to run that.

00:02:46.100 | And first thing I want to do is just check the embedding size that this model uses.

00:02:52.180 | So we can see it's 1024 here.

00:02:56.580 | And we're going to need to use that when we initialize our index.

00:03:01.980 | So we run into here, yep, here we're passing in that dimensionality.

00:03:07.500 | Now I've actually already initialized this index.

00:03:10.160 | So we'll come here and we'll see that my index is already populated.

00:03:14.660 | If you're running through this for the first time, you should see that this will be zero.

00:03:20.780 | Now let's come down to here.

00:03:23.020 | This is where you'll be populating your index.

00:03:25.220 | So literally looping through the entire dataset, embedding everything, and throwing it in there.

00:03:31.020 | We add some metadata for the actual chunks of text, for the source of those chunks, and

00:03:37.540 | for the title being the archive paper title.

00:03:40.700 | But I am not going to go, I'm not going to run that because I already have it in there.

00:03:46.980 | But you should expect it to take, so here it took me about 12 minutes, that is on Colab.

00:03:51.540 | So expect something similar if you have a decent internet connection.

00:03:55.600 | All right, cool.

00:03:56.600 | So now we need to go into the agent component, right?

00:04:00.980 | We've just, we've set up our knowledge base with Voyage AI embeddings.

00:04:06.480 | Now we will get into the agent component.

00:04:09.540 | Now our agent is, as I mentioned before, using a Cloud3 Opus LLM, and it will have a tool

00:04:19.500 | that it can use to retrieve data, which is the knowledge base that we have set up.

00:04:25.940 | So we need to initialize or define that tool.

00:04:30.160 | So we're going to call it the archive search tool here.

00:04:33.580 | We're using the tool decorator from Lionchain agents to define this as a tool.

00:04:38.580 | This prompt here, that basically gives our agent a description on when to use this tool.

00:04:45.900 | And in order to use this tool, it must consume a string, and it will output a string, okay?

00:04:52.940 | And that is it.

00:04:53.940 | So we initialize our archive search tool definition there.

00:04:58.680 | We pass that into a list for tools.

00:05:01.340 | If we had multiple tools, of course, we would have, you know, we might have some other search

00:05:06.580 | tool or something in there as well.

00:05:08.820 | But we just have one.

00:05:09.820 | So I'm going to run that.

00:05:12.180 | And let's just have a look at what this will actually look like when our agent is using

00:05:16.780 | it.

00:05:17.780 | So if we have this query, can you tell me about Llama2, our agent is going to run the

00:05:23.380 | tool using archive search run, and then pass in this tool input parameter, okay?

00:05:30.280 | So it would look like this.

00:05:31.540 | So it would use the tool like this, and this is the output that it would get, okay?

00:05:36.740 | These are the context that we've returned.

00:05:39.140 | Now we're using anthropic models here, and anthropic models, they work very well with

00:05:45.560 | what is called an XML agent.

00:05:47.180 | An XML agent, it's using a slightly different format to other agents in that we, one, have

00:05:54.620 | the input defined like this, and we also have tool usage defined like this.

00:06:00.740 | So using these XML tags.

00:06:02.680 | So we need to first initialize our prompt for that.

00:06:07.960 | So I'm going to use from the long chain hub, Harrison has put together this XML agent convo

00:06:15.060 | prompt.

00:06:16.060 | So I'm going to initialize that.

00:06:17.060 | Okay?

00:06:18.060 | And you can see that's defining, that's the way that we want the agent to use that

00:06:24.480 | XML format here.

00:06:26.720 | We have observations, and then final answer is being instructed to provide a final answer

00:06:34.200 | using, again, these XML tags.

00:06:37.320 | Okay.

00:06:38.320 | Now, what we also need to do is initialize our anthropic chat LLM.

00:06:45.500 | This is slightly different.

00:06:46.540 | So now we're using the line chain anthropic package rather than I think before we were

00:06:50.920 | using something like line chain.community, or sorry, it would be line chain_community.anthropic.chatmodels.

00:07:00.580 | Something along those lines.

00:07:02.620 | Now we are using the line chain anthropic library directly, and we need to do that in

00:07:07.420 | order to use the Cloud 3 Opus model.

00:07:10.180 | If we use the old method and try and use this, we're going to get an error.

00:07:14.440 | Something along the lines of the messages format is incorrect.

00:07:19.200 | So we do need to use this.

00:07:21.540 | We do need a anthropic API key.

00:07:23.340 | So let's go and get that.

00:07:24.420 | We go to console.anthropic.com.

00:07:26.420 | I'm going to go to get API keys.

00:07:29.860 | I will create a new key.

00:07:32.500 | We're going to call this rag demo.

00:07:36.660 | And I'm just going to paste that key into here.

00:07:39.780 | Okay.

00:07:40.780 | Cool.

00:07:41.780 | One thing you can do here if you would like faster response times, you can use SONET rather

00:07:48.180 | than Opus.

00:07:50.020 | But for now, I think we're pretty good.

00:07:53.020 | So we've initialized that.

00:07:54.860 | We have a few intermediate steps that we need in order to basically support the XML format

00:08:00.840 | that this agent requires.

00:08:03.120 | So we add that and also convert the tool names into the format, again, that we need for this

00:08:10.860 | agent.

00:08:13.060 | And then what we do is we initialize the inputs to our agent here.

00:08:18.540 | Okay?

00:08:19.540 | So we have our inputs.

00:08:21.700 | That is how everything is being pulled into the agent pipeline.

00:08:26.500 | And they are piped into our prompt transformation here, which we defined with the convert tools

00:08:33.300 | up there.

00:08:34.860 | Then they are piped into the LLM.

00:08:39.420 | The LLM will stop whenever we hit either tool input or final answer, like the ending tags.

00:08:45.940 | You can see that here.

00:08:48.060 | And then the output from that is piped into our XML agent output parser.

00:08:54.100 | So that we get, you know, like a nice format at the end there that we can actually work

00:08:57.980 | with.

00:08:58.980 | So that is kind of like our agent flow.

00:09:02.180 | Then we need to pass our agent flow as an agent into our agent executor alongside our

00:09:07.540 | tools.

00:09:08.540 | And we can either say verbose is true or false.

00:09:12.140 | It's kind of up to you.

00:09:14.460 | Just so that we can see everything, I'm going to say it's true.

00:09:16.700 | It will just print out a ton of stuff, essentially.

00:09:19.940 | Now, let's try our agent.

00:09:22.580 | For now, there's going to be no chat history.

00:09:24.700 | We'll add that after in a moment.

00:09:27.820 | So let's try.

00:09:31.100 | So it's using the archive search tool.

00:09:33.180 | My question is, can you tell me about Llama 2?

00:09:35.020 | That's good.

00:09:36.020 | It should do.

00:09:37.620 | The context I got back are the same as what we saw before.

00:09:43.020 | Okay.

00:09:44.020 | So this blue bit here, they're the return context.

00:09:48.620 | And then we get our final answer.

00:09:50.260 | That actually did take a little bit of time there, 36 seconds.

00:09:53.340 | So we get a final answer here.

00:09:55.540 | And we can actually read it here as well.

00:09:56.780 | So Llama 2 is a large-diameter model developed by Meta.ai.

00:10:00.180 | Good.

00:10:01.180 | Code Llama is a version of Llama 2.

00:10:05.620 | Yeah, it's all good, all good, all good.

00:10:09.860 | Okay.

00:10:10.860 | Cool.

00:10:11.860 | I mean, that all looks pretty good.

00:10:13.900 | I think, yeah, there's nothing weird there.

00:10:17.700 | What we now might want to do, okay, so right now we don't have that chat history, which

00:10:22.340 | means our model, our agent is stateless, can't refer to previous interactions.

00:10:28.460 | So what we need to do is add those previous interactions in a way of maintaining the state

00:10:34.020 | of the conversation over multiple interactions.

00:10:37.260 | So we're going to be using conversational memory object, or conversational buffer window

00:10:41.740 | memory object from LangChain.

00:10:44.980 | So we initialize that.

00:10:47.260 | And at the moment, our conversational memory is, of course, empty because we haven't added

00:10:52.140 | anything to it yet.

00:10:53.860 | So what we can do is use the addUserMessage and addAIMessage methods to add some memory

00:11:00.460 | or messages into that.

00:11:01.820 | So now if we print that out, we can see that we have a human message, AI message, and that

00:11:06.620 | is it.

00:11:07.620 | We've only had those two messages so far.

00:11:09.540 | So let's see how we can feed that into our XML Cloud 3 agent.

00:11:17.780 | So we can't send these messages into the agent directly.

00:11:21.580 | Instead, we need to pass a string in this format, okay, human and AI.

00:11:27.100 | So we're going to create a new function here, memory to string, that takes our conversational

00:11:32.340 | buffer window memory, retrieves those messages, and then formats them into that format that

00:11:39.020 | we need.

00:11:40.020 | So with human and AI, and returns a string of that.

00:11:44.600 | So let's run this, and let's see what we get when we print that out.

00:11:48.700 | Okay.

00:11:49.700 | So we get human, AI.

00:11:51.700 | Cool.

00:11:52.700 | Looks good.

00:11:54.500 | Now what we want to do is rather than, you know, because we're going to have to do this

00:11:59.780 | code every time where we're taking or we're invoking our agent and then doing this.

00:12:04.920 | So rather than just repeating that every time, let's wrap that all into a single function

00:12:10.940 | called chat, and we'll use this to call our agent and also maintain our conversation state.

00:12:18.020 | So we have our agent execute to invoke as we did before, we're passing in our conversational

00:12:23.620 | memory, and then after that, we are adding the previous interactions to our conversational

00:12:30.140 | memory for any future interactions.

00:12:33.020 | So we run that, and let's see what we get.

00:12:38.500 | Okay.

00:12:39.500 | So what I'm doing here, we've already spoken about LLAMA2 with the model, we're asking

00:12:45.980 | about LLAMA2, but now I'm asking a question that doesn't specify that I'm talking about

00:12:49.740 | LLAMA2.

00:12:50.740 | Okay.

00:12:51.740 | I'm saying was any red teaming done with the model?

00:12:54.540 | I don't say LLAMA2 at all here.

00:12:57.960 | So the model needs to be able to look at those previous interactions in order to understand

00:13:02.180 | that I'm actually asking about LLAMA2, and in order to perform the search with LLAMA2

00:13:08.500 | in there.

00:13:09.500 | Because if you just do a search with red teaming the model, the results are not going to be

00:13:14.240 | specific to LLAMA2.

00:13:17.740 | So fortunately, the model actually handles this pretty nicely, and you can see that the

00:13:22.580 | input it gives to the tool here is LLAMA2 red teaming, so it's looking at those previous

00:13:27.620 | interactions correctly and pulling in that information.

00:13:32.700 | And we see in blue the results that we're getting, and we see that after conducting

00:13:37.100 | red team exercises, we asked participants who had also participated in LLAMA2 chat exercises

00:13:43.340 | to also provide a qualitative assessment of safety capabilities of the model.

00:13:47.540 | Okay.

00:13:48.540 | So you can see that seems pretty relevant.

00:13:51.620 | So let's come down here and have a look at the final answer that they got or that the

00:13:56.580 | model produced.

00:13:57.580 | So it said, "Yes, Meta-AI conducted red teaming exercises on LLAMA2 and Co-LLAMA models.

00:14:03.580 | They conducted three red teaming exercises with 25 Meta employees, including domain experts

00:14:09.400 | in responsible AI, malware development, and offensive security engineering."

00:14:13.340 | And you can see, okay, I think there's more information here.

00:14:17.740 | I think it all looks pretty good, to be honest.

00:14:19.540 | So as I look at the summary, "Meta-AI did conduct red teaming to proactively identify

00:14:24.620 | risks in those models, uncovering some potential issues, especially with ambiguous prompts."

00:14:30.540 | Let's also put any risks in perspective.

00:14:32.180 | All right.

00:14:33.180 | I mean, it looks pretty good.

00:14:34.780 | That's a good answer.

00:14:35.780 | Actually, probably one of the best answers I've seen from running this sort of test.

00:14:40.920 | So yeah, that is how we would put together a conversational agent that has access to

00:14:47.840 | external memory using our knowledge base and is also able to refer to past interactions

00:14:56.440 | in order to formulate responses and have that more sort of conversational nature.

00:15:00.600 | I mean, it's a pretty simple version, but it works pretty well.

00:15:04.960 | So yeah, I hope this is useful in just kind of putting together your own Cloud 3 Opus

00:15:12.900 | or Cloud 3 Sonnet RAG agents.

00:15:15.820 | But for now, that's all I have on Cloud 3 agents.

00:15:20.060 | So I will leave it there.

00:15:21.060 | I hope this has been useful and interesting.

00:15:23.780 | So thank you very much for watching, and I will see you again in the next one.

00:15:26.680 | Bye.

00:15:27.400 | [MUSIC PLAYING]

00:15:29.400 | [END PLAYBACK]

Claude 3 Opus RAG Chatbot (Full Walkthrough)

Chapters