New LangChain XML Agents

00:00:00.000 | Today we're going to be taking a look at one of the alternative agent types that you can use in the

00:00:04.960 | lang chain and specifically we'd want to use this agent with anthropic LLMs. So we're going to be

00:00:11.440 | taking a look at this agent it's called the XML agent and we're going to see how to use it with

00:00:16.560 | some simple tools. We're going to be adding in a rag pipeline using pyngon serverless and we're also

00:00:23.680 | going to be using coheres embed v3 embeddings so three relatively new models and services there so

00:00:33.680 | that will be pretty interesting and we'll see what we get. So we're going to start here on the

00:00:38.080 | agent types page on in the lang chain docs so we come down to here and we can see some information

00:00:44.160 | okay so you have like the openai ones here and then we have xml right so this is one we're going

00:00:48.400 | to focus on and it literally says if you're using anthropic models or other models good at xml so

00:00:54.080 | maybe they have a example of xml somewhere which yes would look like this so you can see that the

00:01:02.720 | different format that xml uses is literally like xml with the sort of html like tags you have the

00:01:11.280 | tool name here so that's like the action that you would get within the json that you'd pass

00:01:17.040 | through a react agent you have the tool input that's the action input and this would be the

00:01:24.480 | response that the agent will get right then it must answer like this so it passes the final

00:01:31.600 | answer tags like so it's a little more compact than the react approach and obviously with models

00:01:39.760 | that have been trained to use this it's going to work better it's going to be more reliable which

00:01:44.080 | is important so yeah it's a good thing to use especially if you are using anthropic models

00:01:52.000 | so for the example notebook i'm in the pinecone examples learn generation line chain i've started

00:01:59.280 | a new directory for the v1 stuff because we're using line chain v1 here which again that's

00:02:05.360 | another kind of new thing so we're going to go to xml agents and we're going to go to open in colab

00:02:13.600 | cool so we're here i'm going to connect and yeah we can start going through it so these are the

00:02:19.840 | versions that we're using so we have line chain long chain community line chain hub where we're

00:02:24.480 | going to get the prompt from for this model anthropic as mentioned cohere pinecone client

00:02:29.760 | and hugging face data sets a few libraries here but all pretty lightweight so okay okay cool

00:02:38.400 | uh i think this is probably fine and here we are so oh this is another new thing

00:02:45.280 | so we also have the ai archive 2 data set now if you've been watching a few of my videos you

00:02:52.960 | will have seen me using the ai archive data set this is a new one uh so if i go over here i just

00:03:00.080 | came off the notebook but if you come over here you can kind of see what that looks like

00:03:06.000 | we have a lot more data in there now and that's also growing i literally have the process running

00:03:13.120 | right now pulling in more archive papers and yeah it's a lot cleaner the the text that's been

00:03:22.160 | processed using a the yolo x model and the unstructured library for that that's why it

00:03:29.200 | takes so long to actually it's been processing this for like a week uh but yeah it's a much

00:03:36.000 | better quality which is great so that's exciting and yeah let's uh so let's start here we'll get

00:03:44.160 | this little warning we can ignore it it's not they just want us they basically want us to use the

00:03:49.760 | token but we don't need the token for this data set so sorry hugging face now while we're waiting

00:03:57.200 | for that we can go get our first api key we need a couple so let's go over to cohere so it's

00:04:04.160 | dashboard

00:04:05.680 | dot cohere dot com go over to api keys and you create an api key so just copy your api key and

00:04:16.640 | we're gonna get to where you enter in a moment so we've just downloaded just 20 000 rows here

00:04:22.240 | we don't need tons for this example and obviously it costs more and it's gonna take a while if we

00:04:28.640 | download the full data set as well so i'm just taking the first 20 000 there then we're gonna

00:04:36.400 | take the cohere api key so that's one we just got we're gonna enter that in this little nice

00:04:43.200 | little text box again great and then we can initialize the coherent beddings that we're

00:04:49.600 | going to be using so we have coherent beddings we're using the embed english v3 model so let's

00:04:55.520 | run that okay and then pinecone api key so we go app.pinecone.io and we'll end up here well you'll

00:05:06.080 | probably have like a default project so i already have my xml agents example there so i will i'm

00:05:15.120 | gonna leave it so that i don't need to wait for it to recreate everything but mainly i want to go to

00:05:22.240 | api keys here and i'm going to copy this okay and i'm going to run this out and again i'm going to

00:05:28.960 | enter my api key okay cool so that looks good i am using pinecone serverless here so i would

00:05:37.600 | recommend doing the same it's what one you get i think it's a hundred dollars right now obviously

00:05:44.560 | i don't know when you're watching this but as i recall this you get a hundred dollars

00:05:48.080 | uh free credits there and i know that very soon there will also be a free tier for serverless

00:05:55.760 | and it's just when when you do if you come to paying if you ever do i mean you have a hundred

00:06:04.000 | dollars so probably not for a while anyway uh you it's like nothing it's crazy cheap so anyway uh

00:06:13.040 | yes so this is how we would create an embedding maybe i should have put that further up but fine

00:06:17.840 | so this is using coherent model we're embedding documents or i'm just embedding hello and yeah

00:06:25.280 | you get this dimensional vector out of it that is the dimensionality of the coherent embedding

00:06:29.680 | model the reason i'm showing you that is because we need to use this just here when we're initializing

00:06:35.040 | our index now yeah we pass in our serverless spec if you wanted to use pods you would swap

00:06:41.200 | that for a pod spec and this is the index name you saw it in my dashboard a moment ago

00:06:47.760 | and yeah let's we can run that with the metric interestingly for the coherent models the embedded

00:06:55.680 | v3 models anyway you can actually use you can use euclidean cosine or dot product apparently

00:07:02.880 | they all give the same the same similarity which is it is kind of cool i don't know how exactly

00:07:11.280 | that is possible but that's cool interesting so yeah uh we word okay right now or when you're on

00:07:20.480 | this you should probably see zero for your total vector count it's because i already have the index

00:07:24.960 | and then after that you would just create your index like this the id we can actually actually

00:07:31.120 | do that because we have unique ideas in here now where is it we have this so yeah just a little

00:07:39.440 | quick fix there and yeah you run that uh last time i did it was 11 minutes so it's a little bit of

00:07:46.480 | time not too significant now while that is running for you let's jump over to grabbing our anthropic

00:07:55.120 | api keys so this one's always a little hard to find at least for me i always find it hard to find

00:08:00.640 | so you have to go to console anthropic.com so console.anthropic.com and you have to create an

00:08:11.600 | account if you don't already okay cool so you should get logged in i i'm gonna go to get api

00:08:20.000 | keys and yeah you can go to create key i'm gonna create a new one and we'll copy that okay so

00:08:28.560 | let's continue we're not actually going to be using the anthropic api key for a little while

00:08:34.480 | but i wanted to initialize it quickly now anyway so what we are going to be doing is setting up

00:08:41.040 | our agent or everything that our agent needs which is actually quite a few things you have to think

00:08:46.240 | okay we need our tool which is going to be our search we need our prompts we need some form of

00:08:54.320 | memory because we're going to make a conversational agent here and i think there may be some other

00:09:00.560 | thing oh the lm of course so anthropic and yeah you know there's a few things that we need there

00:09:07.040 | so let's start with our tool so slightly different syntax here to maybe what i've shown in the past

00:09:16.080 | so using now using the tool decorator tool decorator when we use it we need to make sure

00:09:20.720 | we pass a description here this description is going to be how the lm decides whether to use

00:09:27.760 | this tool or another tool or no tool so we do need something good here something descriptive

00:09:33.440 | but concise within our tool so we're going to pass a string query uh we're going to embed that

00:09:39.520 | using cohere we're going to search using pinecone making sure that we return our metadata because

00:09:46.480 | that will contain our actual plain text and then we return a single string containing all of our

00:09:54.080 | responses okay so yeah let's run that we pack that into this tools list and then what we

00:10:03.600 | need is a few different formats for this tools list and yeah so we have that and then to so when

00:10:12.080 | our agent is actually using the tool it's going to use it like this so it's going to run the tool

00:10:17.360 | and then it's going to input a query and let's say our query is can you tell me about llama 2

00:10:24.560 | okay so we're going to be asking those questions again let's see what we get so we we get a good

00:10:29.280 | response okay so this is the the output from our tool that our agent may see depending on the

00:10:38.160 | question that it that it asks so we now can go and define our xml agent so we come down to here

00:10:47.120 | you know i'm describing a little bit what i already described about you know how the xml thing

00:10:52.640 | works and here we go so we want to download a prompt so this is a xml agent conversational

00:11:04.000 | prompt and you can see here it's like okay you are a helpful assistant and then it tells it

00:11:08.640 | about the different tags that it should use the xml tags so on and so on okay so it's the it's

00:11:14.320 | what i showed you before and you can also see here that it allows a few inputs so the agent scratch

00:11:19.680 | pad it's like it's internal thoughts the input so our query and some tools okay another one that we

00:11:28.560 | may use is the chat history now which would end up somewhere around somewhere here chat history

00:11:35.680 | gets inside there so we'll need to add that as well now we get to our anthropic chat lm so we

00:11:43.440 | initialize this and we want to enter our api key that we copied from before okay so we now have

00:11:51.920 | we have our tools we have our lm we have our prompt there's a few more steps that we need

00:11:58.240 | so one thing that we need is a way of converting our intermediate steps into text in the correct

00:12:06.080 | format so this is what we get all right so this goes into the scratch pad i.e the internal thoughts

00:12:13.600 | of the model so it's basically going to take okay the the tool that was decided to be used

00:12:21.280 | uh the input to that tool and what it got from that tool okay so this is coming from in the

00:12:27.120 | intermediate steps so formats that into a nice string format for the model of the agent we have

00:12:34.480 | another one here so this is when it is so for the initial prompt how it will decide to use different

00:12:42.800 | tools so we have tool name and that maps to a particular tool description so we also need that

00:12:48.240 | so we also need that format and then with that i think we have pretty much everything uh yeah so

00:12:55.840 | yeah you can see so this is like the agent logic itself so the input that is going into the agent

00:13:04.000 | then we can see the tool descriptions that are being passed into there and then we have this

00:13:12.560 | so this is telling our lm when it sees tool input tell it the ending tool input tag or the ending

00:13:19.440 | final answer tag it should stop and we should use the xml output agent output parser okay which is

00:13:26.240 | just going to pass whatever the agent is generated into something that is usable okay so yeah we we

00:13:33.680 | have that one thing i should know is that you could technically remove this but later on what

00:13:41.200 | you will see is that the agent when deciding what tools to use and what information passes to those

00:13:46.480 | tools will have no context of what's happened before so it's not a very good conversational

00:13:51.360 | agent so you basically you do want to have that in there otherwise you're gonna run into issues

00:13:58.080 | okay so that's our agent logic definition and now we need to define our agent executor there's a few

00:14:06.720 | steps to this i know we're there now so we define our agent executor we pass in the agent logic that

00:14:12.720 | we just uh defined we pass in the tools and we're going to set verbose to true so that we can see

00:14:18.640 | what's happening when we're running everything and now what we do is we invoke the agent executor

00:14:25.760 | we pass out input and chat history we don't have any chat history right now we'll handle that soon

00:14:32.160 | and yeah let's see what we get so we're just passing this input in can you tell me about

00:14:36.800 | llama 2 and we'll see what happens okay so we can see that uses the archive search tool

00:14:41.680 | the input is llama 2 it's a little bit weird because we're dropping that end token but that's

00:14:47.840 | fine and then you can see it this blue text here is what is returned from the tool okay so it's

00:14:52.960 | the observation then it decides okay i'm going to use the final answer now i'm going to generate

00:14:59.520 | well generate a final answer and this is the final answer that it generates okay this is what gets

00:15:06.080 | returned to us and we can see here this is the output right based on the information provided

00:15:10.640 | so on and so on and okay uh so it does get get the answer okay it's not a hard one to get an answer

00:15:18.160 | for so that's good now we would like to add some conversational memory here okay we didn't we don't

00:15:24.400 | have any right now we just have chat history it's empty so let's do that we will use a conversational

00:15:32.160 | buffer window memory it's like the super basic one there are obviously many other ways of

00:15:38.880 | implementing memory as well but this one's just nice and easy and what we will do to start with

00:15:45.920 | is create some chat history okay so actually why am i so i want to use a message into here

00:15:52.880 | and again we're going to start with no chat history okay so we just get fine lines straight

00:15:57.600 | away it doesn't need to use the tool here now we need to extract what we have here and create some

00:16:05.120 | chat history with it because right now we haven't connected that conversational memory to our agent

00:16:12.240 | whatsoever and we don't connect it directly we instead what we will do is we'll use these

00:16:17.600 | methods add user message and add ai message to manually add everything okay but we're going to

00:16:24.480 | wrap it up into a nice little function soon so after we do that we'll see that our conversational

00:16:30.320 | memory now does have some history in there so we have this okay that's great but what we actually

00:16:37.520 | need for our xml agent is conversational memory it looks like this so we need a string in this format

00:16:44.880 | and you know we're not far off with this it's it's not exactly hard to pass so let's create a

00:16:51.920 | helper function to help us do that so our memory input into this is going to be the conversation

00:16:57.520 | buffer window memory object we extract the messages so basically what you see here we're going to

00:17:04.240 | extract those and we're going to create a list with human and ai depending on whether it's a human

00:17:10.960 | message or not which would be an ai message and then we're just going to join those together to

00:17:16.720 | create a single string in the format that we need okay and now if we print that we see that we get

00:17:21.760 | the format that we need cool so let's wrap all that into yet another helper function called chat

00:17:28.960 | and this is going to help us deal with the state of our agent or keeping maintaining state in our

00:17:34.640 | agent so run that and let's continue the conversation now we're going to say can you tell

00:17:40.880 | me about llama 2 okay cool so we can see the typical stuff here and it outputs this which is

00:17:50.880 | it's actually a pretty nice summary so then we want to continue and i'm going to say was any

00:17:58.960 | red teaming done and the reason so the reason i'm asking this is this is a hard question at

00:18:04.800 | least it has been in the past with the old data set so we should hopefully get something better

00:18:10.000 | now because it's cleaner and we're also using these you know these different agents so let's

00:18:14.960 | see what we get now one thing that you will notice here is i'm not saying llama 2 so this the context

00:18:23.200 | to this question relies on our conversational memory and you can see that it works right so

00:18:29.120 | we decide to use the archive search tool and the the look at the query it's llama 2 red team i

00:18:35.360 | didn't mention llama 2 here i mentioned llama 2 in the previous interaction so it's looking at the

00:18:40.000 | previous conversational memory and pulling that in to the query and that's good because we actually

00:18:47.280 | get some relevant context here so you can see okay risk score output by llama 2 safety reward

00:18:53.040 | model on prompts so on and so on okay cool and we can see we come down to here it says yes red

00:19:00.480 | teaming was done on llama 2 models to evaluate risk from evaluating generating malicious code

00:19:04.880 | so on and so on which is this is the best response i've had on this question so far by by a long

00:19:15.280 | shot usually it's pretty bad and maybe that's a data set issue but it's also we're using a good

00:19:20.560 | agent here so that is it for this little tutorial on html agents with lang chain v1 we've gone

00:19:30.560 | through a few things using a few different models which has been interesting and cool so one we use

00:19:37.920 | pinecone serverless which obviously kind of new and interesting we also use anthropic for the llm

00:19:45.360 | and we also use coherent bennings and all those together made something that works pretty well

00:19:51.520 | in my opinion so yeah that's it for this video i hope this has been useful and interesting but for

00:19:59.280 | now i'll leave it there so thank you very much for watching and i will see you again in the next one

00:20:05.600 | bye

New LangChain XML Agents

Chapters