back to indexNew LangChain XML Agents
Chapters
0:0 LangChain v1 XML Agents
0:36 LangChain Agent Types
2:13 LangChain Python Setup
2:43 AI ArXiv 2 Dataset
4:34 Building Index with Cohere and Pinecone
8:37 Building a LangChain XML Agent
15:19 Giving our Agent Conversation Memory
19:21 XML Agents Conclusion
00:00:00.000 |
Today we're going to be taking a look at one of the alternative agent types that you can use in the 00:00:04.960 |
lang chain and specifically we'd want to use this agent with anthropic LLMs. So we're going to be 00:00:11.440 |
taking a look at this agent it's called the XML agent and we're going to see how to use it with 00:00:16.560 |
some simple tools. We're going to be adding in a rag pipeline using pyngon serverless and we're also 00:00:23.680 |
going to be using coheres embed v3 embeddings so three relatively new models and services there so 00:00:33.680 |
that will be pretty interesting and we'll see what we get. So we're going to start here on the 00:00:38.080 |
agent types page on in the lang chain docs so we come down to here and we can see some information 00:00:44.160 |
okay so you have like the openai ones here and then we have xml right so this is one we're going 00:00:48.400 |
to focus on and it literally says if you're using anthropic models or other models good at xml so 00:00:54.080 |
maybe they have a example of xml somewhere which yes would look like this so you can see that the 00:01:02.720 |
different format that xml uses is literally like xml with the sort of html like tags you have the 00:01:11.280 |
tool name here so that's like the action that you would get within the json that you'd pass 00:01:17.040 |
through a react agent you have the tool input that's the action input and this would be the 00:01:24.480 |
response that the agent will get right then it must answer like this so it passes the final 00:01:31.600 |
answer tags like so it's a little more compact than the react approach and obviously with models 00:01:39.760 |
that have been trained to use this it's going to work better it's going to be more reliable which 00:01:44.080 |
is important so yeah it's a good thing to use especially if you are using anthropic models 00:01:52.000 |
so for the example notebook i'm in the pinecone examples learn generation line chain i've started 00:01:59.280 |
a new directory for the v1 stuff because we're using line chain v1 here which again that's 00:02:05.360 |
another kind of new thing so we're going to go to xml agents and we're going to go to open in colab 00:02:13.600 |
cool so we're here i'm going to connect and yeah we can start going through it so these are the 00:02:19.840 |
versions that we're using so we have line chain long chain community line chain hub where we're 00:02:24.480 |
going to get the prompt from for this model anthropic as mentioned cohere pinecone client 00:02:29.760 |
and hugging face data sets a few libraries here but all pretty lightweight so okay okay cool 00:02:38.400 |
uh i think this is probably fine and here we are so oh this is another new thing 00:02:45.280 |
so we also have the ai archive 2 data set now if you've been watching a few of my videos you 00:02:52.960 |
will have seen me using the ai archive data set this is a new one uh so if i go over here i just 00:03:00.080 |
came off the notebook but if you come over here you can kind of see what that looks like 00:03:06.000 |
we have a lot more data in there now and that's also growing i literally have the process running 00:03:13.120 |
right now pulling in more archive papers and yeah it's a lot cleaner the the text that's been 00:03:22.160 |
processed using a the yolo x model and the unstructured library for that that's why it 00:03:29.200 |
takes so long to actually it's been processing this for like a week uh but yeah it's a much 00:03:36.000 |
better quality which is great so that's exciting and yeah let's uh so let's start here we'll get 00:03:44.160 |
this little warning we can ignore it it's not they just want us they basically want us to use the 00:03:49.760 |
token but we don't need the token for this data set so sorry hugging face now while we're waiting 00:03:57.200 |
for that we can go get our first api key we need a couple so let's go over to cohere so it's 00:04:05.680 |
dot cohere dot com go over to api keys and you create an api key so just copy your api key and 00:04:16.640 |
we're gonna get to where you enter in a moment so we've just downloaded just 20 000 rows here 00:04:22.240 |
we don't need tons for this example and obviously it costs more and it's gonna take a while if we 00:04:28.640 |
download the full data set as well so i'm just taking the first 20 000 there then we're gonna 00:04:36.400 |
take the cohere api key so that's one we just got we're gonna enter that in this little nice 00:04:43.200 |
little text box again great and then we can initialize the coherent beddings that we're 00:04:49.600 |
going to be using so we have coherent beddings we're using the embed english v3 model so let's 00:04:55.520 |
run that okay and then pinecone api key so we go app.pinecone.io and we'll end up here well you'll 00:05:06.080 |
probably have like a default project so i already have my xml agents example there so i will i'm 00:05:15.120 |
gonna leave it so that i don't need to wait for it to recreate everything but mainly i want to go to 00:05:22.240 |
api keys here and i'm going to copy this okay and i'm going to run this out and again i'm going to 00:05:28.960 |
enter my api key okay cool so that looks good i am using pinecone serverless here so i would 00:05:37.600 |
recommend doing the same it's what one you get i think it's a hundred dollars right now obviously 00:05:44.560 |
i don't know when you're watching this but as i recall this you get a hundred dollars 00:05:48.080 |
uh free credits there and i know that very soon there will also be a free tier for serverless 00:05:55.760 |
and it's just when when you do if you come to paying if you ever do i mean you have a hundred 00:06:04.000 |
dollars so probably not for a while anyway uh you it's like nothing it's crazy cheap so anyway uh 00:06:13.040 |
yes so this is how we would create an embedding maybe i should have put that further up but fine 00:06:17.840 |
so this is using coherent model we're embedding documents or i'm just embedding hello and yeah 00:06:25.280 |
you get this dimensional vector out of it that is the dimensionality of the coherent embedding 00:06:29.680 |
model the reason i'm showing you that is because we need to use this just here when we're initializing 00:06:35.040 |
our index now yeah we pass in our serverless spec if you wanted to use pods you would swap 00:06:41.200 |
that for a pod spec and this is the index name you saw it in my dashboard a moment ago 00:06:47.760 |
and yeah let's we can run that with the metric interestingly for the coherent models the embedded 00:06:55.680 |
v3 models anyway you can actually use you can use euclidean cosine or dot product apparently 00:07:02.880 |
they all give the same the same similarity which is it is kind of cool i don't know how exactly 00:07:11.280 |
that is possible but that's cool interesting so yeah uh we word okay right now or when you're on 00:07:20.480 |
this you should probably see zero for your total vector count it's because i already have the index 00:07:24.960 |
and then after that you would just create your index like this the id we can actually actually 00:07:31.120 |
do that because we have unique ideas in here now where is it we have this so yeah just a little 00:07:39.440 |
quick fix there and yeah you run that uh last time i did it was 11 minutes so it's a little bit of 00:07:46.480 |
time not too significant now while that is running for you let's jump over to grabbing our anthropic 00:07:55.120 |
api keys so this one's always a little hard to find at least for me i always find it hard to find 00:08:00.640 |
so you have to go to console anthropic.com so console.anthropic.com and you have to create an 00:08:11.600 |
account if you don't already okay cool so you should get logged in i i'm gonna go to get api 00:08:20.000 |
keys and yeah you can go to create key i'm gonna create a new one and we'll copy that okay so 00:08:28.560 |
let's continue we're not actually going to be using the anthropic api key for a little while 00:08:34.480 |
but i wanted to initialize it quickly now anyway so what we are going to be doing is setting up 00:08:41.040 |
our agent or everything that our agent needs which is actually quite a few things you have to think 00:08:46.240 |
okay we need our tool which is going to be our search we need our prompts we need some form of 00:08:54.320 |
memory because we're going to make a conversational agent here and i think there may be some other 00:09:00.560 |
thing oh the lm of course so anthropic and yeah you know there's a few things that we need there 00:09:07.040 |
so let's start with our tool so slightly different syntax here to maybe what i've shown in the past 00:09:16.080 |
so using now using the tool decorator tool decorator when we use it we need to make sure 00:09:20.720 |
we pass a description here this description is going to be how the lm decides whether to use 00:09:27.760 |
this tool or another tool or no tool so we do need something good here something descriptive 00:09:33.440 |
but concise within our tool so we're going to pass a string query uh we're going to embed that 00:09:39.520 |
using cohere we're going to search using pinecone making sure that we return our metadata because 00:09:46.480 |
that will contain our actual plain text and then we return a single string containing all of our 00:09:54.080 |
responses okay so yeah let's run that we pack that into this tools list and then what we 00:10:03.600 |
need is a few different formats for this tools list and yeah so we have that and then to so when 00:10:12.080 |
our agent is actually using the tool it's going to use it like this so it's going to run the tool 00:10:17.360 |
and then it's going to input a query and let's say our query is can you tell me about llama 2 00:10:24.560 |
okay so we're going to be asking those questions again let's see what we get so we we get a good 00:10:29.280 |
response okay so this is the the output from our tool that our agent may see depending on the 00:10:38.160 |
question that it that it asks so we now can go and define our xml agent so we come down to here 00:10:47.120 |
you know i'm describing a little bit what i already described about you know how the xml thing 00:10:52.640 |
works and here we go so we want to download a prompt so this is a xml agent conversational 00:11:04.000 |
prompt and you can see here it's like okay you are a helpful assistant and then it tells it 00:11:08.640 |
about the different tags that it should use the xml tags so on and so on okay so it's the it's 00:11:14.320 |
what i showed you before and you can also see here that it allows a few inputs so the agent scratch 00:11:19.680 |
pad it's like it's internal thoughts the input so our query and some tools okay another one that we 00:11:28.560 |
may use is the chat history now which would end up somewhere around somewhere here chat history 00:11:35.680 |
gets inside there so we'll need to add that as well now we get to our anthropic chat lm so we 00:11:43.440 |
initialize this and we want to enter our api key that we copied from before okay so we now have 00:11:51.920 |
we have our tools we have our lm we have our prompt there's a few more steps that we need 00:11:58.240 |
so one thing that we need is a way of converting our intermediate steps into text in the correct 00:12:06.080 |
format so this is what we get all right so this goes into the scratch pad i.e the internal thoughts 00:12:13.600 |
of the model so it's basically going to take okay the the tool that was decided to be used 00:12:21.280 |
uh the input to that tool and what it got from that tool okay so this is coming from in the 00:12:27.120 |
intermediate steps so formats that into a nice string format for the model of the agent we have 00:12:34.480 |
another one here so this is when it is so for the initial prompt how it will decide to use different 00:12:42.800 |
tools so we have tool name and that maps to a particular tool description so we also need that 00:12:48.240 |
so we also need that format and then with that i think we have pretty much everything uh yeah so 00:12:55.840 |
yeah you can see so this is like the agent logic itself so the input that is going into the agent 00:13:04.000 |
then we can see the tool descriptions that are being passed into there and then we have this 00:13:12.560 |
so this is telling our lm when it sees tool input tell it the ending tool input tag or the ending 00:13:19.440 |
final answer tag it should stop and we should use the xml output agent output parser okay which is 00:13:26.240 |
just going to pass whatever the agent is generated into something that is usable okay so yeah we we 00:13:33.680 |
have that one thing i should know is that you could technically remove this but later on what 00:13:41.200 |
you will see is that the agent when deciding what tools to use and what information passes to those 00:13:46.480 |
tools will have no context of what's happened before so it's not a very good conversational 00:13:51.360 |
agent so you basically you do want to have that in there otherwise you're gonna run into issues 00:13:58.080 |
okay so that's our agent logic definition and now we need to define our agent executor there's a few 00:14:06.720 |
steps to this i know we're there now so we define our agent executor we pass in the agent logic that 00:14:12.720 |
we just uh defined we pass in the tools and we're going to set verbose to true so that we can see 00:14:18.640 |
what's happening when we're running everything and now what we do is we invoke the agent executor 00:14:25.760 |
we pass out input and chat history we don't have any chat history right now we'll handle that soon 00:14:32.160 |
and yeah let's see what we get so we're just passing this input in can you tell me about 00:14:36.800 |
llama 2 and we'll see what happens okay so we can see that uses the archive search tool 00:14:41.680 |
the input is llama 2 it's a little bit weird because we're dropping that end token but that's 00:14:47.840 |
fine and then you can see it this blue text here is what is returned from the tool okay so it's 00:14:52.960 |
the observation then it decides okay i'm going to use the final answer now i'm going to generate 00:14:59.520 |
well generate a final answer and this is the final answer that it generates okay this is what gets 00:15:06.080 |
returned to us and we can see here this is the output right based on the information provided 00:15:10.640 |
so on and so on and okay uh so it does get get the answer okay it's not a hard one to get an answer 00:15:18.160 |
for so that's good now we would like to add some conversational memory here okay we didn't we don't 00:15:24.400 |
have any right now we just have chat history it's empty so let's do that we will use a conversational 00:15:32.160 |
buffer window memory it's like the super basic one there are obviously many other ways of 00:15:38.880 |
implementing memory as well but this one's just nice and easy and what we will do to start with 00:15:45.920 |
is create some chat history okay so actually why am i so i want to use a message into here 00:15:52.880 |
and again we're going to start with no chat history okay so we just get fine lines straight 00:15:57.600 |
away it doesn't need to use the tool here now we need to extract what we have here and create some 00:16:05.120 |
chat history with it because right now we haven't connected that conversational memory to our agent 00:16:12.240 |
whatsoever and we don't connect it directly we instead what we will do is we'll use these 00:16:17.600 |
methods add user message and add ai message to manually add everything okay but we're going to 00:16:24.480 |
wrap it up into a nice little function soon so after we do that we'll see that our conversational 00:16:30.320 |
memory now does have some history in there so we have this okay that's great but what we actually 00:16:37.520 |
need for our xml agent is conversational memory it looks like this so we need a string in this format 00:16:44.880 |
and you know we're not far off with this it's it's not exactly hard to pass so let's create a 00:16:51.920 |
helper function to help us do that so our memory input into this is going to be the conversation 00:16:57.520 |
buffer window memory object we extract the messages so basically what you see here we're going to 00:17:04.240 |
extract those and we're going to create a list with human and ai depending on whether it's a human 00:17:10.960 |
message or not which would be an ai message and then we're just going to join those together to 00:17:16.720 |
create a single string in the format that we need okay and now if we print that we see that we get 00:17:21.760 |
the format that we need cool so let's wrap all that into yet another helper function called chat 00:17:28.960 |
and this is going to help us deal with the state of our agent or keeping maintaining state in our 00:17:34.640 |
agent so run that and let's continue the conversation now we're going to say can you tell 00:17:40.880 |
me about llama 2 okay cool so we can see the typical stuff here and it outputs this which is 00:17:50.880 |
it's actually a pretty nice summary so then we want to continue and i'm going to say was any 00:17:58.960 |
red teaming done and the reason so the reason i'm asking this is this is a hard question at 00:18:04.800 |
least it has been in the past with the old data set so we should hopefully get something better 00:18:10.000 |
now because it's cleaner and we're also using these you know these different agents so let's 00:18:14.960 |
see what we get now one thing that you will notice here is i'm not saying llama 2 so this the context 00:18:23.200 |
to this question relies on our conversational memory and you can see that it works right so 00:18:29.120 |
we decide to use the archive search tool and the the look at the query it's llama 2 red team i 00:18:35.360 |
didn't mention llama 2 here i mentioned llama 2 in the previous interaction so it's looking at the 00:18:40.000 |
previous conversational memory and pulling that in to the query and that's good because we actually 00:18:47.280 |
get some relevant context here so you can see okay risk score output by llama 2 safety reward 00:18:53.040 |
model on prompts so on and so on okay cool and we can see we come down to here it says yes red 00:19:00.480 |
teaming was done on llama 2 models to evaluate risk from evaluating generating malicious code 00:19:04.880 |
so on and so on which is this is the best response i've had on this question so far by by a long 00:19:15.280 |
shot usually it's pretty bad and maybe that's a data set issue but it's also we're using a good 00:19:20.560 |
agent here so that is it for this little tutorial on html agents with lang chain v1 we've gone 00:19:30.560 |
through a few things using a few different models which has been interesting and cool so one we use 00:19:37.920 |
pinecone serverless which obviously kind of new and interesting we also use anthropic for the llm 00:19:45.360 |
and we also use coherent bennings and all those together made something that works pretty well 00:19:51.520 |
in my opinion so yeah that's it for this video i hope this has been useful and interesting but for 00:19:59.280 |
now i'll leave it there so thank you very much for watching and i will see you again in the next one