back to indexLocal LangGraph Agents with Llama 3.1 + Ollama
Chapters
0:0 Local Agents with LangGraph and Ollama
1:0 Setting up Ollama and Python
5:35 Reddit API Tool
12:40 Overview of the Graph
17:11 Final Answer Tool
18:33 Agent State
19:9 Ollama Llama 3.1 Setup
26:21 Organizing Agent Tool Use
35:21 Creating Agent Nodes
39:14 Building the Agent Graph
43:10 Testing the Llama 3.1 Agent
46:7 Final Notes on Local Agents
00:00:00.000 |
Today we are going to be taking a look at how we can build our own 00:00:06.440 |
Langraph with a llama now line graph if you don't know it is a open source library from Lang chain 00:00:24.320 |
Preferred way of building agents whether they are on open AI or local wherever now 00:00:30.960 |
Alarma is another open source project that allows us to run 00:00:40.120 |
So we're going to be running llama 3.1 the 8 billion parameter model 00:00:45.840 |
Which is tiny and yet I get like reasonable responses and reasonable actions 00:00:54.120 |
Coming from my agent, which is actually pretty cool. So with that, let's jump straight into it 00:01:00.040 |
So we're going to be running everything locally here. This is slightly different to what I usually do 00:01:04.600 |
We usually go through like a collab notebook, and I'm sure you probably could run this in a collab notebook 00:01:09.520 |
but I want to show you how to run this locally on Mac because 00:01:13.180 |
Generally speaking alarm is it works well with Mac because we have the unified memory 00:01:19.680 |
which means we can run fairly large models quite easily and 00:01:23.160 |
Actually really fast as well. So first thing that we would want to do is go to X 00:01:27.800 |
This is supposed to be the URL and install a llama. So rather than going to the URL just type install llama 00:01:39.280 |
Download Mac OS and we just download it. I already have mine downloaded. You can see the little icon up here 00:01:49.400 |
But once you have downloaded it, you want to run this command in your terminal. So a llama pull llama 00:01:56.280 |
Actually three point one is the one I'm using not this one 00:02:01.120 |
Three point one eight B. So I'm gonna paste that in here a llama pull llama three point one eight B 00:02:08.080 |
Right, so that has downloaded the model. It will take a little while 00:02:11.440 |
It's literally I think it just maybe updated the model on my side here as I already had it installed 00:02:17.320 |
Then we're going to want to set up a Python environment. I mean, you don't have to do this. It's up to you 00:02:22.200 |
I would recommend it. So the reason I say that is because we're going to working from this 00:02:31.400 |
Repo, there'll be a link to that in the comments below. So I would one just get clone that 00:02:50.520 |
That will download the entire repo for you, which is it's quite a lot in there to be fair 00:02:56.120 |
But we only actually need this one bit and then you want to navigate into this directory 00:03:01.080 |
so if I just show you the full path to where I am from the root of 00:03:06.760 |
This directory. So I want to go one more some in learn 00:03:12.680 |
generation line chain line graph zero to olama line graph agent 00:03:21.460 |
Do LS you can see that we have this poetry lock and pyproject.toml file 00:03:26.620 |
So this all the prerequisites that we need to run this little project 00:03:32.140 |
Are contained within here, right and the way that we install these is 00:03:47.320 |
Python environment, so I'm gonna do that. I'm using conda here. I know people have opinions on which 00:03:55.280 |
Package manager they like to use I've been using this forever. So I just stick with it 00:04:00.400 |
So that was that is exactly how I would create my environment there 00:04:10.140 |
So you can now see that I'm in my new Python environment 00:04:14.140 |
Then I'm going to go ahead and install poetry. So pip install poetry then we're going to do poetry install 00:04:21.740 |
Okay, you probably will not see this when you run this 00:04:28.300 |
But just in case you can run poetry lock no update to kind of fix that and then run poetry install again 00:04:38.340 |
So you can see everything has just been installed and then with that we can go over and start running notebook 00:04:49.540 |
You can ignore this if you are going with my colab you could install everything I think with this 00:05:02.820 |
What we need to do first is some in VS code, actually, let me close the sidebar 00:05:07.300 |
And zoom in so in VS code select kernel select another 00:05:13.860 |
Python environment if you don't see your environment on this list 00:05:20.140 |
You might have to restart VS code or I'm actually in cursor, but same thing and now I'm going to try again 00:05:26.980 |
So it's picked up on my environment that I would like to use 00:05:31.980 |
So now we can go ahead and start running everything 00:05:34.500 |
so the first thing we're going to need to do is actually what we're gonna be using the 00:05:39.940 |
reddit API to get some suggestions for our agent our agent is going to be doing is 00:05:47.020 |
Recommending pizza for me in Rome and the way that's going to do that is it's going to 00:05:53.260 |
search for something in my case here is going to be searching for good pizza in Rome and 00:06:03.740 |
Decide like if you ask something like what is the best pizza in Rome? It's going to decide. Okay, I'm gonna go use 00:06:09.660 |
Reddit to search or the search tool is that's what we tell it 00:06:14.020 |
We don't tell it necessarily read it and it's going to return that information from a few different submissions on reddit 00:06:21.060 |
Find what people are saying what their recommendations are and then it's going to tell me where I should go. So 00:06:27.900 |
To use the reddit API we do need to sign up for it. You get a certain amount of free 00:06:35.660 |
So we don't need to do I don't need to pay for anything, which is great 00:06:39.980 |
you can follow this video, which is just I will leave a link again to this in the 00:06:48.980 |
You don't necessarily need to follow that and said so this is what I always do when I've 00:06:55.100 |
Done something in the past and I've kind of forgotten how to do it. I 00:06:59.340 |
search reddit API and then I just add my name on to the end and then because I 00:07:05.860 |
Created this in the past. I'm like, okay, this seems to make sense to me 00:07:10.540 |
Which is I suppose a positive thing of doing all these videos and articles 00:07:24.980 |
This is already three and a half years ago now 00:07:28.640 |
So it's a while but still everything I think is still up to date so you can go to app preferences here 00:07:43.020 |
We go here. I have these I don't even remember what these two were 00:07:50.420 |
So it's interesting and then I create this one. I 00:07:55.100 |
Think no. No, sorry. I created this one just now for this video 00:08:00.100 |
So if I go into here and see all this information here, which I'm going to use so 00:08:05.700 |
Have my secret key you can try and you can try and use this as well if you like. See what happens 00:08:15.620 |
I have my app key or ID. I'm not sure what is exactly 00:08:20.780 |
What's it client ID? Okay, and you just put them into here. So here are initializing our 00:08:26.900 |
Like reddit instance or client use agent. I think you just put the name of your app here 00:08:35.420 |
It has been quite a while. I should rewatch my video on it 00:08:44.060 |
That's what you do. And then if I scroll down a little bit more I 00:08:52.260 |
now the code part is the part that I'm not going to follow because I wanted to do this like from 00:08:57.820 |
Like using requests rather than using the pro library 00:09:02.260 |
Which is what I would probably recommend doing. It's a lot easier and 00:09:08.300 |
It's what we're going to use here. So you see we're importing Pro, which is the Python reddit API wrapper 00:09:17.580 |
What we're going to do is basically gather a loads of data. So we're going to be getting 00:09:21.980 |
Well this information actually so for each submission that we search for and find for a particular question 00:09:29.260 |
we're going to get the title of that submission the description, so I think that's like a first comment and 00:09:35.740 |
Then the sort of like more top rated comments that were within the thread of 00:09:43.620 |
That submission and then I'm going to define and it's variable here, which is basically just going to get called 00:09:50.700 |
Like let's say this is my object to call the string 00:09:57.560 |
method here, obviously we just do string right and 00:10:03.060 |
That will output a more like LM friendly representation of the information there. Okay, so 00:10:12.740 |
Defining that so that it's kind of keep things a little cleaner when we're 00:10:16.660 |
Building everything here. Otherwise, I'm using dictionaries and whatever else and that's fine 00:10:22.420 |
Nothing wrong with that, but it's a little bit not clean or not organized. It's probably the right way to put it 00:10:31.060 |
I don't want to focus too much on the reddit API stuff here because it's not really what we're here for 00:10:35.860 |
But just very quickly, right so reddit subreddits 00:10:40.340 |
We're looking through all of them and then we're searching across reddit best pizza in Rome. All right, so 00:10:48.940 |
We initialize that list we go through we get, you know relevant information. We get all the comments. We include the upvotes 00:10:55.660 |
I thought this is important and we also filter by the number of upvotes that the comments have 00:11:01.760 |
The logic here couldn't be better. We could like say, okay. These are the top rated comments here 00:11:08.500 |
I don't actually go with the top comments. I'm just going for kind of three of them that have you know, at least 20 upvotes 00:11:17.020 |
Works, but I'm you know, it could be better. But anyway, it doesn't matter. This is just 00:11:22.500 |
Like a quick example, so you can see we run this and it's going through it's like, okay. I found these submissions 00:11:29.940 |
This is Rome. Yep, since pizza is an American food. Yeah, it's a lot of Italians here 00:11:37.660 |
That would not be very happy with that. We can then see what we got out from that 00:11:43.460 |
So I'm doing the the string method here to get the recommendations. You can see title 00:12:00.300 |
Recommendations here that they are coming up with this this guy Sisyphus rock. Cool. We have another one here 00:12:11.100 |
Angering a lot of Romans right now. And yeah, we have some other ones as well 00:12:18.820 |
Okay, that's kind of what we're getting out from that tool 00:12:20.980 |
What we're going to do is just wrap all of that one in a function here. So it's all I'm doing here 00:12:24.940 |
So go in a particular query. I'm going to rerun all of what we just did again 00:12:29.580 |
I don't want to go through all this too much. This is all like, you know set up API stuff 00:12:34.340 |
It's not really the agent itself. Cool. So we have all of that 00:12:43.260 |
Might help me if I visualize this a little bit. Okay, so we have our 00:12:48.580 |
Like reddit search API tool thing here. I'm just gonna call it search 00:13:00.860 |
Langrath you will see this pattern quite a lot rather than me allowing 00:13:11.100 |
I actually like to use a structured output format and the way that I do that is using a tool 00:13:16.060 |
Well using a tool usually here. We're not technically using a tool. We're using the JSON output format 00:13:22.580 |
I'll talk a bit more about the difference there soon 00:13:25.060 |
But you can think of this as basically we're using a tool. So we have these two tools 00:13:32.100 |
Technically kind of not really but it's fine. And then we have our LLM which is I call usually called the Oracle 00:13:46.580 |
Think they called it the Oracle in some line chain documentation a long time ago, and I just like the name so I'm sticking with it 00:13:57.380 |
The decision maker right so it makes a decision based on your you know, what is going on, right? 00:14:05.540 |
So it's our user query comes in then the Oracle is like, okay based on this query, what do I need to do? 00:14:14.540 |
Alright, could I just answer the user directly right if I'm just saying? 00:14:21.460 |
Whatever else right? Let's just like small like small talk. I don't need to use I you know 00:14:28.140 |
Why would you use the search tool? There's no need 00:14:30.900 |
So instead the Oracle should be able to just go directly to the final answer tool 00:14:37.580 |
Okay, so we do give it that option and then if it goes to the final answer tool 00:14:43.260 |
It's gonna output a structured output format, which is gonna look a little bit like with the answer 00:14:49.300 |
Which is like a natural language answer and then we have some other parameters. I think it's like phone number 00:14:56.180 |
So it's like a phone number for the restaurant if the if the agent has seen that within the data 00:15:04.780 |
I think honestly using reddit comments. It probably isn't gonna come up with that, but you can you know, we can try and 00:15:17.540 |
So that will be output formats, but of course, you know the phone number and address it doesn't need to output that every time 00:15:25.180 |
Okay, so it's like they're more like optional parameters. I think in the prompting 00:15:29.580 |
We just tell the agent to keep those empty if it doesn't know 00:15:33.180 |
But answer it should provide every time now on the other hand if I ask 00:15:38.620 |
Okay, tell me where to find the best pizza in Rome in that case. All right, the the Oracle will hopefully 00:15:48.620 |
Right. So when it uses the search tool we go here 00:15:53.820 |
It will get some information and then it will actually go back with that new information 00:16:00.420 |
To the Oracle, right? So this is a like it has to go back 00:16:05.860 |
so that's why I'm making this line a solid line, whereas these lines are dotted because 00:16:11.620 |
These are like optional like it could go to finance that it could go to search 00:16:16.080 |
Once it goes to search it has to go back to the Oracle component 00:16:20.860 |
Then the Oracle component given this new information is like, okay. Now, what do I need to do? 00:16:25.960 |
Ideally, it should just go to the final answer every single time 00:16:29.900 |
Like this is a I think quite a simple agent like it is really not complex 00:16:38.260 |
But again, we're using a tiny LLM for this, right? 00:16:43.180 |
Llama 3.1 very good, but we're using the 8 billion pounds of version of that which is tiny. So 00:16:50.200 |
Honestly the fact that this works at all is actually kind of surprising to me, but pretty cool, right? 00:16:57.220 |
And it's relatively reliable as well. Like it's there's the odd hallucination 00:17:01.900 |
There's the odd like going straight to final answer when it should go to search 00:17:06.060 |
But I don't see those issues all that often. So it is really not too bad. Okay, cool 00:17:12.620 |
So now that I've explained that we can jump back into this 00:17:15.980 |
So we have our final answer tool that we are going to be using 00:17:25.060 |
Initialize that all this is doing is formatting the output there. It's not actually doing anything 00:17:29.300 |
It's just returning like the input to this will literally be the same as the output. So there's nothing 00:17:34.980 |
Nothing going on there. Really then once we have our to like we have our search function and 00:17:42.900 |
We have our final answer function. Note that I'm not using line chain tools here and 00:17:49.060 |
there is a reason for that basically, we're not using all of the 00:17:56.100 |
Functions directly here. We're actually going direct to 00:18:01.900 |
That's because honestly, I just found the Olama implementation via line chain to be lacking in in some places 00:18:08.960 |
Particularly with the tool calling which we're not actually using anymore, but with tool calling I couldn't even get to work whatsoever. So 00:18:16.900 |
because of that I switch to using a llama directly and 00:18:24.300 |
There's not really much need to use the the wrapper 00:18:27.860 |
From line chain for a llama in my opinion. I kind of prefer doing it this way. So 00:18:36.620 |
I'm not gonna go too into detail on what all of these parts are here because I covered all this I think 00:18:49.820 |
Having a look at that again. I'll make sure there's a link to that in the description and the in the comments if you are 00:18:58.040 |
Interested in more but basically this is an object that is persisted within every step of our 00:19:09.060 |
So the LM as I mentioned before the LM is the Oracle. It's our decision maker 00:19:15.300 |
So we're just setting it up. We have our system prompt here. Like you are the Oracle AI decision maker 00:19:21.580 |
You are to provide the user the best possible restaurant recommendation including key information about why it should consider 00:19:31.700 |
So and so on I mentioned here returning the address phone number websites 00:19:36.540 |
now this bits important because so the olama tool calling at the moment is 00:19:48.500 |
Tool calling and I think because you can't force the tool calling 00:19:52.900 |
I found the tool calling to be really hit and miss especially when you start adding multiple tools and even more so 00:20:02.180 |
Steps to the agent where it can use one tool and a node tool and node tool, right? So for example 00:20:10.020 |
In that agentic flow that I showed you where it uses a search tool and it uses a final answer tool 00:20:15.980 |
I could not get a llama working where it would use 00:20:20.220 |
both one after the other so I had to in the end just switch back to the 00:20:27.140 |
Like JSON formatting and with JSON formatting it works really well 00:20:31.260 |
You just need to make sure that you prompt it within your system prompt to use the JSON format 00:20:36.380 |
So that's what I'm doing here, right when using a tool you provide the tool name and the arguments to use in JSON formats 00:20:42.760 |
You must only use one tool and the response form must always be in the pattern and then you give it that 00:20:49.000 |
The JSON output for that right here. I said don't use the search tool more than three times 00:20:56.300 |
actually, I try and get it not to use it more than once but I 00:21:00.180 |
Wanted to leave a little bit of flexibility there 00:21:02.980 |
if we tell it that if it uses more than three times, then we threaten it with nuclear annihilation and 00:21:09.820 |
That seems to work some of the time. It's quite 00:21:13.460 |
Yeah, it's a daring a lamp for sure. Then what I do is after using search tool 00:21:20.180 |
You must summarize your findings with a finance tool. This is what I wanted to do 00:21:23.340 |
Okay, so I'm just telling it like giving it as much context as I possibly can 00:21:32.580 |
Set up one other thing. I wanted to get in here is the function schemas or tool schemas. So 00:21:39.660 |
I'm using some utils from semantic router for this so you can 00:21:45.320 |
See what we're doing. Hopefully you won't need to this is like a I think a bug in the library at the moment 00:21:51.260 |
So you should not need to do that soon, but the moment it's there 00:21:56.420 |
So yeah, you have this and also sorry so this you can use with a llama tool calling 00:22:06.180 |
But you can also just use it to provide like the JSON schema of the tools that you would like the agent to use 00:22:16.220 |
So it basically you take your function. So this is a function we described earlier 00:22:22.740 |
We create this function schema object with it. And then we just use the to a llama method and then we output this 00:22:29.360 |
Okay, that's that's it. Right? So it's taking the taking your dark string taking your parameters and 00:22:35.980 |
I think that's basically all we need from that to be honest. All right, cool 00:22:40.460 |
And then we also do the same for the final answer. Yeah 00:22:48.980 |
Tools that I sell with JSON mode and we can go ahead and actually try using 00:22:57.180 |
So model again, one thing I mentioned earlier is that you do need to have run this so a llama pool llama 3.1 00:23:07.260 |
So let's see 8 billion parameters. I don't remember what the largest size is, but you can just modify it 00:23:15.460 |
But if it was like 38 billion parameters, you just you just put that in there 00:23:21.060 |
Pretty pretty simple also the quantization stuff. You can you can put it into there like around here. So we have 00:23:31.460 |
We have messages and format. So this is important. This is what I mentioned before 00:23:34.700 |
We always want the LM to be outputting in JSON format. So we have that structured 00:23:40.100 |
Output that we can then process. So yeah, we we do want that 00:23:48.420 |
Everything to work and then what I do is so we have this get system tools prompt 00:23:54.420 |
so I'm basically combining the system prompts that we defined earlier and 00:23:58.620 |
I'm also taking the tools that we have defined here and 00:24:02.100 |
Then putting them together right in this little 00:24:05.940 |
Function so here the system prompt the tools which is a list of dictionaries 00:24:10.260 |
We create a tools string and then we have system prompt few newline characters 00:24:15.180 |
And then you can use the following tools and then we describe those tools there. So that is our sort of 00:24:33.740 |
So I'm trying to say hello there, right and what we should see is when I say hello there 00:24:37.660 |
It should not use the search tool. It should just use a final answer tool 00:24:51.220 |
Okay, cool. Let's see what we got. So yep. I went straight to final answer and 00:24:57.380 |
Okay, you can see what so the final answer outputs everything in a string 00:25:13.260 |
That's the the tool that we'd like to use and then the parameters that we want to feed in there just like hello 00:25:19.860 |
What kind of cuisine are you in the mood for and then of course phone number and address? 00:25:25.260 |
You know that doesn't need to answer those so it just left them as none 00:25:29.980 |
Okay, cool. Now, let's see if we can get to use the I put web search here. It's actually just it's 00:25:37.100 |
reddit search I suppose it's a reddit search tool and 00:25:44.620 |
Rome, so I'm actually not going to go out because it's a very specific place and I think there's like no one on 00:25:52.460 |
Reddit talking about pizza there. So let's just go with Rome 00:25:58.340 |
Okay, so the agent based on this so you see we have chats 00:26:03.620 |
We pass all this stuff in and asking for that 00:26:06.460 |
It said okay, I'm going to use search tool and I'm going to use search tool with this query best pizzeria in Rome 00:26:13.700 |
Okay, so that work to decide is to use the right tool, which is pretty cool, especially given the model size 00:26:26.620 |
So we're gonna be using this for the agent actions 00:26:30.580 |
So the agent actions are well actually what we hit saw him 00:26:34.660 |
That's this an agent action. The agent is deciding it's going to use the tool name of search 00:26:39.840 |
The tool inputs is going to be this dictionary here and then tool output 00:26:45.760 |
We don't have that yet because we need to run the tool to get that 00:26:48.760 |
We handle that later in some other a little chunk of code. So 00:26:53.460 |
From a llama. So we have the alarm a response again alarm a response never going to include the tool output 00:27:02.660 |
So basically this here why I got here is happening here, right? 00:27:08.540 |
So we're just passing the alarm a response into this agent action object 00:27:14.540 |
Then what we are doing here. We're getting the text 00:27:19.660 |
so what tools use the tool name the input so the the parameters and 00:27:25.460 |
If we have the tool output because we add that later 00:27:29.300 |
We're also going to pass the we're going to return the output. Now we return that text so 00:27:35.980 |
We can see I'll create that we have now an agent action object tool name search tool input 00:27:43.260 |
This and tool output none because we we haven't said that yet 00:27:47.820 |
So that is good. And why do I care about doing that again? I just want to keep things organized and 00:27:54.160 |
- when it does come to passing this like multiple steps of where an agent might be doing different things 00:28:01.620 |
Like it may use search and then it may use the final answer tool or maybe it's going to use search 00:28:08.780 |
Then use the final answer tool. We want to keep a log of what is happening and the way I've set up here 00:28:16.340 |
I don't know if this is the best way of doing it with llama 3.1 00:28:19.800 |
But the way that I say up here is that it's going to take these 00:28:23.380 |
Agent actions and it's going to format them into like a single agent action. It's going to format it into two 00:28:30.220 |
Messages which makes it appear like it's a conversation happening between the assistant and the user 00:28:35.820 |
Okay, so it's like the assistant is providing the the function call and then the user is answering based on the 00:28:47.480 |
Let's see if we can if I can give you an example 00:28:55.300 |
Okay, we have our agent action here and then we have the action to message function 00:29:02.300 |
Okay, so this is just an example, right fake tool name fake query fake output from the function call 00:29:10.580 |
And we will get this so we're gonna get an assistant message with the inputs and the user message 00:29:16.620 |
Representing the output and then we're gonna feed that into our agent as it's kind of going through this process of using tools 00:29:24.220 |
So that's what we're doing here. So the crate scratch pad is basically handling this 00:29:38.740 |
Into here, right? So after the previous like the current user input 00:29:45.020 |
We then add a little bit of additional logic around that as well 00:29:49.960 |
so if the stretch pad has been called at least once so there's at least one tool use I 00:29:55.820 |
You know, it's a small LM so it needs a little bit of extra 00:30:00.540 |
Guidance, so that is what I've done here. So I've added basically another user message 00:30:09.260 |
Messages saying okay, please continue as a reminder. My query was this original query 00:30:16.020 |
the reason I added this is because it tended to I 00:30:18.940 |
Would find that the the agent would go off and start searching about you know 00:30:23.660 |
The best food in it would start with Rome and then it would be like, okay 00:30:27.220 |
what's the best food in LA and then what is the best food in like Austin like it would just kind of 00:30:34.660 |
Start asking like what is the best food noise different places? And of course, I don't want it to do that 00:30:40.100 |
So I'm just reminding it again. Look, this is my original query. This is what I want to know about 00:30:45.180 |
So yeah, I found that to be relatively important for this model and like only answer the original query and nothing else 00:30:55.020 |
Kind of view those other messages as something that it should respond to 00:31:00.620 |
Right that the kind of fake messages I created by the scratch pad again 00:31:04.980 |
there's probably a better way of doing that but it's just you know for this example and 00:31:09.140 |
Then another thing that I found is that it would be quite 00:31:17.500 |
Like I wanted it to give me a bit of more of like a human sounding description like oh 00:31:22.140 |
you should try this because you know X Y & Z and you should try this other place because 00:31:28.820 |
what I would find is I'd be like, hey, you should try X and 00:31:32.540 |
That would be it. All right, and so it was like not very interesting 00:31:38.660 |
So I added this a little bit of prompting here and that seemed to improve things 00:31:43.540 |
Then I just asked it to remember to leave the contact details are prompting looking restaurant if possible now 00:31:50.500 |
another thing that I still found it was doing even after adding all of this is 00:31:56.060 |
It would maybe not search for what is the best food in LA or what was the best food in New York? 00:32:01.140 |
So on so on but it might start saying okay. What is the best food in Rome? 00:32:08.540 |
most recommended food in Rome like it or even just repeat the exact same query again, so 00:32:15.580 |
Added another little bit to the scratchpad as soon as it has used the search tool to say you must now use the final answer 00:32:22.580 |
Tool to kind of be like, okay, just use the finance tool stop being and stop using the search tool 00:32:28.620 |
So yeah that helps it does limit the agent a little bit in 00:32:34.060 |
Okay, maybe using the search tool a few times to try different search terms, but I found it didn't really need that 00:32:41.860 |
Anyway, so this was fine. Then. Yeah, we put everything together as we did before 00:32:47.060 |
so we have the system prompt as before the chat history the users query and then the scratchpad and 00:32:57.180 |
we we just make the query when I remove this and 00:33:02.860 |
Yeah, we return the action agent action run this 00:33:11.420 |
Chat LM function, which is actually this one. We just went through and yeah, we create some like fake chat history 00:33:21.220 |
So actually one important thing here look that I mentioned 00:33:25.300 |
I'm currently in Rome and then I'm like, hey, I'm looking for the best pizzeria near me 00:33:30.580 |
Right, so I'm mentioning this in the history like my current location and then I'm asking for the pizzeria 00:33:37.900 |
So I'm just testing here that okay chat history is actually considered. It's important 00:33:42.900 |
Okay, so you see that sometimes it's not perfect 00:33:46.820 |
So this time it decides to go with final answer straight away, and then we can see that the chat history was 00:33:52.340 |
Considered so considering your location room your desire. Try a local pizzeria. I would recommend trying out pizza 00:34:09.940 |
Switzerland because it kept recommending me this all the time 00:34:19.460 |
Yeah, that was a hallucination. But then we run it one more time and it worked 00:34:24.780 |
okay, so there's a little bit, you know, it could do a little bit of work in some places, but 00:34:30.060 |
Second time it works best pizza in their room. That's what I'm looking for. Okay using the search prompt tool 00:34:38.820 |
That is our core LM function. I'm gonna be getting into the the graph stuff in a moment, but let's try 00:34:46.180 |
Let's try taking this and feeding it into our search function and seeing what we get 00:34:52.740 |
Cool, so that looks pretty relevant. I think very similar results to what we would have seen earlier as well 00:35:07.020 |
Recently traveled to Italy as well the week after I came back 00:35:12.900 |
Decent-sized city and yeah, Rome. Italy has good pizza. I agree 00:35:18.140 |
So we have those results now what we've just done that we've kind of set up all the core logic of the different 00:35:29.020 |
Graph base agent, but we need to like put everything together 00:35:33.820 |
We need to connect everything which we haven't started to do yet. So 00:35:41.500 |
Components that are going to be there almost like wrappers for our functions 00:35:50.100 |
And the reason we need these wrappers is because we're actually using this state object 00:35:54.380 |
Remember the agent state from earlier that gets passed directly into here 00:35:58.460 |
It's not true that this is a list. My my typing is is wrong there 00:36:03.320 |
Let me just check what it actually why actually is okay, it's a typed date 00:36:09.500 |
so let me take that and fix that quickly, okay, so 00:36:19.780 |
Okay, cool. So let's just have a quick look at these run Oracle 00:36:24.080 |
So this is just running our LLM. Okay, so call LLM check history state 00:36:32.580 |
Yeah, I mean this with that is what it is we already went through to call LLM function we then have our router so 00:36:40.580 |
If you remember here, right the Oracle can make it can go one of two ways 00:36:47.420 |
It can go to finance or it can go to search the way that this is handled is actually 00:36:54.240 |
there's like an intermediate step here, which is the router, okay, and 00:36:59.380 |
That is like okay based on what the Oracle outputs 00:37:04.480 |
I will send you to one of these two directions one of these two places 00:37:12.420 |
so we also include some error handling in here, so if we see a 00:37:21.260 |
We go directly to the final answer. Okay, which this might not be the best error handling. Actually. I'm not sure if it would work 00:37:28.980 |
I don't know if I handle it or not, but it yeah, it doesn't really matter. I haven't seen it fail 00:37:37.460 |
Which will go from so if we see the term search it we know we need to use the search function if we see the time 00:37:44.660 |
Final answer we know we need to use a final answer function and we use that in here 00:37:50.180 |
so we have this tool string to function which is 00:37:53.980 |
provided with the tool name based on the output from the 00:38:01.180 |
Based on that we're gonna get the function, right? 00:38:03.860 |
So if it passes in the search string from the Oracle here this 00:38:08.060 |
What I'm highlighting right here is going to become the search function and then in the search function 00:38:14.240 |
We're going to pass in the tool arguments from the Oracle, right then from that we're going to get 00:38:20.640 |
everything we need to construct a agent action and 00:38:25.040 |
Then if the tool name is final and so we output in this in this format 00:38:31.360 |
I'm not too sure why we need to do that. I will leave it but maybe just 00:38:35.440 |
Question that I'm not I'm not sure if it's needed or 00:38:40.320 |
Otherwise, we're going to add to the intermediate steps our action out. We should see. Yeah 00:38:47.120 |
What we've output right so if we're using search tool 00:38:50.720 |
We're going to add that to our intermediate steps the way that the say itself here is that this 00:38:55.680 |
Single item here doesn't replace the entirety of the intermediate steps 00:38:59.720 |
They actually gets added to the intermediate steps a little bit weird in terms of syntax in my opinion 00:39:17.160 |
You know our components for the graph. They're ready. We can construct the graph 00:39:21.960 |
Okay, so we initialize the graph with our agent state 00:39:26.720 |
So that earlier state object, which is a typed date not a list 00:39:31.360 |
Then we add some nodes to our graph. So we have our Oracle our search our final answer 00:39:38.840 |
What that will look like is literally this here ignoring the little router in the middle that does exist it 00:39:46.240 |
you know it just isn't included in the nodes and the reason it doesn't exist in the nodes is that it actually exists is more of 00:39:53.680 |
Like a within this conditional edge object here. So the conditional edge is basically that's like the dotted line 00:40:04.320 |
The actual edges down here from you know, the scalar jaw drawing I did are like actual lines 00:40:11.760 |
It kind of looks like a dot line as well, but I don't know how to do it. How can I do it? 00:40:17.720 |
Oh, there we go. Perfect. So that is what that would look like 00:40:24.520 |
So that's a conditional edge. So that's going from the Oracle based on the router logic, which is like, okay 00:40:33.720 |
The other thing that I'm missing here is the entry point of our graph, which is the Oracle 00:40:38.560 |
So that's a starting point that we go to so that's where we insert the query 00:40:47.640 |
Then we create our actual edges so the actual edges it's only 00:40:53.000 |
okay, so here we're only adding the edge from 00:41:01.580 |
All right, and in reality, I don't even need this bit here. I don't believe 00:41:11.400 |
But then we're gonna see if tool name is not equal to final answer. We add the edge. So 00:41:16.880 |
Honestly, I don't even know why I have that I can just remove it and I can even remove that 00:41:23.080 |
Hey, it's fine. Whatever. I don't want to break it now 00:41:27.080 |
so once something does go to the final answer tool the final answer tool as 00:41:32.520 |
We see in our graph here has one line coming out of it, which is answer, right? 00:41:42.040 |
Right in the end block is kind of like here and then the output from that is is this okay? 00:41:48.400 |
and then once all that is done we can compile our graph and if everything is set up in a 00:41:54.040 |
Functional way it will compile and we will not get an error 00:42:01.160 |
I have this little bit of code here that we can use in order to visualize 00:42:06.460 |
Everything or visualize our graph. So I'm gonna take that 00:42:23.320 |
So the Oracle I think line graph is always adding this extra line here this optional line 00:42:33.920 |
Basically allowing the LLM if it decides to to return a direct answer 00:42:40.240 |
But in reality, we've prompted it not to do that and we try to sell so it doesn't but this is what we get 00:42:47.160 |
So it's our entry point it goes to the Oracle Oracle can go to final answer or search if it goes search 00:43:01.080 |
super simple, but again, we're using a tiny model, so 00:43:05.240 |
Something overly complicated probably won't work 00:43:09.380 |
So I'm gonna say with our graph agent, where is the best pizza in Rome and let's see what it comes up with 00:43:24.400 |
So query was best pizza in Rome. It got these three submissions from reddit 00:43:31.120 |
It then went back to the Oracle which went to the router the router identified that it should go to the final 00:43:37.680 |
answer or sorry the Oracle decided this the router identified that and 00:43:42.040 |
we went to the final answer tool and then we have our 00:43:47.280 |
Outputs. Okay based on your question. I would recommend trying I keep getting this recommendation. I 00:43:59.480 |
Not what I would actually expect and it also isn't a Roman pizza. It's a Neapolitan pizza 00:44:12.520 |
that their top recommended pizza in Rome is seems to be in a 00:44:17.360 |
The Naples style pizza, and I also tried this. So where is the best gluten-free pizza in Rome? I'm not gluten-free but 00:44:25.800 |
My girlfriend is so I thought okay, let's see and 00:44:36.000 |
Let's see what we get. I didn't get a good response before 00:44:41.280 |
So here's like unfortunately, I was unable to find specific recommendations of gluten-free pizza in Rome 00:44:47.160 |
It seems like it's only generally considered a good destination for gluten-free options generally actually it is 00:44:58.600 |
Pizza and Trevi maybe we can we you know, we can try that. Oh, there we go 00:45:04.360 |
So mark, so we did have some options here. I don't know why 00:45:11.200 |
So pizza and Trevi. Let's have a look. Okay, so pizza in Trevi. Let's see 00:45:27.360 |
Gluten-free beer gluten-free pizza. There we go. So 00:45:42.120 |
Here so a little further out just south of Trastevere and the center over here 00:45:48.880 |
But if you look they have some kind of cool-looking pizza 00:45:54.840 |
Like what I would not usually expect to find in Italy like this 00:46:11.920 |
Pizza recommending agent does work. It's not perfect, but it does work and it has some good recommendations there again 00:46:32.400 |
MacBook so I don't have any sort of powerful thing running this and 00:46:36.200 |
Because of the sort of limitations of the memory on my on my MacBook 00:46:41.200 |
I can only run the smaller models like the 8 billion parameter model 00:46:45.200 |
But you could also see how quickly that was responding like it did not take long to go through multiple like agent steps 00:46:52.000 |
Perform a search or read everything from that search and producing an answer to us and the answers were generally pretty good 00:46:59.440 |
So honestly, I I think it's actually quite impressive that you could do that for sure 00:47:04.440 |
The Jason mode seems to be the best approach for agents at least for now 00:47:10.520 |
I know that the LLM a team are planning to add force function calling which I think will make a difference 00:47:18.240 |
But for now Jason mode works perfectly. And yeah, so that's how we would build 00:47:28.480 |
Llama, so that's it for this video. I hope this has been useful and interesting, but I'll leave it there for now 00:47:34.880 |
So thank you very much for watching and I will see you again in the next one. Bye