Local LangGraph Agents with Llama 3.1 + Ollama

00:00:00.000 | Today we are going to be taking a look at how we can build our own

00:00:03.400 | local agents using

00:00:06.440 | Langraph with a llama now line graph if you don't know it is a open source library from Lang chain

00:00:14.160 | that allows us to build

00:00:16.600 | agents within a graph like structure and

00:00:21.000 | it is currently my

00:00:24.320 | Preferred way of building agents whether they are on open AI or local wherever now

00:00:30.960 | Alarma is another open source project that allows us to run

00:00:35.880 | LLMs locally just very very easily

00:00:40.120 | So we're going to be running llama 3.1 the 8 billion parameter model

00:00:45.840 | Which is tiny and yet I get like reasonable responses and reasonable actions

00:00:54.120 | Coming from my agent, which is actually pretty cool. So with that, let's jump straight into it

00:01:00.040 | So we're going to be running everything locally here. This is slightly different to what I usually do

00:01:04.600 | We usually go through like a collab notebook, and I'm sure you probably could run this in a collab notebook

00:01:09.520 | but I want to show you how to run this locally on Mac because

00:01:13.180 | Generally speaking alarm is it works well with Mac because we have the unified memory

00:01:19.680 | which means we can run fairly large models quite easily and

00:01:23.160 | Actually really fast as well. So first thing that we would want to do is go to X

00:01:27.800 | This is supposed to be the URL and install a llama. So rather than going to the URL just type install llama

00:01:35.280 | Okay, and we have Mac OS

00:01:39.280 | Download Mac OS and we just download it. I already have mine downloaded. You can see the little icon up here

00:01:46.040 | So yeah, I'm not going to download it again

00:01:49.400 | But once you have downloaded it, you want to run this command in your terminal. So a llama pull llama

00:01:56.280 | Actually three point one is the one I'm using not this one

00:02:01.120 | Three point one eight B. So I'm gonna paste that in here a llama pull llama three point one eight B

00:02:08.080 | Right, so that has downloaded the model. It will take a little while

00:02:11.440 | It's literally I think it just maybe updated the model on my side here as I already had it installed

00:02:17.320 | Then we're going to want to set up a Python environment. I mean, you don't have to do this. It's up to you

00:02:22.200 | I would recommend it. So the reason I say that is because we're going to working from this

00:02:28.120 | Repository here. So this is the examples

00:02:31.400 | Repo, there'll be a link to that in the comments below. So I would one just get clone that

00:02:38.800 | So you can copy here

00:02:45.280 | Come here and just run get cologne

00:02:48.040 | this

00:02:50.520 | That will download the entire repo for you, which is it's quite a lot in there to be fair

00:02:56.120 | But we only actually need this one bit and then you want to navigate into this directory

00:03:01.080 | so if I just show you the full path to where I am from the root of

00:03:06.760 | This directory. So I want to go one more some in learn

00:03:12.680 | generation line chain line graph zero to olama line graph agent

00:03:17.500 | Okay, there we are now in here

00:03:21.460 | Do LS you can see that we have this poetry lock and pyproject.toml file

00:03:26.620 | So this all the prerequisites that we need to run this little project

00:03:32.140 | Are contained within here, right and the way that we install these is

00:03:41.160 | If we go to our readme

00:03:44.120 | Well, actually we need to set up our

00:03:47.320 | Python environment, so I'm gonna do that. I'm using conda here. I know people have opinions on which

00:03:55.280 | Package manager they like to use I've been using this forever. So I just stick with it

00:04:00.400 | So that was that is exactly how I would create my environment there

00:04:08.140 | Okay

00:04:10.140 | So you can now see that I'm in my new Python environment

00:04:14.140 | Then I'm going to go ahead and install poetry. So pip install poetry then we're going to do poetry install

00:04:21.740 | Okay, you probably will not see this when you run this

00:04:28.300 | But just in case you can run poetry lock no update to kind of fix that and then run poetry install again

00:04:36.980 | Okay

00:04:38.340 | So you can see everything has just been installed and then with that we can go over and start running notebook

00:04:43.940 | Now the notebook is here

00:04:46.020 | again, same little directory here and

00:04:49.540 | You can ignore this if you are going with my colab you could install everything I think with this

00:04:58.280 | But we are not so

00:05:02.820 | What we need to do first is some in VS code, actually, let me close the sidebar

00:05:07.300 | And zoom in so in VS code select kernel select another

00:05:13.860 | Python environment if you don't see your environment on this list

00:05:20.140 | You might have to restart VS code or I'm actually in cursor, but same thing and now I'm going to try again

00:05:26.980 | So it's picked up on my environment that I would like to use

00:05:31.980 | So now we can go ahead and start running everything

00:05:34.500 | so the first thing we're going to need to do is actually what we're gonna be using the

00:05:39.940 | reddit API to get some suggestions for our agent our agent is going to be doing is

00:05:47.020 | Recommending pizza for me in Rome and the way that's going to do that is it's going to

00:05:53.260 | search for something in my case here is going to be searching for good pizza in Rome and

00:06:01.740 | It is then going to

00:06:03.740 | Decide like if you ask something like what is the best pizza in Rome? It's going to decide. Okay, I'm gonna go use

00:06:09.660 | Reddit to search or the search tool is that's what we tell it

00:06:14.020 | We don't tell it necessarily read it and it's going to return that information from a few different submissions on reddit

00:06:21.060 | Find what people are saying what their recommendations are and then it's going to tell me where I should go. So

00:06:27.900 | To use the reddit API we do need to sign up for it. You get a certain amount of free

00:06:32.900 | Calls every hour I think

00:06:35.660 | So we don't need to do I don't need to pay for anything, which is great

00:06:39.980 | you can follow this video, which is just I will leave a link again to this in the

00:06:46.380 | comments or

00:06:48.980 | You don't necessarily need to follow that and said so this is what I always do when I've

00:06:55.100 | Done something in the past and I've kind of forgotten how to do it. I

00:06:59.340 | search reddit API and then I just add my name on to the end and then because I

00:07:05.860 | Created this in the past. I'm like, okay, this seems to make sense to me

00:07:10.540 | Which is I suppose a positive thing of doing all these videos and articles

00:07:17.140 | So

00:07:21.220 | Come down here and this is quite old

00:07:24.980 | This is already three and a half years ago now

00:07:28.640 | So it's a while but still everything I think is still up to date so you can go to app preferences here

00:07:36.380 | So it's reddit.com slash

00:07:38.380 | Press slash apps. I copy that put in here

00:07:43.020 | We go here. I have these I don't even remember what these two were

00:07:50.420 | So it's interesting and then I create this one. I

00:07:55.100 | Think no. No, sorry. I created this one just now for this video

00:08:00.100 | So if I go into here and see all this information here, which I'm going to use so

00:08:05.700 | Have my secret key you can try and you can try and use this as well if you like. See what happens

00:08:13.420 | and

00:08:15.620 | I have my app key or ID. I'm not sure what is exactly

00:08:20.780 | What's it client ID? Okay, and you just put them into here. So here are initializing our

00:08:26.900 | Like reddit instance or client use agent. I think you just put the name of your app here

00:08:35.420 | It has been quite a while. I should rewatch my video on it

00:08:39.720 | but

00:08:42.140 | in any case

00:08:44.060 | That's what you do. And then if I scroll down a little bit more I

00:08:48.180 | Think yeah here we get into the actual code

00:08:52.260 | now the code part is the part that I'm not going to follow because I wanted to do this like from

00:08:57.820 | Like using requests rather than using the pro library

00:09:02.260 | Which is what I would probably recommend doing. It's a lot easier and

00:09:08.300 | It's what we're going to use here. So you see we're importing Pro, which is the Python reddit API wrapper

00:09:15.260 | so we're using that and

00:09:17.580 | What we're going to do is basically gather a loads of data. So we're going to be getting

00:09:21.980 | Well this information actually so for each submission that we search for and find for a particular question

00:09:29.260 | we're going to get the title of that submission the description, so I think that's like a first comment and

00:09:35.740 | Then the sort of like more top rated comments that were within the thread of

00:09:43.620 | That submission and then I'm going to define and it's variable here, which is basically just going to get called

00:09:50.700 | Like let's say this is my object to call the string

00:09:57.560 | method here, obviously we just do string right and

00:10:03.060 | That will output a more like LM friendly representation of the information there. Okay, so

00:10:10.100 | I'm just

00:10:12.740 | Defining that so that it's kind of keep things a little cleaner when we're

00:10:16.660 | Building everything here. Otherwise, I'm using dictionaries and whatever else and that's fine

00:10:22.420 | Nothing wrong with that, but it's a little bit not clean or not organized. It's probably the right way to put it

00:10:29.060 | So let me go through this a little bit

00:10:31.060 | I don't want to focus too much on the reddit API stuff here because it's not really what we're here for

00:10:35.860 | But just very quickly, right so reddit subreddits

00:10:40.340 | We're looking through all of them and then we're searching across reddit best pizza in Rome. All right, so

00:10:47.220 | the

00:10:48.940 | We initialize that list we go through we get, you know relevant information. We get all the comments. We include the upvotes

00:10:55.660 | I thought this is important and we also filter by the number of upvotes that the comments have

00:11:01.760 | The logic here couldn't be better. We could like say, okay. These are the top rated comments here

00:11:08.500 | I don't actually go with the top comments. I'm just going for kind of three of them that have you know, at least 20 upvotes

00:11:14.940 | which

00:11:17.020 | Works, but I'm you know, it could be better. But anyway, it doesn't matter. This is just

00:11:22.500 | Like a quick example, so you can see we run this and it's going through it's like, okay. I found these submissions

00:11:29.940 | This is Rome. Yep, since pizza is an American food. Yeah, it's a lot of Italians here

00:11:37.660 | That would not be very happy with that. We can then see what we got out from that

00:11:43.460 | So I'm doing the the string method here to get the recommendations. You can see title

00:11:48.300 | Description I was a little disappointed. No

00:11:52.260 | after

00:11:53.380 | pasta and gelato

00:11:55.380 | nice and

00:11:57.580 | Yeah, so we have some

00:12:00.300 | Recommendations here that they are coming up with this this guy Sisyphus rock. Cool. We have another one here

00:12:08.380 | Go to Naples

00:12:11.100 | Angering a lot of Romans right now. And yeah, we have some other ones as well

00:12:15.780 | Cool

00:12:18.820 | Okay, that's kind of what we're getting out from that tool

00:12:20.980 | What we're going to do is just wrap all of that one in a function here. So it's all I'm doing here

00:12:24.940 | So go in a particular query. I'm going to rerun all of what we just did again

00:12:29.580 | I don't want to go through all this too much. This is all like, you know set up API stuff

00:12:34.340 | It's not really the agent itself. Cool. So we have all of that

00:12:39.380 | so

00:12:41.260 | in our graph

00:12:43.260 | Might help me if I visualize this a little bit. Okay, so we have our

00:12:48.580 | Like reddit search API tool thing here. I'm just gonna call it search

00:12:54.500 | then we're also going to have

00:12:57.340 | If you've watched previous videos from me on

00:13:00.860 | Langrath you will see this pattern quite a lot rather than me allowing

00:13:06.100 | the agent to

00:13:08.780 | Return an answer directly

00:13:11.100 | I actually like to use a structured output format and the way that I do that is using a tool

00:13:16.060 | Well using a tool usually here. We're not technically using a tool. We're using the JSON output format

00:13:22.580 | I'll talk a bit more about the difference there soon

00:13:25.060 | But you can think of this as basically we're using a tool. So we have these two tools

00:13:32.100 | Technically kind of not really but it's fine. And then we have our LLM which is I call usually called the Oracle

00:13:41.860 | thanks to like

00:13:45.060 | the I

00:13:46.580 | Think they called it the Oracle in some line chain documentation a long time ago, and I just like the name so I'm sticking with it

00:13:52.660 | so the Oracle

00:13:55.060 | like the Oracle's is

00:13:57.380 | The decision maker right so it makes a decision based on your you know, what is going on, right?

00:14:03.540 | So we're gonna have our query coming in

00:14:05.540 | So it's our user query comes in then the Oracle is like, okay based on this query, what do I need to do?

00:14:14.540 | Alright, could I just answer the user directly right if I'm just saying?

00:14:18.860 | Hello, how are you?

00:14:21.460 | Whatever else right? Let's just like small like small talk. I don't need to use I you know

00:14:28.140 | Why would you use the search tool? There's no need

00:14:30.900 | So instead the Oracle should be able to just go directly to the final answer tool

00:14:37.580 | Okay, so we do give it that option and then if it goes to the final answer tool

00:14:43.260 | It's gonna output a structured output format, which is gonna look a little bit like with the answer

00:14:49.300 | Which is like a natural language answer and then we have some other parameters. I think it's like phone number

00:14:56.180 | So it's like a phone number for the restaurant if the if the agent has seen that within the data

00:15:04.780 | I think honestly using reddit comments. It probably isn't gonna come up with that, but you can you know, we can try and

00:15:12.420 | address

00:15:13.860 | Okay

00:15:15.540 | Again, it's like Street address

00:15:17.540 | So that will be output formats, but of course, you know the phone number and address it doesn't need to output that every time

00:15:25.180 | Okay, so it's like they're more like optional parameters. I think in the prompting

00:15:29.580 | We just tell the agent to keep those empty if it doesn't know

00:15:33.180 | But answer it should provide every time now on the other hand if I ask

00:15:38.620 | Okay, tell me where to find the best pizza in Rome in that case. All right, the the Oracle will hopefully

00:15:45.660 | Hopefully use a search tool

00:15:48.620 | Right. So when it uses the search tool we go here

00:15:53.820 | It will get some information and then it will actually go back with that new information

00:16:00.420 | To the Oracle, right? So this is a like it has to go back

00:16:05.860 | so that's why I'm making this line a solid line, whereas these lines are dotted because

00:16:11.620 | These are like optional like it could go to finance that it could go to search

00:16:16.080 | Once it goes to search it has to go back to the Oracle component

00:16:20.860 | Then the Oracle component given this new information is like, okay. Now, what do I need to do?

00:16:25.960 | Ideally, it should just go to the final answer every single time

00:16:29.900 | Like this is a I think quite a simple agent like it is really not complex

00:16:36.200 | whatsoever

00:16:38.260 | But again, we're using a tiny LLM for this, right?

00:16:43.180 | Llama 3.1 very good, but we're using the 8 billion pounds of version of that which is tiny. So

00:16:50.200 | Honestly the fact that this works at all is actually kind of surprising to me, but pretty cool, right?

00:16:57.220 | And it's relatively reliable as well. Like it's there's the odd hallucination

00:17:01.900 | There's the odd like going straight to final answer when it should go to search

00:17:06.060 | But I don't see those issues all that often. So it is really not too bad. Okay, cool

00:17:12.620 | So now that I've explained that we can jump back into this

00:17:15.980 | So we have our final answer tool that we are going to be using

00:17:20.320 | okay, so we have all of this so yeah, we

00:17:25.060 | Initialize that all this is doing is formatting the output there. It's not actually doing anything

00:17:29.300 | It's just returning like the input to this will literally be the same as the output. So there's nothing

00:17:34.980 | Nothing going on there. Really then once we have our to like we have our search function and

00:17:42.900 | We have our final answer function. Note that I'm not using line chain tools here and

00:17:49.060 | there is a reason for that basically, we're not using all of the

00:17:54.100 | line graph or line chain

00:17:56.100 | Functions directly here. We're actually going direct to

00:17:59.860 | Olama

00:18:01.900 | That's because honestly, I just found the Olama implementation via line chain to be lacking in in some places

00:18:08.960 | Particularly with the tool calling which we're not actually using anymore, but with tool calling I couldn't even get to work whatsoever. So

00:18:16.900 | because of that I switch to using a llama directly and

00:18:21.540 | just suck with it because honestly it

00:18:24.300 | There's not really much need to use the the wrapper

00:18:27.860 | From line chain for a llama in my opinion. I kind of prefer doing it this way. So

00:18:33.580 | We initialize the agent state

00:18:36.620 | I'm not gonna go too into detail on what all of these parts are here because I covered all this I think

00:18:42.980 | Pretty well in my previous video on

00:18:47.820 | Line graph, so I would just recommend

00:18:49.820 | Having a look at that again. I'll make sure there's a link to that in the description and the in the comments if you are

00:18:58.040 | Interested in more but basically this is an object that is persisted within every step of our

00:19:06.340 | our agent graph

00:19:09.060 | So the LM as I mentioned before the LM is the Oracle. It's our decision maker

00:19:15.300 | So we're just setting it up. We have our system prompt here. Like you are the Oracle AI decision maker

00:19:21.580 | You are to provide the user the best possible restaurant recommendation including key information about why it should consider

00:19:29.020 | visiting or ordering

00:19:31.700 | So and so on I mentioned here returning the address phone number websites

00:19:36.540 | now this bits important because so the olama tool calling at the moment is

00:19:44.780 | Not that like it works, but you can't force

00:19:48.500 | Tool calling and I think because you can't force the tool calling

00:19:52.900 | I found the tool calling to be really hit and miss especially when you start adding multiple tools and even more so

00:19:59.680 | When you start adding multiple

00:20:02.180 | Steps to the agent where it can use one tool and a node tool and node tool, right? So for example

00:20:10.020 | In that agentic flow that I showed you where it uses a search tool and it uses a final answer tool

00:20:15.980 | I could not get a llama working where it would use

00:20:20.220 | both one after the other so I had to in the end just switch back to the

00:20:27.140 | Like JSON formatting and with JSON formatting it works really well

00:20:31.260 | You just need to make sure that you prompt it within your system prompt to use the JSON format

00:20:36.380 | So that's what I'm doing here, right when using a tool you provide the tool name and the arguments to use in JSON formats

00:20:42.760 | You must only use one tool and the response form must always be in the pattern and then you give it that

00:20:49.000 | The JSON output for that right here. I said don't use the search tool more than three times

00:20:56.300 | actually, I try and get it not to use it more than once but I

00:21:00.180 | Wanted to leave a little bit of flexibility there

00:21:02.980 | if we tell it that if it uses more than three times, then we threaten it with nuclear annihilation and

00:21:09.820 | That seems to work some of the time. It's quite

00:21:13.460 | Yeah, it's a daring a lamp for sure. Then what I do is after using search tool

00:21:20.180 | You must summarize your findings with a finance tool. This is what I wanted to do

00:21:23.340 | Okay, so I'm just telling it like giving it as much context as I possibly can

00:21:29.340 | Cool

00:21:30.580 | Okay, so that is

00:21:32.580 | Set up one other thing. I wanted to get in here is the function schemas or tool schemas. So

00:21:39.660 | I'm using some utils from semantic router for this so you can

00:21:45.320 | See what we're doing. Hopefully you won't need to this is like a I think a bug in the library at the moment

00:21:51.260 | So you should not need to do that soon, but the moment it's there

00:21:56.420 | So yeah, you have this and also sorry so this you can use with a llama tool calling

00:22:04.060 | That that's what it's built for

00:22:06.180 | But you can also just use it to provide like the JSON schema of the tools that you would like the agent to use

00:22:13.740 | When you're using JSON mode

00:22:16.220 | So it basically you take your function. So this is a function we described earlier

00:22:22.740 | We create this function schema object with it. And then we just use the to a llama method and then we output this

00:22:29.360 | Okay, that's that's it. Right? So it's taking the taking your dark string taking your parameters and

00:22:35.980 | I think that's basically all we need from that to be honest. All right, cool

00:22:40.460 | And then we also do the same for the final answer. Yeah

00:22:44.020 | Okay. Yeah, and that's it. So we we have our

00:22:48.980 | Tools that I sell with JSON mode and we can go ahead and actually try using

00:22:54.980 | our

00:22:57.180 | So model again, one thing I mentioned earlier is that you do need to have run this so a llama pool llama 3.1

00:23:05.540 | 8b

00:23:07.260 | So let's see 8 billion parameters. I don't remember what the largest size is, but you can just modify it

00:23:12.940 | So I'm pretty sure it's not this size

00:23:15.460 | But if it was like 38 billion parameters, you just you just put that in there

00:23:21.060 | Pretty pretty simple also the quantization stuff. You can you can put it into there like around here. So we have

00:23:29.180 | model

00:23:31.460 | We have messages and format. So this is important. This is what I mentioned before

00:23:34.700 | We always want the LM to be outputting in JSON format. So we have that structured

00:23:40.100 | Output that we can then process. So yeah, we we do want that

00:23:45.740 | that's how we get the tools and

00:23:48.420 | Everything to work and then what I do is so we have this get system tools prompt

00:23:54.420 | so I'm basically combining the system prompts that we defined earlier and

00:23:58.620 | I'm also taking the tools that we have defined here and

00:24:02.100 | Then putting them together right in this little

00:24:05.940 | Function so here the system prompt the tools which is a list of dictionaries

00:24:10.260 | We create a tools string and then we have system prompt few newline characters

00:24:15.180 | And then you can use the following tools and then we describe those tools there. So that is our sort of

00:24:22.500 | tool augmented system prompt

00:24:26.180 | just passing our tools there and

00:24:29.140 | Then this is a simplified version here

00:24:33.740 | So I'm trying to say hello there, right and what we should see is when I say hello there

00:24:37.660 | It should not use the search tool. It should just use a final answer tool

00:24:40.980 | I'm missing something

00:24:43.460 | system prompt

00:24:45.460 | Okay, I need to run this

00:24:48.700 | And run this again

00:24:51.220 | Okay, cool. Let's see what we got. So yep. I went straight to final answer and

00:24:57.380 | Okay, you can see what so the final answer outputs everything in a string

00:25:02.100 | But we we just pass it. Okay, so we have

00:25:06.260 | Message

00:25:08.820 | Content. Ah, okay. Perfect

00:25:10.820 | So we have the name final answers

00:25:13.260 | That's the the tool that we'd like to use and then the parameters that we want to feed in there just like hello

00:25:18.060 | I'm here to help you find a great restaurant

00:25:19.860 | What kind of cuisine are you in the mood for and then of course phone number and address?

00:25:25.260 | You know that doesn't need to answer those so it just left them as none

00:25:29.980 | Okay, cool. Now, let's see if we can get to use the I put web search here. It's actually just it's

00:25:37.100 | reddit search I suppose it's a reddit search tool and

00:25:40.980 | Hi, I'm looking for the best pizzeria in el

00:25:44.620 | Rome, so I'm actually not going to go out because it's a very specific place and I think there's like no one on

00:25:52.460 | Reddit talking about pizza there. So let's just go with Rome

00:25:58.340 | Okay, so the agent based on this so you see we have chats

00:26:03.620 | We pass all this stuff in and asking for that

00:26:06.460 | It said okay, I'm going to use search tool and I'm going to use search tool with this query best pizzeria in Rome

00:26:13.700 | Okay, so that work to decide is to use the right tool, which is pretty cool, especially given the model size

00:26:20.380 | Okay now

00:26:22.420 | We're gonna use a pedantic base model again

00:26:26.620 | So we're gonna be using this for the agent actions

00:26:30.580 | So the agent actions are well actually what we hit saw him

00:26:34.660 | That's this an agent action. The agent is deciding it's going to use the tool name of search

00:26:39.840 | The tool inputs is going to be this dictionary here and then tool output

00:26:45.760 | We don't have that yet because we need to run the tool to get that

00:26:48.760 | We handle that later in some other a little chunk of code. So

00:26:53.460 | From a llama. So we have the alarm a response again alarm a response never going to include the tool output

00:27:00.380 | So we just include tool name and tool input

00:27:02.660 | So basically this here why I got here is happening here, right?

00:27:08.540 | So we're just passing the alarm a response into this agent action object

00:27:14.540 | Then what we are doing here. We're getting the text

00:27:19.660 | so what tools use the tool name the input so the the parameters and

00:27:25.460 | If we have the tool output because we add that later

00:27:29.300 | We're also going to pass the we're going to return the output. Now we return that text so

00:27:35.980 | We can see I'll create that we have now an agent action object tool name search tool input

00:27:43.260 | This and tool output none because we we haven't said that yet

00:27:47.820 | So that is good. And why do I care about doing that again? I just want to keep things organized and

00:27:54.160 | - when it does come to passing this like multiple steps of where an agent might be doing different things

00:28:01.620 | Like it may use search and then it may use the final answer tool or maybe it's going to use search

00:28:06.140 | three times hopefully not anymore and

00:28:08.780 | Then use the final answer tool. We want to keep a log of what is happening and the way I've set up here

00:28:16.340 | I don't know if this is the best way of doing it with llama 3.1

00:28:19.800 | But the way that I say up here is that it's going to take these

00:28:23.380 | Agent actions and it's going to format them into like a single agent action. It's going to format it into two

00:28:30.220 | Messages which makes it appear like it's a conversation happening between the assistant and the user

00:28:35.820 | Okay, so it's like the assistant is providing the the function call and then the user is answering based on the

00:28:44.500 | Output of that function call. Okay, so

00:28:47.480 | Let's see if we can if I can give you an example

00:28:50.540 | so action to

00:28:53.300 | message or messages

00:28:55.300 | Okay, we have our agent action here and then we have the action to message function

00:29:02.300 | Okay, so this is just an example, right fake tool name fake query fake output from the function call

00:29:10.580 | And we will get this so we're gonna get an assistant message with the inputs and the user message

00:29:16.620 | Representing the output and then we're gonna feed that into our agent as it's kind of going through this process of using tools

00:29:24.220 | So that's what we're doing here. So the crate scratch pad is basically handling this

00:29:30.120 | conversion for us for multiple actions and

00:29:34.300 | Then that scratch pad gets inserted

00:29:38.740 | Into here, right? So after the previous like the current user input

00:29:45.020 | We then add a little bit of additional logic around that as well

00:29:49.960 | so if the stretch pad has been called at least once so there's at least one tool use I

00:29:55.820 | You know, it's a small LM so it needs a little bit of extra

00:30:00.540 | Guidance, so that is what I've done here. So I've added basically another user message

00:30:06.340 | I append on to the the scratch pad

00:30:09.260 | Messages saying okay, please continue as a reminder. My query was this original query

00:30:16.020 | the reason I added this is because it tended to I

00:30:18.940 | Would find that the the agent would go off and start searching about you know

00:30:23.660 | The best food in it would start with Rome and then it would be like, okay

00:30:27.220 | what's the best food in LA and then what is the best food in like Austin like it would just kind of

00:30:34.660 | Start asking like what is the best food noise different places? And of course, I don't want it to do that

00:30:40.100 | So I'm just reminding it again. Look, this is my original query. This is what I want to know about

00:30:45.180 | So yeah, I found that to be relatively important for this model and like only answer the original query and nothing else

00:30:52.740 | So I'm trying to encourage you to not

00:30:55.020 | Kind of view those other messages as something that it should respond to

00:31:00.620 | Right that the kind of fake messages I created by the scratch pad again

00:31:04.980 | there's probably a better way of doing that but it's just you know for this example and

00:31:09.140 | Then another thing that I found is that it would be quite

00:31:12.820 | Loose on details in the answer field

00:31:17.500 | Like I wanted it to give me a bit of more of like a human sounding description like oh

00:31:22.140 | you should try this because you know X Y & Z and you should try this other place because

00:31:26.200 | so on and so on and

00:31:28.820 | what I would find is I'd be like, hey, you should try X and

00:31:32.540 | That would be it. All right, and so it was like not very interesting

00:31:38.660 | So I added this a little bit of prompting here and that seemed to improve things

00:31:43.540 | Then I just asked it to remember to leave the contact details are prompting looking restaurant if possible now

00:31:50.500 | another thing that I still found it was doing even after adding all of this is

00:31:56.060 | It would maybe not search for what is the best food in LA or what was the best food in New York?

00:32:01.140 | So on so on but it might start saying okay. What is the best food in Rome?

00:32:05.660 | cool, what is the

00:32:08.540 | most recommended food in Rome like it or even just repeat the exact same query again, so

00:32:15.580 | Added another little bit to the scratchpad as soon as it has used the search tool to say you must now use the final answer

00:32:22.580 | Tool to kind of be like, okay, just use the finance tool stop being and stop using the search tool

00:32:28.620 | So yeah that helps it does limit the agent a little bit in

00:32:34.060 | Okay, maybe using the search tool a few times to try different search terms, but I found it didn't really need that

00:32:41.860 | Anyway, so this was fine. Then. Yeah, we put everything together as we did before

00:32:47.060 | so we have the system prompt as before the chat history the users query and then the scratchpad and

00:32:54.540 | then yeah, we

00:32:57.180 | we we just make the query when I remove this and

00:33:02.860 | Yeah, we return the action agent action run this

00:33:07.780 | cool, so we're gonna try the

00:33:11.420 | Chat LM function, which is actually this one. We just went through and yeah, we create some like fake chat history

00:33:18.900 | So hi there. How you I'm currently Rome

00:33:21.220 | So actually one important thing here look that I mentioned

00:33:25.300 | I'm currently in Rome and then I'm like, hey, I'm looking for the best pizzeria near me

00:33:30.580 | Right, so I'm mentioning this in the history like my current location and then I'm asking for the pizzeria

00:33:37.900 | So I'm just testing here that okay chat history is actually considered. It's important

00:33:42.900 | Okay, so you see that sometimes it's not perfect

00:33:46.820 | So this time it decides to go with final answer straight away, and then we can see that the chat history was

00:33:52.340 | Considered so considering your location room your desire. Try a local pizzeria. I would recommend trying out pizza

00:34:00.180 | la Monte Carlo now

00:34:02.900 | this doesn't exist or I think it's a

00:34:07.940 | pizza pizzeria in

00:34:09.940 | Switzerland because it kept recommending me this all the time

00:34:13.500 | I've been through many

00:34:16.220 | iterations of getting this to work

00:34:18.180 | so

00:34:19.460 | Yeah, that was a hallucination. But then we run it one more time and it worked

00:34:24.780 | okay, so there's a little bit, you know, it could do a little bit of work in some places, but

00:34:30.060 | Second time it works best pizza in their room. That's what I'm looking for. Okay using the search prompt tool

00:34:36.820 | now

00:34:38.820 | That is our core LM function. I'm gonna be getting into the the graph stuff in a moment, but let's try

00:34:46.180 | Let's try taking this and feeding it into our search function and seeing what we get

00:34:52.740 | Cool, so that looks pretty relevant. I think very similar results to what we would have seen earlier as well

00:34:59.600 | Yeah, it's the room best piece of my life

00:35:02.660 | Bubble farts rock with 202 up votes. I

00:35:07.020 | Recently traveled to Italy as well the week after I came back

00:35:10.260 | Hardcore cravings took pizza from there

00:35:12.900 | Decent-sized city and yeah, Rome. Italy has good pizza. I agree

00:35:18.140 | So we have those results now what we've just done that we've kind of set up all the core logic of the different

00:35:26.700 | components of our

00:35:29.020 | Graph base agent, but we need to like put everything together

00:35:33.820 | We need to connect everything which we haven't started to do yet. So

00:35:38.220 | To do that. We are going to sell a few

00:35:41.500 | Components that are going to be there almost like wrappers for our functions

00:35:46.380 | That will be used within the graph itself

00:35:50.100 | And the reason we need these wrappers is because we're actually using this state object

00:35:54.380 | Remember the agent state from earlier that gets passed directly into here

00:35:58.460 | It's not true that this is a list. My my typing is is wrong there

00:36:03.320 | Let me just check what it actually why actually is okay, it's a typed date

00:36:09.500 | so let me take that and fix that quickly, okay, so

00:36:14.040 | tactic here here and

00:36:16.780 | Here

00:36:19.780 | Okay, cool. So let's just have a quick look at these run Oracle

00:36:24.080 | So this is just running our LLM. Okay, so call LLM check history state

00:36:29.980 | So, okay

00:36:32.580 | Yeah, I mean this with that is what it is we already went through to call LLM function we then have our router so

00:36:40.580 | If you remember here, right the Oracle can make it can go one of two ways

00:36:47.420 | It can go to finance or it can go to search the way that this is handled is actually

00:36:52.160 | there's more like a

00:36:54.240 | there's like an intermediate step here, which is the router, okay, and

00:36:59.380 | That is like okay based on what the Oracle outputs

00:37:04.480 | I will send you to one of these two directions one of these two places

00:37:09.020 | So that is what this is doing and it also

00:37:12.420 | so we also include some error handling in here, so if we see a

00:37:16.980 | tool name that we don't recognize

00:37:21.260 | We go directly to the final answer. Okay, which this might not be the best error handling. Actually. I'm not sure if it would work

00:37:28.980 | I don't know if I handle it or not, but it yeah, it doesn't really matter. I haven't seen it fail

00:37:33.820 | So it's it's okay

00:37:35.180 | Then we have this dictionary here

00:37:37.460 | Which will go from so if we see the term search it we know we need to use the search function if we see the time

00:37:44.660 | Final answer we know we need to use a final answer function and we use that in here

00:37:50.180 | so we have this tool string to function which is

00:37:53.980 | provided with the tool name based on the output from the

00:37:58.580 | Oracle and

00:38:01.180 | Based on that we're gonna get the function, right?

00:38:03.860 | So if it passes in the search string from the Oracle here this

00:38:08.060 | What I'm highlighting right here is going to become the search function and then in the search function

00:38:14.240 | We're going to pass in the tool arguments from the Oracle, right then from that we're going to get

00:38:20.640 | everything we need to construct a agent action and

00:38:25.040 | Then if the tool name is final and so we output in this in this format

00:38:31.360 | I'm not too sure why we need to do that. I will leave it but maybe just

00:38:35.440 | Question that I'm not I'm not sure if it's needed or

00:38:40.320 | Otherwise, we're going to add to the intermediate steps our action out. We should see. Yeah

00:38:47.120 | What we've output right so if we're using search tool

00:38:50.720 | We're going to add that to our intermediate steps the way that the say itself here is that this

00:38:55.680 | Single item here doesn't replace the entirety of the intermediate steps

00:38:59.720 | They actually gets added to the intermediate steps a little bit weird in terms of syntax in my opinion

00:39:06.200 | But that is actually what is happening

00:39:08.440 | Just to make you aware

00:39:10.480 | So we have that I need to run this

00:39:13.600 | Then we have

00:39:17.160 | You know our components for the graph. They're ready. We can construct the graph

00:39:21.960 | Okay, so we initialize the graph with our agent state

00:39:26.720 | So that earlier state object, which is a typed date not a list

00:39:31.360 | Then we add some nodes to our graph. So we have our Oracle our search our final answer

00:39:38.840 | What that will look like is literally this here ignoring the little router in the middle that does exist it

00:39:46.240 | you know it just isn't included in the nodes and the reason it doesn't exist in the nodes is that it actually exists is more of

00:39:53.680 | Like a within this conditional edge object here. So the conditional edge is basically that's like the dotted line

00:40:02.320 | right, where is the

00:40:04.320 | The actual edges down here from you know, the scalar jaw drawing I did are like actual lines

00:40:11.760 | It kind of looks like a dot line as well, but I don't know how to do it. How can I do it?

00:40:17.720 | Oh, there we go. Perfect. So that is what that would look like

00:40:21.560 | Versus that okay

00:40:24.520 | So that's a conditional edge. So that's going from the Oracle based on the router logic, which is like, okay

00:40:31.800 | Go to search or go to final answer

00:40:33.720 | The other thing that I'm missing here is the entry point of our graph, which is the Oracle

00:40:38.560 | So that's a starting point that we go to so that's where we insert the query

00:40:44.000 | Yeah

00:40:47.640 | Then we create our actual edges so the actual edges it's only

00:40:53.000 | okay, so here we're only adding the edge from

00:40:57.160 | the

00:40:59.280 | Search schema back to the Oracle

00:41:01.580 | All right, and in reality, I don't even need this bit here. I don't believe

00:41:08.160 | Yeah, because I can leave this here

00:41:11.400 | But then we're gonna see if tool name is not equal to final answer. We add the edge. So

00:41:16.880 | Honestly, I don't even know why I have that I can just remove it and I can even remove that

00:41:21.280 | but

00:41:23.080 | Hey, it's fine. Whatever. I don't want to break it now

00:41:27.080 | so once something does go to the final answer tool the final answer tool as

00:41:32.520 | We see in our graph here has one line coming out of it, which is answer, right?

00:41:39.440 | It's it's actually go to the end block

00:41:42.040 | Right in the end block is kind of like here and then the output from that is is this okay?

00:41:47.440 | And that's what we're doing there

00:41:48.400 | and then once all that is done we can compile our graph and if everything is set up in a

00:41:54.040 | Functional way it will compile and we will not get an error

00:41:57.320 | So yeah, that is our graph

00:42:01.160 | I have this little bit of code here that we can use in order to visualize

00:42:06.460 | Everything or visualize our graph. So I'm gonna take that

00:42:10.160 | let's pull it in and

00:42:12.960 | It is

00:42:16.760 | Okay

00:42:18.400 | See what we?

00:42:20.040 | See what we get. So we have this basically

00:42:23.320 | So the Oracle I think line graph is always adding this extra line here this optional line

00:42:30.360 | Because it is

00:42:33.920 | Basically allowing the LLM if it decides to to return a direct answer

00:42:38.680 | I think that's what that is

00:42:40.240 | But in reality, we've prompted it not to do that and we try to sell so it doesn't but this is what we get

00:42:46.440 | All right

00:42:47.160 | So it's our entry point it goes to the Oracle Oracle can go to final answer or search if it goes search

00:42:52.040 | it goes back to the Oracle and

00:42:54.040 | Then it would go to final answer and we end

00:42:57.320 | Okay, and that that is our graph

00:43:01.080 | super simple, but again, we're using a tiny model, so

00:43:05.240 | Something overly complicated probably won't work

00:43:09.380 | So I'm gonna say with our graph agent, where is the best pizza in Rome and let's see what it comes up with

00:43:17.760 | Okay

00:43:20.200 | So we see it

00:43:22.200 | Invoked the search tool here

00:43:24.400 | So query was best pizza in Rome. It got these three submissions from reddit

00:43:31.120 | It then went back to the Oracle which went to the router the router identified that it should go to the final

00:43:37.680 | answer or sorry the Oracle decided this the router identified that and

00:43:42.040 | we went to the final answer tool and then we have our

00:43:47.280 | Outputs. Okay based on your question. I would recommend trying I keep getting this recommendation. I

00:43:52.640 | Will actually try it and see how it is

00:43:56.000 | It looks kind of interesting is is

00:43:59.480 | Not what I would actually expect and it also isn't a Roman pizza. It's a Neapolitan pizza

00:44:07.360 | which

00:44:09.160 | Yeah, I'm sure the Romans would not be happy

00:44:12.520 | that their top recommended pizza in Rome is seems to be in a

00:44:17.360 | The Naples style pizza, and I also tried this. So where is the best gluten-free pizza in Rome? I'm not gluten-free but

00:44:25.800 | My girlfriend is so I thought okay, let's see and

00:44:31.960 | Let me see

00:44:36.000 | Let's see what we get. I didn't get a good response before

00:44:39.000 | Yeah

00:44:41.280 | So here's like unfortunately, I was unable to find specific recommendations of gluten-free pizza in Rome

00:44:47.160 | It seems like it's only generally considered a good destination for gluten-free options generally actually it is

00:44:55.280 | Oh

00:44:58.600 | Pizza and Trevi maybe we can we you know, we can try that. Oh, there we go

00:45:04.360 | So mark, so we did have some options here. I don't know why

00:45:09.600 | Expensive

00:45:11.200 | So pizza and Trevi. Let's have a look. Okay, so pizza in Trevi. Let's see

00:45:17.360 | Where you are?

00:45:20.280 | Okay

00:45:21.760 | looks

00:45:23.360 | Interesting. So near the center

00:45:25.360 | Maybe it's a good option

00:45:27.360 | Gluten-free beer gluten-free pizza. There we go. So

00:45:30.640 | That is an option

00:45:33.480 | The other one we can also try it's so pizza

00:45:38.960 | Bonatti

00:45:40.120 | Which is really interesting looking

00:45:42.120 | Here so a little further out just south of Trastevere and the center over here

00:45:48.880 | But if you look they have some kind of cool-looking pizza

00:45:54.840 | Like what I would not usually expect to find in Italy like this

00:46:00.320 | unique for sure

00:46:03.760 | But

00:46:06.160 | Looks good. So it seems like our

00:46:08.920 | fully local

00:46:11.920 | Pizza recommending agent does work. It's not perfect, but it does work and it has some good recommendations there again

00:46:20.160 | I will just point out that

00:46:22.160 | This is a tiny LLM

00:46:24.240 | Alright, so 8 billion parameters

00:46:26.760 | Running fully local on my m1. I think

00:46:32.400 | MacBook so I don't have any sort of powerful thing running this and

00:46:36.200 | Because of the sort of limitations of the memory on my on my MacBook

00:46:41.200 | I can only run the smaller models like the 8 billion parameter model

00:46:45.200 | But you could also see how quickly that was responding like it did not take long to go through multiple like agent steps

00:46:52.000 | Perform a search or read everything from that search and producing an answer to us and the answers were generally pretty good

00:46:59.440 | So honestly, I I think it's actually quite impressive that you could do that for sure

00:47:04.440 | The Jason mode seems to be the best approach for agents at least for now

00:47:10.520 | I know that the LLM a team are planning to add force function calling which I think will make a difference

00:47:18.240 | But for now Jason mode works perfectly. And yeah, so that's how we would build

00:47:24.800 | local agent using line graph and

00:47:28.480 | Llama, so that's it for this video. I hope this has been useful and interesting, but I'll leave it there for now

00:47:34.880 | So thank you very much for watching and I will see you again in the next one. Bye

00:47:39.360 | You

00:47:41.360 | You

00:47:43.360 | You

00:47:45.360 | You

00:47:47.360 | You

00:47:50.360 | You

00:47:52.360 | you

Local LangGraph Agents with Llama 3.1 + Ollama

Chapters