back to index

Local LangGraph Agents with Llama 3.1 + Ollama


Chapters

0:0 Local Agents with LangGraph and Ollama
1:0 Setting up Ollama and Python
5:35 Reddit API Tool
12:40 Overview of the Graph
17:11 Final Answer Tool
18:33 Agent State
19:9 Ollama Llama 3.1 Setup
26:21 Organizing Agent Tool Use
35:21 Creating Agent Nodes
39:14 Building the Agent Graph
43:10 Testing the Llama 3.1 Agent
46:7 Final Notes on Local Agents

Whisper Transcript | Transcript Only Page

00:00:00.000 | Today we are going to be taking a look at how we can build our own
00:00:03.400 | local agents using
00:00:06.440 | Langraph with a llama now line graph if you don't know it is a open source library from Lang chain
00:00:14.160 | that allows us to build
00:00:16.600 | agents within a graph like structure and
00:00:21.000 | it is currently my
00:00:24.320 | Preferred way of building agents whether they are on open AI or local wherever now
00:00:30.960 | Alarma is another open source project that allows us to run
00:00:35.880 | LLMs locally just very very easily
00:00:40.120 | So we're going to be running llama 3.1 the 8 billion parameter model
00:00:45.840 | Which is tiny and yet I get like reasonable responses and reasonable actions
00:00:54.120 | Coming from my agent, which is actually pretty cool. So with that, let's jump straight into it
00:01:00.040 | So we're going to be running everything locally here. This is slightly different to what I usually do
00:01:04.600 | We usually go through like a collab notebook, and I'm sure you probably could run this in a collab notebook
00:01:09.520 | but I want to show you how to run this locally on Mac because
00:01:13.180 | Generally speaking alarm is it works well with Mac because we have the unified memory
00:01:19.680 | which means we can run fairly large models quite easily and
00:01:23.160 | Actually really fast as well. So first thing that we would want to do is go to X
00:01:27.800 | This is supposed to be the URL and install a llama. So rather than going to the URL just type install llama
00:01:35.280 | Okay, and we have Mac OS
00:01:39.280 | Download Mac OS and we just download it. I already have mine downloaded. You can see the little icon up here
00:01:46.040 | So yeah, I'm not going to download it again
00:01:49.400 | But once you have downloaded it, you want to run this command in your terminal. So a llama pull llama
00:01:56.280 | Actually three point one is the one I'm using not this one
00:02:01.120 | Three point one eight B. So I'm gonna paste that in here a llama pull llama three point one eight B
00:02:08.080 | Right, so that has downloaded the model. It will take a little while
00:02:11.440 | It's literally I think it just maybe updated the model on my side here as I already had it installed
00:02:17.320 | Then we're going to want to set up a Python environment. I mean, you don't have to do this. It's up to you
00:02:22.200 | I would recommend it. So the reason I say that is because we're going to working from this
00:02:28.120 | Repository here. So this is the examples
00:02:31.400 | Repo, there'll be a link to that in the comments below. So I would one just get clone that
00:02:38.800 | So you can copy here
00:02:45.280 | Come here and just run get cologne
00:02:50.520 | That will download the entire repo for you, which is it's quite a lot in there to be fair
00:02:56.120 | But we only actually need this one bit and then you want to navigate into this directory
00:03:01.080 | so if I just show you the full path to where I am from the root of
00:03:06.760 | This directory. So I want to go one more some in learn
00:03:12.680 | generation line chain line graph zero to olama line graph agent
00:03:17.500 | Okay, there we are now in here
00:03:21.460 | Do LS you can see that we have this poetry lock and pyproject.toml file
00:03:26.620 | So this all the prerequisites that we need to run this little project
00:03:32.140 | Are contained within here, right and the way that we install these is
00:03:41.160 | If we go to our readme
00:03:44.120 | Well, actually we need to set up our
00:03:47.320 | Python environment, so I'm gonna do that. I'm using conda here. I know people have opinions on which
00:03:55.280 | Package manager they like to use I've been using this forever. So I just stick with it
00:04:00.400 | So that was that is exactly how I would create my environment there
00:04:10.140 | So you can now see that I'm in my new Python environment
00:04:14.140 | Then I'm going to go ahead and install poetry. So pip install poetry then we're going to do poetry install
00:04:21.740 | Okay, you probably will not see this when you run this
00:04:28.300 | But just in case you can run poetry lock no update to kind of fix that and then run poetry install again
00:04:38.340 | So you can see everything has just been installed and then with that we can go over and start running notebook
00:04:43.940 | Now the notebook is here
00:04:46.020 | again, same little directory here and
00:04:49.540 | You can ignore this if you are going with my colab you could install everything I think with this
00:04:58.280 | But we are not so
00:05:02.820 | What we need to do first is some in VS code, actually, let me close the sidebar
00:05:07.300 | And zoom in so in VS code select kernel select another
00:05:13.860 | Python environment if you don't see your environment on this list
00:05:20.140 | You might have to restart VS code or I'm actually in cursor, but same thing and now I'm going to try again
00:05:26.980 | So it's picked up on my environment that I would like to use
00:05:31.980 | So now we can go ahead and start running everything
00:05:34.500 | so the first thing we're going to need to do is actually what we're gonna be using the
00:05:39.940 | reddit API to get some suggestions for our agent our agent is going to be doing is
00:05:47.020 | Recommending pizza for me in Rome and the way that's going to do that is it's going to
00:05:53.260 | search for something in my case here is going to be searching for good pizza in Rome and
00:06:01.740 | It is then going to
00:06:03.740 | Decide like if you ask something like what is the best pizza in Rome? It's going to decide. Okay, I'm gonna go use
00:06:09.660 | Reddit to search or the search tool is that's what we tell it
00:06:14.020 | We don't tell it necessarily read it and it's going to return that information from a few different submissions on reddit
00:06:21.060 | Find what people are saying what their recommendations are and then it's going to tell me where I should go. So
00:06:27.900 | To use the reddit API we do need to sign up for it. You get a certain amount of free
00:06:32.900 | Calls every hour I think
00:06:35.660 | So we don't need to do I don't need to pay for anything, which is great
00:06:39.980 | you can follow this video, which is just I will leave a link again to this in the
00:06:46.380 | comments or
00:06:48.980 | You don't necessarily need to follow that and said so this is what I always do when I've
00:06:55.100 | Done something in the past and I've kind of forgotten how to do it. I
00:06:59.340 | search reddit API and then I just add my name on to the end and then because I
00:07:05.860 | Created this in the past. I'm like, okay, this seems to make sense to me
00:07:10.540 | Which is I suppose a positive thing of doing all these videos and articles
00:07:21.220 | Come down here and this is quite old
00:07:24.980 | This is already three and a half years ago now
00:07:28.640 | So it's a while but still everything I think is still up to date so you can go to app preferences here
00:07:36.380 | So it's reddit.com slash
00:07:38.380 | Press slash apps. I copy that put in here
00:07:43.020 | We go here. I have these I don't even remember what these two were
00:07:50.420 | So it's interesting and then I create this one. I
00:07:55.100 | Think no. No, sorry. I created this one just now for this video
00:08:00.100 | So if I go into here and see all this information here, which I'm going to use so
00:08:05.700 | Have my secret key you can try and you can try and use this as well if you like. See what happens
00:08:15.620 | I have my app key or ID. I'm not sure what is exactly
00:08:20.780 | What's it client ID? Okay, and you just put them into here. So here are initializing our
00:08:26.900 | Like reddit instance or client use agent. I think you just put the name of your app here
00:08:35.420 | It has been quite a while. I should rewatch my video on it
00:08:42.140 | in any case
00:08:44.060 | That's what you do. And then if I scroll down a little bit more I
00:08:48.180 | Think yeah here we get into the actual code
00:08:52.260 | now the code part is the part that I'm not going to follow because I wanted to do this like from
00:08:57.820 | Like using requests rather than using the pro library
00:09:02.260 | Which is what I would probably recommend doing. It's a lot easier and
00:09:08.300 | It's what we're going to use here. So you see we're importing Pro, which is the Python reddit API wrapper
00:09:15.260 | so we're using that and
00:09:17.580 | What we're going to do is basically gather a loads of data. So we're going to be getting
00:09:21.980 | Well this information actually so for each submission that we search for and find for a particular question
00:09:29.260 | we're going to get the title of that submission the description, so I think that's like a first comment and
00:09:35.740 | Then the sort of like more top rated comments that were within the thread of
00:09:43.620 | That submission and then I'm going to define and it's variable here, which is basically just going to get called
00:09:50.700 | Like let's say this is my object to call the string
00:09:57.560 | method here, obviously we just do string right and
00:10:03.060 | That will output a more like LM friendly representation of the information there. Okay, so
00:10:10.100 | I'm just
00:10:12.740 | Defining that so that it's kind of keep things a little cleaner when we're
00:10:16.660 | Building everything here. Otherwise, I'm using dictionaries and whatever else and that's fine
00:10:22.420 | Nothing wrong with that, but it's a little bit not clean or not organized. It's probably the right way to put it
00:10:29.060 | So let me go through this a little bit
00:10:31.060 | I don't want to focus too much on the reddit API stuff here because it's not really what we're here for
00:10:35.860 | But just very quickly, right so reddit subreddits
00:10:40.340 | We're looking through all of them and then we're searching across reddit best pizza in Rome. All right, so
00:10:48.940 | We initialize that list we go through we get, you know relevant information. We get all the comments. We include the upvotes
00:10:55.660 | I thought this is important and we also filter by the number of upvotes that the comments have
00:11:01.760 | The logic here couldn't be better. We could like say, okay. These are the top rated comments here
00:11:08.500 | I don't actually go with the top comments. I'm just going for kind of three of them that have you know, at least 20 upvotes
00:11:14.940 | which
00:11:17.020 | Works, but I'm you know, it could be better. But anyway, it doesn't matter. This is just
00:11:22.500 | Like a quick example, so you can see we run this and it's going through it's like, okay. I found these submissions
00:11:29.940 | This is Rome. Yep, since pizza is an American food. Yeah, it's a lot of Italians here
00:11:37.660 | That would not be very happy with that. We can then see what we got out from that
00:11:43.460 | So I'm doing the the string method here to get the recommendations. You can see title
00:11:48.300 | Description I was a little disappointed. No
00:11:52.260 | after
00:11:53.380 | pasta and gelato
00:11:55.380 | nice and
00:11:57.580 | Yeah, so we have some
00:12:00.300 | Recommendations here that they are coming up with this this guy Sisyphus rock. Cool. We have another one here
00:12:08.380 | Go to Naples
00:12:11.100 | Angering a lot of Romans right now. And yeah, we have some other ones as well
00:12:18.820 | Okay, that's kind of what we're getting out from that tool
00:12:20.980 | What we're going to do is just wrap all of that one in a function here. So it's all I'm doing here
00:12:24.940 | So go in a particular query. I'm going to rerun all of what we just did again
00:12:29.580 | I don't want to go through all this too much. This is all like, you know set up API stuff
00:12:34.340 | It's not really the agent itself. Cool. So we have all of that
00:12:41.260 | in our graph
00:12:43.260 | Might help me if I visualize this a little bit. Okay, so we have our
00:12:48.580 | Like reddit search API tool thing here. I'm just gonna call it search
00:12:54.500 | then we're also going to have
00:12:57.340 | If you've watched previous videos from me on
00:13:00.860 | Langrath you will see this pattern quite a lot rather than me allowing
00:13:06.100 | the agent to
00:13:08.780 | Return an answer directly
00:13:11.100 | I actually like to use a structured output format and the way that I do that is using a tool
00:13:16.060 | Well using a tool usually here. We're not technically using a tool. We're using the JSON output format
00:13:22.580 | I'll talk a bit more about the difference there soon
00:13:25.060 | But you can think of this as basically we're using a tool. So we have these two tools
00:13:32.100 | Technically kind of not really but it's fine. And then we have our LLM which is I call usually called the Oracle
00:13:41.860 | thanks to like
00:13:45.060 | the I
00:13:46.580 | Think they called it the Oracle in some line chain documentation a long time ago, and I just like the name so I'm sticking with it
00:13:52.660 | so the Oracle
00:13:55.060 | like the Oracle's is
00:13:57.380 | The decision maker right so it makes a decision based on your you know, what is going on, right?
00:14:03.540 | So we're gonna have our query coming in
00:14:05.540 | So it's our user query comes in then the Oracle is like, okay based on this query, what do I need to do?
00:14:14.540 | Alright, could I just answer the user directly right if I'm just saying?
00:14:18.860 | Hello, how are you?
00:14:21.460 | Whatever else right? Let's just like small like small talk. I don't need to use I you know
00:14:28.140 | Why would you use the search tool? There's no need
00:14:30.900 | So instead the Oracle should be able to just go directly to the final answer tool
00:14:37.580 | Okay, so we do give it that option and then if it goes to the final answer tool
00:14:43.260 | It's gonna output a structured output format, which is gonna look a little bit like with the answer
00:14:49.300 | Which is like a natural language answer and then we have some other parameters. I think it's like phone number
00:14:56.180 | So it's like a phone number for the restaurant if the if the agent has seen that within the data
00:15:04.780 | I think honestly using reddit comments. It probably isn't gonna come up with that, but you can you know, we can try and
00:15:12.420 | address
00:15:15.540 | Again, it's like Street address
00:15:17.540 | So that will be output formats, but of course, you know the phone number and address it doesn't need to output that every time
00:15:25.180 | Okay, so it's like they're more like optional parameters. I think in the prompting
00:15:29.580 | We just tell the agent to keep those empty if it doesn't know
00:15:33.180 | But answer it should provide every time now on the other hand if I ask
00:15:38.620 | Okay, tell me where to find the best pizza in Rome in that case. All right, the the Oracle will hopefully
00:15:45.660 | Hopefully use a search tool
00:15:48.620 | Right. So when it uses the search tool we go here
00:15:53.820 | It will get some information and then it will actually go back with that new information
00:16:00.420 | To the Oracle, right? So this is a like it has to go back
00:16:05.860 | so that's why I'm making this line a solid line, whereas these lines are dotted because
00:16:11.620 | These are like optional like it could go to finance that it could go to search
00:16:16.080 | Once it goes to search it has to go back to the Oracle component
00:16:20.860 | Then the Oracle component given this new information is like, okay. Now, what do I need to do?
00:16:25.960 | Ideally, it should just go to the final answer every single time
00:16:29.900 | Like this is a I think quite a simple agent like it is really not complex
00:16:36.200 | whatsoever
00:16:38.260 | But again, we're using a tiny LLM for this, right?
00:16:43.180 | Llama 3.1 very good, but we're using the 8 billion pounds of version of that which is tiny. So
00:16:50.200 | Honestly the fact that this works at all is actually kind of surprising to me, but pretty cool, right?
00:16:57.220 | And it's relatively reliable as well. Like it's there's the odd hallucination
00:17:01.900 | There's the odd like going straight to final answer when it should go to search
00:17:06.060 | But I don't see those issues all that often. So it is really not too bad. Okay, cool
00:17:12.620 | So now that I've explained that we can jump back into this
00:17:15.980 | So we have our final answer tool that we are going to be using
00:17:20.320 | okay, so we have all of this so yeah, we
00:17:25.060 | Initialize that all this is doing is formatting the output there. It's not actually doing anything
00:17:29.300 | It's just returning like the input to this will literally be the same as the output. So there's nothing
00:17:34.980 | Nothing going on there. Really then once we have our to like we have our search function and
00:17:42.900 | We have our final answer function. Note that I'm not using line chain tools here and
00:17:49.060 | there is a reason for that basically, we're not using all of the
00:17:54.100 | line graph or line chain
00:17:56.100 | Functions directly here. We're actually going direct to
00:17:59.860 | Olama
00:18:01.900 | That's because honestly, I just found the Olama implementation via line chain to be lacking in in some places
00:18:08.960 | Particularly with the tool calling which we're not actually using anymore, but with tool calling I couldn't even get to work whatsoever. So
00:18:16.900 | because of that I switch to using a llama directly and
00:18:21.540 | just suck with it because honestly it
00:18:24.300 | There's not really much need to use the the wrapper
00:18:27.860 | From line chain for a llama in my opinion. I kind of prefer doing it this way. So
00:18:33.580 | We initialize the agent state
00:18:36.620 | I'm not gonna go too into detail on what all of these parts are here because I covered all this I think
00:18:42.980 | Pretty well in my previous video on
00:18:47.820 | Line graph, so I would just recommend
00:18:49.820 | Having a look at that again. I'll make sure there's a link to that in the description and the in the comments if you are
00:18:58.040 | Interested in more but basically this is an object that is persisted within every step of our
00:19:06.340 | our agent graph
00:19:09.060 | So the LM as I mentioned before the LM is the Oracle. It's our decision maker
00:19:15.300 | So we're just setting it up. We have our system prompt here. Like you are the Oracle AI decision maker
00:19:21.580 | You are to provide the user the best possible restaurant recommendation including key information about why it should consider
00:19:29.020 | visiting or ordering
00:19:31.700 | So and so on I mentioned here returning the address phone number websites
00:19:36.540 | now this bits important because so the olama tool calling at the moment is
00:19:44.780 | Not that like it works, but you can't force
00:19:48.500 | Tool calling and I think because you can't force the tool calling
00:19:52.900 | I found the tool calling to be really hit and miss especially when you start adding multiple tools and even more so
00:19:59.680 | When you start adding multiple
00:20:02.180 | Steps to the agent where it can use one tool and a node tool and node tool, right? So for example
00:20:10.020 | In that agentic flow that I showed you where it uses a search tool and it uses a final answer tool
00:20:15.980 | I could not get a llama working where it would use
00:20:20.220 | both one after the other so I had to in the end just switch back to the
00:20:27.140 | Like JSON formatting and with JSON formatting it works really well
00:20:31.260 | You just need to make sure that you prompt it within your system prompt to use the JSON format
00:20:36.380 | So that's what I'm doing here, right when using a tool you provide the tool name and the arguments to use in JSON formats
00:20:42.760 | You must only use one tool and the response form must always be in the pattern and then you give it that
00:20:49.000 | The JSON output for that right here. I said don't use the search tool more than three times
00:20:56.300 | actually, I try and get it not to use it more than once but I
00:21:00.180 | Wanted to leave a little bit of flexibility there
00:21:02.980 | if we tell it that if it uses more than three times, then we threaten it with nuclear annihilation and
00:21:09.820 | That seems to work some of the time. It's quite
00:21:13.460 | Yeah, it's a daring a lamp for sure. Then what I do is after using search tool
00:21:20.180 | You must summarize your findings with a finance tool. This is what I wanted to do
00:21:23.340 | Okay, so I'm just telling it like giving it as much context as I possibly can
00:21:30.580 | Okay, so that is
00:21:32.580 | Set up one other thing. I wanted to get in here is the function schemas or tool schemas. So
00:21:39.660 | I'm using some utils from semantic router for this so you can
00:21:45.320 | See what we're doing. Hopefully you won't need to this is like a I think a bug in the library at the moment
00:21:51.260 | So you should not need to do that soon, but the moment it's there
00:21:56.420 | So yeah, you have this and also sorry so this you can use with a llama tool calling
00:22:04.060 | That that's what it's built for
00:22:06.180 | But you can also just use it to provide like the JSON schema of the tools that you would like the agent to use
00:22:13.740 | When you're using JSON mode
00:22:16.220 | So it basically you take your function. So this is a function we described earlier
00:22:22.740 | We create this function schema object with it. And then we just use the to a llama method and then we output this
00:22:29.360 | Okay, that's that's it. Right? So it's taking the taking your dark string taking your parameters and
00:22:35.980 | I think that's basically all we need from that to be honest. All right, cool
00:22:40.460 | And then we also do the same for the final answer. Yeah
00:22:44.020 | Okay. Yeah, and that's it. So we we have our
00:22:48.980 | Tools that I sell with JSON mode and we can go ahead and actually try using
00:22:57.180 | So model again, one thing I mentioned earlier is that you do need to have run this so a llama pool llama 3.1
00:23:07.260 | So let's see 8 billion parameters. I don't remember what the largest size is, but you can just modify it
00:23:12.940 | So I'm pretty sure it's not this size
00:23:15.460 | But if it was like 38 billion parameters, you just you just put that in there
00:23:21.060 | Pretty pretty simple also the quantization stuff. You can you can put it into there like around here. So we have
00:23:29.180 | model
00:23:31.460 | We have messages and format. So this is important. This is what I mentioned before
00:23:34.700 | We always want the LM to be outputting in JSON format. So we have that structured
00:23:40.100 | Output that we can then process. So yeah, we we do want that
00:23:45.740 | that's how we get the tools and
00:23:48.420 | Everything to work and then what I do is so we have this get system tools prompt
00:23:54.420 | so I'm basically combining the system prompts that we defined earlier and
00:23:58.620 | I'm also taking the tools that we have defined here and
00:24:02.100 | Then putting them together right in this little
00:24:05.940 | Function so here the system prompt the tools which is a list of dictionaries
00:24:10.260 | We create a tools string and then we have system prompt few newline characters
00:24:15.180 | And then you can use the following tools and then we describe those tools there. So that is our sort of
00:24:22.500 | tool augmented system prompt
00:24:26.180 | just passing our tools there and
00:24:29.140 | Then this is a simplified version here
00:24:33.740 | So I'm trying to say hello there, right and what we should see is when I say hello there
00:24:37.660 | It should not use the search tool. It should just use a final answer tool
00:24:40.980 | I'm missing something
00:24:43.460 | system prompt
00:24:45.460 | Okay, I need to run this
00:24:48.700 | And run this again
00:24:51.220 | Okay, cool. Let's see what we got. So yep. I went straight to final answer and
00:24:57.380 | Okay, you can see what so the final answer outputs everything in a string
00:25:02.100 | But we we just pass it. Okay, so we have
00:25:06.260 | Message
00:25:08.820 | Content. Ah, okay. Perfect
00:25:10.820 | So we have the name final answers
00:25:13.260 | That's the the tool that we'd like to use and then the parameters that we want to feed in there just like hello
00:25:18.060 | I'm here to help you find a great restaurant
00:25:19.860 | What kind of cuisine are you in the mood for and then of course phone number and address?
00:25:25.260 | You know that doesn't need to answer those so it just left them as none
00:25:29.980 | Okay, cool. Now, let's see if we can get to use the I put web search here. It's actually just it's
00:25:37.100 | reddit search I suppose it's a reddit search tool and
00:25:40.980 | Hi, I'm looking for the best pizzeria in el
00:25:44.620 | Rome, so I'm actually not going to go out because it's a very specific place and I think there's like no one on
00:25:52.460 | Reddit talking about pizza there. So let's just go with Rome
00:25:58.340 | Okay, so the agent based on this so you see we have chats
00:26:03.620 | We pass all this stuff in and asking for that
00:26:06.460 | It said okay, I'm going to use search tool and I'm going to use search tool with this query best pizzeria in Rome
00:26:13.700 | Okay, so that work to decide is to use the right tool, which is pretty cool, especially given the model size
00:26:20.380 | Okay now
00:26:22.420 | We're gonna use a pedantic base model again
00:26:26.620 | So we're gonna be using this for the agent actions
00:26:30.580 | So the agent actions are well actually what we hit saw him
00:26:34.660 | That's this an agent action. The agent is deciding it's going to use the tool name of search
00:26:39.840 | The tool inputs is going to be this dictionary here and then tool output
00:26:45.760 | We don't have that yet because we need to run the tool to get that
00:26:48.760 | We handle that later in some other a little chunk of code. So
00:26:53.460 | From a llama. So we have the alarm a response again alarm a response never going to include the tool output
00:27:00.380 | So we just include tool name and tool input
00:27:02.660 | So basically this here why I got here is happening here, right?
00:27:08.540 | So we're just passing the alarm a response into this agent action object
00:27:14.540 | Then what we are doing here. We're getting the text
00:27:19.660 | so what tools use the tool name the input so the the parameters and
00:27:25.460 | If we have the tool output because we add that later
00:27:29.300 | We're also going to pass the we're going to return the output. Now we return that text so
00:27:35.980 | We can see I'll create that we have now an agent action object tool name search tool input
00:27:43.260 | This and tool output none because we we haven't said that yet
00:27:47.820 | So that is good. And why do I care about doing that again? I just want to keep things organized and
00:27:54.160 | - when it does come to passing this like multiple steps of where an agent might be doing different things
00:28:01.620 | Like it may use search and then it may use the final answer tool or maybe it's going to use search
00:28:06.140 | three times hopefully not anymore and
00:28:08.780 | Then use the final answer tool. We want to keep a log of what is happening and the way I've set up here
00:28:16.340 | I don't know if this is the best way of doing it with llama 3.1
00:28:19.800 | But the way that I say up here is that it's going to take these
00:28:23.380 | Agent actions and it's going to format them into like a single agent action. It's going to format it into two
00:28:30.220 | Messages which makes it appear like it's a conversation happening between the assistant and the user
00:28:35.820 | Okay, so it's like the assistant is providing the the function call and then the user is answering based on the
00:28:44.500 | Output of that function call. Okay, so
00:28:47.480 | Let's see if we can if I can give you an example
00:28:50.540 | so action to
00:28:53.300 | message or messages
00:28:55.300 | Okay, we have our agent action here and then we have the action to message function
00:29:02.300 | Okay, so this is just an example, right fake tool name fake query fake output from the function call
00:29:10.580 | And we will get this so we're gonna get an assistant message with the inputs and the user message
00:29:16.620 | Representing the output and then we're gonna feed that into our agent as it's kind of going through this process of using tools
00:29:24.220 | So that's what we're doing here. So the crate scratch pad is basically handling this
00:29:30.120 | conversion for us for multiple actions and
00:29:34.300 | Then that scratch pad gets inserted
00:29:38.740 | Into here, right? So after the previous like the current user input
00:29:45.020 | We then add a little bit of additional logic around that as well
00:29:49.960 | so if the stretch pad has been called at least once so there's at least one tool use I
00:29:55.820 | You know, it's a small LM so it needs a little bit of extra
00:30:00.540 | Guidance, so that is what I've done here. So I've added basically another user message
00:30:06.340 | I append on to the the scratch pad
00:30:09.260 | Messages saying okay, please continue as a reminder. My query was this original query
00:30:16.020 | the reason I added this is because it tended to I
00:30:18.940 | Would find that the the agent would go off and start searching about you know
00:30:23.660 | The best food in it would start with Rome and then it would be like, okay
00:30:27.220 | what's the best food in LA and then what is the best food in like Austin like it would just kind of
00:30:34.660 | Start asking like what is the best food noise different places? And of course, I don't want it to do that
00:30:40.100 | So I'm just reminding it again. Look, this is my original query. This is what I want to know about
00:30:45.180 | So yeah, I found that to be relatively important for this model and like only answer the original query and nothing else
00:30:52.740 | So I'm trying to encourage you to not
00:30:55.020 | Kind of view those other messages as something that it should respond to
00:31:00.620 | Right that the kind of fake messages I created by the scratch pad again
00:31:04.980 | there's probably a better way of doing that but it's just you know for this example and
00:31:09.140 | Then another thing that I found is that it would be quite
00:31:12.820 | Loose on details in the answer field
00:31:17.500 | Like I wanted it to give me a bit of more of like a human sounding description like oh
00:31:22.140 | you should try this because you know X Y & Z and you should try this other place because
00:31:26.200 | so on and so on and
00:31:28.820 | what I would find is I'd be like, hey, you should try X and
00:31:32.540 | That would be it. All right, and so it was like not very interesting
00:31:38.660 | So I added this a little bit of prompting here and that seemed to improve things
00:31:43.540 | Then I just asked it to remember to leave the contact details are prompting looking restaurant if possible now
00:31:50.500 | another thing that I still found it was doing even after adding all of this is
00:31:56.060 | It would maybe not search for what is the best food in LA or what was the best food in New York?
00:32:01.140 | So on so on but it might start saying okay. What is the best food in Rome?
00:32:05.660 | cool, what is the
00:32:08.540 | most recommended food in Rome like it or even just repeat the exact same query again, so
00:32:15.580 | Added another little bit to the scratchpad as soon as it has used the search tool to say you must now use the final answer
00:32:22.580 | Tool to kind of be like, okay, just use the finance tool stop being and stop using the search tool
00:32:28.620 | So yeah that helps it does limit the agent a little bit in
00:32:34.060 | Okay, maybe using the search tool a few times to try different search terms, but I found it didn't really need that
00:32:41.860 | Anyway, so this was fine. Then. Yeah, we put everything together as we did before
00:32:47.060 | so we have the system prompt as before the chat history the users query and then the scratchpad and
00:32:54.540 | then yeah, we
00:32:57.180 | we we just make the query when I remove this and
00:33:02.860 | Yeah, we return the action agent action run this
00:33:07.780 | cool, so we're gonna try the
00:33:11.420 | Chat LM function, which is actually this one. We just went through and yeah, we create some like fake chat history
00:33:18.900 | So hi there. How you I'm currently Rome
00:33:21.220 | So actually one important thing here look that I mentioned
00:33:25.300 | I'm currently in Rome and then I'm like, hey, I'm looking for the best pizzeria near me
00:33:30.580 | Right, so I'm mentioning this in the history like my current location and then I'm asking for the pizzeria
00:33:37.900 | So I'm just testing here that okay chat history is actually considered. It's important
00:33:42.900 | Okay, so you see that sometimes it's not perfect
00:33:46.820 | So this time it decides to go with final answer straight away, and then we can see that the chat history was
00:33:52.340 | Considered so considering your location room your desire. Try a local pizzeria. I would recommend trying out pizza
00:34:00.180 | la Monte Carlo now
00:34:02.900 | this doesn't exist or I think it's a
00:34:07.940 | pizza pizzeria in
00:34:09.940 | Switzerland because it kept recommending me this all the time
00:34:13.500 | I've been through many
00:34:16.220 | iterations of getting this to work
00:34:19.460 | Yeah, that was a hallucination. But then we run it one more time and it worked
00:34:24.780 | okay, so there's a little bit, you know, it could do a little bit of work in some places, but
00:34:30.060 | Second time it works best pizza in their room. That's what I'm looking for. Okay using the search prompt tool
00:34:38.820 | That is our core LM function. I'm gonna be getting into the the graph stuff in a moment, but let's try
00:34:46.180 | Let's try taking this and feeding it into our search function and seeing what we get
00:34:52.740 | Cool, so that looks pretty relevant. I think very similar results to what we would have seen earlier as well
00:34:59.600 | Yeah, it's the room best piece of my life
00:35:02.660 | Bubble farts rock with 202 up votes. I
00:35:07.020 | Recently traveled to Italy as well the week after I came back
00:35:10.260 | Hardcore cravings took pizza from there
00:35:12.900 | Decent-sized city and yeah, Rome. Italy has good pizza. I agree
00:35:18.140 | So we have those results now what we've just done that we've kind of set up all the core logic of the different
00:35:26.700 | components of our
00:35:29.020 | Graph base agent, but we need to like put everything together
00:35:33.820 | We need to connect everything which we haven't started to do yet. So
00:35:38.220 | To do that. We are going to sell a few
00:35:41.500 | Components that are going to be there almost like wrappers for our functions
00:35:46.380 | That will be used within the graph itself
00:35:50.100 | And the reason we need these wrappers is because we're actually using this state object
00:35:54.380 | Remember the agent state from earlier that gets passed directly into here
00:35:58.460 | It's not true that this is a list. My my typing is is wrong there
00:36:03.320 | Let me just check what it actually why actually is okay, it's a typed date
00:36:09.500 | so let me take that and fix that quickly, okay, so
00:36:14.040 | tactic here here and
00:36:19.780 | Okay, cool. So let's just have a quick look at these run Oracle
00:36:24.080 | So this is just running our LLM. Okay, so call LLM check history state
00:36:29.980 | So, okay
00:36:32.580 | Yeah, I mean this with that is what it is we already went through to call LLM function we then have our router so
00:36:40.580 | If you remember here, right the Oracle can make it can go one of two ways
00:36:47.420 | It can go to finance or it can go to search the way that this is handled is actually
00:36:52.160 | there's more like a
00:36:54.240 | there's like an intermediate step here, which is the router, okay, and
00:36:59.380 | That is like okay based on what the Oracle outputs
00:37:04.480 | I will send you to one of these two directions one of these two places
00:37:09.020 | So that is what this is doing and it also
00:37:12.420 | so we also include some error handling in here, so if we see a
00:37:16.980 | tool name that we don't recognize
00:37:21.260 | We go directly to the final answer. Okay, which this might not be the best error handling. Actually. I'm not sure if it would work
00:37:28.980 | I don't know if I handle it or not, but it yeah, it doesn't really matter. I haven't seen it fail
00:37:33.820 | So it's it's okay
00:37:35.180 | Then we have this dictionary here
00:37:37.460 | Which will go from so if we see the term search it we know we need to use the search function if we see the time
00:37:44.660 | Final answer we know we need to use a final answer function and we use that in here
00:37:50.180 | so we have this tool string to function which is
00:37:53.980 | provided with the tool name based on the output from the
00:37:58.580 | Oracle and
00:38:01.180 | Based on that we're gonna get the function, right?
00:38:03.860 | So if it passes in the search string from the Oracle here this
00:38:08.060 | What I'm highlighting right here is going to become the search function and then in the search function
00:38:14.240 | We're going to pass in the tool arguments from the Oracle, right then from that we're going to get
00:38:20.640 | everything we need to construct a agent action and
00:38:25.040 | Then if the tool name is final and so we output in this in this format
00:38:31.360 | I'm not too sure why we need to do that. I will leave it but maybe just
00:38:35.440 | Question that I'm not I'm not sure if it's needed or
00:38:40.320 | Otherwise, we're going to add to the intermediate steps our action out. We should see. Yeah
00:38:47.120 | What we've output right so if we're using search tool
00:38:50.720 | We're going to add that to our intermediate steps the way that the say itself here is that this
00:38:55.680 | Single item here doesn't replace the entirety of the intermediate steps
00:38:59.720 | They actually gets added to the intermediate steps a little bit weird in terms of syntax in my opinion
00:39:06.200 | But that is actually what is happening
00:39:08.440 | Just to make you aware
00:39:10.480 | So we have that I need to run this
00:39:13.600 | Then we have
00:39:17.160 | You know our components for the graph. They're ready. We can construct the graph
00:39:21.960 | Okay, so we initialize the graph with our agent state
00:39:26.720 | So that earlier state object, which is a typed date not a list
00:39:31.360 | Then we add some nodes to our graph. So we have our Oracle our search our final answer
00:39:38.840 | What that will look like is literally this here ignoring the little router in the middle that does exist it
00:39:46.240 | you know it just isn't included in the nodes and the reason it doesn't exist in the nodes is that it actually exists is more of
00:39:53.680 | Like a within this conditional edge object here. So the conditional edge is basically that's like the dotted line
00:40:02.320 | right, where is the
00:40:04.320 | The actual edges down here from you know, the scalar jaw drawing I did are like actual lines
00:40:11.760 | It kind of looks like a dot line as well, but I don't know how to do it. How can I do it?
00:40:17.720 | Oh, there we go. Perfect. So that is what that would look like
00:40:21.560 | Versus that okay
00:40:24.520 | So that's a conditional edge. So that's going from the Oracle based on the router logic, which is like, okay
00:40:31.800 | Go to search or go to final answer
00:40:33.720 | The other thing that I'm missing here is the entry point of our graph, which is the Oracle
00:40:38.560 | So that's a starting point that we go to so that's where we insert the query
00:40:47.640 | Then we create our actual edges so the actual edges it's only
00:40:53.000 | okay, so here we're only adding the edge from
00:40:59.280 | Search schema back to the Oracle
00:41:01.580 | All right, and in reality, I don't even need this bit here. I don't believe
00:41:08.160 | Yeah, because I can leave this here
00:41:11.400 | But then we're gonna see if tool name is not equal to final answer. We add the edge. So
00:41:16.880 | Honestly, I don't even know why I have that I can just remove it and I can even remove that
00:41:23.080 | Hey, it's fine. Whatever. I don't want to break it now
00:41:27.080 | so once something does go to the final answer tool the final answer tool as
00:41:32.520 | We see in our graph here has one line coming out of it, which is answer, right?
00:41:39.440 | It's it's actually go to the end block
00:41:42.040 | Right in the end block is kind of like here and then the output from that is is this okay?
00:41:47.440 | And that's what we're doing there
00:41:48.400 | and then once all that is done we can compile our graph and if everything is set up in a
00:41:54.040 | Functional way it will compile and we will not get an error
00:41:57.320 | So yeah, that is our graph
00:42:01.160 | I have this little bit of code here that we can use in order to visualize
00:42:06.460 | Everything or visualize our graph. So I'm gonna take that
00:42:10.160 | let's pull it in and
00:42:12.960 | It is
00:42:18.400 | See what we?
00:42:20.040 | See what we get. So we have this basically
00:42:23.320 | So the Oracle I think line graph is always adding this extra line here this optional line
00:42:30.360 | Because it is
00:42:33.920 | Basically allowing the LLM if it decides to to return a direct answer
00:42:38.680 | I think that's what that is
00:42:40.240 | But in reality, we've prompted it not to do that and we try to sell so it doesn't but this is what we get
00:42:46.440 | All right
00:42:47.160 | So it's our entry point it goes to the Oracle Oracle can go to final answer or search if it goes search
00:42:52.040 | it goes back to the Oracle and
00:42:54.040 | Then it would go to final answer and we end
00:42:57.320 | Okay, and that that is our graph
00:43:01.080 | super simple, but again, we're using a tiny model, so
00:43:05.240 | Something overly complicated probably won't work
00:43:09.380 | So I'm gonna say with our graph agent, where is the best pizza in Rome and let's see what it comes up with
00:43:20.200 | So we see it
00:43:22.200 | Invoked the search tool here
00:43:24.400 | So query was best pizza in Rome. It got these three submissions from reddit
00:43:31.120 | It then went back to the Oracle which went to the router the router identified that it should go to the final
00:43:37.680 | answer or sorry the Oracle decided this the router identified that and
00:43:42.040 | we went to the final answer tool and then we have our
00:43:47.280 | Outputs. Okay based on your question. I would recommend trying I keep getting this recommendation. I
00:43:52.640 | Will actually try it and see how it is
00:43:56.000 | It looks kind of interesting is is
00:43:59.480 | Not what I would actually expect and it also isn't a Roman pizza. It's a Neapolitan pizza
00:44:07.360 | which
00:44:09.160 | Yeah, I'm sure the Romans would not be happy
00:44:12.520 | that their top recommended pizza in Rome is seems to be in a
00:44:17.360 | The Naples style pizza, and I also tried this. So where is the best gluten-free pizza in Rome? I'm not gluten-free but
00:44:25.800 | My girlfriend is so I thought okay, let's see and
00:44:31.960 | Let me see
00:44:36.000 | Let's see what we get. I didn't get a good response before
00:44:41.280 | So here's like unfortunately, I was unable to find specific recommendations of gluten-free pizza in Rome
00:44:47.160 | It seems like it's only generally considered a good destination for gluten-free options generally actually it is
00:44:58.600 | Pizza and Trevi maybe we can we you know, we can try that. Oh, there we go
00:45:04.360 | So mark, so we did have some options here. I don't know why
00:45:09.600 | Expensive
00:45:11.200 | So pizza and Trevi. Let's have a look. Okay, so pizza in Trevi. Let's see
00:45:17.360 | Where you are?
00:45:21.760 | looks
00:45:23.360 | Interesting. So near the center
00:45:25.360 | Maybe it's a good option
00:45:27.360 | Gluten-free beer gluten-free pizza. There we go. So
00:45:30.640 | That is an option
00:45:33.480 | The other one we can also try it's so pizza
00:45:38.960 | Bonatti
00:45:40.120 | Which is really interesting looking
00:45:42.120 | Here so a little further out just south of Trastevere and the center over here
00:45:48.880 | But if you look they have some kind of cool-looking pizza
00:45:54.840 | Like what I would not usually expect to find in Italy like this
00:46:00.320 | unique for sure
00:46:06.160 | Looks good. So it seems like our
00:46:08.920 | fully local
00:46:11.920 | Pizza recommending agent does work. It's not perfect, but it does work and it has some good recommendations there again
00:46:20.160 | I will just point out that
00:46:22.160 | This is a tiny LLM
00:46:24.240 | Alright, so 8 billion parameters
00:46:26.760 | Running fully local on my m1. I think
00:46:32.400 | MacBook so I don't have any sort of powerful thing running this and
00:46:36.200 | Because of the sort of limitations of the memory on my on my MacBook
00:46:41.200 | I can only run the smaller models like the 8 billion parameter model
00:46:45.200 | But you could also see how quickly that was responding like it did not take long to go through multiple like agent steps
00:46:52.000 | Perform a search or read everything from that search and producing an answer to us and the answers were generally pretty good
00:46:59.440 | So honestly, I I think it's actually quite impressive that you could do that for sure
00:47:04.440 | The Jason mode seems to be the best approach for agents at least for now
00:47:10.520 | I know that the LLM a team are planning to add force function calling which I think will make a difference
00:47:18.240 | But for now Jason mode works perfectly. And yeah, so that's how we would build
00:47:24.800 | local agent using line graph and
00:47:28.480 | Llama, so that's it for this video. I hope this has been useful and interesting, but I'll leave it there for now
00:47:34.880 | So thank you very much for watching and I will see you again in the next one. Bye