back to indexLlama Index Workflows | Building Async AI Agents
Chapters
0:0 Llama Index Workflows
0:53 Llama Index vs. LangGraph
5:27 Python Prerequisites
6:40 Building Knowledge Base
8:20 Defining Agent Tools
11:2 Defining the LLM
12:31 Llama Index Workflow Events
14:0 Llama Index Agent Workflow
24:25 Debugging our Workflow
26:47 Using and Tweaking our Agent
30:5 Testing Llama Index Async
00:00:00.000 |
Today, we're going to be taking a look at LLAMA Indexer's workflows for building agents. 00:00:06.720 |
Now, LLAMA Indexer's workflow, as explained in their introductory article here, is a new 00:00:18.880 |
So even this visual here just gives you a good understanding of what they're trying 00:00:23.480 |
So you're defining these steps that your agentic flow can take, and you also define these events 00:00:35.280 |
Now, a lot of people would compare this to Langrath, and I think that is completely justified. 00:00:42.160 |
So before we dive into building stuff with LLAMA Index, I just want to talk a little 00:00:48.080 |
bit about what I think the main differences between the two libraries are. 00:00:53.600 |
Now, there are many differences, but the most fundamental, in my opinion, are LLAMA Index, 00:00:59.680 |
when compared to Langrath, seems to offer higher level abstractions that seem to be 00:01:05.120 |
structured a little bit better than what Langrath does. 00:01:09.080 |
And I wouldn't say that's either a pro or a con, because honestly, one of the things 00:01:13.780 |
that I like about Langrath is that you can go a little more low-level. 00:01:23.200 |
There's still a ton of abstractions in Langrath, and it's very confusing, and that is also 00:01:30.160 |
There's a lot of abstractions, but I feel with Langrath, it's a little bit easier to 00:01:34.360 |
remove at least a few of those abstractions, which I usually prefer, but it's not always 00:01:43.660 |
Especially if you don't need to know every single thing that is going into your LLM or 00:01:47.540 |
coming out of these different steps, that's not always needed. 00:01:51.080 |
So whether this is a pro or a con is really dependent on, in my opinion, the developer 00:02:00.720 |
Now, conceptually, I think the structure of the agents that you're building between these 00:02:10.800 |
However, with Langrath, the focus is on kind of like building a graph, so you're connecting 00:02:16.340 |
these nodes and edges, whereas with Llama Index, it is event-driven. 00:02:23.520 |
So one of your steps could output event type A, and that would trigger step type A. Or 00:02:33.020 |
it could instead output event type B, and that would trigger step type B. Okay? 00:02:39.940 |
So it's more looking at what events trigger which steps. 00:02:47.120 |
Now that, I think, mostly makes a difference when you're building stuff. 00:02:52.700 |
I think the actual outcome of what you get from that, at least from my experience, is 00:03:02.220 |
I think you can build pretty similar structures with both of those approaches. 00:03:11.780 |
How are you thinking about what you're building? 00:03:14.280 |
I think that's the main difference in terms of the structure. 00:03:17.860 |
And finally, one thing that I have noticed with Llama Index, which I really like actually, 00:03:24.740 |
is that they prioritize the use of async a lot. 00:03:30.260 |
Now, that may also come with pros and cons, but in my opinion, it's mainly pros. 00:03:35.980 |
So the learning curve for developers that are not too familiar with writing asynchronous 00:03:44.340 |
However, the end product that you will build is almost definitely going to be better if 00:03:52.020 |
you're writing async code, particularly with agents. 00:03:56.460 |
And I say that because agents contain LLMs, and LLMs take a little bit of time to process, 00:04:05.980 |
If you're writing asynchronous code, whilst your LLM is doing that, your code is just 00:04:15.000 |
If you're writing asynchronous code, your code can be, you know, doing other things 00:04:21.140 |
And it's not just waiting for the LLM, there's other things, of course, that you're waiting 00:04:25.940 |
So what that translates to when you are kind of more forced to write async code is something 00:04:32.660 |
that's probably a lot more scalable and more performant. 00:04:36.380 |
Now, they're the main key things to take note of, in my opinion, when you're comparing 00:04:46.120 |
But one thing I should note here as well is I've used LineGraph a lot more than I've used 00:04:54.580 |
I have gone to prod with LineGraph, I have not with Llama Index workflows. 00:05:02.080 |
So there may be a lot of stuff that I'm missing. 00:05:04.540 |
And I just want to caveat everything I'm saying with that fact. 00:05:09.700 |
Nonetheless, at least from what I've seen so far, that's what I've noticed. 00:05:15.180 |
So I think it's relatively representative of the two libraries. 00:05:20.020 |
In any case, let's jump straight into building something with Llama Index workflows. 00:05:26.700 |
Okay, so we're first just going to install the prerequisite libraries. 00:05:32.540 |
This example we're going to go through, we did something very similar in the past with 00:05:38.200 |
So it might be useful, particularly for going from LineGraph to Llama Index or the other 00:05:43.420 |
It might be useful to have a look at that video as well. 00:05:47.940 |
So I'll just make sure there's a link to that in the description and comments. 00:05:54.180 |
But essentially what we're going to be doing is building an AI research agent using Llama 00:06:01.060 |
Index, of course, with a few components, right? 00:06:08.300 |
Those components or tools are going to be a rag search tool, rag search filter, a web 00:06:15.560 |
search tool, and I think a archive search tool, if I remember correctly. 00:06:22.820 |
So all those are going to come together and basically be tools that our research agent 00:06:27.700 |
And we're going to construct all of the links between everything using Llama Index workflows. 00:06:40.700 |
And I will go through this bit kind of quickly because I want to mainly focus on Llama Index 00:06:46.140 |
So we're going to be using this dataset for our rag components. 00:06:52.100 |
So I'm going to come to here and we're going to just initialize our, like our connection 00:06:59.460 |
We're going to be using the text embedding three small embedding model to construct our 00:07:07.940 |
Then we come down to Pinecone here and also initialize our connection to Pinecone. 00:07:15.380 |
You know, for both of these, we do need API keys. 00:07:17.900 |
So for OpenAI, there's a link here, it's just, I think, platform.openai.com. 00:07:24.380 |
And then here, we're going to app.pinecone.io, and then we just insert our API key there. 00:07:34.700 |
Then we're going to set up our index, just want to check what the free tier serverless 00:07:42.980 |
I'm going to see it on the pricing page for Pinecone. 00:07:45.740 |
So it says US East 1 is the free tier region. 00:07:52.940 |
Then what we're going to do is get the dimensions for our embedding model, which is this 1.5.3.6. 00:08:01.540 |
And you'll see that I already have my index defined here. 00:08:04.580 |
So I run this before, but otherwise what you will see here is that your vector count will 00:08:10.980 |
So to populate your index, you will need to run this next cell here. 00:08:18.660 |
That is it for the RAG setup part or knowledge base setup. 00:08:26.400 |
So all we're doing here is defining a few functions. 00:08:34.340 |
The I will say some of the later functions are not written in an async way. 00:08:40.600 |
We can pretend they're fully async, but they're not. 00:08:45.060 |
So we have this fetch archive, which is going to, what you can see here, we're describing 00:08:50.200 |
gets the abstract from archive paper given by the archive ID. 00:08:54.180 |
Worth spending a little bit of time on what we're doing here. 00:08:57.420 |
So we are, of course, defining an async function because we're using Llama index workflows. 00:09:05.900 |
You don't have to, you can actually use synchronous as well if you prefer. 00:09:13.820 |
This is a description for the tool that will be passed to our agent later. 00:09:19.140 |
So it is, yeah, you want to write something natural language that is scripted there. 00:09:25.020 |
And then the agent will also be looking at your parameters here to decide, okay, you 00:09:31.220 |
know, what do I need to input or generate for using this tool? 00:09:52.540 |
So, you know, this is, we're defining async here, but you need to do a bit more work to 00:09:57.300 |
actually make this asynchronous, but it's fine for the example. 00:10:09.540 |
I add this print just so later on we can see what is being run when. 00:10:16.820 |
Similar things for our rag endpoints or rag tools. 00:10:20.700 |
So we're going to have rag search filter and rag search. 00:10:23.300 |
So all the same, again, not async in the way that we're running this, but that's fine. 00:10:31.940 |
We have some async, but the index querying here is not, this is fine. 00:10:48.900 |
So this is basically just saying to the, we use this tool with the agent to essentially 00:11:06.380 |
So the Oracle LLM, what I call it here, is essentially the decision maker, right? 00:11:15.780 |
So Oracle makes decisions on what tool to use. 00:11:19.140 |
So we have the system prompt that just tells it kind of what to do. 00:11:26.780 |
We defined tools that it will have access to here. 00:11:30.880 |
So here we're using LLM indexes function tool object to basically take the functions we've 00:11:38.100 |
generated and convert them into LLM index tools. 00:11:43.940 |
So what you can do here, if you don't define async functions, you can just go with sync 00:11:51.380 |
function like that, which in this case might be a good idea for some of those. 00:11:59.340 |
But anyway, I just went with async for all, then you, so what I'm doing here is just defining 00:12:08.660 |
So OpenAI, we're using GPT 4.0, of course, use whatever you like there. 00:12:14.180 |
The one thing that I did add here is just to like enforce the use of, you know, we need 00:12:27.380 |
Then I think that is, that's everything on the LLM side and we can come down into basically 00:12:36.860 |
So where we have our steps and the events to trigger each one of those steps. 00:12:45.400 |
So events, every, every workflow in LLM index workflows has a start event, which triggers 00:12:52.460 |
the workflow and a stop event, which ends the workflow. 00:12:58.280 |
We also define a couple of extra events here, and those are the input event and tool call 00:13:07.240 |
I think here I have, yeah, this one, you can ignore, I just simplified it so that we don't 00:13:17.200 |
Now what is probably quite important with these events is that you have good typing 00:13:23.180 |
That's where that sort of good structure or enforced structure comes from, which I think 00:13:29.240 |
is ideal for any code, to be honest, but I think it's a very important thing here. 00:13:34.800 |
So the input event, we are expecting a list of chat message objects from LLM index, and 00:13:45.520 |
With the tool call event, we're expecting a, like a tool ID, tool name, and tool parameters. 00:13:53.300 |
This is decided by the LLM, and this is kind of logically inserted from LLM index. 00:14:04.940 |
The workflow is defined, or it's a class that inherits from the workflow object here. 00:14:12.680 |
We define everything that we initialize our workflow with, so it's our research agent 00:14:21.040 |
We set the tools, and then I define this get tool attribute, which is just a dictionary 00:14:27.040 |
that allows us to more quickly extract tools as we are like getting the name of the tool. 00:14:35.520 |
We use this basically to map between strings to actual functions. 00:14:43.240 |
Then we initialize our chat history and our memory. 00:14:53.640 |
Then what we do here is, so this step is the step that is taken at the very beginning of 00:15:04.120 |
Now one thing I made sure to add in here was to clear memory, right? 00:15:14.840 |
It kind of depends on what you're trying to do, right? 00:15:17.200 |
If you want to maintain your sort of chat memory between runs of your agent, then you 00:15:27.240 |
So you could just comment this out because you don't want to clear the memory. 00:15:30.600 |
But you do need a way of, I don't know, like if you are starting a new conversation with 00:15:36.440 |
your workflow agent, you would need to redefine or reinitialize your research agent. 00:15:42.880 |
In this case, what I've done here is it's every time you saw a new workflow, it's reinitializes 00:15:52.680 |
It's up to you, the thing that you are doing, like what are you building as to whether you 00:15:58.560 |
I think in a lot of cases, particularly chatbots, you wouldn't do that. 00:16:04.240 |
Now that start event is the default event type from Llama index, and it includes an 00:16:11.600 |
input parameter, which is like the user query. 00:16:16.240 |
So we're going to take that and we're going to format into a chat message, and then we 00:16:24.120 |
So now in our memory, we're going to have the user message here, but actually I just 00:16:36.200 |
And I'll leave this bit here, but essentially what I want to do is I'm going to, after defining 00:16:45.800 |
or reinitializing my memory, I'm going to do self memory, and we're going to put the 00:16:54.400 |
I think that is required, unless I'm misunderstanding what I already did. 00:17:14.320 |
So now in our chat history, we have the first message is always going to be the system message 00:17:31.640 |
That's like the, that initializes our agent with every new call. 00:17:45.440 |
So if we come up here, we know the input event should consist of a list of chat messages. 00:17:53.980 |
So the chat history is the event input, right? 00:17:57.400 |
So it's actually, we separate it out from here, from this, and what we do is we actually 00:18:04.840 |
extract the chat history and sort of manually insert it into our event. 00:18:09.160 |
You don't, I mean, how you do that is kind of up to you. 00:18:12.920 |
I like that we pass the information between events, between steps via the event. 00:18:30.240 |
Then we just pass the tools to our a chat with tools. 00:18:33.040 |
We have the chat history and we have tool choice required. 00:18:42.600 |
I'm not sure which of those you actually should use. 00:18:46.300 |
If both of those are considered or not, you probably only need one of them, but I'm not 00:18:51.840 |
In any case, I put in both places, it's fine. 00:19:02.840 |
After it's been generated, then we get the message from that response and we put it into 00:19:09.040 |
So the message format here is actually already a chat message object with the role of assistant. 00:19:15.600 |
And of course the message content generated by our LLM. 00:19:19.200 |
Now there is always going to be a tool call in our response because we've one set tool 00:19:28.560 |
And even when the LLM is supposed to be answering directly to the user, it has to answer via 00:19:36.040 |
So there's going to be, or there should be a tool call. 00:19:42.320 |
So what we do is if that tool call is final answer, we return a stop event, which stops 00:19:49.200 |
everything and we return what the LLM generated to be input into the final answer tool, which 00:19:55.560 |
is going to follow that structure that we defined earlier for that tool. 00:20:00.840 |
So yes, we stop event, that stops the workflow, returns everything. 00:20:05.760 |
Otherwise we're going to return a tool call event, right? 00:20:09.040 |
So this is where we're returning different types of events depending on what SEP we'd 00:20:16.040 |
So if we are triggering a tool call event, we need to pass in our ID, name and params 00:20:39.440 |
If this happens, we do need to handle that and we handle that using the run tool method 00:20:45.960 |
So that event triggered by the tool call event, outputs an input event. 00:20:50.380 |
So basically we'll go back to the, this step here, where you're going back to the LLM to 00:21:01.800 |
And what we're doing, okay, we're getting the tool name. 00:21:09.600 |
So this is using the get tool dictionary that we defined before. 00:21:20.520 |
So this maps us from the tool name as a string to the actual tool function. 00:21:30.520 |
So if there is no tool with that tool name or no key in our dictionary with that tool 00:21:40.320 |
So if it returns none, what we return is this chat message, the role is tool and the content 00:21:46.600 |
is just saying, okay, we couldn't find this tool. 00:21:49.440 |
This should not, ideally this shouldn't happen. 00:21:53.240 |
But just in case it does, this is kind of like error handling, but for the agent. 00:22:01.480 |
If it does, actually we should handle things differently if it does. 00:22:05.560 |
So another little bug I just found, which is this should be here. 00:22:21.720 |
So basically if there is no tool, we're going to return this tool and that will be added 00:22:28.040 |
So now our agent or our Oracle LLM can see that we tried to use this tool that doesn't 00:22:32.640 |
exist and it will hopefully not call it again. 00:22:35.000 |
It shouldn't call it in the first place, but just in case, otherwise let's say the tool 00:22:41.760 |
We call the tool, we use a call async, we await the response, we're passing the parameters 00:22:48.880 |
that our LLM generated and we then create a tool message, right? 00:22:56.680 |
So let's say it does rag, it's going to return all the rag content that it searched for. 00:23:02.120 |
If it decides it wanted to use a web search tool, it will return the results it got from 00:23:10.680 |
We then again, add that to the memory and return it to this step here. 00:23:17.520 |
So basically this workflow can keep looping through between handle LLM input and run tool 00:23:26.000 |
Once it has enough information, the Oracle LLM here will decide to use the final answer 00:23:56.600 |
We have our site event, prepare chat history, and we go to our input event, right? 00:24:02.440 |
Then that will go to handle LLM input, which can either use the tool call event to run 00:24:10.880 |
And it will go, it can go in this loop for quite a few turns if it needs to. 00:24:17.160 |
And then eventually go to SOP event and done. 00:24:26.280 |
So initializing the workflow, we, our research agent, I defined that we need to pass in our 00:24:32.480 |
Oracle LLM, and we need to pass in the tools that we're going to use. 00:24:39.600 |
You can basically have some sort of, like, if you want to change a number of tools that 00:24:46.480 |
you have access to, you can do that super easily. 00:24:51.880 |
You don't need to change the research agent code itself. 00:24:55.560 |
In most cases, unless there's some sort of dramatic change. 00:25:05.660 |
So here you can see self memory put, that expects a message item, right? 00:25:15.720 |
Here I just tried to, like, throw in the role and content directly there. 00:25:20.360 |
So this, we will write message equals chat message like so, okay. 00:25:44.240 |
So invalid type expected object, we've got a string. 00:25:53.560 |
Not sure how I managed to break everything, but it's good to at least go through a bit 00:26:03.040 |
So, so here we've already defined the system message. 00:26:16.960 |
So there I was putting, like, a chat message within a chat message. 00:26:33.880 |
Even though I put something really stupid, it still sticks to that, like, strict final 00:26:39.720 |
answer structure that I wanted it to, which is, it's intentional. 00:26:47.120 |
Then what we can do is just say, tell me about AI, see what, see what it comes up with. 00:26:53.160 |
So actually this, you can ignore, we don't need to do that. 00:27:00.440 |
So the reason that I got that, so something we should, I should change even. 00:27:05.080 |
So here it's using, and this is a good thing, right? 00:27:09.320 |
It's using the web search, it's then going to reg search, it's then going to fetch archive, 00:27:15.720 |
So it's going through, it's like checking like a ton of information, which is what I 00:27:18.920 |
wanted to do, but I am a little bit stupid and put the timeout for our workflow to be 00:27:32.880 |
So to add a timeout, we need to modify the, it's a self_timeout attribute. 00:27:42.460 |
So we're going to set that to a timeout value that we will insert here. 00:27:47.800 |
I'm going to set a default value of like, not 600, that's insane, like 20. 00:27:56.040 |
So a little bit higher, and you can modify if you want when you're initializing the agent. 00:28:01.920 |
So run that, run everything again, initialize our agent, and we'll just ask it to tell us 00:28:15.840 |
So it's going to go through, I don't know if it's long enough, we might need to increase 00:28:22.920 |
Some of these tools also take a little while to actually respond to get the information. 00:28:30.240 |
Let's go a little bit longer, but you can see here, right, operation time left 20 seconds 00:28:38.760 |
Let's go with timeout of 30 and hope that's long enough, otherwise it's more a case of 00:28:46.880 |
maybe prompting the agent to not use too many steps. 00:29:00.060 |
So we have our introduction, which looks good. 00:29:03.920 |
We have, sorry, introduction is just this bit. 00:29:07.360 |
We have the research steps, it's like conduct a web search, gather information, utilize 00:29:14.400 |
a specialized AI research tool from academic papers, retrieve specific historical contexts 00:29:19.400 |
and developments in AI from both general and specialized sources. 00:29:22.600 |
So that's what it's doing with each one of these tools here, so it's going back and forth. 00:29:29.920 |
So then it's like, okay, it's giving us a ton of text here. 00:29:33.480 |
Look, we can see that a British person is pioneered AI, of course. 00:29:44.440 |
Yeah, nice little history lesson here, actually. 00:29:54.040 |
So this is, I think this is really useful, just having the sources of where everything 00:29:57.640 |
Ideally, it'd be nicer if you insert links and everything here, but time for that later. 00:30:08.560 |
So we can just try running our research agent. 00:30:23.220 |
So because we did everything in async, we should see a speed up when we run everything 00:30:29.560 |
in parallel, even though I didn't fully write everything in async. 00:30:33.080 |
It should at least be slightly less blocking than it would be otherwise. 00:30:38.080 |
So I define all these calls, I'm gathering, I'm putting them in a list and I'm gathering 00:30:44.080 |
them with async I/O. So that's basically, it's going to take that list of async things 00:30:54.320 |
And then it's going to run them all together. 00:31:00.960 |
It should be kind of similar to how long it's taking to run all the other, the one last 00:31:07.840 |
We're doing everything in parallel here, right? 00:31:09.400 |
So we're printing like four web searches because all of them are going to web search first. 00:31:16.000 |
Then most or three of them went to red search, one of them went to fetch archive. 00:31:22.540 |
And then we had this group and then we still have them continuing here. 00:31:27.680 |
So 25 seconds, slightly longer than just running one, but faster than running four of them 00:31:45.720 |
So there's probably doing something wrong with the code there. 00:31:52.520 |
I wonder if it's somehow getting the history from the first agent. 00:32:06.900 |
Number one is actually, okay, tell me about AI. 00:32:18.120 |
And this one, three, is talking about, what did I ask you about? 00:32:26.640 |
What is the latest in LLMs from OpenAI, which it talks about, right? 00:32:36.540 |
Probably I should have reinitialized the agent, but that's fine. 00:32:41.200 |
But yeah, generally, other than that little hiccup, it looks pretty good. 00:32:48.480 |
I mean, we've just seen how to use the library, which is great. 00:32:54.440 |
The agents are actually going to a lot of different steps and yeah, it's functioning 00:33:03.840 |
I think obviously, yeah, we've seen LLM index, looks pretty nice. 00:33:06.920 |
I think there are definitely a lot of pros, a lot of cons, depends on what you're wanting 00:33:12.560 |
But honestly, I think just the async part of it, like going with async first is pretty 00:33:19.720 |
valuable because you're going to see performance like we saw at the end there, right? 00:33:24.440 |
You run four at the same time that you would run one workflow just by using async. 00:33:35.040 |
So yeah, just that one thing about LLM index workflows is by itself super valuable in my 00:33:44.040 |
I can't say that I would have a general preference yet between line graph LLM index workflows. 00:33:51.880 |
They're both different, both good in different ways. 00:33:55.120 |
I think both do some things better than the other, that's for certain. 00:34:01.520 |
I mean, for now, you know, I have line graph in production, I'll probably just leave that 00:34:07.320 |
I'm not planning to switch over everything yet, but maybe at some point in the future, 00:34:12.240 |
it's something that I need to try and use LLM index workflows more before I can say 00:34:19.880 |
anything about jumping or which one I prefer. 00:34:25.440 |
I hope all this has been useful and interesting, but for now I'll leave it there. 00:34:28.960 |
So thank you very much for watching and I will see you again in the next one.