Llama Index Workflows | Building Async AI Agents

Today, we're going to be taking a look at LLAMA Indexer's workflows for building agents. Now, LLAMA Indexer's workflow, as explained in their introductory article here, is a new approach to building agentic flows. So even this visual here just gives you a good understanding of what they're trying to do.

So you're defining these steps that your agentic flow can take, and you also define these events which trigger each one of these steps. Now, a lot of people would compare this to Langrath, and I think that is completely justified. So before we dive into building stuff with LLAMA Index, I just want to talk a little bit about what I think the main differences between the two libraries are.

Now, there are many differences, but the most fundamental, in my opinion, are LLAMA Index, when compared to Langrath, seems to offer higher level abstractions that seem to be structured a little bit better than what Langrath does. And I wouldn't say that's either a pro or a con, because honestly, one of the things that I like about Langrath is that you can go a little more low-level.

There can be less abstractions. There's still a ton of abstractions in Langrath, and it's very confusing, and that is also the case here. Right? There's a lot of abstractions, but I feel with Langrath, it's a little bit easier to remove at least a few of those abstractions, which I usually prefer, but it's not always going to be the preference of everyone.

Especially if you don't need to know every single thing that is going into your LLM or coming out of these different steps, that's not always needed. So whether this is a pro or a con is really dependent on, in my opinion, the developer and also what you're actually building.

Now, conceptually, I think the structure of the agents that you're building between these two frameworks is also very similar. However, with Langrath, the focus is on kind of like building a graph, so you're connecting these nodes and edges, whereas with Llama Index, it is event-driven. So one of your steps could output event type A, and that would trigger step type A.

Or it could instead output event type B, and that would trigger step type B. Okay? So it's more looking at what events trigger which steps. Now that, I think, mostly makes a difference when you're building stuff. I think the actual outcome of what you get from that, at least from my experience, is not going to be a whole lot different.

I think you can build pretty similar structures with both of those approaches. It's more of like, okay, mental framework. How are you thinking about what you're building? I think that's the main difference in terms of the structure. And finally, one thing that I have noticed with Llama Index, which I really like actually, is that they prioritize the use of async a lot.

Now, that may also come with pros and cons, but in my opinion, it's mainly pros. So the learning curve for developers that are not too familiar with writing asynchronous code is going to be higher. However, the end product that you will build is almost definitely going to be better if you're writing async code, particularly with agents.

And I say that because agents contain LLMs, and LLMs take a little bit of time to process, you know, whatever they're doing. If you're writing asynchronous code, whilst your LLM is doing that, your code is just waiting for a response from the LLM. If you're writing asynchronous code, your code can be, you know, doing other things whilst it's waiting.

And it's not just waiting for the LLM, there's other things, of course, that you're waiting for as well. So what that translates to when you are kind of more forced to write async code is something that's probably a lot more scalable and more performant. Now, they're the main key things to take note of, in my opinion, when you're comparing LineGraph to Llama Index.

But one thing I should note here as well is I've used LineGraph a lot more than I've used Llama Index workflows. I have gone to prod with LineGraph, I have not with Llama Index workflows. So there may be a lot of stuff that I'm missing. And I just want to caveat everything I'm saying with that fact.

Nonetheless, at least from what I've seen so far, that's what I've noticed. So I think it's relatively representative of the two libraries. In any case, let's jump straight into building something with Llama Index workflows. Okay, so we're first just going to install the prerequisite libraries. This example we're going to go through, we did something very similar in the past with LineGraph.

So it might be useful, particularly for going from LineGraph to Llama Index or the other way around. It might be useful to have a look at that video as well. So I'll just make sure there's a link to that in the description and comments. But essentially what we're going to be doing is building an AI research agent using Llama Index, of course, with a few components, right?

So it's a bit of complexity here. Those components or tools are going to be a rag search tool, rag search filter, a web search tool, and I think a archive search tool, if I remember correctly. So all those are going to come together and basically be tools that our research agent can use.

And we're going to construct all of the links between everything using Llama Index workflows. So let's jump into the code. And I will go through this bit kind of quickly because I want to mainly focus on Llama Index here. So we're going to be using this dataset for our rag components.

So I'm going to come to here and we're going to just initialize our, like our connection to OpenAI. We're going to be using the text embedding three small embedding model to construct our index. Then we come down to Pinecone here and also initialize our connection to Pinecone. You know, for both of these, we do need API keys.

So for OpenAI, there's a link here, it's just, I think, platform.openai.com. And then here, we're going to app.pinecone.io, and then we just insert our API key there. Cool. Then we're going to set up our index, just want to check what the free tier serverless region is. I'm going to see it on the pricing page for Pinecone.

So it says US East 1 is the free tier region. So I'm going to put that in here. Then what we're going to do is get the dimensions for our embedding model, which is this 1.5.3.6. And you'll see that I already have my index defined here. So I run this before, but otherwise what you will see here is that your vector count will be zero.

So to populate your index, you will need to run this next cell here. That is it for the RAG setup part or knowledge base setup. Then we define our agent components. So all we're doing here is defining a few functions. These functions are asynchronous. The I will say some of the later functions are not written in an async way.

We can pretend they're fully async, but they're not. This one is fine. So we have this fetch archive, which is going to, what you can see here, we're describing gets the abstract from archive paper given by the archive ID. Worth spending a little bit of time on what we're doing here.

So we are, of course, defining an async function because we're using Llama index workflows. It's best to do this. You don't have to, you can actually use synchronous as well if you prefer. But it's best to use async. This is a description for the tool that will be passed to our agent later.

So it is, yeah, you want to write something natural language that is scripted there. And then the agent will also be looking at your parameters here to decide, okay, you know, what do I need to input or generate for using this tool? Okay. So we have that tool. Then we also have a web search tool here.

Okay. So web search tool, similar thing again. So it's asynchronous. I think this one, this probably is an async. Nope. So, you know, this is, we're defining async here, but you need to do a bit more work to actually make this asynchronous, but it's fine for the example. So we are using sub API.

We define that. We do our little description here. I add this print just so later on we can see what is being run when. And then, yeah, we just define that. Similar things for our rag endpoints or rag tools. So we're going to have rag search filter and rag search.

So all the same, again, not async in the way that we're running this, but that's fine. We have some async, but the index querying here is not, this is fine. Then we have our final answer tool. Now do you need to do this? Not necessarily, but you can if you like to.

So this is basically just saying to the, we use this tool with the agent to essentially construct that final structure of an output. You don't need to do this. It's just something I like to do. Okay. And then we come to defining our LLM. Okay. So the Oracle LLM, what I call it here, is essentially the decision maker, right?

So Oracle makes decisions on what tool to use. So we have the system prompt that just tells it kind of what to do. We define that. We defined tools that it will have access to here. So here we're using LLM indexes function tool object to basically take the functions we've generated and convert them into LLM index tools.

So what you can do here, if you don't define async functions, you can just go with sync function like that, which in this case might be a good idea for some of those. But anyway, I just went with async for all, then you, so what I'm doing here is just defining our LLM.

Okay. So OpenAI, we're using GPT 4.0, of course, use whatever you like there. The one thing that I did add here is just to like enforce the use of, you know, we need to use a tool every time. So tool choice is required. Then I think that is, that's everything on the LLM side and we can come down into basically setting up with that workflow.

Okay. So where we have our steps and the events to trigger each one of those steps. So let's take a look at those. So events, every, every workflow in LLM index workflows has a start event, which triggers the workflow and a stop event, which ends the workflow. We also define a couple of extra events here, and those are the input event and tool call event.

I think here I have, yeah, this one, you can ignore, I just simplified it so that we don't need that. So we just have these. Now what is probably quite important with these events is that you have good typing for them. That's where that sort of good structure or enforced structure comes from, which I think is ideal for any code, to be honest, but I think it's a very important thing here.

So the input event, we are expecting a list of chat message objects from LLM index, and they are passed to the input parameter. With the tool call event, we're expecting a, like a tool ID, tool name, and tool parameters. These are generated by the LLM. This is decided by the LLM, and this is kind of logically inserted from LLM index.

Okay, cool. So we're going to build our workflow. The workflow is defined, or it's a class that inherits from the workflow object here. We define everything that we initialize our workflow with, so it's our research agent workflow. We set the oracle, which is our LLM. We set the tools, and then I define this get tool attribute, which is just a dictionary that allows us to more quickly extract tools as we are like getting the name of the tool.

Okay, which function do we use? We use this basically to map between strings to actual functions. Then we initialize our chat history and our memory. Okay, cool. So we have that. Then what we do here is, so this step is the step that is taken at the very beginning of our workflow.

Now one thing I made sure to add in here was to clear memory, right? You don't need to clear the memory. It kind of depends on what you're trying to do, right? If you want to maintain your sort of chat memory between runs of your agent, then you probably would not do this, right?

So you could just comment this out because you don't want to clear the memory. But you do need a way of, I don't know, like if you are starting a new conversation with your workflow agent, you would need to redefine or reinitialize your research agent. In this case, what I've done here is it's every time you saw a new workflow, it's reinitializes the sort of conversation.

It's up to you, the thing that you are doing, like what are you building as to whether you do that or not. I think in a lot of cases, particularly chatbots, you wouldn't do that. Again, it's up to you. Now that start event is the default event type from Llama index, and it includes an input parameter, which is like the user query.

So we're going to take that and we're going to format into a chat message, and then we add it into our memory, okay? So now in our memory, we're going to have the user message here, but actually I just realized there was an error. We should also have our system message here.

And I'll leave this bit here, but essentially what I want to do is I'm going to, after defining or reinitializing my memory, I'm going to do self memory, and we're going to put the system message. I think that is required, unless I'm misunderstanding what I already did. Oh, the role here would be system.

Oh, this would be content, sorry, my bad. There we go. So now in our chat history, we have the first message is always going to be the system message followed by the user's input message there. Great. Well, so we have, I think it's everything. That's like the, that initializes our agent with every new call.

Then what do we have here? Okay. So handle LLM input. So we have our input event. Input event is what we define, right? So if we come up here, we know the input event should consist of a list of chat messages. Great. So the chat history is the event input, right?

So it's actually, we separate it out from here, from this, and what we do is we actually extract the chat history and sort of manually insert it into our event. You don't, I mean, how you do that is kind of up to you. I like that we pass the information between events, between steps via the event.

So that's what we do here. We get our Oracle response. So we're using the a chat with tools here. We await that because this is asynchronous. Then we just pass the tools to our a chat with tools. We have the chat history and we have tool choice required. Again, that is just doing what we did here.

I'm not sure which of those you actually should use. If both of those are considered or not, you probably only need one of them, but I'm not sure which. In any case, I put in both places, it's fine. Now, we want to get our response here. Okay. After it's been generated, then we get the message from that response and we put it into our memory.

Okay. So the message format here is actually already a chat message object with the role of assistant. And of course the message content generated by our LLM. Now there is always going to be a tool call in our response because we've one set tool choice to required. And even when the LLM is supposed to be answering directly to the user, it has to answer via our final answer tool.

So there's going to be, or there should be a tool call. If there is not, there is a problem. So what we do is if that tool call is final answer, we return a stop event, which stops everything and we return what the LLM generated to be input into the final answer tool, which is going to follow that structure that we defined earlier for that tool.

So yes, we stop event, that stops the workflow, returns everything. Otherwise we're going to return a tool call event, right? So this is where we're returning different types of events depending on what SEP we'd like to trigger. So if we are triggering a tool call event, we need to pass in our ID, name and params as we defined here.

Okay. So that's our custom structure. And then I think, yep, final set is okay. We don't need to handle when this happens. This just ends the workflow. If this happens, we do need to handle that and we handle that using the run tool method here. So that event triggered by the tool call event, outputs an input event.

So basically we'll go back to the, this step here, where you're going back to the LLM to make another decision. And what we're doing, okay, we're getting the tool name. We have the additional keyword arguments. We get our chosen tool. So this is using the get tool dictionary that we defined before.

Okay. So that is this here, right? So this maps us from the tool name as a string to the actual tool function. Okay. So we get our tool. It's using a get. So if there is no tool with that tool name or no key in our dictionary with that tool name, it's going to return none.

So if it returns none, what we return is this chat message, the role is tool and the content is just saying, okay, we couldn't find this tool. This should not, ideally this shouldn't happen. But just in case it does, this is kind of like error handling, but for the agent.

Now hopefully that doesn't happen. If it does, actually we should handle things differently if it does. So another little bug I just found, which is this should be here. Yes, like that. Okay. There we go. So that should, that will work now. So basically if there is no tool, we're going to return this tool and that will be added to our memory and then returned.

Okay. So now our agent or our Oracle LLM can see that we tried to use this tool that doesn't exist and it will hopefully not call it again. It shouldn't call it in the first place, but just in case, otherwise let's say the tool does exist. We call the tool, we use a call async, we await the response, we're passing the parameters that our LLM generated and we then create a tool message, right?

So the role is tool. This is the output. So let's say it does rag, it's going to return all the rag content that it searched for. If it decides it wanted to use a web search tool, it will return the results it got from that. And yeah, I mean that is all it's doing.

We then again, add that to the memory and return it to this step here. Okay. So basically this workflow can keep looping through between handle LLM input and run tool until it has enough information. Once it has enough information, the Oracle LLM here will decide to use the final answer tool to respond to the user.

So we have that. We can draw the workflow we just defined. So I'll do that. Let's open that and see what it looks like. So we're in Colab. So we'll just open this on the side here. I have to, I think I download it. Okay. So it looks like this.

We have our site event, prepare chat history, and we go to our input event, right? Then that will go to handle LLM input, which can either use the tool call event to run a tool, or it can go to SOP event. Okay. And it will go, it can go in this loop for quite a few turns if it needs to.

And then eventually go to SOP event and done. So that is our workflow. Nice little graph. It looks like this. Cool. So initializing the workflow, we, our research agent, I defined that we need to pass in our Oracle LLM, and we need to pass in the tools that we're going to use.

Okay. So this is nice. You can basically have some sort of, like, if you want to change a number of tools that you have access to, you can do that super easily. You just, you change the inputs here. You don't need to change the research agent code itself. In most cases, unless there's some sort of dramatic change.

So I'm going to run. Okay. I made a mistake. Let me fix that quick. So here you can see self memory put, that expects a message item, right? You can see here, it expects a chat message. Here I just tried to, like, throw in the role and content directly there.

So this, we will write message equals chat message like so, okay. Rerun that. Rerun the rest. Okay. And then we have another error here. So invalid type expected object, we've got a string. Let's work through that. Not sure how I managed to break everything, but it's good to at least go through a bit of debugging as well.

Okay. My bad. All right. So, so here we've already defined the system message. Why did I do that? So we just need to go here and here. Okay. Cool. All right. So there I was putting, like, a chat message within a chat message. So obviously we don't want. Then.

Great. So now it works. Just a silly little input here. Right. I said hi. Even though I put something really stupid, it still sticks to that, like, strict final answer structure that I wanted it to, which is, it's intentional. So that's good. Then what we can do is just say, tell me about AI, see what, see what it comes up with.

So actually this, you can ignore, we don't need to do that. Here I got a workflow timeout error. So the reason that I got that, so something we should, I should change even. So here it's using, and this is a good thing, right? It's using the web search, it's then going to reg search, it's then going to fetch archive, then it's going to this reg search filter.

So it's going through, it's like checking like a ton of information, which is what I wanted to do, but I am a little bit stupid and put the timeout for our workflow to be a little bit too tight. Okay. So to add a timeout, we need to modify the, it's a self_timeout attribute.

So we're going to set that to a timeout value that we will insert here. I'm going to set a default value of like, not 600, that's insane, like 20. So a little bit higher, and you can modify if you want when you're initializing the agent. So run that, run everything again, initialize our agent, and we'll just ask it to tell us about AI again.

So we run this, let's see what happens. So it's going to go through, I don't know if it's long enough, we might need to increase the timeout a little further, but let's see. Some of these tools also take a little while to actually respond to get the information. Yeah.

Okay. Let's go a little bit longer, but you can see here, right, operation time left 20 seconds now so that it did work. Let's go with timeout of 30 and hope that's long enough, otherwise it's more a case of maybe prompting the agent to not use too many steps.

Let's see. Okay, cool. So that worked. It did take just about 20 seconds there. So we have our introduction, which looks good. We have, sorry, introduction is just this bit. We have the research steps, it's like conduct a web search, gather information, utilize a specialized AI research tool from academic papers, retrieve specific historical contexts and developments in AI from both general and specialized sources.

So that's what it's doing with each one of these tools here, so it's going back and forth. Main body. So then it's like, okay, it's giving us a ton of text here. Look, we can see that a British person is pioneered AI, of course. So looks generally quite good.

Yeah, nice little history lesson here, actually. Very good. Then we have the conclusion. AI has come a long way. And then sources. All right. So this is, I think this is really useful, just having the sources of where everything came from. Ideally, it'd be nicer if you insert links and everything here, but time for that later.

Then oh, we can also try this as well. So we can just try running our research agent. Let me do that timeout here. Timeout of 30 to each one of these. All right. So because we did everything in async, we should see a speed up when we run everything in parallel, even though I didn't fully write everything in async.

It should at least be slightly less blocking than it would be otherwise. So I define all these calls, I'm gathering, I'm putting them in a list and I'm gathering them with async I/O. So that's basically, it's going to take that list of async things here. And then it's going to run them all together.

Okay. So I will run that, see how long it takes. It should be kind of similar to how long it's taking to run all the other, the one last time. We're doing everything in parallel here, right? So we're printing like four web searches because all of them are going to web search first.

Then most or three of them went to red search, one of them went to fetch archive. And then we had this group and then we still have them continuing here. And it's just finished. So 25 seconds, slightly longer than just running one, but faster than running four of them one after the other, right?

So that's why we use async. Then cool. So we have our outputs, this is zero, right? So this is actually high. So there's probably doing something wrong with the code there. I wonder if it's somehow getting the history from the first agent. Okay. Something to check there. Let's just have a look at number one.

I don't remember that happening last time. Number one is actually, okay, tell me about AI. So we'd expect that to talk about AI. Number two, please don't talk about AI. Okay. This one's talking about rag. And this one, three, is talking about, what did I ask you about? What is the latest in LLMs from OpenAI, which it talks about, right?

So run all those in parallel or with async. Something weird happening there. Probably I should have reinitialized the agent, but that's fine. But yeah, generally, other than that little hiccup, it looks pretty good. We see async is working really nicely. I mean, we've just seen how to use the library, which is great.

And it was working really well, right? The agents are actually going to a lot of different steps and yeah, it's functioning as you would expect, which is pretty nice. So I think that is it for this video. I think obviously, yeah, we've seen LLM index, looks pretty nice. I think there are definitely a lot of pros, a lot of cons, depends on what you're wanting to do.

But honestly, I think just the async part of it, like going with async first is pretty valuable because you're going to see performance like we saw at the end there, right? You run four at the same time that you would run one workflow just by using async. And of course that scales, right?

You can run way more than four as well. So yeah, just that one thing about LLM index workflows is by itself super valuable in my opinion. I can't say that I would have a general preference yet between line graph LLM index workflows. They're both different, both good in different ways.

I think both do some things better than the other, that's for certain. So it's kind of hard. I mean, for now, you know, I have line graph in production, I'll probably just leave that for now. I'm not planning to switch over everything yet, but maybe at some point in the future, it's something that I need to try and use LLM index workflows more before I can say anything about jumping or which one I prefer.

So yeah, that's it for this video. I hope all this has been useful and interesting, but for now I'll leave it there. So thank you very much for watching and I will see you again in the next one. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye. Bye.

Bye. Bye. Bye. Bye. Bye. Bye. (soft music) you

Llama Index Workflows | Building Async AI Agents

Chapters

Transcript