Llama Index Workflows | Building Async AI Agents

00:00:00.000 | Today, we're going to be taking a look at LLAMA Indexer's workflows for building agents.

00:00:06.720 | Now, LLAMA Indexer's workflow, as explained in their introductory article here, is a new

00:00:15.560 | approach to building agentic flows.

00:00:18.880 | So even this visual here just gives you a good understanding of what they're trying

00:00:22.480 | to do.

00:00:23.480 | So you're defining these steps that your agentic flow can take, and you also define these events

00:00:33.160 | which trigger each one of these steps.

00:00:35.280 | Now, a lot of people would compare this to Langrath, and I think that is completely justified.

00:00:42.160 | So before we dive into building stuff with LLAMA Index, I just want to talk a little

00:00:48.080 | bit about what I think the main differences between the two libraries are.

00:00:53.600 | Now, there are many differences, but the most fundamental, in my opinion, are LLAMA Index,

00:00:59.680 | when compared to Langrath, seems to offer higher level abstractions that seem to be

00:01:05.120 | structured a little bit better than what Langrath does.

00:01:09.080 | And I wouldn't say that's either a pro or a con, because honestly, one of the things

00:01:13.780 | that I like about Langrath is that you can go a little more low-level.

00:01:22.200 | There can be less abstractions.

00:01:23.200 | There's still a ton of abstractions in Langrath, and it's very confusing, and that is also

00:01:28.160 | the case here.

00:01:29.160 | Right?

00:01:30.160 | There's a lot of abstractions, but I feel with Langrath, it's a little bit easier to

00:01:34.360 | remove at least a few of those abstractions, which I usually prefer, but it's not always

00:01:41.560 | going to be the preference of everyone.

00:01:43.660 | Especially if you don't need to know every single thing that is going into your LLM or

00:01:47.540 | coming out of these different steps, that's not always needed.

00:01:51.080 | So whether this is a pro or a con is really dependent on, in my opinion, the developer

00:01:58.380 | and also what you're actually building.

00:02:00.720 | Now, conceptually, I think the structure of the agents that you're building between these

00:02:08.280 | two frameworks is also very similar.

00:02:10.800 | However, with Langrath, the focus is on kind of like building a graph, so you're connecting

00:02:16.340 | these nodes and edges, whereas with Llama Index, it is event-driven.

00:02:23.520 | So one of your steps could output event type A, and that would trigger step type A. Or

00:02:33.020 | it could instead output event type B, and that would trigger step type B. Okay?

00:02:39.940 | So it's more looking at what events trigger which steps.

00:02:47.120 | Now that, I think, mostly makes a difference when you're building stuff.

00:02:52.700 | I think the actual outcome of what you get from that, at least from my experience, is

00:03:00.000 | not going to be a whole lot different.

00:03:02.220 | I think you can build pretty similar structures with both of those approaches.

00:03:07.960 | It's more of like, okay, mental framework.

00:03:11.780 | How are you thinking about what you're building?

00:03:14.280 | I think that's the main difference in terms of the structure.

00:03:17.860 | And finally, one thing that I have noticed with Llama Index, which I really like actually,

00:03:24.740 | is that they prioritize the use of async a lot.

00:03:30.260 | Now, that may also come with pros and cons, but in my opinion, it's mainly pros.

00:03:35.980 | So the learning curve for developers that are not too familiar with writing asynchronous

00:03:41.620 | code is going to be higher.

00:03:44.340 | However, the end product that you will build is almost definitely going to be better if

00:03:52.020 | you're writing async code, particularly with agents.

00:03:56.460 | And I say that because agents contain LLMs, and LLMs take a little bit of time to process,

00:04:04.420 | you know, whatever they're doing.

00:04:05.980 | If you're writing asynchronous code, whilst your LLM is doing that, your code is just

00:04:12.300 | waiting for a response from the LLM.

00:04:15.000 | If you're writing asynchronous code, your code can be, you know, doing other things

00:04:19.820 | whilst it's waiting.

00:04:21.140 | And it's not just waiting for the LLM, there's other things, of course, that you're waiting

00:04:24.700 | for as well.

00:04:25.940 | So what that translates to when you are kind of more forced to write async code is something

00:04:32.660 | that's probably a lot more scalable and more performant.

00:04:36.380 | Now, they're the main key things to take note of, in my opinion, when you're comparing

00:04:43.420 | LineGraph to Llama Index.

00:04:46.120 | But one thing I should note here as well is I've used LineGraph a lot more than I've used

00:04:52.580 | Llama Index workflows.

00:04:54.580 | I have gone to prod with LineGraph, I have not with Llama Index workflows.

00:05:02.080 | So there may be a lot of stuff that I'm missing.

00:05:04.540 | And I just want to caveat everything I'm saying with that fact.

00:05:09.700 | Nonetheless, at least from what I've seen so far, that's what I've noticed.

00:05:15.180 | So I think it's relatively representative of the two libraries.

00:05:20.020 | In any case, let's jump straight into building something with Llama Index workflows.

00:05:26.700 | Okay, so we're first just going to install the prerequisite libraries.

00:05:32.540 | This example we're going to go through, we did something very similar in the past with

00:05:36.340 | LineGraph.

00:05:38.200 | So it might be useful, particularly for going from LineGraph to Llama Index or the other

00:05:42.420 | way around.

00:05:43.420 | It might be useful to have a look at that video as well.

00:05:47.940 | So I'll just make sure there's a link to that in the description and comments.

00:05:54.180 | But essentially what we're going to be doing is building an AI research agent using Llama

00:06:01.060 | Index, of course, with a few components, right?

00:06:04.460 | So it's a bit of complexity here.

00:06:08.300 | Those components or tools are going to be a rag search tool, rag search filter, a web

00:06:15.560 | search tool, and I think a archive search tool, if I remember correctly.

00:06:22.820 | So all those are going to come together and basically be tools that our research agent

00:06:26.700 | can use.

00:06:27.700 | And we're going to construct all of the links between everything using Llama Index workflows.

00:06:34.720 | So let's jump into the code.

00:06:40.700 | And I will go through this bit kind of quickly because I want to mainly focus on Llama Index

00:06:45.140 | here.

00:06:46.140 | So we're going to be using this dataset for our rag components.

00:06:52.100 | So I'm going to come to here and we're going to just initialize our, like our connection

00:06:58.460 | to OpenAI.

00:06:59.460 | We're going to be using the text embedding three small embedding model to construct our

00:07:04.500 | index.

00:07:07.940 | Then we come down to Pinecone here and also initialize our connection to Pinecone.

00:07:15.380 | You know, for both of these, we do need API keys.

00:07:17.900 | So for OpenAI, there's a link here, it's just, I think, platform.openai.com.

00:07:24.380 | And then here, we're going to app.pinecone.io, and then we just insert our API key there.

00:07:32.940 | Cool.

00:07:34.700 | Then we're going to set up our index, just want to check what the free tier serverless

00:07:41.980 | region is.

00:07:42.980 | I'm going to see it on the pricing page for Pinecone.

00:07:45.740 | So it says US East 1 is the free tier region.

00:07:50.820 | So I'm going to put that in here.

00:07:52.940 | Then what we're going to do is get the dimensions for our embedding model, which is this 1.5.3.6.

00:08:01.540 | And you'll see that I already have my index defined here.

00:08:04.580 | So I run this before, but otherwise what you will see here is that your vector count will

00:08:09.980 | be zero.

00:08:10.980 | So to populate your index, you will need to run this next cell here.

00:08:18.660 | That is it for the RAG setup part or knowledge base setup.

00:08:23.020 | Then we define our agent components.

00:08:26.400 | So all we're doing here is defining a few functions.

00:08:30.280 | These functions are asynchronous.

00:08:34.340 | The I will say some of the later functions are not written in an async way.

00:08:40.600 | We can pretend they're fully async, but they're not.

00:08:43.900 | This one is fine.

00:08:45.060 | So we have this fetch archive, which is going to, what you can see here, we're describing

00:08:50.200 | gets the abstract from archive paper given by the archive ID.

00:08:54.180 | Worth spending a little bit of time on what we're doing here.

00:08:57.420 | So we are, of course, defining an async function because we're using Llama index workflows.

00:09:03.220 | It's best to do this.

00:09:05.900 | You don't have to, you can actually use synchronous as well if you prefer.

00:09:10.060 | But it's best to use async.

00:09:13.820 | This is a description for the tool that will be passed to our agent later.

00:09:19.140 | So it is, yeah, you want to write something natural language that is scripted there.

00:09:25.020 | And then the agent will also be looking at your parameters here to decide, okay, you

00:09:31.220 | know, what do I need to input or generate for using this tool?

00:09:35.980 | Okay.

00:09:36.980 | So we have that tool.

00:09:41.220 | Then we also have a web search tool here.

00:09:43.300 | Okay.

00:09:44.300 | So web search tool, similar thing again.

00:09:46.420 | So it's asynchronous.

00:09:47.740 | I think this one, this probably is an async.

00:09:51.540 | Nope.

00:09:52.540 | So, you know, this is, we're defining async here, but you need to do a bit more work to

00:09:57.300 | actually make this asynchronous, but it's fine for the example.

00:10:02.220 | So we are using sub API.

00:10:05.740 | We define that.

00:10:06.740 | We do our little description here.

00:10:09.540 | I add this print just so later on we can see what is being run when.

00:10:14.460 | And then, yeah, we just define that.

00:10:16.820 | Similar things for our rag endpoints or rag tools.

00:10:20.700 | So we're going to have rag search filter and rag search.

00:10:23.300 | So all the same, again, not async in the way that we're running this, but that's fine.

00:10:31.940 | We have some async, but the index querying here is not, this is fine.

00:10:40.780 | Then we have our final answer tool.

00:10:43.020 | Now do you need to do this?

00:10:44.440 | Not necessarily, but you can if you like to.

00:10:48.900 | So this is basically just saying to the, we use this tool with the agent to essentially

00:10:54.220 | construct that final structure of an output.

00:10:59.180 | You don't need to do this.

00:11:00.180 | It's just something I like to do.

00:11:02.020 | Okay.

00:11:03.020 | And then we come to defining our LLM.

00:11:05.380 | Okay.

00:11:06.380 | So the Oracle LLM, what I call it here, is essentially the decision maker, right?

00:11:15.780 | So Oracle makes decisions on what tool to use.

00:11:19.140 | So we have the system prompt that just tells it kind of what to do.

00:11:25.220 | We define that.

00:11:26.780 | We defined tools that it will have access to here.

00:11:30.880 | So here we're using LLM indexes function tool object to basically take the functions we've

00:11:38.100 | generated and convert them into LLM index tools.

00:11:43.940 | So what you can do here, if you don't define async functions, you can just go with sync

00:11:51.380 | function like that, which in this case might be a good idea for some of those.

00:11:59.340 | But anyway, I just went with async for all, then you, so what I'm doing here is just defining

00:12:06.660 | our LLM.

00:12:07.660 | Okay.

00:12:08.660 | So OpenAI, we're using GPT 4.0, of course, use whatever you like there.

00:12:14.180 | The one thing that I did add here is just to like enforce the use of, you know, we need

00:12:20.700 | to use a tool every time.

00:12:22.900 | So tool choice is required.

00:12:27.380 | Then I think that is, that's everything on the LLM side and we can come down into basically

00:12:34.260 | setting up with that workflow.

00:12:35.860 | Okay.

00:12:36.860 | So where we have our steps and the events to trigger each one of those steps.

00:12:42.860 | So let's take a look at those.

00:12:45.400 | So events, every, every workflow in LLM index workflows has a start event, which triggers

00:12:52.460 | the workflow and a stop event, which ends the workflow.

00:12:58.280 | We also define a couple of extra events here, and those are the input event and tool call

00:13:06.140 | event.

00:13:07.240 | I think here I have, yeah, this one, you can ignore, I just simplified it so that we don't

00:13:12.840 | need that.

00:13:15.600 | So we just have these.

00:13:17.200 | Now what is probably quite important with these events is that you have good typing

00:13:22.180 | for them.

00:13:23.180 | That's where that sort of good structure or enforced structure comes from, which I think

00:13:29.240 | is ideal for any code, to be honest, but I think it's a very important thing here.

00:13:34.800 | So the input event, we are expecting a list of chat message objects from LLM index, and

00:13:42.980 | they are passed to the input parameter.

00:13:45.520 | With the tool call event, we're expecting a, like a tool ID, tool name, and tool parameters.

00:13:51.780 | These are generated by the LLM.

00:13:53.300 | This is decided by the LLM, and this is kind of logically inserted from LLM index.

00:14:00.140 | Okay, cool.

00:14:01.780 | So we're going to build our workflow.

00:14:04.940 | The workflow is defined, or it's a class that inherits from the workflow object here.

00:14:12.680 | We define everything that we initialize our workflow with, so it's our research agent

00:14:17.520 | workflow.

00:14:19.040 | We set the oracle, which is our LLM.

00:14:21.040 | We set the tools, and then I define this get tool attribute, which is just a dictionary

00:14:27.040 | that allows us to more quickly extract tools as we are like getting the name of the tool.

00:14:33.880 | Okay, which function do we use?

00:14:35.520 | We use this basically to map between strings to actual functions.

00:14:43.240 | Then we initialize our chat history and our memory.

00:14:49.600 | Okay, cool.

00:14:51.120 | So we have that.

00:14:53.640 | Then what we do here is, so this step is the step that is taken at the very beginning of

00:15:02.160 | our workflow.

00:15:04.120 | Now one thing I made sure to add in here was to clear memory, right?

00:15:12.000 | You don't need to clear the memory.

00:15:14.840 | It kind of depends on what you're trying to do, right?

00:15:17.200 | If you want to maintain your sort of chat memory between runs of your agent, then you

00:15:25.160 | probably would not do this, right?

00:15:27.240 | So you could just comment this out because you don't want to clear the memory.

00:15:30.600 | But you do need a way of, I don't know, like if you are starting a new conversation with

00:15:36.440 | your workflow agent, you would need to redefine or reinitialize your research agent.

00:15:42.880 | In this case, what I've done here is it's every time you saw a new workflow, it's reinitializes

00:15:49.480 | the sort of conversation.

00:15:52.680 | It's up to you, the thing that you are doing, like what are you building as to whether you

00:15:57.560 | do that or not.

00:15:58.560 | I think in a lot of cases, particularly chatbots, you wouldn't do that.

00:16:02.720 | Again, it's up to you.

00:16:04.240 | Now that start event is the default event type from Llama index, and it includes an

00:16:11.600 | input parameter, which is like the user query.

00:16:16.240 | So we're going to take that and we're going to format into a chat message, and then we

00:16:22.400 | add it into our memory, okay?

00:16:24.120 | So now in our memory, we're going to have the user message here, but actually I just

00:16:29.840 | realized there was an error.

00:16:33.240 | We should also have our system message here.

00:16:36.200 | And I'll leave this bit here, but essentially what I want to do is I'm going to, after defining

00:16:45.800 | or reinitializing my memory, I'm going to do self memory, and we're going to put the

00:16:53.400 | system message.

00:16:54.400 | I think that is required, unless I'm misunderstanding what I already did.

00:17:00.200 | Oh, the role here would be system.

00:17:06.160 | Oh, this would be content, sorry, my bad.

00:17:13.320 | There we go.

00:17:14.320 | So now in our chat history, we have the first message is always going to be the system message

00:17:20.040 | followed by the user's input message there.

00:17:24.680 | Great.

00:17:25.680 | Well, so we have, I think it's everything.

00:17:31.640 | That's like the, that initializes our agent with every new call.

00:17:38.200 | Then what do we have here?

00:17:39.480 | Okay.

00:17:40.480 | So handle LLM input.

00:17:42.280 | So we have our input event.

00:17:43.960 | Input event is what we define, right?

00:17:45.440 | So if we come up here, we know the input event should consist of a list of chat messages.

00:17:52.080 | Great.

00:17:53.980 | So the chat history is the event input, right?

00:17:57.400 | So it's actually, we separate it out from here, from this, and what we do is we actually

00:18:04.840 | extract the chat history and sort of manually insert it into our event.

00:18:09.160 | You don't, I mean, how you do that is kind of up to you.

00:18:12.920 | I like that we pass the information between events, between steps via the event.

00:18:19.240 | So that's what we do here.

00:18:20.920 | We get our Oracle response.

00:18:23.600 | So we're using the a chat with tools here.

00:18:26.320 | We await that because this is asynchronous.

00:18:30.240 | Then we just pass the tools to our a chat with tools.

00:18:33.040 | We have the chat history and we have tool choice required.

00:18:37.320 | Again, that is just doing what we did here.

00:18:42.600 | I'm not sure which of those you actually should use.

00:18:46.300 | If both of those are considered or not, you probably only need one of them, but I'm not

00:18:50.440 | sure which.

00:18:51.840 | In any case, I put in both places, it's fine.

00:18:56.200 | Now, we want to get our response here.

00:19:01.840 | Okay.

00:19:02.840 | After it's been generated, then we get the message from that response and we put it into

00:19:06.920 | our memory.

00:19:08.040 | Okay.

00:19:09.040 | So the message format here is actually already a chat message object with the role of assistant.

00:19:15.600 | And of course the message content generated by our LLM.

00:19:19.200 | Now there is always going to be a tool call in our response because we've one set tool

00:19:27.360 | choice to required.

00:19:28.560 | And even when the LLM is supposed to be answering directly to the user, it has to answer via

00:19:33.720 | our final answer tool.

00:19:36.040 | So there's going to be, or there should be a tool call.

00:19:39.480 | If there is not, there is a problem.

00:19:42.320 | So what we do is if that tool call is final answer, we return a stop event, which stops

00:19:49.200 | everything and we return what the LLM generated to be input into the final answer tool, which

00:19:55.560 | is going to follow that structure that we defined earlier for that tool.

00:20:00.840 | So yes, we stop event, that stops the workflow, returns everything.

00:20:05.760 | Otherwise we're going to return a tool call event, right?

00:20:09.040 | So this is where we're returning different types of events depending on what SEP we'd

00:20:14.240 | like to trigger.

00:20:16.040 | So if we are triggering a tool call event, we need to pass in our ID, name and params

00:20:25.120 | as we defined here.

00:20:27.360 | Okay.

00:20:28.360 | So that's our custom structure.

00:20:32.120 | And then I think, yep, final set is okay.

00:20:36.000 | We don't need to handle when this happens.

00:20:37.680 | This just ends the workflow.

00:20:39.440 | If this happens, we do need to handle that and we handle that using the run tool method

00:20:44.960 | here.

00:20:45.960 | So that event triggered by the tool call event, outputs an input event.

00:20:50.380 | So basically we'll go back to the, this step here, where you're going back to the LLM to

00:20:58.160 | make another decision.

00:21:01.800 | And what we're doing, okay, we're getting the tool name.

00:21:04.680 | We have the additional keyword arguments.

00:21:07.960 | We get our chosen tool.

00:21:09.600 | So this is using the get tool dictionary that we defined before.

00:21:14.640 | Okay.

00:21:15.640 | So that is this here, right?

00:21:20.520 | So this maps us from the tool name as a string to the actual tool function.

00:21:26.240 | Okay.

00:21:28.120 | So we get our tool.

00:21:29.360 | It's using a get.

00:21:30.520 | So if there is no tool with that tool name or no key in our dictionary with that tool

00:21:38.240 | name, it's going to return none.

00:21:40.320 | So if it returns none, what we return is this chat message, the role is tool and the content

00:21:46.600 | is just saying, okay, we couldn't find this tool.

00:21:49.440 | This should not, ideally this shouldn't happen.

00:21:53.240 | But just in case it does, this is kind of like error handling, but for the agent.

00:21:59.340 | Now hopefully that doesn't happen.

00:22:01.480 | If it does, actually we should handle things differently if it does.

00:22:05.560 | So another little bug I just found, which is this should be here.

00:22:12.440 | Yes, like that.

00:22:15.760 | Okay.

00:22:16.760 | There we go.

00:22:19.500 | So that should, that will work now.

00:22:21.720 | So basically if there is no tool, we're going to return this tool and that will be added

00:22:25.400 | to our memory and then returned.

00:22:27.040 | Okay.

00:22:28.040 | So now our agent or our Oracle LLM can see that we tried to use this tool that doesn't

00:22:32.640 | exist and it will hopefully not call it again.

00:22:35.000 | It shouldn't call it in the first place, but just in case, otherwise let's say the tool

00:22:40.200 | does exist.

00:22:41.760 | We call the tool, we use a call async, we await the response, we're passing the parameters

00:22:48.880 | that our LLM generated and we then create a tool message, right?

00:22:54.040 | So the role is tool.

00:22:55.480 | This is the output.

00:22:56.680 | So let's say it does rag, it's going to return all the rag content that it searched for.

00:23:02.120 | If it decides it wanted to use a web search tool, it will return the results it got from

00:23:06.440 | that.

00:23:07.440 | And yeah, I mean that is all it's doing.

00:23:10.680 | We then again, add that to the memory and return it to this step here.

00:23:16.520 | Okay.

00:23:17.520 | So basically this workflow can keep looping through between handle LLM input and run tool

00:23:23.520 | until it has enough information.

00:23:26.000 | Once it has enough information, the Oracle LLM here will decide to use the final answer

00:23:31.920 | tool to respond to the user.

00:23:35.200 | So we have that.

00:23:37.400 | We can draw the workflow we just defined.

00:23:42.320 | So I'll do that.

00:23:44.080 | Let's open that and see what it looks like.

00:23:47.080 | So we're in Colab.

00:23:48.080 | So we'll just open this on the side here.

00:23:49.640 | I have to, I think I download it.

00:23:53.160 | Okay.

00:23:55.360 | So it looks like this.

00:23:56.600 | We have our site event, prepare chat history, and we go to our input event, right?

00:24:02.440 | Then that will go to handle LLM input, which can either use the tool call event to run

00:24:07.280 | a tool, or it can go to SOP event.

00:24:09.880 | Okay.

00:24:10.880 | And it will go, it can go in this loop for quite a few turns if it needs to.

00:24:17.160 | And then eventually go to SOP event and done.

00:24:20.480 | So that is our workflow.

00:24:23.040 | Nice little graph.

00:24:24.040 | It looks like this.

00:24:25.280 | Cool.

00:24:26.280 | So initializing the workflow, we, our research agent, I defined that we need to pass in our

00:24:32.480 | Oracle LLM, and we need to pass in the tools that we're going to use.

00:24:37.600 | Okay.

00:24:38.600 | So this is nice.

00:24:39.600 | You can basically have some sort of, like, if you want to change a number of tools that

00:24:46.480 | you have access to, you can do that super easily.

00:24:49.600 | You just, you change the inputs here.

00:24:51.880 | You don't need to change the research agent code itself.

00:24:55.560 | In most cases, unless there's some sort of dramatic change.

00:24:59.360 | So I'm going to run.

00:25:01.680 | Okay.

00:25:02.680 | I made a mistake.

00:25:03.680 | Let me fix that quick.

00:25:05.660 | So here you can see self memory put, that expects a message item, right?

00:25:12.560 | You can see here, it expects a chat message.

00:25:15.720 | Here I just tried to, like, throw in the role and content directly there.

00:25:20.360 | So this, we will write message equals chat message like so, okay.

00:25:38.440 | Rerun that.

00:25:39.440 | Rerun the rest.

00:25:40.440 | Okay.

00:25:41.440 | And then we have another error here.

00:25:44.240 | So invalid type expected object, we've got a string.

00:25:51.800 | Let's work through that.

00:25:53.560 | Not sure how I managed to break everything, but it's good to at least go through a bit

00:25:58.280 | of debugging as well.

00:26:00.040 | Okay.

00:26:01.040 | My bad.

00:26:02.040 | All right.

00:26:03.040 | So, so here we've already defined the system message.

00:26:04.880 | Why did I do that?

00:26:06.640 | So we just need to go here and here.

00:26:13.000 | Okay.

00:26:14.000 | Cool.

00:26:15.000 | All right.

00:26:16.960 | So there I was putting, like, a chat message within a chat message.

00:26:20.760 | So obviously we don't want.

00:26:25.400 | Then.

00:26:26.400 | Great.

00:26:27.400 | So now it works.

00:26:30.600 | Just a silly little input here.

00:26:31.880 | Right.

00:26:32.880 | I said hi.

00:26:33.880 | Even though I put something really stupid, it still sticks to that, like, strict final

00:26:39.720 | answer structure that I wanted it to, which is, it's intentional.

00:26:44.760 | So that's good.

00:26:47.120 | Then what we can do is just say, tell me about AI, see what, see what it comes up with.

00:26:53.160 | So actually this, you can ignore, we don't need to do that.

00:26:57.200 | Here I got a workflow timeout error.

00:27:00.440 | So the reason that I got that, so something we should, I should change even.

00:27:05.080 | So here it's using, and this is a good thing, right?

00:27:09.320 | It's using the web search, it's then going to reg search, it's then going to fetch archive,

00:27:13.600 | then it's going to this reg search filter.

00:27:15.720 | So it's going through, it's like checking like a ton of information, which is what I

00:27:18.920 | wanted to do, but I am a little bit stupid and put the timeout for our workflow to be

00:27:30.000 | a little bit too tight.

00:27:31.880 | Okay.

00:27:32.880 | So to add a timeout, we need to modify the, it's a self_timeout attribute.

00:27:42.460 | So we're going to set that to a timeout value that we will insert here.

00:27:47.800 | I'm going to set a default value of like, not 600, that's insane, like 20.

00:27:56.040 | So a little bit higher, and you can modify if you want when you're initializing the agent.

00:28:01.920 | So run that, run everything again, initialize our agent, and we'll just ask it to tell us

00:28:09.640 | about AI again.

00:28:10.640 | So we run this, let's see what happens.

00:28:15.840 | So it's going to go through, I don't know if it's long enough, we might need to increase

00:28:19.640 | the timeout a little further, but let's see.

00:28:22.920 | Some of these tools also take a little while to actually respond to get the information.

00:28:28.240 | Yeah.

00:28:29.240 | Okay.

00:28:30.240 | Let's go a little bit longer, but you can see here, right, operation time left 20 seconds

00:28:36.200 | now so that it did work.

00:28:38.760 | Let's go with timeout of 30 and hope that's long enough, otherwise it's more a case of

00:28:46.880 | maybe prompting the agent to not use too many steps.

00:28:53.720 | Let's see.

00:28:54.720 | Okay, cool.

00:28:55.720 | So that worked.

00:28:56.720 | It did take just about 20 seconds there.

00:29:00.060 | So we have our introduction, which looks good.

00:29:03.920 | We have, sorry, introduction is just this bit.

00:29:07.360 | We have the research steps, it's like conduct a web search, gather information, utilize

00:29:14.400 | a specialized AI research tool from academic papers, retrieve specific historical contexts

00:29:19.400 | and developments in AI from both general and specialized sources.

00:29:22.600 | So that's what it's doing with each one of these tools here, so it's going back and forth.

00:29:28.920 | Main body.

00:29:29.920 | So then it's like, okay, it's giving us a ton of text here.

00:29:33.480 | Look, we can see that a British person is pioneered AI, of course.

00:29:40.620 | So looks generally quite good.

00:29:44.440 | Yeah, nice little history lesson here, actually.

00:29:47.340 | Very good.

00:29:48.840 | Then we have the conclusion.

00:29:49.840 | AI has come a long way.

00:29:52.040 | And then sources.

00:29:53.040 | All right.

00:29:54.040 | So this is, I think this is really useful, just having the sources of where everything

00:29:56.640 | came from.

00:29:57.640 | Ideally, it'd be nicer if you insert links and everything here, but time for that later.

00:30:06.440 | Then oh, we can also try this as well.

00:30:08.560 | So we can just try running our research agent.

00:30:12.440 | Let me do that timeout here.

00:30:17.760 | Timeout of 30 to each one of these.

00:30:21.800 | All right.

00:30:23.220 | So because we did everything in async, we should see a speed up when we run everything

00:30:29.560 | in parallel, even though I didn't fully write everything in async.

00:30:33.080 | It should at least be slightly less blocking than it would be otherwise.

00:30:38.080 | So I define all these calls, I'm gathering, I'm putting them in a list and I'm gathering

00:30:44.080 | them with async I/O. So that's basically, it's going to take that list of async things

00:30:50.920 | here.

00:30:54.320 | And then it's going to run them all together.

00:30:56.600 | Okay.

00:30:57.600 | So I will run that, see how long it takes.

00:31:00.960 | It should be kind of similar to how long it's taking to run all the other, the one last

00:31:06.840 | time.

00:31:07.840 | We're doing everything in parallel here, right?

00:31:09.400 | So we're printing like four web searches because all of them are going to web search first.

00:31:16.000 | Then most or three of them went to red search, one of them went to fetch archive.

00:31:22.540 | And then we had this group and then we still have them continuing here.

00:31:26.680 | And it's just finished.

00:31:27.680 | So 25 seconds, slightly longer than just running one, but faster than running four of them

00:31:34.040 | one after the other, right?

00:31:35.600 | So that's why we use async.

00:31:37.960 | Then cool.

00:31:39.680 | So we have our outputs, this is zero, right?

00:31:43.240 | So this is actually high.

00:31:45.720 | So there's probably doing something wrong with the code there.

00:31:52.520 | I wonder if it's somehow getting the history from the first agent.

00:31:58.840 | Okay.

00:32:00.020 | Something to check there.

00:32:02.120 | Let's just have a look at number one.

00:32:03.840 | I don't remember that happening last time.

00:32:06.900 | Number one is actually, okay, tell me about AI.

00:32:08.980 | So we'd expect that to talk about AI.

00:32:12.700 | Number two, please don't talk about AI.

00:32:14.880 | Okay.

00:32:15.880 | This one's talking about rag.

00:32:18.120 | And this one, three, is talking about, what did I ask you about?

00:32:26.640 | What is the latest in LLMs from OpenAI, which it talks about, right?

00:32:30.400 | So run all those in parallel or with async.

00:32:34.340 | Something weird happening there.

00:32:36.540 | Probably I should have reinitialized the agent, but that's fine.

00:32:41.200 | But yeah, generally, other than that little hiccup, it looks pretty good.

00:32:46.680 | We see async is working really nicely.

00:32:48.480 | I mean, we've just seen how to use the library, which is great.

00:32:52.280 | And it was working really well, right?

00:32:54.440 | The agents are actually going to a lot of different steps and yeah, it's functioning

00:32:58.520 | as you would expect, which is pretty nice.

00:33:01.700 | So I think that is it for this video.

00:33:03.840 | I think obviously, yeah, we've seen LLM index, looks pretty nice.

00:33:06.920 | I think there are definitely a lot of pros, a lot of cons, depends on what you're wanting

00:33:11.560 | to do.

00:33:12.560 | But honestly, I think just the async part of it, like going with async first is pretty

00:33:19.720 | valuable because you're going to see performance like we saw at the end there, right?

00:33:24.440 | You run four at the same time that you would run one workflow just by using async.

00:33:30.880 | And of course that scales, right?

00:33:32.480 | You can run way more than four as well.

00:33:35.040 | So yeah, just that one thing about LLM index workflows is by itself super valuable in my

00:33:43.040 | opinion.

00:33:44.040 | I can't say that I would have a general preference yet between line graph LLM index workflows.

00:33:51.880 | They're both different, both good in different ways.

00:33:55.120 | I think both do some things better than the other, that's for certain.

00:34:00.520 | So it's kind of hard.

00:34:01.520 | I mean, for now, you know, I have line graph in production, I'll probably just leave that

00:34:06.320 | for now.

00:34:07.320 | I'm not planning to switch over everything yet, but maybe at some point in the future,

00:34:12.240 | it's something that I need to try and use LLM index workflows more before I can say

00:34:19.880 | anything about jumping or which one I prefer.

00:34:23.760 | So yeah, that's it for this video.

00:34:25.440 | I hope all this has been useful and interesting, but for now I'll leave it there.

00:34:28.960 | So thank you very much for watching and I will see you again in the next one.

00:34:32.000 | Bye.

00:34:32.320 | Bye.

00:34:32.820 | Bye.

00:34:33.320 | Bye.

00:34:33.820 | Bye.

00:34:34.320 | Bye.

00:34:34.820 | Bye.

00:34:36.320 | Bye.

00:34:36.820 | Bye.

00:34:37.320 | Bye.

00:34:37.820 | Bye.

00:34:39.320 | Bye.

00:34:40.320 | Bye.

00:34:41.320 | Bye.

00:34:41.820 | Bye.

00:34:42.820 | Bye.

00:34:43.820 | Bye.

00:34:44.320 | (soft music)

00:34:46.740 | you

Llama Index Workflows | Building Async AI Agents

Chapters