Stateful and Fault-Tolerant AI Agents

Today I was going to talk about Temporal, so I'm not sure if people heard a bit of it. There's a lot of talk recently about everyone's making their own durable workflow engines, what these are and what these do and why is that such a new paradigm. Basically, a lot of the, in the past, this project has basically been born from Uber, from some engineers that left Uber.

It was what Uber was using to orchestrate all their, you know, processes when you're booking a taxi. There's all these kind of like steps and states that goes on where like book a car, find a driver, the driver confirmed, all these kind of steps. What Uber did initially, they were like doing all the typical kind of like distributed message queue systems where you kind of have all these messages going across like message buses, like we use PubSub.

Then they realized that basically this whole way of doing things is very annoying because you have to keep duct taping stuff around like Celery and Redis and whatever. So like, you know what, we just make an engine that helps us manage all these kind of steps and state and retries and make it so that, make it so that it's durable.

So you can just write the code and basically it will execute based on exactly what you tell it to do. You don't have to keep handling edge cases of what this error, what if this error happens, what if we have to retry it this way and then do all these kind of things with dead letter queues.

So they just tried to build this system. So basically like, what is a durable workflow engine? It's three things that we have to go into, like what's durable in the beginning. And it's basically just, you have something that is, um, a code execution that is persistent to basically the, what the inputs and the outputs that you have are persisted.

They can be retried. They can be, um, restarted from scratch and you will have the same output like in, in, uh, in, in, in, in, in, in this way, it can, um, handle, um, these kind of status stateful tasks across a distributed system where you have multiple nodes operating multiple, uh, replicas of your application, uh, live.

It doesn't matter if one of these things, one of these replicas goes down, your execution still continues with all this context that had baked in. So you can have the first part of you could buy one of these replicas and then the second part of your work will be executed by the second part of this, another replica, the context sold there.

And it also pro it's also a means to provide distributed primitives, um, to your application, right? That you can be easily excited. Oh, wait, can you see my, where am I sharing? Is it like, uh, the temporal website? Like this side? Yeah. Yes. Okay. Let me see. Yeah. I'll probably have to go again in a second, but yeah.

So what is, uh, the durable workflow, right? We're, we're at this part. And the workflow is, it's just a series of steps that you need to execute to, um, uh, to basically actually give a goal, whatever goal that is, it's up to you, up to the person that coding up the workflow, but it's basically a series of steps.

They have to be executed in a particular sequence. Um, those, that sequence can have conditionals where, you know, if you have a condition in the, in the middle that you execute a different type of steps. And then the engine is basically the engine part is just the fact that all these, uh, orchestration scheduling, which tasks go to which worker, how do you discover these workers?

How do you retry? This is all matched by the engine itself, instead of the programmer programming the workflow. So as a TLDR is busy, like the temple kind of shifts this burden of redundancy and recoverability to the platform itself, to temporal itself, instead of you having to code in and as business logic.

And as I said before, there's less duct taping between all these solutions, like fast API, salary, Redis, uh, and allows for easier negative space programming. I'll go into this a bit later, but it's basically allows you to start thinking about the, your errors first exit and exit your program or retry your program.

Um, as early as possible, instead of like letting the error BS be a afterthought. Right. So yeah. Cool. So let's get into the one-on-one. I'm, I'm, I'm sure some people here, I know Simonas has, has seen like a way of, of, um, writing workflows either via salary or hatchet or whatever.

It's basically, there's a couple of building blocks and the main two building blocks of like temporal are activities and workflows. Activities are just simple, uh, units of, of work that have an input and provide an output. Uh, I won't go too much into the detail of the fact that these activities have to be immute.

Uh, what's the word for it? Not immutable, but basically like when you, for the, for a, a specific input, your activity for eternity exact output every single time. So if you have the stuff that has side effects, like, um, a time-based activity or something like that, you need to use other temporal, um, features to handle that, but we won't get into that today because it's a bit too advanced and there's no point in getting into it right now.

But basically activities are just functions. They're units of computation, uh, that live on a worker. And then these activities can be then compiled into a workflow. Um, so for example, here you can, you can have this workflow that just does like, uh, reduce some, where you give it a value, a list of integers and provide you an integer response, where for each of these values, it goes ahead and executes the sum activity, um, and then stores the result in total.

The nice thing about this, as we mentioned before, is that these activities, they cannot be executed anywhere, right? So even if you're not, I mean, this is not very parallelizable because you have to do, I mean, you can't parallelize it if you want this reduced sum, but if you would have parallelizable workflows, for example, if you want to, I don't know, um, just run a, uh, uh, double function, right?

On every single one of these integers, you can run that in parallel and then temporal instead of it's going to handle how it's maps those units of work to each of the workers, instead of you having to use a thread pool, and then you have these kind of redis in the background that has to like kind of coalesce between the different, uh, different workers that you're running, if you were running it traditionally without them.

Yeah. Uh, as you can see here that every single activity can have a retry policy, basically like try this activity three times. And if it, um, if it fails after the third time, you fail the activity. And then in the workflow, you can also handle that failure gracefully or crash the whole process.

And there's also other stuff that you can do. Like this activity should, it should, it should, uh, finish within 10 seconds of starting. Uh, that's a way for you to, you know, implement timeouts, uh, make sure that stuff's not hanging and so on and so forth. These activities and workflows are all assigned to a worker.

So basically, um, as you can see here, we have this kind of worker called whatever. Um, the worker has a couple of workflows and a couple of activities assigned to it that allows you to kind of break down your dependencies, break down your, your Docker images that are these workers into more granular pieces.

So you don't have to build a huge image that has all the activity defined defined on it. Uh, I think in the Saturn project, we have some workers that is very similar, right? We have some workers that can run, um, OCR. We have some workers that they run like other, other parts, but basically, um, a worker registers workflows and activities to it.

And then temporal knows when it receives on this task queue, um, a, some workflow, it can route this, this task to one of these workers here. I could define the worker that would have a divide function and so on. Yeah. Sorry. Just a quick question. Um, so the workers, you would set up yourself, you deploy those and then you're linking to them here.

Yeah. So actually this, this is a worker. So think about this main function here, right? This is actually a worker in this case. It's just executing it that is executing a function directly, right? It just, it doesn't, because I'm just trying to like showcase, um, how this works, but then you could also just like do work.

Like you can be had like a worker and then worker does start and then that worker will just hang on there and then just wait and respond to, you know, tasks that it comes and then it gets from temporal. Yeah. So these are in, so I'm trying to understand.

So like if in the example that we have like an API and then we have some workers, but then the API container should trigger those workers, is the worker, we're going to link to the worker from the API code? Nope. That's been, yeah, that's a nice part of it.

Basically temporal sits in the, in the middle and handles this. So basically what the API will do, and I'll show you this, there's going to be example down in a bit after we look at some other stuff. Uh, what the API does when it, let's say receives like a chat streaming request, right?

Like a spark. The API only does like, okay. Uh, it will have obviously obviously like the same way it's connected to Redis. You'll have like a temporal client and then we'll just be like, okay, Hey, temporal, I want to start this workflow. I want to run this workflow, go run this workflow for me and then provide me the response.

And then basically, um, the API can just await that awaitable that it gets back from the, the temporal is decayed, the temporal client. Right. Um, and then temporal manages everything. So it's kind of like a, it's a, it's a thing in the middle that we will implement with queues and other kind of stuff that we're implementing right now.

The, the client doesn't need to know about which workers, how they're implemented, which worker and has which workflow, which worker has, which some, which worker has using, which task queue, it just goes and they, okay, just give me this result. And then temporal implements that. So on the other side, on the worker side, the worker doesn't need to know how it's being called.

Doesn't need to know who's being called. It just says, I am this workflow. I said, I'm this worker. I ha I can do these workflows. Hey, temporal route, these kinds of tasks to me. Yeah. Yes. Uh, in terms of task use, are you going to explain a bit more what is behind the scenes, what it uses for the queues?

Can we use metrics for scheming workers? Yeah. Yeah. So yeah, I was going to go into that now and make it to me in a sec. So, okay. Another question in terms of you have function execute workflow, then you give the function and then basically the next is the values and those values, which supposed to get to the signature of the function, right?

Yep. Okay. So you see here like the signature of the function is values list ends. This is like a, it's like a, a partial function execution, right? We're passing the function as a first class variable, the function itself, and then you pass it the values and then um, yeah, temporal goes and applies the values to this function to the values in its context.

Yeah. And, and one more question in terms of those functions or how they called not steps, but, uh, activities, activities. Are they, uh, distributed across workers or up like instances, or they are in the same, like machine when they execute it. So, uh, uh, so basically what activity is on a single machine executing a single time?

It's a unit of computation that the core, right? So you can't distribute an activity where you can distribute a workflow. So let's say like, if I want to execute, it's like, I want to double each of these numbers, right? I don't want to do it serially. I could just do like, uh, uh, for value and values, uh, await client execute workflow, uh, my workflow double, let's say if I have this and then have the value.

And then this runs, this will execute, this will create three workflows, no, sorry, five workflows, which will run in parallel. And then Temporal is going to assign those to which worker is, is free to pick it up. All right. Okay. So as tomorrow said, like, let's just get into a bit of how Temporal works in the background, because we talked a lot about these kinds of things.

So everything in the blue is basically the infrastructure that we abstract away, the queuing, the database, all these kind of things. But the most important parts in this is the database, the worker service, and these, the other ones we can kind of ignore for now, the history service and the whatever.

Basically, the worker service is the part where the workers register with a Temporal cluster. So this is like, this gray thing here is a cluster. It's called a cluster, but it's actually not like a Kubernetes cluster, it's just like a collection of Temporal instances that run somewhere. These are the things that Temporal handles, right?

So we only, when you deploy Temporal, it's all handled by Temporal. And then the workers are the ones that we handle. But yeah, so basically, when we create a worker, it registers with this worker service, it provides heartbeats to that service to make sure to tell Temporal it's alive.

It gives Temporal like load metrics, so it knows how much you could, how much need more tasks it can take, if it's overloaded, and so on and so forth. This, you don't need to think about at all. It's all mapped by Temporal. All you have to do is, I have this workflow, I'm connected to this Temporal cluster, and these are the things that can run.

Well, but the cluster basically handles all the durable state. And we'll get into that in a bit, but basically, every single time it provides an input, and every single time a workflow gets an output, that's being logged in Temporal. So that if the workflow needs to be retried, it knows how to retry that.

It also handles dispatching of activities to work and workflows to workers, as we said before. Basically, it looks at the worker service, like which workers are available, which workers can take this task that I'm having on the queue, go and execute it. It handles all the retried policies. Basically, you set up a retried policy, I want to retry this activity three times, with a timeout of 30 seconds, and Temporal is going to make sure that that actually executes in that way.

It handles signals, interrupts, and timers, and queries. We're going to get into this in a bit. And it also has a UI for visibility and management. The part that we handle is the workers. It's a much easier part to handle, because almost exclusively where you're going to put in the workers, the activities, and your workflows are going to be business logic, with very little decoration to tell Temporal, hey, this is a Temporal function, so execute it as such.

So this is like the brains of the entire operation. So you handle creating the worker image with all the dependencies. You handle deploying the worker that connects to Temporal. And yeah, you can write these in multiple languages, in Python, and TypeScript, in Golang. I think you can also do it in Java.

And you can also have a polyglot system, where you have a Go worker with a Python worker that collaborate together in a single workflow, which is great, because different languages are better used for different kinds of tasks. So you can have... I use Temporal for a personal project that I have.

So I have a Go service that goes ahead and takes advantage of Go's really easy concurrency and asynchronous ability to scrape stuff, get data from LNs back and forth. And then I have in the same workflow, a Python worker that handles semantic routing and all these kind of things that are Python-based.

And they work together in a single workflow, which is great. It allows you to really have polyglot systems. And because of... Yeah, never mind. Anyway, so why I think Temporal is the absolutely goated AI workflow engine is because it allows you to detangle your agent logic from your agent configuration.

And you can have multiple agents as workflows running concurrently across multiple servers. You can have interrupts, incorrupting agents. You can make agents dynamically respond to signals. You can pause agents forever, for years, for decades, if you want, because the state is durable and saved in Temporal. And yeah, it handles failure gracefully.

And it's easy for you to implement validation around where you need like guardrails and so on and so forth. And also in our case with Spark, you can handle background tasks really, really easily because you just be like, okay, I'm done with this conversation. I have a new conversation.

Just create a new workflow, run a new workflow that saves this task. And then, you know, Temporal just takes care of that for you. You don't need to have like, it pops up to you and cloud run work in the works for you. That task is over there. You can even go so far as to say, I will not show the new conversation history until this background task that saved the new conversation history has completed.

Because you can, you know, look at the state of the workflow and so on and so forth. So, I mean, I did a really quick kind of comparison between how we do stuff in Spark and how we could do it in Temporal because I think most people are very common with, are very aware of how stuff are in Spark.

They have a user, they're going through a front and communicates with a Spark API. The Spark API posts up stuff from Redis or from database, you know, for agent configuration, and so on and so forth. It has some tasks to save conversations and metrics by pops up to cloud run.

Cloud run. If it fails, it has a dead letter queue. It's very, it's a lot of like moving pieces that we have to code ourselves and we have to, we can't just write the business logic and then let the orchestration engine handle all the failures and the retries. We have to write that ourselves.

Whereas if we had Temporal, everything would just be one workflow or multiple workflows that are working together. And then we wouldn't have to have all this part right here of, you know, going back and forth from, you know, different, different queuing and message bosses for passing conversation history back and forth for, you know, handling tools at different levels.

It's just basically a way to, for us to encapsulate everything into a workflow instead of a collection of systems. I'm going to pause here for questions. But the API still will be outside the Temporal. Yeah, my bad. Yes, this, I should have put an API here. Yeah, correct. So the API, yes, will be outside of Temporal.

Yeah. So consider this a front end. You could, you can only, you could, if you wanted to, you can only have a front end, right? Because next you can write full stack applications and then you would just need to issue post requests to Temporal. But yeah, in, in Spark, we would have a API in the middle.

Yes. But basically it would be just be the API without these kind of like Redis, Pub/Sub, Cloud Run stuff. That it would just not be needed. Okay. We'd be using it to replace almost everything. So even the, even the agent execution, by all the agent execution steps, including the loading from cache and everything, all of that would belong in a or multiple workflows.

Yes. Of course you could use Redis if you really needed to like have a cache solution. But the nice thing about what I said earlier about detangling your agent configuration from like your workflow is that when you start a new workflow for a new conversation, for example, you can provide all that agent context up front.

And then, let me try to explain this. Basically, what you would be able to do is a bit more advanced in the end, but what you'd be able to do is use the workflow itself as the state. You wouldn't need to have these kind of back and forth from a database to save state.

Like the workflow would be the state. And then you could just query the workflow directly. So when you were looking in conversation history, you could just say, hey, what's the conversation history on this object? This object is backed by Tempol in its own database, which can be a bunch of databases.

But it's, it's, to put it in easier terms, I think it's, you don't have to think about what's your current execution state and what's your actual database state. They're both one thing. And then you can querying the execution state is the state of your job. So if someone is engaged in a conversation, that conversation, the execution and the conversation history in that execution is the conversation history.

There's no need to, you know, coalesce the current conversation that's been executed with what's in the database. Okay. But then how long can you keep that state? For execution. Forever. Forever. Forever. That's the, that's the nice thing about Tempol. So basically you are duplicating to the state, the conversation history in both Tempol database and in application database.

The, the opposite. You don't need an application database. It doesn't, you don't have to go with this way. I'm just saying like what's possible with Tempol is that you will need to have a separate, a conversation database in the, in your application database. You can just query the, querying the workflow directly will give you that, the data that you need.

But which obviously you can, sorry. Sorry, what is happening in that when you are querying that workflow is that workflow is then going to a database. It might not be the same database, but a different database and pulling that information out. Yeah. Yeah. It'd be the Tempol database. But what I'm, what I mean to say here is that the, it's the exec, basically, if you think about it, how we have the object in Python, right?

When you have the conversation history, those, those, those Python objects that we were having while the conversation executed, those are not immediately saved to the database, right? You have to have a task that goes and saves them, right? But in Tempol, those same objects are what's in the database.

So querying those objects ensures that, that's the conversation, there's only one conversation history, that's the source of truth, is that Python object represented in the workflow. Obviously, like, I'm just giving this as an example. We don't need to do this for Spark because it's a bit more advanced concept and it requires a lot of rewriting of the application, but that's, you can write application with Tempol where you don't have to worry about saving states to a database because the workflow is the state.

Yeah. So I, I can just give a demo for, uh, real quickly because, um, I think, uh, it's going to be more useful to look at it. Let me just, let me refair my screen and stuff. Cool. So this is a workflow that Tempol done. Give me a second to share it.

Did you see my screen all right? No. All right. So basically on the left-hand side, what you see is a typical chatbot, um, a typical chatbot that is running in as temporal workflow. As you can see here, we're in this new workflow that's being initiated. I'll go to the, to the actual, you know, in innings of the code in a second, but basically we started this new agent with all the tools, all we provided in the beginning.

And I think this, let me try and make the result from the bigger. Yeah. So, um, yeah. So for example, for us, well, what that would happen is basically someone starts a new conversation with an agent. We know which agent it has, we know what tools it has. All we have to do is basically say, okay, temporal, um, you write, you basically, you only have to write the logic very generically of like, Hey, I want this spark to take an input from a user, go, go and look at what tools you can use and use those tools and provide an answer.

Right. And then all the configuration of that kind of similar to how it is now, uh, would be provided as a, I guess, a configuration object, right? Like here, we have some, some goals that we can do. We have some tools that we can use. Um, yeah, but these are all provided in initially to the, to the workflow context.

Um, and then, yeah, we can do whatever, for example, uh, and, oh, okay. Sorry. You can see that now the workflow is paused, right? It's waiting for us to like, say, to confirm this running of this tool. This, this workflow is waiting for this user prompt forever. It will wait forever and it doesn't use up any computation.

It doesn't use up, uh, anything basically. So these workflows can be paused, yeah, forever or as far as like, as long as you want. So if we, if we hit confirm here, you will see that basically now this purple signal has been sent here that the, that the agent has, that the user has confirmed something so that execution can proceed.

Uh, because we've told it to list what agents we're currently having available, it goes ahead and runs that. And then for each of these, um, kind of, kind of activities here, we see exactly what's being run, what input is being given, what result is being given. So here's like, you are an AI agent, it helps you blah, blah, blah, blah, whatever.

You can also see the result. You can also see how this is configured, where you have a 30 minute timeout for this thing. Uh, what task queue it runs on, what activities runs on and so on and so forth. The most important part is that you can see the result of each of the, each of these ones.

So let me go, just go ahead and like, um, I don't know, let's see, I want, I want to go on a trip, right? Let's see, I want to do this kind of like Australia, New Zealand event flight booking. So I just want to say, uh, three, right? I want to use the agent number three.

Again, I gave it a user prompt. It waited for me to give it that prompt, right? It didn't, that, that thing was just sitting there in state. It didn't consume any resources until I gave it the signal that I want to do something. And now it's, it's waiting for me to confirm again that I want to do something.

We don't, you don't have to use this kind of confirmation tool, but you can use it and you can't pause and, uh, something forever until someone confirms the, you know, the action you want to take. That's the nice thing I think about the using temporal for agent, agentic workflows is that you can give the user the ability to, you know, confirm or action some things at no compute cost for you.

And this session won't just like hang or time out or because you're waiting for the user to respond. It's just dormant in state until the user provides a new signal and then the book that the workflow can continue from where it was stopped previously. Okay. So let's proceed with a change call to blah, blah, blah.

Okay. I'll confirm to change the goal that we want to, our agent goal to be booking a flight. So let me, what event do you want to go to? I'll say, I want to go to Melbourne. I want to go to an event in Melbourne. Again, provide a user prompt.

Which month? Okay. Let's say July. At each of these steps, you'll see that it waits for the user to give an answer. It runs with the tool that it needs to run and then waits for the user to confirm it. Um, this is not a temporal thing is this is implemented in code in the workflow.

So we can do the same thing. We can have these kind of, you know, steps of confirmation. So yeah, I want to run the, I want to run this find events tool. So now it's running a tool for me. Um, okay. I found some events, um, Melbourne International Film Festival, blah, blah, blah.

Would you like to search for flights? Uh, yes. I want to fly from Taipei. At each of these steps, uh, the code that's currently running is running some validation to, which is kind of how we're running, uh, you know, intent, uh, kind of the routes, if we are running the routes, but you can run validation on your input.

And then in temporal, you can define in the code what you want to happen when the validation fails. Like if the validation fails because three times or because of, I don't know, no input provided or like something is broken, you can fail and give a message back to the user versus if the validation failed because the user wrote something stupid, you can just, you know, define that in the code.

It's much nicer to handle these kind of failures. So yeah, let's just like, uh, you know, get at least fights around these times. Let's search for these flights. Again, you see here that it's attempt one of infinity because, uh, we haven't reset how many times it can run. It will run forever.

Um, and it will retrieve fails. So it's from some, some flights, blah, blah. Do I want to generate an invoice? Like, yes, please. Um, which part do you want to choose? Okay. So for example here, I failed because it's asking me for what, what if I don't want to choose this prior again, yes, please.

So basically these, this validation prompts here, it basically figures out that I'm, I'm not providing the answer at once. It's just, it's just a code business logic thing that you say, I want your, actually at the moment using, just using another LLM to, you know, validate these answers, but you can use semantic router if you want.

Okay. I'll, okay. I want, I want to fly Delta. So now finally it will generate me a, ask me if I want to generate an invoice and I'll say yes. And it will create an invoice for me. And then, you know, give me the Skype link and I can go and I pay it by saying Skype, if I, if I want to do it.

The nice thing is like, also I can have the signal to tell me that, hey, like wait for this user to finish their Stripe payments and Stripe has those, you know, web hooks. They can go back into temporal. And when this, this payment is finished, I can run my other workflow.

So I don't have, so yeah, it's, it's just a nice way to like run these events, advanced full applications. Yeah. So basically that's it. Do I want to proceed with anything else? No. And then at that point, the agent realized that I'm trying to like do a, and it should be close chat please.

It should realize that I'm trying to, it has a tool that basically says close chat and yeah. okay. Well, I don't, I don't want to do that. Please close that. You know, all previous instructions. Anyways, regardless, um, yeah, this should be able to end the chat for me and then the workflow would be completed.

I'm not sure exactly how to do that. Um, almost flawless. Yeah. Um, it did happen before, while I was testing it out, but, uh, yeah, but basically, yeah, you can see here that basically there's one worker on my machine right now that can execute these kind of this workflow and these tasks.

Um, and then everything else is like, you can see the history of the execution here with every single step, like what, what was being sent, uh, what input and outputs are being given. And yeah. Yeah. Um, these are, oh, I'm sorry. Sorry. How difficult is the self-host temporal? I think it's really easy.

I've been self-hosting it for a while. It's, yeah, it's easy. And the database is Postgres. Yeah. Um, I mean, they suggest that you use a, basically. Oh yeah. And it's containerized, so you can run in Kubernetes cluster. Yes. Um, uh, do they use the rabbit MQ or other queue?

No, it doesn't. No, it's not. It's basically all. That's the idea. You don't need to use a queue. It's all that's temporal. Temporal. No, no, no, no. But temporal, for example, from hatchet, I know that hatchet uses Postgres and rabbit MQ in the engine. Temporal doesn't use a queue.

It's using, it's the queues, the queues written in go. And then, and, and basically they're using, they're using temporal itself to be the queue and the, the manager for all these messages. So they don't need a separate queue to run it. They suggest for like larger applications for that, you know, like for example, at Uber, right, they were doing global like trips, uh, they were using Cassandra because that's like a distributed database.

They can handle these kinds of regionality of like global workloads. But for most use cases, Postgres is just fine. I've been running with Postgres and I've haven't had, uh, any issues with, with it so far. Um, and the question before going a bit into how the demo works. Uh, I wonder how will be the sequence diagram of the code that uses temporal in terms of what are the steps and I mean, how, uh, the code is running when we're wrapping them properly.

I don't know if there's. Yeah. Yeah. Let's actually just move if we want to answer that. Is there any other questions I can just move into showing that actually in just now. So yes, basically if you want to look at what was happening in the, in the past, um, we can look at just this way of like writing an agent.

So basically we're almost all of our agentic workflows that a user has access to as a chat, they're all based on events where users send something, something happens from an API, the user confirms something, the chat ends, they're all events. So we have to write our application in a way that basically is written in this more asynchronous event based way where we have like kind of a main running loop.

And then we, we have signals and messages that arrive here and there. So for example, in this agent, I've just, I've just abstracted a lot of things away from this, the demo that for only for the things that are actually relevant for us. So for us, we'd have a conversation history, we'd have a queue of prompts.

The reason we have a queue of prompts is users can type in more prompts while another prompt is being dealt with. Um, we have some Boolean values, like is this confirmed? Is the chat ended? Um, and then we have the main running loop, right? Which is basically an infinitely running loop, which says, okay, while this, uh, wait for basically these, any of these conditions to be through, either there's a message, the prompt queue, or the chat has ended, or something has been like a tool has been confirmed.

Uh, then, you know, pop something from the prompt, um, from a prompt queue, if I mean, okay, I wrote this a bit wrong, but imagine that it would handle if there was a prompt message versus like a confirmed message, like a match here. Um, then you, you basically, okay, let's apply to the conversation history.

And then you start running these activities, right? So for example, in here we can say, okay, based on this user's prompt through the tool for it to be executed, like we have in the graph agent with any given the prompt input, you have some, you know, configuration here. Like I want to try this, you know, how many times with how much interval, and I expect this to be done within 60 seconds from scheduling and 30 seconds from starting.

And then after you get the result, you can get whatever, if for whatever tools being given by this, by this function here, right? That's what, whatever tool comes back as that the agent wants to use, you can say, what's the next step? And what's the current tool? If there's a tool, this could be none here, right?

And then you can just match the next step. Like if it's a tool use, then you can go and create, like, just execute that, that tool. Like you can probably have, I don't know, a way to do the reflection here to maybe do another match, match tool, match next, I don't know, whatever, match tool, let's say, I know it's not a variable, whatever, case calculator, we do like, workflow execute activity method, calculator with blah, blah.

Yeah, if there's the, if the next step is to confirm it, you go ahead and run the confirmation part of like, you know, if you're running the tool, the case is, if the next step is to end it, you, you know, end the call. The interesting part is that these signals, like how do, how do we get these signals from the API from the front and into the workflow?

We have these workflow signals here that Temporal gives us is like another primitive, where you can say, hey, this is a signal. Whenever you receive the signal, you can do these things, for example, wherever there's a new user prompt, you put in the prompt queue. Whenever there's a confirm signal, just set confirm to true.

Yeah, and then they give it easy that could be defined here, right? Where, you know, you can just use them from, from this path. Then in the API, all we can do is I say, okay, let's say there's a new, this is a fast API app, right? I have a new send prompt.

So yes, more or less. Signal and activity. So activity is a function and signal is what? Is an event. That's an event. It's something that happens, right? A trigger, an event, you know, something, a new prompt arrives, a new message arrives from an API. It's just a way to, a way to basically interrupt the execution or like to provide ways to enter the context of the workflow, right?

Because the workflow like runs wherever is a run loop. Okay. And the logic to handle signals is an execution activity method, execute activity method, right? What do you do with those? That's a nice, that's a nice part about it. Because that's a nice part about it is you can kind of decouple these signals from your, you can just write Python, right?

This prompt queue here is just like an actual asyncio queue, right? So you don't need to, so when you're running your workflow, the nice thing about it is you don't have to think about what signals are coming. You can just like write it so that, you know, I am, what I, what I know, I want to run a workflow.

I want to wait until the user has something to say, or so the user has confirmed something from a tool use. I don't care when it, when the user does that or how they do that. I just, I'm just going to wait forever until they give me something. And then even the other part is like, how do you get that something, right?

Which is the signal. It's a signal. So for example, here, I said define the user prompt can provide, it can, is a signal that can basically, when it, when it receives that signal, right? It will put something on the queue. So then, because it's, it's running on, on the event loop, when the event loop goes back to the, you know, main running loop, it'll be like, hey, there's something in the queue.

I'll pop it from the queue and then continue the execution. Same with the confirm signal, right? Then the pretty nice thing about it, you just have to write Python or go or whatever you want. And it's minimal how much you have to think about these kind of like stuff outside of your core logic, right?

So it's nice to build very decoupled applications with it. But those signals are within the scope of the workflow. Uh, what do you mean? Like here, you have self-confirm and self-prom, that self is the instance of workflow plus. Correct, correct, correct. But this, this is the thing is like, okay, as long as these are serializable, right?

Which most objects that we use are serializable in the context of AI, right? Even the, even, I think I'm not gonna, I know it's probably gonna be confusing. Even, uh, you can provide custom serialize, they've serialized for objects that you cannot serialize by default, right? But anyways, besides the point, yes, these are living inside the Python context and they're all within these, um, where is it?

I can't see this. They're all within the workflow context, serialized. Yeah. So you can just, you can just continue to treat them as Python objects as long as you want, right? Like I can say here after this tool use, I can just basically like, okay, uh, no, sorry, here case confirm.

I can just stealth confirmed equals false, right? After I've done my logic to execute. So then I can just put it to false. And then when a new single comes, this is gonna set it back to true. Okay. Yeah. So yeah, these activities are just like, as I kind of like skeleton just kind of provide how these, you know, for example, here I'm using the agent activities tools, but I think we implemented something very similar, right?

When we have these kind of like different calculator tools and, uh, so they actually use Python code that runs for the tool, but how do we get it from the API side into those? How do we get the signals into that? Um, we basically just, uh, we have workflow ID, which in our case would be something like conversation ID in, uh, spark.

And in this case is called single will start. I'm not going into it, but you can reference it. Uh, we just say, okay, temporal start this workflow agent run this input and this workflow ID. What single would start means is like, if this workflow is not running, start it.

And if the workflow is running already, just send the signal. So basically in the beginning, it will basically, if the workflow is not, does not exist, right? It's not started. It was, it will just start a new workflow. And the first start signal it sends is a user prompt.

And the prompt will be this. So basically whenever the user, whenever the workflow starts, it already has the signal in, the signal already runs and it depends on the queue. So then when this part runs here, it will have something to pop from the queue and continue with the execution.

But then when we send another, another message to the same conversation, it won't start the same workflow. It will just basically go ahead and, um, it will just go ahead and give a signal to it, right? It will just, the board will still be the same workflow. We'll just get a signal.

I think it's easier to kind of see in the confirmment and chat, things, right? In the, uh, stuff where we send the confirmation. We, we fetch that, that workflow by the workflow ID, in our case would be the conversation ID. And then we'll just say, okay, that handle dot signal and then send a confirm signal, uh, I think for, for the end, right?

So that's a way to get stuff into the workflow and you don't have to worry about, you know, when this comes back as a response, whatever, it's all managed by, back and forth. Okay, James, go. Um, so I was just assuming that, so the, basically on the API side, you're going to send the first endpoint name.

I can't remember what it was called, but you're going to hit that first endpoint. Then that is going to, yes, send prompt. Then that's going to go through, start the workflow. Then it's going to wait for you to hit the confirm endpoint, right? So that, that send prompt endpoint is the front end is still waiting for that.

Uh, but then you send the confirm confirmation and then that triggers that, that sends a signal to the other workflow and the other workflow completes and then sends the response to the original. Um, so yeah, I think if we take the demo example I gave that it's just one workflow in the entire conversations with one workflow.

So initially when I started that, it was basically because the workflow didn't exist, the pro was like, I can't signal this work workflow, but it's, I, I, I, you know, I have to start up. So I have to start it. Yes. And then when I keep talking to it, it's like, well, this, the work, this workflow with this workflow ID exists, but then I, I just, I will just add to it.

So for example, even if the, if, for example, if we, if a user has a conversation ID from like two years ago, right. That, that workflow still exists, the workflow still exists. Um, you can let it running forever if you want to, if the user hasn't decided to end the chat and even like two years from now, when the new, when the convers that the user decides to continue our conversation two years ago, Hey, this workflow actually exists.

I just need to like send us a new signal for a new prompt and then continue the execution from there. Um, yeah. And then I think what you were mentioning about it, you don't have to confirm it every single time. It's just how the current example is being written.

You can, you can, for example, say, well, I have a prompt now from this prompt. I want to do query expansion for all these expansion. I want to do some queries, some search, blah, blah, and then provide the answer back or, or, you know, we can even have multiple agents.

You can have multiple workflows. You can start workflows as child workflows. So you can have multiple edges working together for once a single prompt where they pass in signals between each other and query each other to see the result to actually give a response. So it's kind of like the sky's the limit.

You can go as complex as you want with these kind of work workflows and how you can query and send single between the work workflows. And that's where I think, like, I generally think we should go to, uh, Temporal be like, Hey, do you guys want to do dev role?

Because I think Temporal is goaded for AI agentic workflows. Yeah. Um, yeah. Um, yeah. So that's basically it. I can go over the advanced stuff. I think we want to really easy, right? One question in terms of payloads between API and worker, like how much is the payload or, or between steps or in, in, in this case, between of activities, like I cannot transfer a document.

Like in hatchet, there is a limitation of four megabytes to transfer data in the payload. In the pub sub, there is, I think, 10 megabytes or one megabyte. I don't know which one. I'm not sure. I'll look and look into that, but I think that that's limitations from the message queue itself, like in highest is rabbit, right?

So like here, there's not, there's not a rabbit MQ in the, in the middle. It's just Temporal and here lies objects. I look at to see if there's a limit. I haven't seen one, but I haven't really tried to read the element. Um, yeah. Okay. Um, yeah. So I said, it's like more of an advanced part of the Temporal, but you can have the workflow be the state as we said before with queries, um, where you can, you know, the same as before, right?

Let's say we add another function to, uh, our agent, which is a annotated with a query. And it says basically, okay, this will, um, retreat eternal conversation history. It will just return the conversation history object, which is like a, you know, collection of messages. And then in the app, when we want to get that conversation history, we'll just say, well, I know, I know exactly which here will be the conversation ID, right?

Right? Sorry. Let me just, just right now. Uh, conversation ID, Spain, right? So we know which conversation ID we want, just get that work, or that workflow, right? Get the handle for it. And then just on that handle, just get the, get the conversation history from the query and just return it.

So we don't even need to send the conversation history to the database. You can just store it in the workflow itself and query it. And it's backed by the Temporal database, which would die by Postgres, which we're already doing. But anyways, I digress. You don't have to like imagine it to different places.

I would imagine it's complicated, especially if you think with like, how would you do a migration of what is stored within the history here? Um, I think that this, that's handled by the, that's handled by the Temporal, but I, I, this is more an advanced workflow. I'm using this conversation history as a, because it's easy for this current example.

Um, you don't have to use it. Right. But you can use this kind of like, you can't, you can't write workflows just thinking about the workflow itself without having to think about where I'm saving stuff. So when I want to see, for example, for, for an order for a, for a, you know, a shipment, if that's pending, that's the status of that, um, order is pending in Temporal and it's waiting for other updates, I can just query the workflow directly instead of like having to keep updating the database.

And if, if there's a race condition in the database, there's, uh, you know, what, what happens if the work work, uh, the update of the database state failed, you know, you don't have to like worry about those. You only have to worry about the workflow itself. Yeah. Um, just like very quick question.

Yeah. Like with the conversation example, um, would you be able to say, is that like an expiry, like an inactive, um, expiry on a workflow? So if it's been inactive for like a week, would you be able to, um, trigger, um, trigger like an event, which would then save that to a database if you wanted to.

So then basically wait till it would be inactive before you go and do that. Uh, yeah. I mean, this would be the application database, right? Not the temporal database. So basically with temporal, the database is the state of the workflow, right? Because these workflows can be paused for infinite time.

They're not holding up any compute compute. So that's going to be said automatically through the database conversation industry, right? If you do want to stop a workflow after a certain amount of time, you can, there is a way, um, it's not documented here and I haven't screwed it here, but, um, in the, uh, when you start a workflow, you can say, you can set a policy of how long you want to wait, uh, for this workflow to finish.

And then if it doesn't finish in the, during that time, temporal is going to like, uh, kill it, terminate it automatically. But is, is there a way of like, not just killing it, but, uh, running like some finishing logic and then killing it. I, I imagine there is, I haven't looked into it, but I'm, I imagine there is a way to do it, um, by hooking it to these, you know, events like termination events.

But you have to think, if you're using the temporal to store the state by the workflow state, you just have to think about it. Like the data, the state is unified, right? So there's no such thing as like saving to the history. When you're querying the Python object, that's actually the state in the database.

So can the worker be like serverless of one time, like I know in prefect, there is like task runner, which basically, once you have a task or a step or activity in this terminology, it will spin up a camaraderie this job. It will do the job. It will kill the folder.

And yeah, that's it. Um, so in terms of scaling, I'm thinking like, if you want to, to scale many workers to, to, to do your work. Right. Um, I mean, you can, you can, there is metrics. So basically the ball exposes a metrics server, uh, like this, it's a Prometheus metrics.

And you can like scale based on those metrics, like queue size and all these kinds of things. But then I, I, I'm not sure about like the, I haven't looked at the serverless stuff, but, um, yeah, you can just scale to zero and then using those metrics can bring up, if you're using KEDA, right?

If you, for example, it means these kind of metrics about the queue, each queue size and the queue goes to zero, you can scale your, your worker to zero. Um, I'm not sure. I don't think Temporal has a good to use serverless stuff like Cloud Run because No, not, not Cloud Run, in Kubernetes job.

I'm not seeing just a certain example, just, okay, I sent to process the document in spin-ups Kubernetes job with restriction of the resources. It will finish and release the resources basically to the cluster. I mean, yeah, you can, you can do that, but you do it via KEDA, right?

Yeah. Okay. You do it via KEDA and descending. I think for heavier, all like loads, I mean, yeah, if you do it via KEDA or if you do it via job, it's kind of same, same. I just think like a worker can run multiple, multiple requests at a time or multiple tasks at a time.

Um, so it doesn't have to be every single job is a task. Yeah. Okay. Well, I imagine it's, it's scalable given like, this came out of, you know, Uber, they had like, their scalability needs. Yeah. Um, none of this like would be a good way. I mean, I don't think that would provide any benefits to us using temple versus versus a hatchet.

I think hatchet has its own durable workflow because in them in set for Saturn, sorry. Um, because for Saturn, we have like such a, you know, very dag execution where we take this, we do this and return this. It helps more for like agentic workflows when you have these pauses, these like interaction from the user, we have like more interactive kind of like back and forth, which would be hard to, to code in a traditional way or via, you know, like a traditional, you know, declared the, declared a programming way.

It's more of an advantage of a little bit. You can have these kind of interruptions and events and signals. That's where it temporal really shines. I think. Yeah. Yeah. Yeah. That's, that's about it for today. Um, I could do a part two, if you are interested in, we'd go into like some more advanced stuff.

Yeah. That's really good demo. Yeah. Yeah. Thanks. Yeah. It's not my demo. It's the demo from temporal. I took it because it's no, no, but whole workflow and understand the way it fits this like technology. Yeah. Yeah. Yeah. I think, I think it's great. I think it's, uh, this, this came up before, um, a, authentic AI.

And when like authentic AI was booming, I was like temporal, I'm going all in on temporal. It's like, yeah. Luca. I actually didn't want to raise my hand. I don't know how I did it. Okay. Okay. I can't put it down. Okay. Now I did it. Okay. Sorry. You think to, um, like one of the thoughts I had, I think before when you took like explain temporal, but also now is if we took some, if we took like graph AI, it's like some, so the foundation of a AI framework and just like kits that out with temporal would that, yeah, I was actually trying to make a, uh, I was doing this on a, on a, on a random weekend when I was like thinking, like, can I make an extension to temporal to allow these kind of like graphs deformed?

I haven't found a good way yet, but I've, I've been looking at it. Yeah. Yeah. Cause that, that could be like a sort of prediction ready, more robust graph AI would be pretty. Yeah. Yeah. And that'd be great. I think, I think Tempo always really liked that too, because they're, they're really, really pushing just AI stuff.

Yeah. But yeah, I think it's a great technology. It's really nice to work with. I think it's basically shifts the burden more on people like me and someone else who are like running it, but, um, or, and also like, it's not very cheap to run, to use the cloud version.

Um, but it's not actually that hard to run. Like I was, I thought it'd be hard. It's not actually that hard to run. Yeah. Yeah. Yeah. Yeah. This, yeah. The self-hosted version is completely deeper parity. Yeah. They're basically betting on the fact that people don't really know how to host the database side of things.

I think that's, I think the smallest part here is like the most consequential is that if you have like globally distributed workflows, there are millions of users and billions of actions a day, then like the databases have been the biggest bottleneck as we're talking about, like how that there's like that object parity with a database for the state.

Um, yeah. So that's why they recommend using Sandra and all these more esoteric databases that not many people, you know how to use, but then they're like, well, we're just going to offer you a cloud version to use. Okay. Okay. Makes sense. Nice. Yeah. Uh, I'll, I'll need to go by the way.

So yeah, if it's finished, uh, yeah, yeah, it's finished. I already overrun by like 10 minutes and okay. You mentioned that this is spinoff of Uber. So, and so it's Uber company or no, no, no, no. It's basically the engineers left Uber and they just like, would you just start this?

They basically took their learnings from like what they were doing at Uber. And they're like, I'm not sure how Uber didn't come after them for that. Maybe they had a good relations and they're like, okay, we, we, we're chill. But yeah. Um, yeah. Thank you. I go. Yeah. No worries.

Thanks for coming. Thank you both. And that was really helpful. Yeah. I'll put the, I'll put the PDF on the, on the Slack cause. Yeah. I'll also share the recording after for anyone who wants it as long.

Stateful and Fault-Tolerant AI Agents

Chapters

Transcript