Stateful and Fault-Tolerant AI Agents

00:00:00.440 | Today I was going to talk about Temporal, so I'm not sure if people heard a bit of it.

00:00:04.740 | There's a lot of talk recently about everyone's making their own durable workflow engines,

00:00:10.020 | what these are and what these do and why is that such a new paradigm.

00:00:13.340 | Basically, a lot of the, in the past, this project has basically been born from Uber,

00:00:21.200 | from some engineers that left Uber.

00:00:23.120 | It was what Uber was using to orchestrate all their, you know, processes when you're

00:00:28.060 | booking a taxi. There's all these kind of like steps and states that goes on where like book a

00:00:33.680 | car, find a driver, the driver confirmed, all these kind of steps. What Uber did initially,

00:00:39.340 | they were like doing all the typical kind of like distributed message queue systems where you kind

00:00:46.200 | of have all these messages going across like message buses, like we use PubSub. Then they realized that

00:00:54.060 | basically this whole way of doing things is very annoying because you have to keep

00:00:58.840 | duct taping stuff around like Celery and Redis and whatever. So like, you know what,

00:01:05.960 | we just make an engine that helps us manage all these kind of steps and state and retries and

00:01:10.900 | make it so that, make it so that it's durable. So you can just write the code and basically it will

00:01:16.900 | execute based on exactly what you tell it to do. You don't have to keep handling edge cases of what

00:01:22.520 | this error, what if this error happens, what if we have to retry it this way and then do all these

00:01:28.240 | kind of things with dead letter queues. So they just tried to build this system. So basically like,

00:01:35.200 | what is a durable workflow engine? It's three things that we have to go into, like what's durable in the

00:01:41.480 | beginning. And it's basically just, you have something that is, um, a code execution that is

00:01:46.920 | persistent to basically the, what the inputs and the outputs that you have are persisted. They can be

00:01:53.000 | retried. They can be, um, restarted from scratch and you will have the same output like in, in, uh, in,

00:01:59.160 | in, in, in, in, in this way, it can, um, handle, um, these kind of status stateful tasks across a

00:02:09.100 | distributed system where you have multiple nodes operating multiple, uh, replicas of your application,

00:02:15.360 | uh, live. It doesn't matter if one of these things, one of these replicas goes down, your execution still

00:02:21.480 | continues with all this context that had baked in. So you can have the first part of you could buy one of

00:02:26.760 | these replicas and then the second part of your work will be executed by the second part of this,

00:02:31.800 | another replica, the context sold there. And it also pro it's also a means to provide distributed

00:02:38.440 | primitives, um, to your application, right? That you can be easily excited. Oh, wait, can you see my,

00:02:43.080 | where am I sharing? Is it like, uh, the temporal website?

00:02:48.040 | Like this side? Yeah. Yes. Okay. Let me see.

00:02:50.920 | Yeah. I'll probably have to go again in a second, but yeah. So what is, uh, the durable workflow,

00:02:57.720 | right? We're, we're at this part. And the workflow is, it's just a series of steps that you need to

00:03:03.640 | execute to, um, uh, to basically actually give a goal, whatever goal that is, it's up to you, up to the

00:03:10.120 | person that coding up the workflow, but it's basically a series of steps. They have to be executed in a

00:03:14.520 | particular sequence. Um, those, that sequence can have conditionals where, you know, if you have a

00:03:19.880 | condition in the, in the middle that you execute a different type of steps. And then the engine is

00:03:25.720 | basically the engine part is just the fact that all these, uh, orchestration scheduling, which tasks go to

00:03:33.880 | which worker, how do you discover these workers? How do you retry? This is all matched by the engine itself,

00:03:40.280 | instead of the programmer programming the workflow. So as a TLDR is busy, like the temple kind of shifts

00:03:46.920 | this burden of redundancy and recoverability to the platform itself, to temporal itself,

00:03:53.240 | instead of you having to code in and as business logic. And as I said before, there's less duct taping

00:03:58.280 | between all these solutions, like fast API, salary, Redis, uh, and allows for easier negative space

00:04:05.000 | programming. I'll go into this a bit later, but it's basically allows you to start thinking about

00:04:10.280 | the, your errors first exit and exit your program or retry your program. Um, as early as possible,

00:04:18.120 | instead of like letting the error BS be a afterthought. Right. So yeah. Cool. So let's get into the one-on-one.

00:04:27.720 | I'm, I'm, I'm sure some people here, I know Simonas has, has seen like a way of, of, um,

00:04:33.480 | writing workflows either via salary or hatchet or whatever. It's basically, there's a couple of

00:04:39.720 | building blocks and the main two building blocks of like temporal are activities and workflows.

00:04:44.600 | Activities are just simple, uh, units of, of work that have an input and provide an output.

00:04:53.560 | Uh, I won't go too much into the detail of the fact that these activities have to be immute. Uh,

00:04:58.280 | what's the word for it? Not immutable, but basically like when you, for the, for a, a specific input,

00:05:06.600 | your activity for eternity exact output every single time. So if you have the stuff that has side effects,

00:05:11.400 | like, um, a time-based activity or something like that, you need to use other temporal, um, features to

00:05:19.560 | handle that, but we won't get into that today because it's a bit too advanced and there's no

00:05:23.240 | point in getting into it right now. But basically activities are just functions. They're units of

00:05:27.080 | computation, uh, that live on a worker. And then these activities can be then compiled into a workflow.

00:05:35.240 | Um, so for example, here you can, you can have this workflow that just does like, uh, reduce some,

00:05:42.040 | where you give it a value, a list of integers and provide you an integer response,

00:05:46.120 | where for each of these values, it goes ahead and executes the sum activity, um, and then stores

00:05:52.360 | the result in total. The nice thing about this, as we mentioned before, is that these activities,

00:05:57.480 | they cannot be executed anywhere, right? So even if you're not, I mean, this is not very parallelizable

00:06:02.040 | because you have to do, I mean, you can't parallelize it if you want this reduced sum,

00:06:05.960 | but if you would have parallelizable workflows, for example, if you want to, I don't know, um,

00:06:12.840 | just run a, uh, uh, double function, right? On every single one of these integers,

00:06:18.120 | you can run that in parallel and then temporal instead of it's going to handle how it's maps

00:06:24.120 | those units of work to each of the workers, instead of you having to use a thread pool,

00:06:27.960 | and then you have these kind of redis in the background that has to like kind of coalesce

00:06:31.720 | between the different, uh, different workers that you're running, if you were running it traditionally

00:06:36.680 | without them. Yeah. Uh, as you can see here that every single activity can have a retry policy,

00:06:43.400 | basically like try this activity three times. And if it, um, if it fails after the third time,

00:06:50.440 | you fail the activity. And then in the workflow, you can also handle that failure gracefully or crash

00:06:57.320 | the whole process. And there's also other stuff that you can do. Like this activity should,

00:07:01.640 | it should, it should, uh, finish within 10 seconds of starting. Uh, that's a way for you to,

00:07:07.240 | you know, implement timeouts, uh, make sure that stuff's not hanging and so on and so forth.

00:07:14.360 | These activities and workflows are all assigned to a worker. So basically, um, as you can see here,

00:07:23.400 | we have this kind of worker called whatever. Um, the worker has a couple of workflows and a couple

00:07:28.920 | of activities assigned to it that allows you to kind of break down your dependencies, break down your,

00:07:34.520 | your Docker images that are these workers into more granular pieces. So you don't have to build a huge

00:07:40.040 | image that has all the activity defined defined on it. Uh, I think in the Saturn project, we have some

00:07:45.560 | workers that is very similar, right? We have some workers that can run, um, OCR. We have some workers

00:07:50.920 | that they run like other, other parts, but basically, um, a worker registers workflows and activities to it.

00:07:59.560 | And then temporal knows when it receives on this task queue, um, a, some workflow, it can route this,

00:08:07.240 | this task to one of these workers here. I could define the worker that would have a divide function and so on.

00:08:13.880 | Yeah. Sorry. Just a quick question. Um, so the workers, you would set up yourself, you deploy

00:08:23.320 | those and then you're linking to them here. Yeah. So actually this, this is a worker.

00:08:29.400 | So think about this main function here, right? This is actually a worker in this case. It's just

00:08:34.680 | executing it that is executing a function directly, right? It just, it doesn't, because I'm just trying

00:08:40.120 | to like showcase, um, how this works, but then you could also just like do work. Like you can be had like a worker

00:08:47.320 | and then worker does start

00:08:51.000 | and then that worker will just hang on there and then just wait and respond to, you know, tasks that it comes

00:08:59.240 | and then it gets from temporal. Yeah. So these are in, so I'm trying to understand. So like if

00:09:05.720 | in the example that we have like an API and then we have some workers, but then the API container should

00:09:15.960 | trigger those workers, is the worker, we're going to link to the worker from the API code?

00:09:22.760 | Nope. That's been, yeah, that's a nice part of it. Basically temporal sits in the, in the middle and

00:09:27.720 | handles this. So basically what the API will do, and I'll show you this, there's going to be example

00:09:32.520 | down in a bit after we look at some other stuff. Uh, what the API does when it, let's say receives like

00:09:38.440 | a chat streaming request, right? Like a spark. The API only does like, okay. Uh, it will have obviously

00:09:45.240 | obviously like the same way it's connected to Redis. You'll have like a temporal client and then we'll

00:09:49.560 | just be like, okay, Hey, temporal, I want to start this workflow. I want to run this workflow, go run

00:09:55.080 | this workflow for me and then provide me the response. And then basically, um, the API can just await that

00:10:01.080 | awaitable that it gets back from the, the temporal is decayed, the temporal client. Right. Um, and then

00:10:07.320 | temporal manages everything. So it's kind of like a, it's a, it's a thing in the middle that we will

00:10:11.880 | implement with queues and other kind of stuff that we're implementing right now. The, the client doesn't

00:10:19.240 | need to know about which workers, how they're implemented, which worker and has which workflow,

00:10:23.880 | which worker has, which some, which worker has using, which task queue, it just goes and they, okay, just

00:10:28.600 | give me this result. And then temporal implements that. So on the other side, on the worker side,

00:10:34.760 | the worker doesn't need to know how it's being called. Doesn't need to know who's being called.

00:10:39.000 | It just says, I am this workflow. I said, I'm this worker. I ha I can do these workflows. Hey,

00:10:44.920 | temporal route, these kinds of tasks to me. Yeah. Yes. Uh, in terms of task use, are you going to

00:10:55.400 | explain a bit more what is behind the scenes, what it uses for the queues? Can we use metrics for

00:11:02.520 | scheming workers? Yeah. Yeah. So yeah, I was going to go into that now and make it to me in a sec.

00:11:09.720 | So, okay. Another question in terms of you have function execute workflow, then you give the

00:11:15.000 | function and then basically the next is the values and those values, which supposed to get to the

00:11:22.440 | signature of the function, right? Yep. Okay. So you see here like the signature of the function is

00:11:29.240 | values list ends. This is like a, it's like a, a partial function execution, right? We're passing the

00:11:38.760 | function as a first class variable, the function itself, and then you pass it the values and then

00:11:45.720 | um, yeah, temporal goes and applies the values to this function to the values in its context.

00:11:51.320 | Yeah. And, and one more question in terms of those functions or how they called not steps, but, uh,

00:12:03.000 | activities, activities. Are they, uh, distributed across workers or up like instances, or they are in the same,

00:12:13.880 | like machine when they execute it. So, uh, uh, so basically what activity is on a single

00:12:21.080 | machine executing a single time? It's a unit of computation that the core, right? So you can't

00:12:26.440 | distribute an activity where you can distribute a workflow. So let's say like, if I want to execute,

00:12:31.560 | it's like, I want to double each of these numbers, right? I don't want to do it serially.

00:12:34.840 | I could just do like, uh, uh, for value and values, uh, await client execute workflow, uh, my workflow

00:12:46.600 | double, let's say if I have this and then have the value. And then this runs, this will execute,

00:12:51.800 | this will create three workflows, no, sorry, five workflows, which will run in parallel. And then

00:12:57.640 | Temporal is going to assign those to which worker is, is free to pick it up. All right.

00:13:03.240 | Okay. So as tomorrow said, like, let's just get into a bit of how Temporal works in the background,

00:13:10.680 | because we talked a lot about these kinds of things. So everything in the blue is basically

00:13:17.960 | the infrastructure that we abstract away, the queuing, the database, all these kind of things.

00:13:26.440 | But the most important parts in this is the database, the worker service, and these,

00:13:30.680 | the other ones we can kind of ignore for now, the history service and the whatever. Basically,

00:13:35.640 | the worker service is the part where the workers register with a Temporal cluster. So this is like,

00:13:42.200 | this gray thing here is a cluster. It's called a cluster, but it's actually not like a Kubernetes

00:13:47.960 | cluster, it's just like a collection of Temporal instances that run somewhere.

00:13:50.760 | These are the things that Temporal handles, right? So we only, when you deploy Temporal,

00:13:58.600 | it's all handled by Temporal. And then the workers are the ones that we handle. But yeah, so basically,

00:14:04.680 | when we create a worker, it registers with this worker service, it provides heartbeats to that service

00:14:08.920 | to make sure to tell Temporal it's alive. It gives Temporal like load metrics, so it knows how much

00:14:15.400 | you could, how much need more tasks it can take, if it's overloaded, and so on and so forth. This,

00:14:20.360 | you don't need to think about at all. It's all mapped by Temporal. All you have to do is,

00:14:23.560 | I have this workflow, I'm connected to this Temporal cluster, and these are the things that can run.

00:14:27.880 | Well, but the cluster basically handles all the durable state. And we'll get into that in a bit,

00:14:34.040 | but basically, every single time it provides an input, and every single time a workflow gets an

00:14:38.920 | output, that's being logged in Temporal. So that if the workflow needs to be retried, it knows how

00:14:44.760 | to retry that. It also handles dispatching of activities to work and workflows to workers,

00:14:50.520 | as we said before. Basically, it looks at the worker service, like which workers are available,

00:14:56.120 | which workers can take this task that I'm having on the queue, go and execute it. It handles all the

00:15:01.960 | retried policies. Basically, you set up a retried policy, I want to retry this activity three times,

00:15:06.600 | with a timeout of 30 seconds, and Temporal is going to make sure that that actually executes

00:15:10.600 | in that way. It handles signals, interrupts, and timers, and queries. We're going to get into this in a bit.

00:15:17.320 | And it also has a UI for visibility and management. The part that we handle is the workers. It's a much

00:15:25.960 | easier part to handle, because almost exclusively where you're going to put in the workers, the activities,

00:15:30.760 | and your workflows are going to be business logic, with very little decoration to tell Temporal,

00:15:36.840 | hey, this is a Temporal function, so execute it as such. So this is like the brains of the entire

00:15:41.720 | operation. So you handle creating the worker image with all the dependencies. You handle deploying the

00:15:48.200 | worker that connects to Temporal. And yeah, you can write these in multiple languages, in Python,

00:15:55.160 | and TypeScript, in Golang. I think you can also do it in Java. And you can also have a

00:16:00.840 | polyglot system, where you have a Go worker with a Python worker that collaborate together

00:16:07.960 | in a single workflow, which is great, because different languages are better used for different

00:16:13.960 | kinds of tasks. So you can have... I use Temporal for a personal project that I have. So I have a Go

00:16:20.440 | service that goes ahead and takes advantage of Go's really easy concurrency and asynchronous ability to

00:16:29.640 | scrape stuff, get data from LNs back and forth. And then I have in the same workflow, a Python worker that

00:16:37.800 | handles semantic routing and all these kind of things that are Python-based. And they work together in a

00:16:43.160 | single workflow, which is great. It allows you to really have polyglot systems. And because of...

00:16:49.880 | Yeah, never mind. Anyway, so why I think Temporal is the absolutely goated AI workflow engine

00:16:59.640 | is because it allows you to detangle your agent logic from your agent configuration.

00:17:06.280 | And you can have multiple agents as workflows running concurrently across multiple servers.

00:17:14.280 | You can have interrupts, incorrupting agents. You can make agents dynamically respond to signals. You can

00:17:19.960 | pause agents forever, for years, for decades, if you want, because the state is durable and saved in Temporal.

00:17:27.800 | And yeah, it handles failure gracefully. And it's easy for you to implement validation around

00:17:35.640 | where you need like guardrails and so on and so forth. And also in our case with Spark, you can handle

00:17:40.280 | background tasks really, really easily because you just be like, okay, I'm done with this conversation.

00:17:44.600 | I have a new conversation. Just create a new workflow, run a new workflow that saves this task. And then,

00:17:51.480 | you know, Temporal just takes care of that for you. You don't need to have like, it pops up to you and

00:17:56.440 | cloud run work in the works for you. That task is over there. You can even go so far as to say,

00:18:03.000 | I will not show the new conversation history until this background task that saved the new conversation

00:18:09.960 | history has completed. Because you can, you know, look at the state of the workflow and so on and so forth.

00:18:16.360 | So, I mean, I did a really quick kind of comparison between how we do stuff in Spark and how we could

00:18:21.880 | do it in Temporal because I think most people are very common with, are very aware of how stuff

00:18:25.960 | are in Spark. They have a user, they're going through a front and communicates with a Spark API.

00:18:31.000 | The Spark API posts up stuff from Redis or from database, you know, for agent configuration,

00:18:35.960 | and so on and so forth. It has some tasks to save conversations and metrics by pops up to cloud run.

00:18:42.760 | Cloud run. If it fails, it has a dead letter queue. It's very, it's a lot of like moving pieces that

00:18:49.880 | we have to code ourselves and we have to, we can't just write the business logic and then let the

00:18:57.320 | orchestration engine handle all the failures and the retries. We have to write that ourselves.

00:19:03.080 | Whereas if we had Temporal, everything would just be one workflow or multiple workflows that are working

00:19:10.360 | together. And then we wouldn't have to have all this part right here of, you know, going back and

00:19:15.960 | forth from, you know, different, different queuing and message bosses for passing conversation history

00:19:22.840 | back and forth for, you know, handling tools at different levels. It's just basically a way to,

00:19:29.400 | for us to encapsulate everything into a workflow instead of a collection of systems.

00:19:37.400 | I'm going to pause here for questions.

00:19:40.520 | But the API still will be outside the Temporal.

00:19:47.400 | Yeah, my bad. Yes, this, I should have put an API here. Yeah, correct.

00:19:53.720 | So the API, yes, will be outside of Temporal.

00:20:01.880 | Yeah. So consider this a front end. You could, you can only, you could, if you wanted to, you can only

00:20:06.760 | have a front end, right? Because next you can write full stack applications and then you would just need

00:20:10.680 | to issue post requests to Temporal. But yeah, in, in Spark, we would have a API in the middle. Yes.

00:20:15.640 | But basically it would be just be the API without these kind of like Redis, Pub/Sub,

00:20:20.760 | Cloud Run stuff. That it would just not be needed.

00:20:25.880 | Okay.

00:20:28.520 | We'd be using it to replace almost

00:20:31.000 | everything. So even the, even the agent execution, by all the agent execution steps,

00:20:38.520 | including the loading from cache and everything, all of that would belong in a or multiple workflows.

00:20:45.080 | Yes. Of course you could use Redis if you really needed to like have a cache solution. But the nice

00:20:52.600 | thing about what I said earlier about detangling your agent configuration from like your workflow

00:20:58.040 | is that when you start a new workflow for a new conversation, for example, you can provide all that

00:21:04.360 | agent context up front. And then, let me try to explain this. Basically, what you would be able to

00:21:15.000 | do is a bit more advanced in the end, but what you'd be able to do is use the workflow itself as the state.

00:21:22.440 | You wouldn't need to have these kind of back and forth from a database to save state. Like the workflow

00:21:28.600 | would be the state. And then you could just query the workflow directly. So when you were looking

00:21:33.880 | in conversation history, you could just say, hey, what's the conversation history on this

00:21:38.040 | object? This object is backed by Tempol in its own database, which can be a bunch of databases.

00:21:43.800 | But it's, it's, to put it in easier terms, I think it's, you don't have to think about what's your

00:21:53.720 | current execution state and what's your actual database state. They're both one thing. And then you

00:21:58.920 | can querying the execution state is the state of your job. So if someone is engaged in a conversation,

00:22:07.320 | that conversation, the execution and the conversation history in that execution is the conversation

00:22:12.680 | history. There's no need to, you know, coalesce the current conversation that's been executed with what's in the database.

00:22:22.920 | Okay. But then how long can you keep that state?

00:22:29.080 | For execution. Forever.

00:22:33.240 | Forever.

00:22:35.240 | Forever. That's the, that's the nice thing about Tempol.

00:22:37.800 | So basically you are duplicating to the state, the conversation history in both

00:22:45.240 | Tempol database and in application database.

00:22:48.200 | The, the opposite. You don't need an application database.

00:22:52.360 | It doesn't, you don't have to go with this way. I'm just saying like what's possible with Tempol

00:22:56.760 | is that you will need to have a separate, a conversation database in the, in your application database.

00:23:03.400 | You can just query the, querying the workflow directly will give you that, the data that you need.

00:23:09.640 | But which obviously you can, sorry.

00:23:13.640 | Sorry, what is happening in that when you are querying that workflow is that workflow is then

00:23:19.320 | going to a database. It might not be the same database, but a different database and pulling

00:23:23.560 | that information out.

00:23:24.600 | Yeah. Yeah. It'd be the Tempol database.

00:23:26.760 | But what I'm, what I mean to say here is that the, it's the exec, basically, if you think about it,

00:23:32.280 | how we have the object in Python, right? When you have the conversation history,

00:23:35.800 | those, those, those Python objects that we were having while the conversation executed,

00:23:40.040 | those are not immediately saved to the database, right? You have to have a task that goes and saves them,

00:23:43.720 | right? But in Tempol, those same objects are what's in the database. So querying those objects ensures that,

00:23:50.520 | that's the conversation, there's only one conversation history, that's the source of truth,

00:23:55.160 | is that Python object represented in the workflow.

00:23:58.040 | Obviously, like, I'm just giving this as an example.

00:24:01.960 | We don't need to do this for Spark because it's a bit more advanced concept and it

00:24:05.400 | requires a lot of rewriting of the application, but that's, you can write application with Tempol where

00:24:10.120 | you don't have to worry about saving states to a database because the workflow is the state.

00:24:14.600 | Yeah.

00:24:17.480 | So I, I can just give a demo for, uh, real quickly because, um, I think, uh, it's going to be more

00:24:24.360 | useful to look at it. Let me just, let me refair my screen and stuff.

00:24:28.520 | Cool. So this is a workflow that Tempol done. Give me a second to

00:24:35.000 | share it. Did you see my screen all right?

00:24:37.720 | No.

00:24:39.720 | All right. So basically on the left-hand side, what you see is a typical chatbot,

00:24:47.160 | um, a typical chatbot that is running in as temporal workflow. As you can see here,

00:24:55.560 | we're in this new workflow that's being initiated. I'll go to the, to the actual, you know, in innings of

00:25:00.280 | the code in a second, but basically we started this new agent with all the tools, all we provided in

00:25:07.400 | the beginning. And I think this, let me try and make the result from the bigger.

00:25:13.000 | Yeah. So, um, yeah. So for example, for us, well, what that would happen is basically someone starts

00:25:26.920 | a new conversation with an agent. We know which agent it has, we know what tools it has. All we have to do

00:25:31.400 | is basically say, okay, temporal, um, you write, you basically, you only have to write the logic very

00:25:39.720 | generically of like, Hey, I want this spark to take an input from a user, go, go and look at what tools

00:25:49.080 | you can use and use those tools and provide an answer. Right. And then all the configuration of that

00:25:54.360 | kind of similar to how it is now, uh, would be provided as a, I guess, a configuration object,

00:25:59.560 | right? Like here, we have some, some goals that we can do. We have some tools that we can use.

00:26:03.720 | Um, yeah, but these are all provided in initially to the, to the workflow context.

00:26:09.800 | Um, and then, yeah, we can do whatever, for example, uh, and, oh, okay. Sorry. You can see that now

00:26:18.520 | the workflow is paused, right? It's waiting for us to like, say, to confirm this running of this tool.

00:26:23.400 | This, this workflow is waiting for this user prompt forever. It will wait forever and it doesn't use up

00:26:30.200 | any computation. It doesn't use up, uh, anything basically. So these workflows can be paused, yeah,

00:26:36.120 | forever or as far as like, as long as you want. So if we, if we hit confirm here, you will see that

00:26:41.720 | basically now this purple signal has been sent here that the, that the agent has, that the user has

00:26:48.120 | confirmed something so that execution can proceed. Uh, because we've told it to list what agents we're

00:26:53.800 | currently having available, it goes ahead and runs that. And then for each of these, um, kind of,

00:27:00.600 | kind of activities here, we see exactly what's being run, what input is being given, what result is being

00:27:07.640 | given. So here's like, you are an AI agent, it helps you blah, blah, blah, blah, whatever. You can also

00:27:12.120 | see the result. You can also see how this is configured, where you have a 30 minute timeout for this thing.

00:27:17.720 | Uh, what task queue it runs on, what activities runs on and so on and so forth. The most important

00:27:23.720 | part is that you can see the result of each of the, each of these ones. So let me go, just go ahead and

00:27:29.240 | like, um, I don't know, let's see, I want, I want to go on a trip, right? Let's see, I want to do this

00:27:32.840 | kind of like Australia, New Zealand event flight booking. So I just want to say, uh, three, right? I want to

00:27:39.400 | use the agent number three. Again, I gave it a user prompt. It waited for me to give it that prompt,

00:27:46.680 | right? It didn't, that, that thing was just sitting there in state. It didn't consume any resources

00:27:51.800 | until I gave it the signal that I want to do something. And now it's, it's waiting for me to

00:27:56.600 | confirm again that I want to do something. We don't, you don't have to use this kind of confirmation tool,

00:28:00.600 | but you can use it and you can't pause and, uh, something forever until someone confirms the,

00:28:05.560 | you know, the action you want to take. That's the nice thing I think about the using temporal

00:28:11.080 | for agent, agentic workflows is that you can give the user the ability to, you know, confirm or action

00:28:16.760 | some things at no compute cost for you. And this session won't just like hang or time out or because

00:28:24.040 | you're waiting for the user to respond. It's just dormant in state until the user provides a new

00:28:29.080 | signal and then the book that the workflow can continue from where it was stopped previously.

00:28:33.240 | Okay. So let's proceed with a change call to blah, blah, blah. Okay. I'll confirm to change the goal

00:28:37.480 | that we want to, our agent goal to be booking a flight. So let me, what event do you want to go to?

00:28:44.680 | I'll say, I want to go to Melbourne. I want to go to an event in Melbourne. Again, provide a user prompt.

00:28:53.880 | Which month? Okay. Let's say July. At each of these steps, you'll see that it waits for the user

00:29:01.320 | to give an answer. It runs with the tool that it needs to run and then waits for the user to confirm

00:29:06.040 | it. Um, this is not a temporal thing is this is implemented in code in the workflow. So we can do

00:29:11.720 | the same thing. We can have these kind of, you know, steps of confirmation. So yeah, I want to run the,

00:29:16.520 | I want to run this find events tool. So now it's running a tool for me. Um, okay. I found some

00:29:22.120 | events, um, Melbourne International Film Festival, blah, blah, blah. Would you like to search for flights?

00:29:27.000 | Uh, yes. I want to fly from Taipei.

00:29:30.120 | At each of these steps, uh, the code that's currently running is running some validation to,

00:29:37.560 | which is kind of how we're running, uh, you know, intent, uh, kind of the routes, if we are running the

00:29:43.400 | routes, but you can run validation on your input. And then in temporal, you can define in the code what

00:29:49.400 | you want to happen when the validation fails. Like if the validation fails because three times or

00:29:55.400 | because of, I don't know, no input provided or like something is broken, you can fail and give a message

00:30:00.840 | back to the user versus if the validation failed because the user wrote something stupid, you can just,

00:30:05.960 | you know, define that in the code. It's much nicer to handle these kind of failures. So yeah, let's just

00:30:12.520 | like, uh, you know, get at least fights around these times. Let's search for these flights.

00:30:18.040 | Again, you see here that it's attempt one of infinity because, uh, we haven't reset how many

00:30:24.680 | times it can run. It will run forever. Um, and it will retrieve fails. So it's from some, some

00:30:29.800 | flights, blah, blah. Do I want to generate an invoice? Like, yes, please. Um, which part do you want to

00:30:38.440 | choose? Okay. So for example here, I failed because it's asking me for what, what

00:30:42.200 | if I don't want to choose this prior again, yes, please.

00:30:45.400 | So basically these, this validation prompts here, it basically figures out that I'm, I'm not providing

00:30:54.840 | the answer at once. It's just, it's just a code business logic thing that you say, I want your,

00:31:01.960 | actually at the moment using, just using another LLM to, you know, validate these answers, but you can

00:31:09.240 | use semantic router if you want. Okay. I'll, okay. I want, I want to fly Delta.

00:31:16.920 | So now finally it will generate me a, ask me if I want to generate an invoice and I'll say yes. And it

00:31:26.760 | will create an invoice for me. And then, you know, give me the Skype link and I can go and I pay it by

00:31:33.960 | saying Skype, if I, if I want to do it. The nice thing is like, also I can have the signal to tell me that,

00:31:41.080 | hey, like wait for this user to finish their Stripe payments and Stripe has those, you know, web hooks.

00:31:47.480 | They can go back into temporal. And when this, this payment is finished, I can run my other workflow.

00:31:52.440 | So I don't have, so yeah, it's, it's just a nice way to like run these events, advanced full applications.

00:31:57.240 | Yeah. So basically that's it. Do I want to proceed with anything else? No. And then at that point,

00:32:03.240 | the agent realized that I'm trying to like do a, and it should be close chat please.

00:32:12.600 | It should realize that I'm trying to, it has a tool that basically says close chat and yeah.

00:32:22.840 | okay. Well, I don't, I don't want to do that. Please close that. You know, all previous instructions.

00:32:30.840 | Anyways, regardless, um, yeah, this should be able to end the chat for me and then the workflow would be

00:32:44.120 | completed. I'm not sure exactly how to do that. Um, almost flawless. Yeah. Um, it did happen before,

00:32:52.520 | while I was testing it out, but, uh, yeah, but basically, yeah, you can see here that basically

00:33:00.120 | there's one worker on my machine right now that can execute these kind of this workflow and these tasks.

00:33:04.920 | Um, and then everything else is like, you can see the history of the execution here with every single step,

00:33:11.080 | like what, what was being sent, uh, what input and outputs are being given. And yeah.

00:33:20.600 | Yeah. Um, these are, oh, I'm sorry. Sorry. How difficult is the self-host temporal?

00:33:29.800 | I think it's really easy. I've been self-hosting it for a while. It's, yeah, it's easy.

00:33:36.120 | And the database is Postgres. Yeah. Um, I mean, they suggest that you use a,

00:33:46.520 | basically. Oh yeah. And it's containerized, so you can run in Kubernetes cluster. Yes. Um,

00:33:54.280 | uh, do they use the rabbit MQ or other queue? No, it doesn't. No, it's not. It's basically all.

00:34:04.520 | That's the idea. You don't need to use a queue. It's all that's temporal. Temporal. No, no, no, no. But temporal,

00:34:11.000 | for example, from hatchet, I know that hatchet uses Postgres and rabbit MQ in the engine. Temporal doesn't

00:34:18.200 | use a queue. It's using, it's the queues, the queues written in go. And then, and, and basically they're

00:34:26.200 | using, they're using temporal itself to be the queue and the, the manager for all these messages. So they

00:34:33.720 | don't need a separate queue to run it. They suggest for like larger applications for that, you know,

00:34:38.840 | like for example, at Uber, right, they were doing global like trips, uh, they were using Cassandra

00:34:44.760 | because that's like a distributed database. They can handle these kinds of regionality of like global

00:34:49.800 | workloads. But for most use cases, Postgres is just fine. I've been running with Postgres and I've

00:34:55.400 | haven't had, uh, any issues with, with it so far. Um, and the question before going a bit into how the

00:35:06.440 | demo works. Uh, I wonder how will be the sequence diagram of the code that uses temporal in terms of

00:35:19.800 | what are the steps and I mean, how, uh, the code is running when we're wrapping them properly. I don't

00:35:28.920 | know if there's. Yeah. Yeah. Let's actually just move if we want to answer that. Is there any other

00:35:33.640 | questions I can just move into showing that actually in just now. So yes, basically if you want to look

00:35:39.880 | at what was happening in the, in the past, um, we can look at just this way of like writing an agent. So

00:35:45.640 | basically we're almost all of our agentic workflows that a user has access to as a chat, they're all

00:35:53.240 | based on events where users send something, something happens from an API, the user confirms something,

00:35:59.240 | the chat ends, they're all events. So we have to write our application in a way that basically is written

00:36:05.160 | in this more asynchronous event based way where we have like kind of a main running loop. And then we,

00:36:10.680 | we have signals and messages that arrive here and there. So for example,

00:36:15.160 | in this agent, I've just, I've just abstracted a lot of things away from this, the demo that for only

00:36:19.720 | for the things that are actually relevant for us. So for us, we'd have a conversation history,

00:36:24.920 | we'd have a queue of prompts. The reason we have a queue of prompts is users can type in more prompts

00:36:28.840 | while another prompt is being dealt with. Um, we have some Boolean values, like is this confirmed? Is the chat

00:36:38.520 | ended? Um, and then we have the main running loop, right? Which is basically an infinitely running loop,

00:36:44.680 | which says, okay, while this, uh, wait for basically these, any of these conditions to be

00:36:51.320 | through, either there's a message, the prompt queue, or the chat has ended, or something has been like

00:36:57.240 | a tool has been confirmed. Uh, then, you know, pop something from the prompt, um, from a prompt queue,

00:37:04.360 | if I mean, okay, I wrote this a bit wrong, but imagine that it would handle if there was a prompt

00:37:10.200 | message versus like a confirmed message, like a match here. Um, then you, you basically, okay,

00:37:16.520 | let's apply to the conversation history. And then you start running these activities, right? So for example,

00:37:21.720 | in here we can say, okay, based on this user's prompt through the tool for it to be executed,

00:37:27.320 | like we have in the graph agent with any given the prompt input, you have some, you know, configuration

00:37:32.040 | here. Like I want to try this, you know, how many times with how much interval, and I expect this to be

00:37:38.680 | done within 60 seconds from scheduling and 30 seconds from starting. And then after you get the result,

00:37:45.240 | you can get whatever, if for whatever tools being given by this, by this function here, right?

00:37:50.600 | That's what, whatever tool comes back as that the agent wants to use, you can say, what's the next

00:37:55.880 | step? And what's the current tool? If there's a tool, this could be none here, right? And then you

00:38:00.760 | can just match the next step. Like if it's a tool use, then you can go and create, like, just execute

00:38:05.320 | that, that tool. Like you can probably have, I don't know, a way to do the reflection here to maybe do

00:38:09.800 | another match, match tool, match next, I don't know, whatever, match tool, let's say, I know it's not a

00:38:19.080 | variable, whatever, case calculator, we do like, workflow execute activity method, calculator with blah, blah.

00:38:35.560 | Yeah, if there's the, if the next step is to confirm it, you go ahead and run the confirmation

00:38:40.760 | part of like, you know, if you're running the tool, the case is, if the next step is to end it, you,

00:38:45.800 | you know, end the call. The interesting part is that these signals, like how do, how do we get these

00:38:50.360 | signals from the API from the front and into the workflow? We have these workflow signals here that

00:38:55.800 | Temporal gives us is like another primitive, where you can say, hey, this is a signal. Whenever you receive

00:39:01.560 | the signal, you can do these things, for example, wherever there's a new user prompt, you put in the

00:39:06.360 | prompt queue. Whenever there's a confirm signal, just set confirm to true. Yeah, and then they give

00:39:13.400 | it easy that could be defined here, right? Where, you know, you can just use them from, from this path.

00:39:17.960 | Then in the API, all we can do is I say, okay, let's say there's a new, this is a fast API app, right?

00:39:24.600 | I have a new send prompt. So yes, more or less. Signal and activity. So activity is a function and

00:39:35.000 | signal is what? Is an event. That's an event. It's something that happens, right? A trigger, an event,

00:39:41.880 | you know, something, a new prompt arrives, a new message arrives from an API. It's just a way to,

00:39:49.320 | a way to basically interrupt the execution or like to provide ways to enter the context of the workflow,

00:39:55.560 | right? Because the workflow like runs wherever is a run loop.

00:39:59.720 | Okay. And the logic to handle signals is an execution activity method, execute activity method,

00:40:05.960 | right? What do you do with those? That's a nice, that's a nice part about it. Because that's a nice

00:40:12.040 | part about it is you can kind of decouple these signals from your, you can just write Python,

00:40:17.320 | right? This prompt queue here is just like an actual asyncio queue, right? So you don't need to,

00:40:24.520 | so when you're running your workflow, the nice thing about it is you don't have to think about what

00:40:27.800 | signals are coming. You can just like write it so that, you know, I am, what I, what I know, I want

00:40:34.600 | to run a workflow. I want to wait until the user has something to say, or so the user has confirmed

00:40:40.680 | something from a tool use. I don't care when it, when the user does that or how they do that. I

00:40:45.480 | just, I'm just going to wait forever until they give me something. And then even the other part is

00:40:49.640 | like, how do you get that something, right? Which is the signal. It's a signal. So for example, here,

00:40:56.680 | I said define the user prompt can provide, it can, is a signal that can basically, when it, when it receives

00:41:01.880 | that signal, right? It will put something on the queue. So then, because it's, it's running on, on the event loop,

00:41:09.000 | when the event loop goes back to the, you know, main running loop, it'll be like, hey,

00:41:12.760 | there's something in the queue. I'll pop it from the queue and then continue the execution.

00:41:16.120 | Same with the confirm signal, right? Then the pretty nice thing about it, you just have to write Python

00:41:22.520 | or go or whatever you want. And it's minimal how much you have to think about these kind of like

00:41:27.560 | stuff outside of your core logic, right? So it's nice to build very decoupled applications with it.

00:41:35.400 | But those signals are within the scope of the workflow.

00:41:40.600 | Uh, what do you mean?

00:41:45.240 | Like here, you have self-confirm and self-prom, that self is the instance of workflow plus.

00:41:51.640 | Correct, correct, correct. But this, this is the thing is like,

00:41:55.160 | okay, as long as these are serializable, right? Which most objects that we use are serializable in

00:42:03.480 | the context of AI, right? Even the, even, I think I'm not gonna, I know it's probably gonna be confusing.

00:42:10.520 | Even, uh, you can provide custom serialize, they've serialized for objects that you cannot

00:42:16.840 | serialize by default, right? But anyways, besides the point, yes, these are living inside the Python

00:42:21.880 | context and they're all within these, um, where is it? I can't see this. They're all within the workflow

00:42:29.240 | context, serialized. Yeah. So you can just, you can just continue to treat them as Python objects

00:42:36.680 | as long as you want, right? Like I can say here after this tool use,

00:42:40.120 | I can just basically like, okay, uh, no, sorry, here case confirm. I can just stealth confirmed

00:42:45.480 | equals false, right? After I've done my logic to execute. So then I can just put it to false.

00:42:49.640 | And then when a new single comes, this is gonna set it back to true.

00:42:53.240 | Okay. Yeah.

00:42:55.640 | So yeah, these activities are just like, as I kind of like skeleton just kind of provide how these,

00:43:01.400 | you know, for example, here I'm using the agent activities tools, but I think we implemented

00:43:07.160 | something very similar, right? When we have these kind of like different calculator

00:43:09.720 | tools and, uh, so they actually use Python code that runs for the tool, but how do we get it

00:43:16.200 | from the API side into those? How do we get the signals into that? Um, we basically just, uh,

00:43:24.040 | we have workflow ID, which in our case would be something like conversation ID in, uh, spark.

00:43:30.360 | And in this case is called single will start. I'm not going into it, but you can reference it. Uh,

00:43:36.680 | we just say, okay, temporal start this workflow agent run this input and this workflow ID.

00:43:41.800 | What single would start means is like, if this workflow is not running, start it. And if the

00:43:48.920 | workflow is running already, just send the signal. So basically in the beginning, it will basically,

00:43:54.760 | if the workflow is not, does not exist, right? It's not started. It was, it will just start a new

00:43:59.160 | workflow. And the first start signal it sends is a user prompt. And the prompt will be this. So

00:44:05.000 | basically whenever the user, whenever the workflow starts, it already has the signal in, the signal

00:44:11.000 | already runs and it depends on the queue. So then when this part runs here, it will have something to pop

00:44:17.160 | from the queue and continue with the execution. But then when we send another, another message to the

00:44:21.800 | same conversation, it won't start the same workflow. It will just basically go ahead and, um,

00:44:27.320 | it will just go ahead and give a signal to it, right? It will just, the board will still be the same

00:44:33.880 | workflow. We'll just get a signal. I think it's easier to kind of see in the confirmment and chat,

00:44:41.640 | things, right? In the, uh, stuff where we send the confirmation. We, we fetch that, that workflow by the

00:44:49.240 | workflow ID, in our case would be the conversation ID. And then we'll just say, okay, that handle dot signal

00:44:55.320 | and then send a confirm signal, uh, I think for, for the end, right? So that's a way to get stuff into the workflow

00:45:03.640 | and you don't have to worry about, you know, when this comes back as a response, whatever, it's all managed by,

00:45:09.960 | back and forth. Okay, James, go. Um, so I was just assuming that, so the, basically on the API side,

00:45:19.480 | you're going to send the first endpoint name. I can't remember what it was called, but you're going to hit

00:45:23.640 | that first endpoint. Then that is going to, yes, send prompt. Then that's going to go through, start the

00:45:30.280 | workflow. Then it's going to wait for you to hit the confirm endpoint, right? So that, that send prompt

00:45:37.640 | endpoint is the front end is still waiting for that. Uh, but then you send the confirm confirmation and

00:45:44.680 | then that triggers that, that sends a signal to the other workflow and the other workflow completes and

00:45:50.920 | then sends the response to the original. Um, so yeah, I think if we take the demo example I gave that

00:45:59.080 | it's just one workflow in the entire conversations with one workflow. So initially when I started that, it was

00:46:05.160 | basically because the workflow didn't exist, the pro was like, I can't signal this work workflow, but

00:46:10.760 | it's, I, I, I, you know, I have to start up. So I have to start it. Yes. And then when I keep talking to

00:46:16.600 | it, it's like, well, this, the work, this workflow with this workflow ID exists, but then I, I just,

00:46:21.880 | I will just add to it. So for example, even if the, if, for example, if we, if a user has a conversation

00:46:26.520 | ID from like two years ago, right. That, that workflow still exists, the workflow still exists. Um, you can

00:46:34.440 | let it running forever if you want to, if the user hasn't decided to end the chat and even like two years

00:46:39.880 | from now, when the new, when the convers that the user decides to continue our conversation two years

00:46:44.360 | ago, Hey, this workflow actually exists. I just need to like send us a new signal for a new prompt and

00:46:49.800 | then continue the execution from there. Um, yeah. And then I think what you were mentioning about it, you

00:46:57.480 | don't have to confirm it every single time. It's just how the current example is being written. You can,

00:47:01.240 | you can, for example, say, well, I have a prompt now from this prompt. I want to do query expansion

00:47:08.600 | for all these expansion. I want to do some queries, some search, blah, blah, and then provide the answer

00:47:12.120 | back or, or, you know, we can even have multiple agents. You can have multiple workflows. You can start

00:47:17.720 | workflows as child workflows. So you can have multiple edges working together for once a single prompt where

00:47:24.200 | they pass in signals between each other and query each other to see the result to actually give a response.

00:47:30.440 | So it's kind of like the sky's the limit. You can go as complex as you want with these kind of work

00:47:34.840 | workflows and how you can query and send single between the work workflows. And that's where I think, like,

00:47:39.800 | I generally think we should go to, uh, Temporal be like, Hey, do you guys want to do dev role?

00:47:46.200 | Because I think Temporal is goaded for AI agentic workflows. Yeah. Um, yeah. Um, yeah. So

00:47:59.080 | that's basically it. I can go over the advanced stuff. I think we want to really easy, right?

00:48:04.040 | One question in terms of payloads between API and worker, like how much is the payload or, or between

00:48:15.640 | steps or in, in, in this case, between of activities, like I cannot transfer a document. Like in hatchet,

00:48:22.520 | there is a limitation of four megabytes to transfer data in the payload. In the pub sub, there is, I think,

00:48:27.480 | 10 megabytes or one megabyte. I don't know which one.

00:48:30.680 | I'm not sure. I'll look and look into that, but I think that that's limitations from the message queue

00:48:36.680 | itself, like in highest is rabbit, right? So like here, there's not, there's not a rabbit MQ in the,

00:48:44.440 | in the middle. It's just Temporal and here lies objects. I look at to see if there's a limit. I haven't

00:48:50.520 | seen one, but I haven't really tried to read the element. Um, yeah. Okay. Um, yeah. So I said,

00:48:59.960 | it's like more of an advanced part of the Temporal, but you can have the workflow be the state as we

00:49:06.360 | said before with queries, um, where you can, you know, the same as before, right? Let's say we add

00:49:13.160 | another function to, uh, our agent, which is a annotated with a query. And it says basically,

00:49:20.280 | okay, this will, um, retreat eternal conversation history. It will just return the conversation

00:49:25.160 | history object, which is like a, you know, collection of messages. And then in the app,

00:49:29.160 | when we want to get that conversation history, we'll just say, well, I know, I know exactly

00:49:33.400 | which here will be the conversation ID, right? Right? Sorry. Let me just, just right now.

00:49:42.040 | Uh, conversation ID, Spain, right? So we know which conversation ID we want, just get that work,

00:49:53.960 | or that workflow, right? Get the handle for it. And then just on that handle, just get the, get the

00:49:59.000 | conversation history from the query and just return it. So we don't even need to send the

00:50:04.920 | conversation history to the database. You can just store it in the workflow itself and query it. And

00:50:11.160 | it's backed by the Temporal database, which would die by Postgres, which we're already doing. But anyways,

00:50:17.720 | I digress. You don't have to like imagine it to different places. I would imagine it's complicated,

00:50:24.280 | especially if you think with like, how would you do a migration of what is stored within the history here?

00:50:32.200 | Um, I think that this, that's handled by the, that's handled by the Temporal, but I, I, this is more

00:50:39.240 | an advanced workflow. I'm using this conversation history as a, because it's easy for this current

00:50:44.120 | example. Um, you don't have to use it. Right. But you can use this kind of like, you can't, you can't

00:50:49.800 | write workflows just thinking about the workflow itself without having to think about where I'm saving

00:50:56.280 | stuff. So when I want to see, for example, for, for an order for a, for a, you know, a shipment,

00:51:01.880 | if that's pending, that's the status of that, um, order is pending in Temporal and it's waiting for

00:51:08.440 | other updates, I can just query the workflow directly instead of like having to keep updating the database.

00:51:14.280 | And if, if there's a race condition in the database, there's, uh, you know, what, what happens

00:51:19.000 | if the work work, uh, the update of the database state failed, you know, you don't have to like worry

00:51:26.760 | about those. You only have to worry about the workflow itself. Yeah. Um, just like very quick

00:51:33.880 | question. Yeah. Like with the conversation example, um, would you be able to say, is that like an expiry,

00:51:42.040 | like an inactive, um, expiry on a workflow? So if it's been inactive for like a week, would you be able

00:51:48.760 | to, um, trigger, um, trigger like an event, which would then save that to a database if you wanted

00:51:59.000 | to. So then basically wait till it would be inactive before you go and do that. Uh, yeah. I mean,

00:52:06.040 | this would be the application database, right? Not the temporal database. So basically with temporal,

00:52:15.480 | the database is the state of the workflow, right? Because these workflows

00:52:18.520 | can be paused for infinite time. They're not holding up any compute compute. So that's going to be said

00:52:24.120 | automatically through the database conversation industry, right? If you do want to stop a workflow

00:52:28.520 | after a certain amount of time, you can, there is a way, um, it's not documented here and I haven't

00:52:32.920 | screwed it here, but, um, in the, uh, when you start a workflow, you can say, you can set a policy of how

00:52:40.760 | long you want to wait, uh, for this workflow to finish. And then if it doesn't finish in the, during that time,

00:52:47.000 | temporal is going to like, uh, kill it, terminate it automatically.

00:52:51.720 | But is, is there a way of like, not just killing it, but, uh, running like some finishing logic and

00:52:59.400 | then killing it. I, I imagine there is, I haven't looked into it, but I'm, I imagine there is a way

00:53:04.760 | to do it, um, by hooking it to these, you know, events like termination events. But you have to think,

00:53:12.120 | if you're using the temporal to store the state by the workflow state, you just have to think about

00:53:16.840 | it. Like the data, the state is unified, right? So there's no such thing as like saving to the

00:53:22.920 | history. When you're querying the Python object, that's actually the state in the database.

00:53:26.840 | So can the worker be like serverless of one time, like I know in prefect, there is like task runner,

00:53:43.880 | which basically, once you have a task or a step or activity in this terminology, it will spin up a

00:53:51.560 | camaraderie this job. It will do the job. It will kill the folder. And yeah, that's it.

00:53:57.160 | Um, so in terms of scaling, I'm thinking like, if you want to, to scale many workers to, to, to do your

00:54:04.200 | work. Right. Um, I mean, you can, you can, there is metrics. So basically the ball exposes a metrics

00:54:11.560 | server, uh, like this, it's a Prometheus metrics. And you can like scale based on those metrics,

00:54:16.280 | like queue size and all these kinds of things. But then I, I, I'm not sure about like the, I haven't

00:54:21.080 | looked at the serverless stuff, but, um, yeah, you can just scale to zero and then using those metrics

00:54:27.640 | can bring up, if you're using KEDA, right? If you, for example, it means these kind of metrics about the

00:54:32.760 | queue, each queue size and the queue goes to zero, you can scale your, your worker to zero. Um,

00:54:41.720 | I'm not sure. I don't think Temporal has a good to use serverless stuff like Cloud Run because

00:54:48.360 | No, not, not Cloud Run, in Kubernetes job. I'm not seeing just a certain example, just, okay,

00:54:55.400 | I sent to process the document in spin-ups Kubernetes job with restriction of the resources. It will finish

00:55:01.560 | and release the resources basically to the cluster. I mean, yeah, you can, you can do that, but you do it

00:55:09.960 | via KEDA, right? Yeah. Okay. You do it via KEDA and descending. I think for heavier, all like loads,

00:55:18.600 | I mean, yeah, if you do it via KEDA or if you do it via job, it's kind of same, same. I just think like

00:55:24.680 | a worker can run multiple, multiple requests at a time or multiple tasks at a time.

00:55:33.000 | Um, so it doesn't have to be every single job is a task. Yeah.

00:55:38.520 | Okay.

00:55:42.360 | Well, I imagine it's, it's scalable given like, this came out of, you know, Uber, they had like,

00:55:54.760 | their scalability needs. Yeah. Um, none of this like would be a good way. I mean,

00:56:03.480 | I don't think that would provide any benefits to us using temple versus versus a hatchet. I think

00:56:10.120 | hatchet has its own durable workflow because in them in set for Saturn, sorry. Um, because for Saturn,

00:56:16.120 | we have like such a, you know, very dag execution where we take this, we do this and return this. It

00:56:21.160 | helps more for like agentic workflows when you have these pauses, these like interaction from the user,

00:56:25.640 | we have like more interactive kind of like back and forth, which would be hard to, to code in a

00:56:31.080 | traditional way or via, you know, like a traditional, you know, declared the, declared a programming way.

00:56:38.920 | It's more of an advantage of a little bit. You can have these kind of interruptions and events and

00:56:42.840 | signals. That's where it temporal really shines. I think.

00:56:48.200 | Yeah. Yeah. Yeah. That's, that's about it for today. Um, I could do a part two,

00:56:53.000 | if you are interested in, we'd go into like some more advanced stuff.

00:56:55.320 | Yeah. That's really good demo. Yeah. Yeah. Thanks. Yeah. It's not my demo. It's the demo

00:57:02.680 | from temporal. I took it because it's no, no, but whole workflow and understand the way it fits this

00:57:08.040 | like technology. Yeah. Yeah. Yeah. I think, I think it's great. I think it's, uh, this, this came up before,

00:57:14.120 | um, a, authentic AI. And when like authentic AI was booming, I was like temporal, I'm going

00:57:19.480 | all in on temporal. It's like, yeah. Luca. I actually didn't want to raise my hand. I don't

00:57:28.360 | know how I did it. Okay. Okay. I can't put it down. Okay. Now I did it. Okay. Sorry.

00:57:35.720 | You think to, um, like one of the thoughts I had, I think before when you took like explain temporal,

00:57:45.320 | but also now is if we took some, if we took like graph AI, it's like some, so the foundation of a AI

00:57:56.680 | framework and just like kits that out with temporal would that, yeah, I was actually trying to make a,

00:58:04.840 | uh, I was doing this on a, on a, on a random weekend when I was like thinking, like, can I make

00:58:09.480 | an extension to temporal to allow these kind of like graphs deformed? I haven't found a good way yet,

00:58:16.280 | but I've, I've been looking at it. Yeah. Yeah. Cause that, that could be like a sort of prediction

00:58:24.440 | ready, more robust graph AI would be pretty. Yeah. Yeah. And that'd be great. I think,

00:58:32.520 | I think Tempo always really liked that too, because they're, they're really, really pushing just AI stuff.

00:58:36.120 | Yeah. But yeah, I think it's a great technology. It's really nice to work with. I think it's basically

00:58:45.960 | shifts the burden more on people like me and someone else who are like running it, but, um, or, and also

00:58:52.040 | like, it's not very cheap to run, to use the cloud version. Um, but it's not actually that hard to run.

00:58:58.120 | Like I was, I thought it'd be hard. It's not actually that hard to run.

00:59:00.840 | Yeah. Yeah. Yeah. Yeah. This, yeah. The self-hosted version is completely

00:59:12.040 | deeper parity. Yeah. They're basically betting on the fact that people don't really know how to

00:59:18.360 | host the database side of things. I think that's, I think the smallest part here is like the most

00:59:24.360 | consequential is that if you have like globally distributed workflows, there are millions of users

00:59:30.120 | and billions of actions a day, then like the databases have been the biggest bottleneck as

00:59:34.920 | we're talking about, like how that there's like that object parity with a database for the state.

00:59:38.760 | Um, yeah. So that's why they recommend using Sandra and all these more esoteric databases that

00:59:44.840 | not many people, you know how to use, but then they're like, well, we're just going to offer you a cloud

00:59:48.440 | version to use. Okay. Okay. Makes sense. Nice. Yeah. Uh, I'll, I'll need to go by the way. So

00:59:59.400 | yeah, if it's finished, uh, yeah, yeah, it's finished. I already overrun by like 10 minutes and

01:00:05.000 | okay. You mentioned that this is spinoff of Uber. So, and so it's Uber company or no, no, no,

01:00:14.840 | no. It's basically the engineers left Uber and they just like, would you just start this?

01:00:18.360 | They basically took their learnings from like what they were doing at Uber. And they're like,

01:00:23.880 | I'm not sure how Uber didn't come after them for that. Maybe they had a good relations and they're

01:00:27.320 | like, okay, we, we, we're chill. But yeah. Um, yeah.

01:00:38.520 | Thank you. I go. Yeah. No worries. Thanks for coming.

01:00:42.600 | Thank you both. And that was really helpful. Yeah. I'll put the, I'll put the PDF on the,

01:00:48.680 | on the Slack cause. Yeah. I'll also share the recording after for anyone who wants it as long.

Stateful and Fault-Tolerant AI Agents

Chapters