How Monday.com Built Their Digital Workforce with LangGraph

00:00:00.000 | It's great to meet you, I'm Asaf, and today I'm going to talk to you about how we're building our digital workforce at monday.com.

00:00:17.360 | So very quickly about myself, I'm the head of AI at Monday, I'm a scout for Sequoia, I currently live in Israel, so I'm super jet-lagged right now.

00:00:27.360 | And I've also been building AI products for the past decade, including some you may know, like GPT Researcher and Tavili.

00:00:32.820 | So monday.com, maybe some of you know a bit about them, but basically we're a public company, we basically build a workOS where you can manage and do all your work in one single platform, rather your CRM, dev, service, work management.

00:00:51.620 | We've actually crossed $1 billion ARR just this year, and there's one important fact here that I think is worth noting, and it's that we're actually processing around 1 billion tasks per year.

00:01:04.080 | And I want to just think about this for a second, because when you think about 1 billion tasks, work tasks per year, just think about the opportunity for agents and AI that can actually take on those tasks.

00:01:19.840 | And this is a huge opportunity that we see at Monday when we think about AI.

00:01:23.380 | And we've actually launched our first AI feature around September last year, and we've seen insane hypergrowth.

00:01:35.520 | We've been growing 100% month over month with our AI usage, and just recently we launched our digital workforce.

00:01:41.940 | So when you think about what is our digital workforce, think about agents working around the clock.

00:01:50.700 | Whether you're an SMB looking to scale up, or an enterprise, imagine agents working within the Monday ecosystem on any given task you can think of.

00:02:04.520 | And what I'm going to show you today is very powerful lessons learned that we had in our experience building agents.

00:02:13.300 | And it was said earlier here today by the Harvey team, and I think others, that to build very successful agents, you have to focus on product and user experience.

00:02:28.760 | And we have a saying it Monday, that the biggest barrier to adoption is trust.

00:02:34.320 | It's actually not technology.

00:02:36.180 | And I want to show you a few examples of things that we've learned.

00:02:41.880 | So when we think about autonomy, I think we're all engineers, and we love to think about autonomous agents, and agents doing everything around the clock.

00:02:53.240 | But actually, the opposite is true.

00:02:54.720 | When we see how our users are using agents and what they think, imagine that every company, every user, has a different risk appetite.

00:03:03.660 | And when you build AI agents, you should give users that control.

00:03:08.000 | And what we've learned by applying this is that we've actually increased adoption in a sane way, by giving the control in the user's hand to decide how they want to control their agents.

00:03:22.140 | Secondly, is entry points.

00:03:23.540 | Now, if you're building a startup from scratch, that's something else.

00:03:26.600 | But as a huge company like Monday, one thing that we've learned is don't rebuild a new user experience.

00:03:35.020 | Try to think how you can create these experiences within your existing products.

00:03:41.100 | So when you think about how agents can work at Monday, we already have people working at Monday, on Monday.

00:03:47.420 | We just assign people.

00:03:48.880 | So we can do the same with agents.

00:03:51.340 | Just think about how you can just assign digital workers or agents to actual tasks.

00:03:56.020 | And by doing that, our users have no new habits that they have to learn.

00:04:01.040 | It's seamless within their experience.

00:04:02.800 | Another super important thing that we've learned.

00:04:07.920 | So originally, when we released those agents, imagine that you can ask in the chat and say things like, create this board, create this project, modify this item.

00:04:19.940 | For our users, Monday boards are production data.

00:04:26.100 | I think a very good example I'd like to give is think about Cursor AI, which is an amazing product.

00:04:33.280 | We all vibe code, as Andrew and G said earlier.

00:04:36.200 | But imagine that with Cursor AI, instead of you as developers seeing the code, imagine it was pushed straight to production.

00:04:43.920 | I assume that none of you, or maybe most of you, would have not used it, right?

00:04:48.880 | And that is just how important user experience is versus technology.

00:04:54.700 | Because technologically-wise, we could do that.

00:04:58.060 | Cursor could have done that.

00:04:59.020 | And what we did is that we saw users onboarding, testing them out.

00:05:06.160 | And once came the time to actually push content to the board, that's where they froze.

00:05:13.280 | So we introduced a preview, and this preview increased adoption by insane, because users now have the confidence that they know what's going to have this guardrail before they actually save.

00:05:26.460 | And they know what's going to be the outputs before they see it saved.

00:05:30.040 | So when you think about building AI experiences, think about previews, think about UB in the loop, think about how users can have that control and understanding before AI releases to production.

00:05:44.940 | And lastly is explainability.

00:05:48.020 | Now, explainability, we've heard a lot, and it feels, I think, when I talk with people, it kind of feels like a nice-to-have, but explainability is much more than that.

00:05:58.220 | Think about explainability as a way for your users to learn how to improve their experience with the AI over time.

00:06:08.220 | Because when they have an understanding of why the outputs happened, they have an ability to change the outcomes.

00:06:16.780 | So these four are super-important components that we've actually introduced in our product that have increased adoption very, very nicely.

00:06:25.860 | Now let's talk about the tech.

00:06:28.880 | So we actually built our entire ecosystem of our agents on LangRath and LangSmith.

00:06:37.340 | And we've tested out various frameworks, and we found LangRath to be the number one by far.

00:06:44.260 | And just a few examples, so what's great about LangRath is that it's not really opinionated, but it still does everything you don't want to deal with as an engineer,

00:06:55.740 | like interrupts and checkpoints, persistent memory, you mean the loop.

00:07:03.840 | Those are critical components that we don't want to deal with, but we have that.

00:07:07.340 | On the other hand, we have super great options to customize it just for what we need, and we'll show you an example in one second.

00:07:15.480 | And additionally, native integration, we now process millions of requests per month using LangRath, and it's proven to be super scalable.

00:07:29.300 | So let's take a look at how this is behind the hood.

00:07:32.580 | So we have LangRath as the center of everything we're building.

00:07:36.560 | And around our LangRath engine, which uses also LangRath and LangSmith for monitoring,

00:07:43.080 | we also have what we have built as what we call AI blocks, which is basically internal AI actions that we've developed at Monday.

00:07:51.360 | We've actually built our own evaluation framework, because we believe that evaluation is one of the most important aspects when you're building AI.

00:07:58.560 | And I think today was a lot about evaluations, as you've guys seen.

00:08:02.400 | So I'm not going to dive into that.

00:08:03.940 | And then we also have our AI gateway, which is our way of preserving what kind of inputs and outputs are enabled in the system.

00:08:13.080 | Now let's take an example of our first digital worker that we released, which is the Monday Expert.

00:08:21.200 | So basically what you see here is a conversational agent using the supervisor methodology that the system holds four different agents.

00:08:35.220 | We have a supervisor, we have a data retrieval agent, which is in charge of retrieving all data across Monday.

00:08:41.680 | For example, knowledge base, board data, we also use web search.

00:08:47.680 | Then we have our board actions agent, that does actual actions on Monday.

00:08:53.440 | And last, we have the answer composer, that based on the user, the past conversations, tone of voice,

00:09:01.940 | and all kind of other parameters that are defined by the Monday user, actually composes the final answer.

00:09:09.560 | And we've even added a really awesome tool that we've learned, which is called undo.

00:09:14.360 | So basically, we give the supervisor the ability to dynamically decide what to undo within the actions based on the user feedback.

00:09:24.340 | Which is, by the way, which is, by the way, proved to be one of the coolest use cases for building.

00:09:31.460 | And I want to share a bit of our lessons learned as we built this agent and what we're seeing.

00:09:38.380 | So, when you're building conversational agents, assume that 99% of user interactions you're not going to know how to handle.

00:09:50.260 | And it's proven statistically, right?

00:09:53.560 | When you think about the infinite amount of things users can ask, probably you've only handled 1%.

00:10:00.660 | And for this, we learned to start with a fallback.

00:10:06.160 | What happens in the 99% of interactions that we don't know how to handle?

00:10:10.880 | So, for example, what we did was, if we detect and the user is asking some action that we don't know how to handle,

00:10:18.540 | we would search our knowledge base and give them an answer for how they can do it themselves.

00:10:23.200 | Just an example of one way of resolving fallback.

00:10:26.560 | Evals, we've talked so much today, so I'm not going to dive into it.

00:10:32.100 | But I think the bottom line with evals is that evals are your IP.

00:10:34.920 | Because models change.

00:10:36.980 | Technology is going to change so much over the next few years.

00:10:40.500 | But if you have very strong evaluation, that is your IP.

00:10:44.940 | That will allow you to move much faster than your competitors.

00:10:47.960 | You mean the loop.

00:10:50.840 | Critical.

00:10:52.120 | We've talked about this a lot at the beginning.

00:10:54.180 | I think, for those who have really shipped AI to production,

00:11:00.820 | I think you've seen that it's one thing to bring AI to 80%.

00:11:05.660 | But then it takes another year to get to 99%.

00:11:10.120 | And this is a very important rule, because we really felt confident when we were working locally.

00:11:17.580 | And once we shipped to production, we realized how far we are from an actual product.

00:11:22.560 | I can see some of the audience resonate with me on that one.

00:11:30.300 | All guardrails, we highly recommend that you build outside the LLM.

00:11:35.080 | Right?

00:11:36.520 | We've seen things like LLM as a judge.

00:11:38.280 | Even back to the cursor idea.

00:11:40.660 | By the way, I think cursor is such a great example for a way to build a good product experience.

00:11:46.020 | Because I don't know if you guys, especially with Vibe Coding, after 25 runs, it stops.

00:11:50.520 | Right?

00:11:51.440 | This is an external guardrail they put in.

00:11:53.480 | No matter if it's actually running successful.

00:11:55.940 | 25 runs and it stops.

00:11:57.940 | So just think about how you can create those guardrails outside the LLM.

00:12:02.940 | And then lastly, and this is a very interesting one, is that it might be obvious that it's smart

00:12:11.420 | to break your agent into sub-agents.

00:12:14.940 | Obviously, when you have specialized agents they work better, but what we've learned is

00:12:20.240 | that there is a very important balance because when you have too many agents, what happens

00:12:27.180 | is what we like to call compound hallucination.

00:12:31.280 | So basically, it's a mathematic problem, right?

00:12:35.780 | 90% accuracy times a 90% accuracy second agent times a third times a fourth, even if they're

00:12:41.940 | all at 90%, you're now at 70%.

00:12:45.820 | And it's a mathematical, it's proven mathematically, right?

00:12:49.720 | So I think there's a very strong balance between how much of agents you want in your multi-agent

00:12:56.300 | system versus having too much or too little.

00:12:59.700 | And it's something that I think there is no rule of thumb, it's something you have to iterate

00:13:03.280 | on based on your use case.

00:13:07.420 | So let's talk about the future of work.

00:13:11.600 | And we believe that the future of work, as what we're working on at Monday, is all about

00:13:18.080 | orchestration.

00:13:20.200 | And I want to give you an example.

00:13:21.300 | So this is a real use case that we try to work on internally.

00:13:26.480 | We just had our earnings report, just a few days ago.

00:13:31.160 | And for those of you working in large public companies, you probably, or if you've been

00:13:36.440 | involved in these earnings reports, it's a tedious process.

00:13:41.220 | there is so much data, narrative, information across the company that you have to gather.

00:13:46.720 | There's so many people involved.

00:13:48.520 | So we said, what if we automated this?

00:13:50.680 | What if we had a way to automate and create an entire workflow that would automatically create

00:13:56.320 | everything we need for earnings?

00:13:57.540 | That would be a dream, right?

00:13:59.980 | But there's one problem with this.

00:14:01.360 | And the problem is that it will only run once a quarter.

00:14:07.740 | We invest the entire month, building an amazing workflow, and then it would run it once.

00:14:14.300 | And the next time we run, AI is going to change dramatically, new models are going to come out,

00:14:19.160 | everything is going to change in the world, and then we're going to have to rebuild everything,

00:14:21.540 | right?

00:14:22.760 | So it got us thinking about how can we solve this.

00:14:27.200 | So I want you to imagine, what if there was a finite set of agents that could do infinite

00:14:32.980 | amount of tasks?

00:14:34.980 | Now, the irony is that this is not some big dream.

00:14:38.740 | This is exactly how we work as humans, right?

00:14:40.920 | When you think about us, we each have our specialized skills, some are engineers, some are data analysts,

00:14:49.520 | and then every time there is a task at work, some of us do A and some of us do B. So there's

00:14:55.800 | no reason why they shouldn't work with agents and AI.

00:15:02.280 | So when we think about the future, we think about what you see here.

00:15:08.660 | Imagine that for the same given task that we had, and I showed you earlier, we had a dynamic

00:15:14.020 | way to orchestrate and create dynamic workflow with dynamic edges and dynamic rules.

00:15:22.300 | Choosing dynamic, very specific agents that are perfect for the task, run the task, and

00:15:29.860 | then dynamically dissolve.

00:15:34.780 | So this is super exciting and one of the things that we're working on with LinkedIn, and we really

00:15:39.280 | want to see this come to life in the future.

00:15:43.160 | So lastly, we're actually opening our marketplace of agents to all of you, and we'd love to see

00:15:49.560 | you join the waitlist and join us in building and trying to tackle those 1 billion tests that

00:15:54.480 | we are trying to complete.

00:15:57.640 | So thank you very much, everyone.

00:15:58.800 | This was a pleasure.

00:15:59.800 | Thank you.

00:15:59.920 | Thank you.

00:16:00.920 | Thank you.

00:16:01.920 | Thank you.

00:16:02.920 | Thank you.

00:16:03.920 | Thank you.