How to Build Planning Agents without losing control

00:00:00.000 | Yogi Dhani: Hi, everyone.

00:00:16.320 | I'm Yogi.

00:00:17.240 | I work at FactSet, a financial data and software company.

00:00:21.320 | And today I'll be sharing some of my experience

00:00:24.160 | while building Agent.

00:00:27.200 | In last few years, we have seen tremendous growth

00:00:29.560 | in AI.

00:00:30.920 | And especially in last couple of years,

00:00:34.360 | we are on exponential curve of intelligence growth.

00:00:38.700 | And yet, it feels like when we develop AI applications,

00:00:44.260 | driving a monster truck through a crowded mall

00:00:46.680 | with the tiny joysticks.

00:00:48.320 | So AI applications have not seen its ChatGPT moment yet.

00:00:53.720 | There are many reasons why agents don't behave.

00:00:57.020 | But probably one reason that strikes out

00:01:00.220 | is it misses the right context.

00:01:03.920 | And in case of enterprises, often it

00:01:06.160 | means that it does not have knowledge of enterprise-specific

00:01:10.540 | workflows.

00:01:13.100 | But before that, we will see some common context.

00:01:17.040 | And just like agents, humans also need a common context.

00:01:20.680 | So let's start with some key definitions.

00:01:24.920 | So as you know, LLMs are limited by their knowledge

00:01:28.380 | at the time of training.

00:01:29.720 | So we enhance their functionality by increasing it by tool.

00:01:36.360 | And when you combine this LLM with tool and memory,

00:01:39.700 | we call it augmented LLM.

00:01:41.860 | When you place this augmented LLM on a static and predefined path,

00:01:46.180 | we call it a workflow.

00:01:48.080 | And if these augmented LLMs have high autonomy and feedback loop,

00:01:53.340 | we call it as an agent.

00:01:56.880 | Now workflows are controllable and reliable,

00:02:00.460 | while agents have flexibility and they are highly autonomous.

00:02:04.680 | So the question is, can we get best of both worlds?

00:02:07.980 | So the answer is yes.

00:02:09.620 | With agentic workflows, we can plan and execute

00:02:12.980 | the workflows based on the goal, context, and feedback.

00:02:16.220 | I see these terms being used very loosely,

00:02:22.140 | and at times interchangeably.

00:02:24.800 | So I would like to make a key distinction

00:02:26.440 | between workflow agent and agentic workflow.

00:02:31.740 | Workflow agent is a predefined workflow run by agent,

00:02:37.360 | while agentic workflow is a workflow planned

00:02:40.860 | and run by an agent.

00:02:43.960 | I know these terms are quite confusing,

00:02:46.220 | and in AI we are very bad at naming things.

00:02:49.560 | So if you are confused, don't worry.

00:02:52.860 | In case of workflow agent, just remember

00:02:54.940 | that workflow is in control and workflow is static.

00:02:58.460 | In case of agentic workflow, agent is always in control,

00:03:02.780 | and the workflow is dynamic.

00:03:07.740 | It is also important to view these systems as agentic system,

00:03:11.680 | as Andrewing pointed out correctly.

00:03:15.040 | On agentic spectrum, agentic workflows

00:03:18.220 | have more agenticness than workflow agents,

00:03:21.460 | generally speaking.

00:03:22.260 | So why all of this matter?

00:03:28.920 | Apart from control, reliability, predictability,

00:03:33.100 | for enterprises, agentic workflows provide a way

00:03:37.340 | to automate the workflows at scale.

00:03:40.400 | And perhaps most important thing is enterprises

00:03:45.520 | can use their existing enterprises microservices

00:03:50.520 | to build on top of it.

00:03:52.680 | And in some cases, these enterprises

00:03:56.000 | have invested years, if not decades.

00:04:02.020 | So before diving deep, I would like

00:04:03.620 | to say that even though I'm speaking

00:04:05.120 | in terms of enterprise context here,

00:04:07.260 | the concepts are generally applicable.

00:04:08.960 | So where do we begin?

00:04:13.500 | In last few years, the focus really

00:04:15.600 | has been on the React-based agent.

00:04:18.680 | And in building agentic workflow,

00:04:20.640 | we need to move on from React-based agent

00:04:23.740 | to proactive agents.

00:04:25.540 | By the way, great philosophy for life as well.

00:04:30.740 | So for building agentic workflows,

00:04:32.340 | you need tools, memory, and reflection.

00:04:37.140 | But more importantly, you will need a design pattern

00:04:41.020 | called planning by sub-goal division,

00:04:46.020 | sometimes also referred as a task decomposition.

00:04:49.120 | And it is just a fancy way of saying that take your goal

00:04:52.860 | and break it down into simpler steps.

00:04:57.500 | So here are some specific agentic architecture

00:05:00.000 | and research papers that you will find useful.

00:05:02.840 | And each of that has its own pros and cons.

00:05:06.600 | And LangChain has done a fantastic job

00:05:08.780 | of creating a blog from this and also given the code.

00:05:14.520 | So I highly recommend checking it out.

00:05:18.020 | So how does it look in practice?

00:05:21.100 | So in fact, what we have done is we

00:05:23.900 | are taking this LLM compiler architecture

00:05:25.740 | and trying to adapt for our problems.

00:05:29.800 | And you can see some components here

00:05:32.220 | that you also find that in your organization.

00:05:35.280 | Microservices.

00:05:37.140 | And you build tools around those microservices.

00:05:39.960 | And when a user question asks, it goes to Blueprint Generator.

00:05:43.120 | And I will get to that in a bit.

00:05:45.320 | But consider it as a high level plan.

00:05:49.560 | What we call it is a Blueprint that gets fed to Planner.

00:05:53.820 | Planner is your low level task.

00:05:57.040 | Planner, it gives the plan to the executor.

00:06:00.460 | An executor is supposed to execute it.

00:06:03.540 | And Joiner combines the outputs from different tasks.

00:06:09.040 | Based on your replanning logic, either you do replanning again,

00:06:12.980 | or you just terminate and give the response back to the user.

00:06:17.960 | Sometimes you also set some recursion limits

00:06:20.080 | so that your agent just doesn't go into loop.

00:06:24.800 | On LangGraph, we are using each of these components as nodes.

00:06:28.400 | So Blueprint Generator, Planner, Executor, and Joiner

00:06:32.360 | are all nodes on the LangGraph.

00:06:36.980 | When building these tools in your enterprises

00:06:42.020 | around your microservices, probably this is where you will

00:06:45.020 | spend most of your time.

00:06:47.620 | And it's important to consider how this relation between tools

00:06:51.560 | and microservices goes.

00:06:53.700 | And here, the relationship is definitely not one-to-one or end-to-end.

00:06:57.860 | It's end-to-end.

00:06:59.140 | It's up to you how you want to design your tools according

00:07:02.900 | to your microservices so that your agent knows how to use this tool.

00:07:07.640 | Perhaps this is like the most key point here,

00:07:10.640 | that you need to make-- really put yourself into agent's shoes

00:07:14.580 | so that agent really understand what tool to use,

00:07:18.220 | and it has that knowledge of your microservices.

00:07:22.720 | Always follow standard.

00:07:23.720 | I know MCP is everyone's favorite.

00:07:25.320 | So build the MCP tool server for your tools.

00:07:28.580 | And for providing the tool details,

00:07:32.260 | just think from agent's point of view

00:07:33.820 | that you need to provide a tool purpose, description,

00:07:38.080 | and input/output contracts.

00:07:40.660 | So tool purpose will help you what tools to be selected.

00:07:45.240 | Detail description will tell you when

00:07:46.920 | this tool needs to be invoked.

00:07:49.140 | And input/output contracts will tell you

00:07:50.920 | how to use this tool.

00:07:52.940 | And lastly, add some validation checks,

00:07:55.580 | which acts as a break for your agent.

00:07:57.920 | Now, I would like to a little bit zoom in into this Blueprint

00:08:04.800 | because this is one of the key architecture chains that we made.

00:08:09.000 | Blueprint is just a series of steps for workflow

00:08:11.900 | as for tool capabilities in natural language.

00:08:15.300 | And it gets fed to Planner, but why we are doing it.

00:08:21.140 | What we realized was Planner really gets cognitively loaded when

00:08:28.400 | you try to just put too much onto it.

00:08:30.980 | So introducing a Blueprint, which is just a natural language

00:08:34.780 | of breaking down of a task, is very helpful.

00:08:38.840 | But we also noticed that it brings a lot of other benefits

00:08:41.920 | as well.

00:08:43.260 | For example, it achieves the finer control over task planning.

00:08:47.980 | It limits the in-context tool for the Planner.

00:08:50.820 | So when Blueprint, you can select what tools need

00:08:55.080 | to be given to the Planner.

00:08:56.640 | And sometimes this Planner has a lot of tool description,

00:09:01.520 | and you run all sort of problems as context window limit

00:09:05.300 | and Planner getting very much overloaded.

00:09:08.780 | So using Blueprint, you can limit what tools really

00:09:13.040 | goes to the Planner.

00:09:15.900 | And thus, it really helps in the planning.

00:09:21.660 | It also helps interpreting the agentic behavior.

00:09:24.920 | And lastly, when you need to collaborate

00:09:27.160 | with non-technical people, it's really helpful

00:09:30.980 | because natural language is less intimidating.

00:09:34.500 | Let's see a concrete example.

00:09:36.600 | So in financial research, preparing for a company's

00:09:40.920 | earning call is a common workflow.

00:09:43.920 | So this is a very, very simplified version

00:09:46.300 | of a workflow of preparing for a company's earning call.

00:09:50.640 | And for example, we are showing you preparing

00:09:52.340 | for NVIDIA's earning call.

00:09:55.480 | Now, you can see in the Blueprint, there is a tool

00:09:58.180 | and there is task.

00:09:59.580 | And in the plan, there is a tool and the function call.

00:10:03.080 | So how does it look in the Blueprint

00:10:05.620 | is you have two tools, and then your first step

00:10:09.360 | is summarizing the NVIDIA's previous earning call.

00:10:12.080 | And the next step is retrieval, gathering

00:10:14.140 | some of the financial data for NVIDIA.

00:10:17.300 | And then your reasoning, suggesting some questions

00:10:19.580 | for the earning call, and finally reporting

00:10:22.320 | a general data competency report from all the information.

00:10:26.520 | And there are corresponding function calls.

00:10:28.040 | And as you can see, context is being fed from a task.

00:10:31.460 | A concrete example of the response

00:10:37.340 | is before you implement agentic workflow,

00:10:39.740 | the response is pretty much vanilla.

00:10:41.960 | But after this, it can easily capture your workflow

00:10:45.960 | and give a very structured response.

00:10:49.000 | So whatever we talked about, none of this

00:10:51.440 | will really work without writing a proper evals.

00:10:54.860 | So always make sure to invest and build and maintain

00:10:58.880 | your eval framework.

00:11:00.800 | You should have at least component and end-to-end evals.

00:11:04.560 | You should really use the correct techniques,

00:11:07.440 | like code-based, LLMS-judge, human-in-the-loop.

00:11:10.040 | And more importantly, write evals for metrics

00:11:13.580 | that you really care for.

00:11:16.020 | Aspect-based eval is something we should really think about.

00:11:20.320 | And for example, for Blueprint, you

00:11:23.840 | can check an aspect like how many Blueprint,

00:11:26.620 | whether it resembles a golden Blueprint or not.

00:11:28.760 | And you can use LLMS-judge.

00:11:31.320 | If you want to see whether tools are selected correct or not,

00:11:34.700 | you should leverage code-based evals.

00:11:37.540 | If you want to check whether a plan is in line

00:11:39.720 | with the Blueprint or not, LLMS-judge, probably

00:11:42.740 | the right technique.

00:11:44.460 | And for some cases, leveraging human-in-the-loop

00:11:47.500 | is good, because report formatting,

00:11:50.160 | that's the best approach to deal with report formatting.

00:11:55.440 | So when not to use agentic workflows?

00:11:58.060 | So in some cases, definitely agentic workflow

00:12:00.100 | doesn't make sense.

00:12:01.080 | In case of fixed and repeated tasks,

00:12:03.120 | just probably go for ETL pipelines.

00:12:06.300 | If your workflow cannot be really captured,

00:12:08.700 | you cannot really capture use case-in workflows,

00:12:11.220 | agentic workflows are probably not worked.

00:12:13.740 | And if deterministic outcome is paramount,

00:12:16.620 | in case of strict compliance and a safety-critical context,

00:12:21.200 | you probably should not go with agentic workflow.

00:12:24.100 | And in case of low latency and cost-centered environment

00:12:27.380 | also, you should probably try to avoid agentic workflow.

00:12:32.820 | So wrapping up some learnings, start with simple Blueprints.

00:12:38.380 | Work your way up building a complex RAC system.

00:12:43.020 | For the Blueprints, use Blueprint to reduce the in-context tools

00:12:49.280 | and provide the high-level plan to the planner.

00:12:52.720 | Design tools from agent point of view.

00:12:55.880 | Always aim for the tool use of simplicity.

00:12:58.640 | Implement safety guardrails.

00:13:01.780 | And evals, observability, and all the good software engineering.

00:13:07.080 | And that should help you a lot.

00:13:11.200 | And from the whole presentation, the key takeaways are,

00:13:14.540 | agentic workflow is planned and run by agent.

00:13:18.080 | Agentic workflows bring the reliability at scale.

00:13:23.200 | And planning by sub-goal division is a key design pattern.

00:13:27.040 | Plan and execute is a key agentic architecture.

00:13:30.840 | And build your tools to complement your microservices.

00:13:35.940 | Always try to leverage your microservices in the tools.

00:13:40.240 | And modify your architecture to solve the problems.

00:13:43.700 | Don't really shy away from changing, taking research paper,

00:13:47.380 | and experimenting on it.

00:13:49.680 | And finally, treat your evals like first-class citizen.

00:13:54.580 | And with that, thank you very much for your time.

00:13:56.620 | All right.

00:14:03.200 | Thank you.

00:14:04.160 | Any questions?

00:14:05.200 | We have a little bit of time to spare.

00:14:09.040 | I have a question.

00:14:09.900 | Sure.

00:14:11.120 | Do you have, on top of your mind, any GitHub project

00:14:16.760 | or reference that we can follow?

00:14:19.800 | Sure.

00:14:20.300 | Sure.

00:14:20.800 | So if you just go back here, I kind

00:14:25.380 | of shared some of the links for the Langchain.

00:14:32.860 | It should have all the code for these research paper.

00:14:36.760 | And that's probably the most best place

00:14:39.780 | to start with this plan and execute kind of agents.

00:14:42.560 | Thank you.

00:14:43.260 | Yeah.

00:14:43.760 | Any other questions?

00:14:47.920 | Any other questions?

00:14:50.520 | All right.

00:14:52.040 | I guess one question I would have for you

00:14:53.540 | is when you talk about MCP and other forms

00:14:57.520 | of orchestration, what do you foresee

00:14:59.660 | being the primary method of orchestration going forward?

00:15:02.980 | Is it going to be a lane graph or some other--

00:15:07.500 | Yeah.

00:15:07.720 | I think the answer is probably everything.

00:15:10.780 | MCP, you use it so that you provide a standard across the arc.

00:15:15.140 | And MCP will really help for organization

00:15:17.920 | to build once, use it everywhere.

00:15:21.320 | You can have-- oftentimes, in organizations,

00:15:23.940 | we see that people just trying to just use this functionality

00:15:28.480 | in different AI apps.

00:15:29.640 | But if you can build an MCP around it, you can keep using it.

00:15:33.500 | And obviously, for orchestration, Langraph is great.

00:15:36.920 | And whatever the other tools that you

00:15:38.980 | find to solve your problem, that will be also--

00:15:42.020 | so the answer is probably there will be multiple things

00:15:44.520 | that is useful.

00:15:45.640 | It depends on your use case, what is the most optimal framework

00:15:49.400 | that you want to use.

00:15:50.420 | Amazing.

00:15:51.200 | Thank you so much, Yuri.

00:15:52.900 | you

00:15:54.960 | We'll be right back.

How to Build Planning Agents without losing control - Yogendra Miraje, Factset