back to index

How to Build Planning Agents without losing control - Yogendra Miraje, Factset


Whisper Transcript | Transcript Only Page

00:00:00.000 | Yogi Dhani: Hi, everyone.
00:00:16.320 | I'm Yogi.
00:00:17.240 | I work at FactSet, a financial data and software company.
00:00:21.320 | And today I'll be sharing some of my experience
00:00:24.160 | while building Agent.
00:00:27.200 | In last few years, we have seen tremendous growth
00:00:29.560 | in AI.
00:00:30.920 | And especially in last couple of years,
00:00:34.360 | we are on exponential curve of intelligence growth.
00:00:38.700 | And yet, it feels like when we develop AI applications,
00:00:44.260 | driving a monster truck through a crowded mall
00:00:46.680 | with the tiny joysticks.
00:00:48.320 | So AI applications have not seen its ChatGPT moment yet.
00:00:53.720 | There are many reasons why agents don't behave.
00:00:57.020 | But probably one reason that strikes out
00:01:00.220 | is it misses the right context.
00:01:03.920 | And in case of enterprises, often it
00:01:06.160 | means that it does not have knowledge of enterprise-specific
00:01:10.540 | workflows.
00:01:13.100 | But before that, we will see some common context.
00:01:17.040 | And just like agents, humans also need a common context.
00:01:20.680 | So let's start with some key definitions.
00:01:24.920 | So as you know, LLMs are limited by their knowledge
00:01:28.380 | at the time of training.
00:01:29.720 | So we enhance their functionality by increasing it by tool.
00:01:36.360 | And when you combine this LLM with tool and memory,
00:01:39.700 | we call it augmented LLM.
00:01:41.860 | When you place this augmented LLM on a static and predefined path,
00:01:46.180 | we call it a workflow.
00:01:48.080 | And if these augmented LLMs have high autonomy and feedback loop,
00:01:53.340 | we call it as an agent.
00:01:56.880 | Now workflows are controllable and reliable,
00:02:00.460 | while agents have flexibility and they are highly autonomous.
00:02:04.680 | So the question is, can we get best of both worlds?
00:02:07.980 | So the answer is yes.
00:02:09.620 | With agentic workflows, we can plan and execute
00:02:12.980 | the workflows based on the goal, context, and feedback.
00:02:16.220 | I see these terms being used very loosely,
00:02:22.140 | and at times interchangeably.
00:02:24.800 | So I would like to make a key distinction
00:02:26.440 | between workflow agent and agentic workflow.
00:02:31.740 | Workflow agent is a predefined workflow run by agent,
00:02:37.360 | while agentic workflow is a workflow planned
00:02:40.860 | and run by an agent.
00:02:43.960 | I know these terms are quite confusing,
00:02:46.220 | and in AI we are very bad at naming things.
00:02:49.560 | So if you are confused, don't worry.
00:02:52.860 | In case of workflow agent, just remember
00:02:54.940 | that workflow is in control and workflow is static.
00:02:58.460 | In case of agentic workflow, agent is always in control,
00:03:02.780 | and the workflow is dynamic.
00:03:07.740 | It is also important to view these systems as agentic system,
00:03:11.680 | as Andrewing pointed out correctly.
00:03:15.040 | On agentic spectrum, agentic workflows
00:03:18.220 | have more agenticness than workflow agents,
00:03:21.460 | generally speaking.
00:03:22.260 | So why all of this matter?
00:03:28.920 | Apart from control, reliability, predictability,
00:03:33.100 | for enterprises, agentic workflows provide a way
00:03:37.340 | to automate the workflows at scale.
00:03:40.400 | And perhaps most important thing is enterprises
00:03:45.520 | can use their existing enterprises microservices
00:03:50.520 | to build on top of it.
00:03:52.680 | And in some cases, these enterprises
00:03:56.000 | have invested years, if not decades.
00:04:02.020 | So before diving deep, I would like
00:04:03.620 | to say that even though I'm speaking
00:04:05.120 | in terms of enterprise context here,
00:04:07.260 | the concepts are generally applicable.
00:04:08.960 | So where do we begin?
00:04:13.500 | In last few years, the focus really
00:04:15.600 | has been on the React-based agent.
00:04:18.680 | And in building agentic workflow,
00:04:20.640 | we need to move on from React-based agent
00:04:23.740 | to proactive agents.
00:04:25.540 | By the way, great philosophy for life as well.
00:04:30.740 | So for building agentic workflows,
00:04:32.340 | you need tools, memory, and reflection.
00:04:37.140 | But more importantly, you will need a design pattern
00:04:41.020 | called planning by sub-goal division,
00:04:46.020 | sometimes also referred as a task decomposition.
00:04:49.120 | And it is just a fancy way of saying that take your goal
00:04:52.860 | and break it down into simpler steps.
00:04:57.500 | So here are some specific agentic architecture
00:05:00.000 | and research papers that you will find useful.
00:05:02.840 | And each of that has its own pros and cons.
00:05:06.600 | And LangChain has done a fantastic job
00:05:08.780 | of creating a blog from this and also given the code.
00:05:14.520 | So I highly recommend checking it out.
00:05:18.020 | So how does it look in practice?
00:05:21.100 | So in fact, what we have done is we
00:05:23.900 | are taking this LLM compiler architecture
00:05:25.740 | and trying to adapt for our problems.
00:05:29.800 | And you can see some components here
00:05:32.220 | that you also find that in your organization.
00:05:35.280 | Microservices.
00:05:37.140 | And you build tools around those microservices.
00:05:39.960 | And when a user question asks, it goes to Blueprint Generator.
00:05:43.120 | And I will get to that in a bit.
00:05:45.320 | But consider it as a high level plan.
00:05:49.560 | What we call it is a Blueprint that gets fed to Planner.
00:05:53.820 | Planner is your low level task.
00:05:57.040 | Planner, it gives the plan to the executor.
00:06:00.460 | An executor is supposed to execute it.
00:06:03.540 | And Joiner combines the outputs from different tasks.
00:06:09.040 | Based on your replanning logic, either you do replanning again,
00:06:12.980 | or you just terminate and give the response back to the user.
00:06:17.960 | Sometimes you also set some recursion limits
00:06:20.080 | so that your agent just doesn't go into loop.
00:06:24.800 | On LangGraph, we are using each of these components as nodes.
00:06:28.400 | So Blueprint Generator, Planner, Executor, and Joiner
00:06:32.360 | are all nodes on the LangGraph.
00:06:36.980 | When building these tools in your enterprises
00:06:42.020 | around your microservices, probably this is where you will
00:06:45.020 | spend most of your time.
00:06:47.620 | And it's important to consider how this relation between tools
00:06:51.560 | and microservices goes.
00:06:53.700 | And here, the relationship is definitely not one-to-one or end-to-end.
00:06:57.860 | It's end-to-end.
00:06:59.140 | It's up to you how you want to design your tools according
00:07:02.900 | to your microservices so that your agent knows how to use this tool.
00:07:07.640 | Perhaps this is like the most key point here,
00:07:10.640 | that you need to make-- really put yourself into agent's shoes
00:07:14.580 | so that agent really understand what tool to use,
00:07:18.220 | and it has that knowledge of your microservices.
00:07:22.720 | Always follow standard.
00:07:23.720 | I know MCP is everyone's favorite.
00:07:25.320 | So build the MCP tool server for your tools.
00:07:28.580 | And for providing the tool details,
00:07:32.260 | just think from agent's point of view
00:07:33.820 | that you need to provide a tool purpose, description,
00:07:38.080 | and input/output contracts.
00:07:40.660 | So tool purpose will help you what tools to be selected.
00:07:45.240 | Detail description will tell you when
00:07:46.920 | this tool needs to be invoked.
00:07:49.140 | And input/output contracts will tell you
00:07:50.920 | how to use this tool.
00:07:52.940 | And lastly, add some validation checks,
00:07:55.580 | which acts as a break for your agent.
00:07:57.920 | Now, I would like to a little bit zoom in into this Blueprint
00:08:04.800 | because this is one of the key architecture chains that we made.
00:08:09.000 | Blueprint is just a series of steps for workflow
00:08:11.900 | as for tool capabilities in natural language.
00:08:15.300 | And it gets fed to Planner, but why we are doing it.
00:08:21.140 | What we realized was Planner really gets cognitively loaded when
00:08:28.400 | you try to just put too much onto it.
00:08:30.980 | So introducing a Blueprint, which is just a natural language
00:08:34.780 | of breaking down of a task, is very helpful.
00:08:38.840 | But we also noticed that it brings a lot of other benefits
00:08:41.920 | as well.
00:08:43.260 | For example, it achieves the finer control over task planning.
00:08:47.980 | It limits the in-context tool for the Planner.
00:08:50.820 | So when Blueprint, you can select what tools need
00:08:55.080 | to be given to the Planner.
00:08:56.640 | And sometimes this Planner has a lot of tool description,
00:09:01.520 | and you run all sort of problems as context window limit
00:09:05.300 | and Planner getting very much overloaded.
00:09:08.780 | So using Blueprint, you can limit what tools really
00:09:13.040 | goes to the Planner.
00:09:15.900 | And thus, it really helps in the planning.
00:09:21.660 | It also helps interpreting the agentic behavior.
00:09:24.920 | And lastly, when you need to collaborate
00:09:27.160 | with non-technical people, it's really helpful
00:09:30.980 | because natural language is less intimidating.
00:09:34.500 | Let's see a concrete example.
00:09:36.600 | So in financial research, preparing for a company's
00:09:40.920 | earning call is a common workflow.
00:09:43.920 | So this is a very, very simplified version
00:09:46.300 | of a workflow of preparing for a company's earning call.
00:09:50.640 | And for example, we are showing you preparing
00:09:52.340 | for NVIDIA's earning call.
00:09:55.480 | Now, you can see in the Blueprint, there is a tool
00:09:58.180 | and there is task.
00:09:59.580 | And in the plan, there is a tool and the function call.
00:10:03.080 | So how does it look in the Blueprint
00:10:05.620 | is you have two tools, and then your first step
00:10:09.360 | is summarizing the NVIDIA's previous earning call.
00:10:12.080 | And the next step is retrieval, gathering
00:10:14.140 | some of the financial data for NVIDIA.
00:10:17.300 | And then your reasoning, suggesting some questions
00:10:19.580 | for the earning call, and finally reporting
00:10:22.320 | a general data competency report from all the information.
00:10:26.520 | And there are corresponding function calls.
00:10:28.040 | And as you can see, context is being fed from a task.
00:10:31.460 | A concrete example of the response
00:10:37.340 | is before you implement agentic workflow,
00:10:39.740 | the response is pretty much vanilla.
00:10:41.960 | But after this, it can easily capture your workflow
00:10:45.960 | and give a very structured response.
00:10:49.000 | So whatever we talked about, none of this
00:10:51.440 | will really work without writing a proper evals.
00:10:54.860 | So always make sure to invest and build and maintain
00:10:58.880 | your eval framework.
00:11:00.800 | You should have at least component and end-to-end evals.
00:11:04.560 | You should really use the correct techniques,
00:11:07.440 | like code-based, LLMS-judge, human-in-the-loop.
00:11:10.040 | And more importantly, write evals for metrics
00:11:13.580 | that you really care for.
00:11:16.020 | Aspect-based eval is something we should really think about.
00:11:20.320 | And for example, for Blueprint, you
00:11:23.840 | can check an aspect like how many Blueprint,
00:11:26.620 | whether it resembles a golden Blueprint or not.
00:11:28.760 | And you can use LLMS-judge.
00:11:31.320 | If you want to see whether tools are selected correct or not,
00:11:34.700 | you should leverage code-based evals.
00:11:37.540 | If you want to check whether a plan is in line
00:11:39.720 | with the Blueprint or not, LLMS-judge, probably
00:11:42.740 | the right technique.
00:11:44.460 | And for some cases, leveraging human-in-the-loop
00:11:47.500 | is good, because report formatting,
00:11:50.160 | that's the best approach to deal with report formatting.
00:11:55.440 | So when not to use agentic workflows?
00:11:58.060 | So in some cases, definitely agentic workflow
00:12:00.100 | doesn't make sense.
00:12:01.080 | In case of fixed and repeated tasks,
00:12:03.120 | just probably go for ETL pipelines.
00:12:06.300 | If your workflow cannot be really captured,
00:12:08.700 | you cannot really capture use case-in workflows,
00:12:11.220 | agentic workflows are probably not worked.
00:12:13.740 | And if deterministic outcome is paramount,
00:12:16.620 | in case of strict compliance and a safety-critical context,
00:12:21.200 | you probably should not go with agentic workflow.
00:12:24.100 | And in case of low latency and cost-centered environment
00:12:27.380 | also, you should probably try to avoid agentic workflow.
00:12:32.820 | So wrapping up some learnings, start with simple Blueprints.
00:12:38.380 | Work your way up building a complex RAC system.
00:12:43.020 | For the Blueprints, use Blueprint to reduce the in-context tools
00:12:49.280 | and provide the high-level plan to the planner.
00:12:52.720 | Design tools from agent point of view.
00:12:55.880 | Always aim for the tool use of simplicity.
00:12:58.640 | Implement safety guardrails.
00:13:01.780 | And evals, observability, and all the good software engineering.
00:13:07.080 | And that should help you a lot.
00:13:11.200 | And from the whole presentation, the key takeaways are,
00:13:14.540 | agentic workflow is planned and run by agent.
00:13:18.080 | Agentic workflows bring the reliability at scale.
00:13:23.200 | And planning by sub-goal division is a key design pattern.
00:13:27.040 | Plan and execute is a key agentic architecture.
00:13:30.840 | And build your tools to complement your microservices.
00:13:35.940 | Always try to leverage your microservices in the tools.
00:13:40.240 | And modify your architecture to solve the problems.
00:13:43.700 | Don't really shy away from changing, taking research paper,
00:13:47.380 | and experimenting on it.
00:13:49.680 | And finally, treat your evals like first-class citizen.
00:13:54.580 | And with that, thank you very much for your time.
00:13:56.620 | All right.
00:14:03.200 | Thank you.
00:14:04.160 | Any questions?
00:14:05.200 | We have a little bit of time to spare.
00:14:09.040 | I have a question.
00:14:09.900 | Sure.
00:14:11.120 | Do you have, on top of your mind, any GitHub project
00:14:16.760 | or reference that we can follow?
00:14:19.800 | Sure.
00:14:20.300 | Sure.
00:14:20.800 | So if you just go back here, I kind
00:14:25.380 | of shared some of the links for the Langchain.
00:14:32.860 | It should have all the code for these research paper.
00:14:36.760 | And that's probably the most best place
00:14:39.780 | to start with this plan and execute kind of agents.
00:14:42.560 | Thank you.
00:14:43.260 | Yeah.
00:14:43.760 | Any other questions?
00:14:47.920 | Any other questions?
00:14:50.520 | All right.
00:14:52.040 | I guess one question I would have for you
00:14:53.540 | is when you talk about MCP and other forms
00:14:57.520 | of orchestration, what do you foresee
00:14:59.660 | being the primary method of orchestration going forward?
00:15:02.980 | Is it going to be a lane graph or some other--
00:15:07.500 | Yeah.
00:15:07.720 | I think the answer is probably everything.
00:15:10.780 | MCP, you use it so that you provide a standard across the arc.
00:15:15.140 | And MCP will really help for organization
00:15:17.920 | to build once, use it everywhere.
00:15:21.320 | You can have-- oftentimes, in organizations,
00:15:23.940 | we see that people just trying to just use this functionality
00:15:28.480 | in different AI apps.
00:15:29.640 | But if you can build an MCP around it, you can keep using it.
00:15:33.500 | And obviously, for orchestration, Langraph is great.
00:15:36.920 | And whatever the other tools that you
00:15:38.980 | find to solve your problem, that will be also--
00:15:42.020 | so the answer is probably there will be multiple things
00:15:44.520 | that is useful.
00:15:45.640 | It depends on your use case, what is the most optimal framework
00:15:49.400 | that you want to use.
00:15:50.420 | Amazing.
00:15:51.200 | Thank you so much, Yuri.
00:15:54.960 | We'll be right back.