back to indexHow to Build Planning Agents without losing control - Yogendra Miraje, Factset

00:00:17.240 |
I work at FactSet, a financial data and software company. 00:00:21.320 |
And today I'll be sharing some of my experience 00:00:27.200 |
In last few years, we have seen tremendous growth 00:00:34.360 |
we are on exponential curve of intelligence growth. 00:00:38.700 |
And yet, it feels like when we develop AI applications, 00:00:44.260 |
driving a monster truck through a crowded mall 00:00:48.320 |
So AI applications have not seen its ChatGPT moment yet. 00:00:53.720 |
There are many reasons why agents don't behave. 00:01:06.160 |
means that it does not have knowledge of enterprise-specific 00:01:13.100 |
But before that, we will see some common context. 00:01:17.040 |
And just like agents, humans also need a common context. 00:01:24.920 |
So as you know, LLMs are limited by their knowledge 00:01:29.720 |
So we enhance their functionality by increasing it by tool. 00:01:36.360 |
And when you combine this LLM with tool and memory, 00:01:41.860 |
When you place this augmented LLM on a static and predefined path, 00:01:48.080 |
And if these augmented LLMs have high autonomy and feedback loop, 00:02:00.460 |
while agents have flexibility and they are highly autonomous. 00:02:04.680 |
So the question is, can we get best of both worlds? 00:02:09.620 |
With agentic workflows, we can plan and execute 00:02:12.980 |
the workflows based on the goal, context, and feedback. 00:02:31.740 |
Workflow agent is a predefined workflow run by agent, 00:02:54.940 |
that workflow is in control and workflow is static. 00:02:58.460 |
In case of agentic workflow, agent is always in control, 00:03:07.740 |
It is also important to view these systems as agentic system, 00:03:28.920 |
Apart from control, reliability, predictability, 00:03:33.100 |
for enterprises, agentic workflows provide a way 00:03:40.400 |
And perhaps most important thing is enterprises 00:03:45.520 |
can use their existing enterprises microservices 00:04:25.540 |
By the way, great philosophy for life as well. 00:04:37.140 |
But more importantly, you will need a design pattern 00:04:46.020 |
sometimes also referred as a task decomposition. 00:04:49.120 |
And it is just a fancy way of saying that take your goal 00:04:57.500 |
So here are some specific agentic architecture 00:05:00.000 |
and research papers that you will find useful. 00:05:08.780 |
of creating a blog from this and also given the code. 00:05:32.220 |
that you also find that in your organization. 00:05:37.140 |
And you build tools around those microservices. 00:05:39.960 |
And when a user question asks, it goes to Blueprint Generator. 00:05:49.560 |
What we call it is a Blueprint that gets fed to Planner. 00:06:03.540 |
And Joiner combines the outputs from different tasks. 00:06:09.040 |
Based on your replanning logic, either you do replanning again, 00:06:12.980 |
or you just terminate and give the response back to the user. 00:06:20.080 |
so that your agent just doesn't go into loop. 00:06:24.800 |
On LangGraph, we are using each of these components as nodes. 00:06:28.400 |
So Blueprint Generator, Planner, Executor, and Joiner 00:06:36.980 |
When building these tools in your enterprises 00:06:42.020 |
around your microservices, probably this is where you will 00:06:47.620 |
And it's important to consider how this relation between tools 00:06:53.700 |
And here, the relationship is definitely not one-to-one or end-to-end. 00:06:59.140 |
It's up to you how you want to design your tools according 00:07:02.900 |
to your microservices so that your agent knows how to use this tool. 00:07:07.640 |
Perhaps this is like the most key point here, 00:07:10.640 |
that you need to make-- really put yourself into agent's shoes 00:07:14.580 |
so that agent really understand what tool to use, 00:07:18.220 |
and it has that knowledge of your microservices. 00:07:33.820 |
that you need to provide a tool purpose, description, 00:07:40.660 |
So tool purpose will help you what tools to be selected. 00:07:57.920 |
Now, I would like to a little bit zoom in into this Blueprint 00:08:04.800 |
because this is one of the key architecture chains that we made. 00:08:09.000 |
Blueprint is just a series of steps for workflow 00:08:11.900 |
as for tool capabilities in natural language. 00:08:15.300 |
And it gets fed to Planner, but why we are doing it. 00:08:21.140 |
What we realized was Planner really gets cognitively loaded when 00:08:30.980 |
So introducing a Blueprint, which is just a natural language 00:08:38.840 |
But we also noticed that it brings a lot of other benefits 00:08:43.260 |
For example, it achieves the finer control over task planning. 00:08:47.980 |
It limits the in-context tool for the Planner. 00:08:50.820 |
So when Blueprint, you can select what tools need 00:08:56.640 |
And sometimes this Planner has a lot of tool description, 00:09:01.520 |
and you run all sort of problems as context window limit 00:09:08.780 |
So using Blueprint, you can limit what tools really 00:09:21.660 |
It also helps interpreting the agentic behavior. 00:09:27.160 |
with non-technical people, it's really helpful 00:09:30.980 |
because natural language is less intimidating. 00:09:36.600 |
So in financial research, preparing for a company's 00:09:46.300 |
of a workflow of preparing for a company's earning call. 00:09:50.640 |
And for example, we are showing you preparing 00:09:55.480 |
Now, you can see in the Blueprint, there is a tool 00:09:59.580 |
And in the plan, there is a tool and the function call. 00:10:05.620 |
is you have two tools, and then your first step 00:10:09.360 |
is summarizing the NVIDIA's previous earning call. 00:10:17.300 |
And then your reasoning, suggesting some questions 00:10:22.320 |
a general data competency report from all the information. 00:10:28.040 |
And as you can see, context is being fed from a task. 00:10:41.960 |
But after this, it can easily capture your workflow 00:10:51.440 |
will really work without writing a proper evals. 00:10:54.860 |
So always make sure to invest and build and maintain 00:11:00.800 |
You should have at least component and end-to-end evals. 00:11:04.560 |
You should really use the correct techniques, 00:11:07.440 |
like code-based, LLMS-judge, human-in-the-loop. 00:11:10.040 |
And more importantly, write evals for metrics 00:11:16.020 |
Aspect-based eval is something we should really think about. 00:11:26.620 |
whether it resembles a golden Blueprint or not. 00:11:31.320 |
If you want to see whether tools are selected correct or not, 00:11:37.540 |
If you want to check whether a plan is in line 00:11:39.720 |
with the Blueprint or not, LLMS-judge, probably 00:11:44.460 |
And for some cases, leveraging human-in-the-loop 00:11:50.160 |
that's the best approach to deal with report formatting. 00:11:58.060 |
So in some cases, definitely agentic workflow 00:12:08.700 |
you cannot really capture use case-in workflows, 00:12:16.620 |
in case of strict compliance and a safety-critical context, 00:12:21.200 |
you probably should not go with agentic workflow. 00:12:24.100 |
And in case of low latency and cost-centered environment 00:12:27.380 |
also, you should probably try to avoid agentic workflow. 00:12:32.820 |
So wrapping up some learnings, start with simple Blueprints. 00:12:38.380 |
Work your way up building a complex RAC system. 00:12:43.020 |
For the Blueprints, use Blueprint to reduce the in-context tools 00:12:49.280 |
and provide the high-level plan to the planner. 00:13:01.780 |
And evals, observability, and all the good software engineering. 00:13:11.200 |
And from the whole presentation, the key takeaways are, 00:13:14.540 |
agentic workflow is planned and run by agent. 00:13:18.080 |
Agentic workflows bring the reliability at scale. 00:13:23.200 |
And planning by sub-goal division is a key design pattern. 00:13:27.040 |
Plan and execute is a key agentic architecture. 00:13:30.840 |
And build your tools to complement your microservices. 00:13:35.940 |
Always try to leverage your microservices in the tools. 00:13:40.240 |
And modify your architecture to solve the problems. 00:13:43.700 |
Don't really shy away from changing, taking research paper, 00:13:49.680 |
And finally, treat your evals like first-class citizen. 00:13:54.580 |
And with that, thank you very much for your time. 00:14:11.120 |
Do you have, on top of your mind, any GitHub project 00:14:25.380 |
of shared some of the links for the Langchain. 00:14:32.860 |
It should have all the code for these research paper. 00:14:39.780 |
to start with this plan and execute kind of agents. 00:14:59.660 |
being the primary method of orchestration going forward? 00:15:02.980 |
Is it going to be a lane graph or some other-- 00:15:10.780 |
MCP, you use it so that you provide a standard across the arc. 00:15:23.940 |
we see that people just trying to just use this functionality 00:15:29.640 |
But if you can build an MCP around it, you can keep using it. 00:15:33.500 |
And obviously, for orchestration, Langraph is great. 00:15:38.980 |
find to solve your problem, that will be also-- 00:15:42.020 |
so the answer is probably there will be multiple things 00:15:45.640 |
It depends on your use case, what is the most optimal framework