How to Build Planning Agents without losing control

Yogi Dhani: Hi, everyone. I'm Yogi. I work at FactSet, a financial data and software company. And today I'll be sharing some of my experience while building Agent. In last few years, we have seen tremendous growth in AI. And especially in last couple of years, we are on exponential curve of intelligence growth.

And yet, it feels like when we develop AI applications, driving a monster truck through a crowded mall with the tiny joysticks. So AI applications have not seen its ChatGPT moment yet. There are many reasons why agents don't behave. But probably one reason that strikes out is it misses the right context.

And in case of enterprises, often it means that it does not have knowledge of enterprise-specific workflows. But before that, we will see some common context. And just like agents, humans also need a common context. So let's start with some key definitions. So as you know, LLMs are limited by their knowledge at the time of training.

So we enhance their functionality by increasing it by tool. And when you combine this LLM with tool and memory, we call it augmented LLM. When you place this augmented LLM on a static and predefined path, we call it a workflow. And if these augmented LLMs have high autonomy and feedback loop, we call it as an agent.

Now workflows are controllable and reliable, while agents have flexibility and they are highly autonomous. So the question is, can we get best of both worlds? So the answer is yes. With agentic workflows, we can plan and execute the workflows based on the goal, context, and feedback. I see these terms being used very loosely, and at times interchangeably.

So I would like to make a key distinction between workflow agent and agentic workflow. Workflow agent is a predefined workflow run by agent, while agentic workflow is a workflow planned and run by an agent. I know these terms are quite confusing, and in AI we are very bad at naming things.

So if you are confused, don't worry. In case of workflow agent, just remember that workflow is in control and workflow is static. In case of agentic workflow, agent is always in control, and the workflow is dynamic. It is also important to view these systems as agentic system, as Andrewing pointed out correctly.

On agentic spectrum, agentic workflows have more agenticness than workflow agents, generally speaking. So why all of this matter? Apart from control, reliability, predictability, for enterprises, agentic workflows provide a way to automate the workflows at scale. And perhaps most important thing is enterprises can use their existing enterprises microservices to build on top of it.

And in some cases, these enterprises have invested years, if not decades. So before diving deep, I would like to say that even though I'm speaking in terms of enterprise context here, the concepts are generally applicable. So where do we begin? In last few years, the focus really has been on the React-based agent.

And in building agentic workflow, we need to move on from React-based agent to proactive agents. By the way, great philosophy for life as well. So for building agentic workflows, you need tools, memory, and reflection. But more importantly, you will need a design pattern called planning by sub-goal division, sometimes also referred as a task decomposition.

And it is just a fancy way of saying that take your goal and break it down into simpler steps. So here are some specific agentic architecture and research papers that you will find useful. And each of that has its own pros and cons. And LangChain has done a fantastic job of creating a blog from this and also given the code.

So I highly recommend checking it out. So how does it look in practice? So in fact, what we have done is we are taking this LLM compiler architecture and trying to adapt for our problems. And you can see some components here that you also find that in your organization.

Microservices. And you build tools around those microservices. And when a user question asks, it goes to Blueprint Generator. And I will get to that in a bit. But consider it as a high level plan. What we call it is a Blueprint that gets fed to Planner. Planner is your low level task.

Planner, it gives the plan to the executor. An executor is supposed to execute it. And Joiner combines the outputs from different tasks. Based on your replanning logic, either you do replanning again, or you just terminate and give the response back to the user. Sometimes you also set some recursion limits so that your agent just doesn't go into loop.

On LangGraph, we are using each of these components as nodes. So Blueprint Generator, Planner, Executor, and Joiner are all nodes on the LangGraph. When building these tools in your enterprises around your microservices, probably this is where you will spend most of your time. And it's important to consider how this relation between tools and microservices goes.

And here, the relationship is definitely not one-to-one or end-to-end. It's end-to-end. It's up to you how you want to design your tools according to your microservices so that your agent knows how to use this tool. Perhaps this is like the most key point here, that you need to make-- really put yourself into agent's shoes so that agent really understand what tool to use, and it has that knowledge of your microservices.

Always follow standard. I know MCP is everyone's favorite. So build the MCP tool server for your tools. And for providing the tool details, just think from agent's point of view that you need to provide a tool purpose, description, and input/output contracts. So tool purpose will help you what tools to be selected.

Detail description will tell you when this tool needs to be invoked. And input/output contracts will tell you how to use this tool. And lastly, add some validation checks, which acts as a break for your agent. Now, I would like to a little bit zoom in into this Blueprint because this is one of the key architecture chains that we made.

Blueprint is just a series of steps for workflow as for tool capabilities in natural language. And it gets fed to Planner, but why we are doing it. What we realized was Planner really gets cognitively loaded when you try to just put too much onto it. So introducing a Blueprint, which is just a natural language of breaking down of a task, is very helpful.

But we also noticed that it brings a lot of other benefits as well. For example, it achieves the finer control over task planning. It limits the in-context tool for the Planner. So when Blueprint, you can select what tools need to be given to the Planner. And sometimes this Planner has a lot of tool description, and you run all sort of problems as context window limit and Planner getting very much overloaded.

So using Blueprint, you can limit what tools really goes to the Planner. And thus, it really helps in the planning. It also helps interpreting the agentic behavior. And lastly, when you need to collaborate with non-technical people, it's really helpful because natural language is less intimidating. Let's see a concrete example.

So in financial research, preparing for a company's earning call is a common workflow. So this is a very, very simplified version of a workflow of preparing for a company's earning call. And for example, we are showing you preparing for NVIDIA's earning call. Now, you can see in the Blueprint, there is a tool and there is task.

And in the plan, there is a tool and the function call. So how does it look in the Blueprint is you have two tools, and then your first step is summarizing the NVIDIA's previous earning call. And the next step is retrieval, gathering some of the financial data for NVIDIA.

And then your reasoning, suggesting some questions for the earning call, and finally reporting a general data competency report from all the information. And there are corresponding function calls. And as you can see, context is being fed from a task. A concrete example of the response is before you implement agentic workflow, the response is pretty much vanilla.

But after this, it can easily capture your workflow and give a very structured response. So whatever we talked about, none of this will really work without writing a proper evals. So always make sure to invest and build and maintain your eval framework. You should have at least component and end-to-end evals.

You should really use the correct techniques, like code-based, LLMS-judge, human-in-the-loop. And more importantly, write evals for metrics that you really care for. Aspect-based eval is something we should really think about. And for example, for Blueprint, you can check an aspect like how many Blueprint, whether it resembles a golden Blueprint or not.

And you can use LLMS-judge. If you want to see whether tools are selected correct or not, you should leverage code-based evals. If you want to check whether a plan is in line with the Blueprint or not, LLMS-judge, probably the right technique. And for some cases, leveraging human-in-the-loop is good, because report formatting, that's the best approach to deal with report formatting.

So when not to use agentic workflows? So in some cases, definitely agentic workflow doesn't make sense. In case of fixed and repeated tasks, just probably go for ETL pipelines. If your workflow cannot be really captured, you cannot really capture use case-in workflows, agentic workflows are probably not worked. And if deterministic outcome is paramount, in case of strict compliance and a safety-critical context, you probably should not go with agentic workflow.

And in case of low latency and cost-centered environment also, you should probably try to avoid agentic workflow. So wrapping up some learnings, start with simple Blueprints. Work your way up building a complex RAC system. For the Blueprints, use Blueprint to reduce the in-context tools and provide the high-level plan to the planner.

Design tools from agent point of view. Always aim for the tool use of simplicity. Implement safety guardrails. And evals, observability, and all the good software engineering. And that should help you a lot. And from the whole presentation, the key takeaways are, agentic workflow is planned and run by agent.

Agentic workflows bring the reliability at scale. And planning by sub-goal division is a key design pattern. Plan and execute is a key agentic architecture. And build your tools to complement your microservices. Always try to leverage your microservices in the tools. And modify your architecture to solve the problems. Don't really shy away from changing, taking research paper, and experimenting on it.

And finally, treat your evals like first-class citizen. And with that, thank you very much for your time. All right. Thank you. Any questions? We have a little bit of time to spare. I have a question. Sure. Do you have, on top of your mind, any GitHub project or reference that we can follow?

Sure. Sure. So if you just go back here, I kind of shared some of the links for the Langchain. It should have all the code for these research paper. And that's probably the most best place to start with this plan and execute kind of agents. Thank you. Yeah. Any other questions?

Any other questions? All right. I guess one question I would have for you is when you talk about MCP and other forms of orchestration, what do you foresee being the primary method of orchestration going forward? Is it going to be a lane graph or some other-- Yeah. I think the answer is probably everything.

MCP, you use it so that you provide a standard across the arc. And MCP will really help for organization to build once, use it everywhere. You can have-- oftentimes, in organizations, we see that people just trying to just use this functionality in different AI apps. But if you can build an MCP around it, you can keep using it.

And obviously, for orchestration, Langraph is great. And whatever the other tools that you find to solve your problem, that will be also-- so the answer is probably there will be multiple things that is useful. It depends on your use case, what is the most optimal framework that you want to use.

Amazing. Thank you so much, Yuri. you We'll be right back.

How to Build Planning Agents without losing control - Yogendra Miraje, Factset

Transcript