back to indexExposing Agents as MCP servers with mcp-agent: Sarmad Qadri

00:00:00.160 |
My name is Sarmad and today I want to talk about building effective agents with model context 00:00:05.120 |
protocol or MCP. So a lot has changed in the last year especially as far as agent development is 00:00:13.440 |
concerned. I think 2025 is the year of agents and things like MCP make agent design simpler and more 00:00:21.280 |
robust than ever before. So I want to talk about what the agent tech stack looks like in 2025. 00:00:26.800 |
The second thing is a lot of MCP servers today are just one-to-one mappings of existing REST API 00:00:34.160 |
services to MCP tools but MCP servers can be a lot more than that. They could even be agents and so 00:00:43.520 |
I want to show how agents can be represented as MCP servers. And the last thing is a little bit of a 00:00:49.920 |
look into agent architecture and modeling agents as asynchronous workflows with workflow orchestration 00:00:57.920 |
infrastructure like Airflow, Temporal, etc. So a little bit about me. I'm the CEO of Last Mile AI 00:01:06.400 |
and I've in the past been working on developer tools for a while for many years and back in 00:01:14.320 |
2016 to 2018 I was working on language server protocol and language servers at Microsoft. 00:01:20.640 |
LSB revolutionized IDEs. Here on the right you can kind of see the list of hundreds and hundreds of 00:01:28.720 |
language servers that are now available but before this every IDE had a unique API surface and so every 00:01:37.040 |
language server had to implement a VS code specific way of doing things or an Eclipse specific way 00:01:44.320 |
and it was very fragmented as an ecosystem. And LSB completely changed that by standardizing a single 00:01:53.040 |
interface API interface for how language services should be exposed in IDEs. And so when LLMs took off even 00:02:01.440 |
before tool calling was a thing I've been thinking about what it would take to make a LSB style protocol 00:02:07.600 |
for LLMs and I've been thinking about this for a long time. Here you have this like scratch pad from 2023 00:02:14.480 |
where this is the era of you know ChatGPT plugins and I was thinking of how you know 00:02:23.120 |
agent authentication should work or how LLMs should be connected to tools, resources, data in some way. 00:02:29.360 |
And so model context protocol which Anthropic created a few months ago has been a godsend and 00:02:36.880 |
I think it incorporates a lot of the things that are really necessary to to get you know agents into 00:02:43.840 |
production and we'll talk a little bit about that. Like I stated before I think 2025 is the year that 00:02:48.880 |
agents hit production on mass. Until now there have been a lot of high impact use cases that our customers 00:02:58.800 |
see that have been stalled in proof of concept stage. Things like you know people want to work 00:03:03.680 |
do workflow automation, they want to deal with unstructured data and process it in interesting ways, 00:03:10.720 |
they want to do information retrieval and you're starting to see agents appear in each of these 00:03:16.320 |
categories already and I think that pattern will accelerate in the coming months. So what does this 00:03:23.920 |
tech stack look like for agents in 2025? There are three big kind of updates or changes that are happening 00:03:31.520 |
which I think allow you to build effective agents much more easily than ever before. So the first thing is 00:03:39.360 |
better models. We have reasoning models and LLMs that are pretty reliable for a lot of use cases and with 00:03:48.400 |
test time compute a lot of the complexity things like you know chain of thought reasoning or react or other 00:03:56.320 |
kind of patterns that had been implemented at the framework layer are actually now shifting left into the inference layer 00:04:02.960 |
and all that allows is for less complexity and less burden for app developers because they can get a lot 00:04:11.280 |
more done by just invoking a model API than ever before. The second thing is model context protocol 00:04:19.280 |
or MCP. For folks who are not familiar, MCP is basically a standardized interface for connecting LLMs to 00:04:25.920 |
tools to data to resources to the world around them and so the really the revolutionary thing about it is that 00:04:34.960 |
it is a single way it provides a single interface to connect and give context to LLMs whereas in the past 00:04:43.360 |
there used to be you know a multitude of data connectors that were platform specific that you would have to 00:04:50.800 |
integrate with and MCP has taken off you know like Google, OpenAI, Microsoft, many other companies potentially 00:05:01.040 |
competitors have all kind of coalesced around MCP and so it is going to become the de facto standard for how 00:05:10.400 |
LLMs connect to the world around them. And the last part that's really changed in the last few months 00:05:18.000 |
is there are simpler architectures for how agent applications should look. Agents today unlike the past 00:05:26.000 |
are now simply you know orchestrators of better models and MCP and connecting LLMs to these tools and resources 00:05:36.400 |
using these standard protocols in some well-defined patterns. There's no longer a need for monolithic 00:05:43.120 |
AI frameworks that did a lot of heavy lifting at the framework layer in the past. Now you can have simple agent 00:05:50.320 |
patterns you implement them with standard protocols with good LLMs and you can get a long way. And just to 00:05:57.920 |
show you Anthropic at the end of last year beginning of this year released this very influential blog post called 00:06:05.680 |
Building Effective Agents and in it they highlighted a couple of agent patterns that work well in production 00:06:12.080 |
from their experience with you know deploying agents into enterprises. And so the simplest example of this 00:06:19.760 |
pattern is this thing called an augmented LLM which is basically an LLM that has access to tools and resources 00:06:26.720 |
or data. And you basically you know it's the base building block you run this LLM in a loop it gets an 00:06:33.360 |
input it may call tools it may retrieve data in order to do its job and it runs you know several iterations 00:06:41.760 |
iterations and returns a response at the end. And then you can build more interesting patterns on top of 00:06:48.400 |
that. So then you can have an augmented LLM which is the optimizer that generates a response and you can 00:06:54.720 |
connect it to another augmented LLM which is the evaluator that evaluates the quality of the generated 00:07:00.960 |
response and gives feedback to the to the generator LLM to see what it could do better. And this process 00:07:08.400 |
happens over like a set of iterations until the evaluator LLM is happy with the quality of the 00:07:15.200 |
response and then it you know returns the final response to the user. You could have you know distributed 00:07:21.920 |
systems practices like fanning out to multiple sub agents and then fanning back in to aggregate the results. 00:07:29.920 |
And perhaps the most sophisticated one which we're starting to see in tools like cloud code and 00:07:36.560 |
other you know agentic systems is this idea of an orchestrator where you have one LLM that does 00:07:44.160 |
that generates a plan and assigns tasks to sub agents dynamically and then synthesizes the results 00:07:52.720 |
before responding back to the user. And this process can also run in a loop but really the idea is that 00:07:59.520 |
that there's a planner that is reasoning and deciding what to do next kind of dynamically. 00:08:06.160 |
So what I did towards the end of last year as part of my Christmas break was I wanted to build an agent 00:08:16.480 |
library that implemented all of the patterns that this Building Effective Agents blog post had 00:08:24.160 |
and basically was very opinionated about the world being MCP native in the very near future and so 00:08:30.000 |
that's what I built it's called MCP Agent it's on GitHub you can check it out and it is basically making 00:08:36.800 |
a few very key opinionated choices. One is that MCP is going to be everywhere so every line of business 00:08:43.680 |
application think like you know Notion, Google Docs, Cursor or Cloud is soon going to be an MCP compatible client. 00:08:52.160 |
So that means that it could connect to MCP servers and on the flip side I think every service this is 00:08:58.160 |
already starting to happen is going to have an MCP server equivalent for it and so you're going to see 00:09:04.640 |
things like you know a linear MCP server a GitHub MCP server and any kind of like SaaS product that needs 00:09:10.560 |
to expose itself to LLMs will have an MCP server. The second thing that I'm going to show in a little 00:09:16.800 |
bit is that agents should be thought of as microservices and they can be deployed as MCP servers 00:09:24.480 |
themselves and as we'll talk about in a little bit that actually gives a lot of benefits on how multi-agent 00:09:32.160 |
interactions can work. And the last part is agents are async workflows and they should be modeled as such 00:09:39.280 |
because they can be paused, resumed, retried you may have a human in the loop and that's really 00:09:45.040 |
a workflow orchestration that's asynchronous instead of something that's you know happening in your chat 00:09:51.040 |
session in proc. You know if you think of agentic behavior in the MCP world today it all happens on the 00:09:59.280 |
client side so you use cloud or cursor and they in turn use MCP servers to solve your the tasks 00:10:06.000 |
you give them. But what if agents themselves were exposed as MCP servers? In that case if you connect 00:10:13.200 |
an agent as an MCP server to an MCP client then that client can invoke that agent it could coordinate 00:10:20.000 |
across multiple agents it could orchestrate similar to the patterns I showed you the same as it does today with any other MCP server. 00:10:27.760 |
Also you could do multi-agent communication also over MCP so agents can then invoke other agents. In this diagram 00:10:38.560 |
you kind of see an MCP client that's connected to regular MCP servers like github, slack, linear etc. 00:10:46.080 |
But it's also connected to agent servers and these agent servers in turn can connect to other MCP servers 00:10:53.440 |
just over the base MCP protocol and so then you can kind of get multi-agent collaboration and coordination 00:11:00.880 |
for free. The MCP client can invoke in this case MCP agent server A which in turn may invoke other MCP servers 00:11:10.080 |
or it may even invoke other agents and as a result you basically have this network of agents that may get 00:11:17.280 |
activated from a single command that a user sends through Claude, Cursor or some other MCP client. 00:11:23.440 |
So what are the benefits of this? If you expose agents as MCP servers the first thing you get is 00:11:30.800 |
composable agents. Like I mentioned you have complex multi-agent systems that can operate over the same 00:11:37.120 |
base protocol that everybody's adopting. We know MCP is going to be a common standard and so we can 00:11:43.280 |
safely build on top of it. The second thing is you get platform agnostic agents. You can build these 00:11:49.600 |
agents once and then you can reuse them anywhere that is MCP compatible. And finally you get scalable agents. 00:11:57.120 |
If you run agent workflows on dedicated infrastructure then you can kind of separate that where the agent is 00:12:04.800 |
where the agent compute is happening from the client that is being used to invoke the agent. 00:12:10.160 |
And that gives enormous benefits in terms of you know scalability, performance and durability as well. 00:12:17.440 |
So I've talked about agents as async workflows. What I mean by that is that agents can be paused and resumed. 00:12:24.960 |
They need to await on human feedback in some cases. They may fail and then they need to be retried. 00:12:31.440 |
Agents could be triggered or scheduled. It's not just a chat application that is agentic. You could have 00:12:37.760 |
a webhook that triggers an agent or a cron job that you know triggers an agent every every day or every 00:12:43.120 |
week or something. And so the right way to model all of this is as asynchronous workflows. And so that's 00:12:50.880 |
what we do in MCP agent as well. We use temporal as the durable execution backend to the compute or the 00:12:59.040 |
orchestration of agent execution. So let's do a quick demo to show what all of this looks like just to 00:13:05.920 |
make it more real. So the first thing you'll see here is I have this task that I want to build an agent 00:13:12.960 |
for. In this case it's a fairly complex task. I'm asking an agent to load the student's short story 00:13:19.120 |
from a markdown file which is this. But we assume this is a student's short story. And then I want 00:13:24.640 |
to generate a report. Basically grade this short story across proofreading, factual and logical 00:13:31.920 |
consistency, as well as style adherence. And by the way for the style adherence I want to use the APA 00:13:38.160 |
style guide from this URL. And finally I want to write that graded report to the markdown file, 00:13:44.640 |
gradedreport.md. So the agent that I've created here is actually going to do a couple of things. 00:13:51.680 |
But first I connect it to a couple of MCP servers. I have the fetch MCP server which can connect to URLs 00:13:58.800 |
and get fetch data from the internet. And I have the file system MCP server to interact with the file 00:14:04.880 |
system. So right off the bat because of MCP I don't need to interact with the file system or interact 00:14:13.440 |
with the internet and fetch URLs in a unique way. It's all over the same base protocol and it's all 00:14:21.680 |
exposed as tools from these MCP servers. And then I define a couple of these agents where I have a finder 00:14:28.000 |
agent that can fetch content from the internet or from disk. I have a writer agent that can write 00:14:33.200 |
stuff to disk. I have a proofreader, a fact checker, a style enforcer, and then I have an orchestrator. 00:14:39.840 |
Recall from you know those agent patterns I showed you. This one will basically generate a plan given the 00:14:46.240 |
task and it will use it will orchestrate these agents that I've defined in a way that it sees fit. 00:14:53.760 |
So this workflow is about like a hundred lines of code and that it's still doing something fairly 00:15:01.280 |
sophisticated. So if we run this we're going to use temporal to run this and so I'll kick this off 00:15:10.960 |
and you'll see that the worker job has triggered and it's going to start executing. Workflow UI you see 00:15:17.840 |
that there's a workflow that's been triggered and the first thing you'll see that the agent does is it 00:15:23.360 |
actually generates a plan. So over here you see that it's broken down the task I gave it the fairly 00:15:30.720 |
complex multi-step task into a series of steps that it's going to do. First it's going to load the 00:15:36.720 |
student short story and it's going to use the finder agent for that. In turn the finder agent is going 00:15:42.240 |
to use the file system MCP server. Then it's going to analyze the short story using the proofreader, 00:15:49.360 |
the fact checker, the style enforcer and finally it's going to generate the graded report dot markdown file 00:15:56.720 |
and write with the writer agent. And so then you see that the agent is executing. There's a whole 00:16:03.600 |
workflow graph. This can fail at any step and can be retried. It can be terminated. It can await for human 00:16:12.160 |
feedback. And here you see that it already completed. So we should have a graded report dot markdown file 00:16:19.200 |
that's generated for us. And if we see what's in it, you can kind of see that it did what I asked it to 00:16:27.040 |
factual consistency, APA style guide. It was able to do all this correctly. Lastly, you can actually do the same 00:16:37.120 |
thing now by exposing this agent as an MCP server. You can connect it to an MCP client like cloud desktop. 00:16:44.480 |
Here I have the agent exposed as an MCP server and you see that it exposes itself as workflows. 00:16:50.640 |
And so I gave it the same short story here and I asked it to grade it use with this basic agent. 00:16:57.760 |
And so what it does is it runs the workflow. It gives the input of the story and then it pulls for 00:17:06.560 |
the status of that workflow job because note that the agent is executing in a different execution 00:17:13.040 |
environment. I could close my cloud desktop and come back and it can check the status of that workflow 00:17:19.360 |
and get me the results at a later date. And so the asynchronous nature of this work of this agent 00:17:25.920 |
helps me kind of, you know, kick off agent tasks from anywhere. And then it like once the agent 00:17:33.200 |
completes, it presents me the report over here. And so I can still use this agent in a chat bot environment. 00:17:41.840 |
I can run this agent anywhere that is MCP compatible. Thank you all for listening to this. There's a lot more 00:17:48.560 |
that you can do with agents because of the revolution that MCP is causing. I'd love to chat more in general 00:17:56.800 |
about the future of agents. So you can come find me over email, Twitter or GitHub. Thank you.