Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri

My name is Sarmad and today I want to talk about building effective agents with model context protocol or MCP. So a lot has changed in the last year especially as far as agent development is concerned. I think 2025 is the year of agents and things like MCP make agent design simpler and more robust than ever before.

So I want to talk about what the agent tech stack looks like in 2025. The second thing is a lot of MCP servers today are just one-to-one mappings of existing REST API services to MCP tools but MCP servers can be a lot more than that. They could even be agents and so I want to show how agents can be represented as MCP servers.

And the last thing is a little bit of a look into agent architecture and modeling agents as asynchronous workflows with workflow orchestration infrastructure like Airflow, Temporal, etc. So a little bit about me. I'm the CEO of Last Mile AI and I've in the past been working on developer tools for a while for many years and back in 2016 to 2018 I was working on language server protocol and language servers at Microsoft.

LSB revolutionized IDEs. Here on the right you can kind of see the list of hundreds and hundreds of language servers that are now available but before this every IDE had a unique API surface and so every language server had to implement a VS code specific way of doing things or an Eclipse specific way and it was very fragmented as an ecosystem.

And LSB completely changed that by standardizing a single interface API interface for how language services should be exposed in IDEs. And so when LLMs took off even before tool calling was a thing I've been thinking about what it would take to make a LSB style protocol for LLMs and I've been thinking about this for a long time.

Here you have this like scratch pad from 2023 where this is the era of you know ChatGPT plugins and I was thinking of how you know agent authentication should work or how LLMs should be connected to tools, resources, data in some way. And so model context protocol which Anthropic created a few months ago has been a godsend and I think it incorporates a lot of the things that are really necessary to to get you know agents into production and we'll talk a little bit about that.

Like I stated before I think 2025 is the year that agents hit production on mass. Until now there have been a lot of high impact use cases that our customers see that have been stalled in proof of concept stage. Things like you know people want to work do workflow automation, they want to deal with unstructured data and process it in interesting ways, they want to do information retrieval and you're starting to see agents appear in each of these categories already and I think that pattern will accelerate in the coming months.

So what does this tech stack look like for agents in 2025? There are three big kind of updates or changes that are happening which I think allow you to build effective agents much more easily than ever before. So the first thing is better models. We have reasoning models and LLMs that are pretty reliable for a lot of use cases and with test time compute a lot of the complexity things like you know chain of thought reasoning or react or other kind of patterns that had been implemented at the framework layer are actually now shifting left into the inference layer and all that allows is for less complexity and less burden for app developers because they can get a lot more done by just invoking a model API than ever before.

The second thing is model context protocol or MCP. For folks who are not familiar, MCP is basically a standardized interface for connecting LLMs to tools to data to resources to the world around them and so the really the revolutionary thing about it is that it is a single way it provides a single interface to connect and give context to LLMs whereas in the past there used to be you know a multitude of data connectors that were platform specific that you would have to integrate with and MCP has taken off you know like Google, OpenAI, Microsoft, many other companies potentially competitors have all kind of coalesced around MCP and so it is going to become the de facto standard for how LLMs connect to the world around them.

And the last part that's really changed in the last few months is there are simpler architectures for how agent applications should look. Agents today unlike the past are now simply you know orchestrators of better models and MCP and connecting LLMs to these tools and resources using these standard protocols in some well-defined patterns.

There's no longer a need for monolithic AI frameworks that did a lot of heavy lifting at the framework layer in the past. Now you can have simple agent patterns you implement them with standard protocols with good LLMs and you can get a long way. And just to show you Anthropic at the end of last year beginning of this year released this very influential blog post called Building Effective Agents and in it they highlighted a couple of agent patterns that work well in production from their experience with you know deploying agents into enterprises.

And so the simplest example of this pattern is this thing called an augmented LLM which is basically an LLM that has access to tools and resources or data. And you basically you know it's the base building block you run this LLM in a loop it gets an input it may call tools it may retrieve data in order to do its job and it runs you know several iterations iterations and returns a response at the end.

And then you can build more interesting patterns on top of that. So then you can have an augmented LLM which is the optimizer that generates a response and you can connect it to another augmented LLM which is the evaluator that evaluates the quality of the generated response and gives feedback to the to the generator LLM to see what it could do better.

And this process happens over like a set of iterations until the evaluator LLM is happy with the quality of the response and then it you know returns the final response to the user. You could have you know distributed systems practices like fanning out to multiple sub agents and then fanning back in to aggregate the results.

And perhaps the most sophisticated one which we're starting to see in tools like cloud code and other you know agentic systems is this idea of an orchestrator where you have one LLM that does that generates a plan and assigns tasks to sub agents dynamically and then synthesizes the results before responding back to the user.

And this process can also run in a loop but really the idea is that that there's a planner that is reasoning and deciding what to do next kind of dynamically. So what I did towards the end of last year as part of my Christmas break was I wanted to build an agent library that implemented all of the patterns that this Building Effective Agents blog post had and basically was very opinionated about the world being MCP native in the very near future and so that's what I built it's called MCP Agent it's on GitHub you can check it out and it is basically making a few very key opinionated choices.

One is that MCP is going to be everywhere so every line of business application think like you know Notion, Google Docs, Cursor or Cloud is soon going to be an MCP compatible client. So that means that it could connect to MCP servers and on the flip side I think every service this is already starting to happen is going to have an MCP server equivalent for it and so you're going to see things like you know a linear MCP server a GitHub MCP server and any kind of like SaaS product that needs to expose itself to LLMs will have an MCP server.

The second thing that I'm going to show in a little bit is that agents should be thought of as microservices and they can be deployed as MCP servers themselves and as we'll talk about in a little bit that actually gives a lot of benefits on how multi-agent interactions can work.

And the last part is agents are async workflows and they should be modeled as such because they can be paused, resumed, retried you may have a human in the loop and that's really a workflow orchestration that's asynchronous instead of something that's you know happening in your chat session in proc.

You know if you think of agentic behavior in the MCP world today it all happens on the client side so you use cloud or cursor and they in turn use MCP servers to solve your the tasks you give them. But what if agents themselves were exposed as MCP servers?

In that case if you connect an agent as an MCP server to an MCP client then that client can invoke that agent it could coordinate across multiple agents it could orchestrate similar to the patterns I showed you the same as it does today with any other MCP server. Also you could do multi-agent communication also over MCP so agents can then invoke other agents.

In this diagram you kind of see an MCP client that's connected to regular MCP servers like github, slack, linear etc. But it's also connected to agent servers and these agent servers in turn can connect to other MCP servers just over the base MCP protocol and so then you can kind of get multi-agent collaboration and coordination for free.

The MCP client can invoke in this case MCP agent server A which in turn may invoke other MCP servers or it may even invoke other agents and as a result you basically have this network of agents that may get activated from a single command that a user sends through Claude, Cursor or some other MCP client.

So what are the benefits of this? If you expose agents as MCP servers the first thing you get is composable agents. Like I mentioned you have complex multi-agent systems that can operate over the same base protocol that everybody's adopting. We know MCP is going to be a common standard and so we can safely build on top of it.

The second thing is you get platform agnostic agents. You can build these agents once and then you can reuse them anywhere that is MCP compatible. And finally you get scalable agents. If you run agent workflows on dedicated infrastructure then you can kind of separate that where the agent is where the agent compute is happening from the client that is being used to invoke the agent.

And that gives enormous benefits in terms of you know scalability, performance and durability as well. So I've talked about agents as async workflows. What I mean by that is that agents can be paused and resumed. They need to await on human feedback in some cases. They may fail and then they need to be retried.

Agents could be triggered or scheduled. It's not just a chat application that is agentic. You could have a webhook that triggers an agent or a cron job that you know triggers an agent every every day or every week or something. And so the right way to model all of this is as asynchronous workflows.

And so that's what we do in MCP agent as well. We use temporal as the durable execution backend to the compute or the orchestration of agent execution. So let's do a quick demo to show what all of this looks like just to make it more real. So the first thing you'll see here is I have this task that I want to build an agent for.

In this case it's a fairly complex task. I'm asking an agent to load the student's short story from a markdown file which is this. But we assume this is a student's short story. And then I want to generate a report. Basically grade this short story across proofreading, factual and logical consistency, as well as style adherence.

And by the way for the style adherence I want to use the APA style guide from this URL. And finally I want to write that graded report to the markdown file, gradedreport.md. So the agent that I've created here is actually going to do a couple of things. But first I connect it to a couple of MCP servers.

I have the fetch MCP server which can connect to URLs and get fetch data from the internet. And I have the file system MCP server to interact with the file system. So right off the bat because of MCP I don't need to interact with the file system or interact with the internet and fetch URLs in a unique way.

It's all over the same base protocol and it's all exposed as tools from these MCP servers. And then I define a couple of these agents where I have a finder agent that can fetch content from the internet or from disk. I have a writer agent that can write stuff to disk.

I have a proofreader, a fact checker, a style enforcer, and then I have an orchestrator. Recall from you know those agent patterns I showed you. This one will basically generate a plan given the task and it will use it will orchestrate these agents that I've defined in a way that it sees fit.

So this workflow is about like a hundred lines of code and that it's still doing something fairly sophisticated. So if we run this we're going to use temporal to run this and so I'll kick this off and you'll see that the worker job has triggered and it's going to start executing.

Workflow UI you see that there's a workflow that's been triggered and the first thing you'll see that the agent does is it actually generates a plan. So over here you see that it's broken down the task I gave it the fairly complex multi-step task into a series of steps that it's going to do.

First it's going to load the student short story and it's going to use the finder agent for that. In turn the finder agent is going to use the file system MCP server. Then it's going to analyze the short story using the proofreader, the fact checker, the style enforcer and finally it's going to generate the graded report dot markdown file and write with the writer agent.

And so then you see that the agent is executing. There's a whole workflow graph. This can fail at any step and can be retried. It can be terminated. It can await for human feedback. And here you see that it already completed. So we should have a graded report dot markdown file that's generated for us.

And if we see what's in it, you can kind of see that it did what I asked it to factual consistency, APA style guide. It was able to do all this correctly. Lastly, you can actually do the same thing now by exposing this agent as an MCP server. You can connect it to an MCP client like cloud desktop.

Here I have the agent exposed as an MCP server and you see that it exposes itself as workflows. And so I gave it the same short story here and I asked it to grade it use with this basic agent. And so what it does is it runs the workflow.

It gives the input of the story and then it pulls for the status of that workflow job because note that the agent is executing in a different execution environment. I could close my cloud desktop and come back and it can check the status of that workflow and get me the results at a later date.

And so the asynchronous nature of this work of this agent helps me kind of, you know, kick off agent tasks from anywhere. And then it like once the agent completes, it presents me the report over here. And so I can still use this agent in a chat bot environment.

I can run this agent anywhere that is MCP compatible. Thank you all for listening to this. There's a lot more that you can do with agents because of the revolution that MCP is causing. I'd love to chat more in general about the future of agents. So you can come find me over email, Twitter or GitHub.

Thank you.

Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri

Transcript