back to index

Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri


Whisper Transcript | Transcript Only Page

00:00:00.160 | My name is Sarmad and today I want to talk about building effective agents with model context
00:00:05.120 | protocol or MCP. So a lot has changed in the last year especially as far as agent development is
00:00:13.440 | concerned. I think 2025 is the year of agents and things like MCP make agent design simpler and more
00:00:21.280 | robust than ever before. So I want to talk about what the agent tech stack looks like in 2025.
00:00:26.800 | The second thing is a lot of MCP servers today are just one-to-one mappings of existing REST API
00:00:34.160 | services to MCP tools but MCP servers can be a lot more than that. They could even be agents and so
00:00:43.520 | I want to show how agents can be represented as MCP servers. And the last thing is a little bit of a
00:00:49.920 | look into agent architecture and modeling agents as asynchronous workflows with workflow orchestration
00:00:57.920 | infrastructure like Airflow, Temporal, etc. So a little bit about me. I'm the CEO of Last Mile AI
00:01:06.400 | and I've in the past been working on developer tools for a while for many years and back in
00:01:14.320 | 2016 to 2018 I was working on language server protocol and language servers at Microsoft.
00:01:20.640 | LSB revolutionized IDEs. Here on the right you can kind of see the list of hundreds and hundreds of
00:01:28.720 | language servers that are now available but before this every IDE had a unique API surface and so every
00:01:37.040 | language server had to implement a VS code specific way of doing things or an Eclipse specific way
00:01:44.320 | and it was very fragmented as an ecosystem. And LSB completely changed that by standardizing a single
00:01:53.040 | interface API interface for how language services should be exposed in IDEs. And so when LLMs took off even
00:02:01.440 | before tool calling was a thing I've been thinking about what it would take to make a LSB style protocol
00:02:07.600 | for LLMs and I've been thinking about this for a long time. Here you have this like scratch pad from 2023
00:02:14.480 | where this is the era of you know ChatGPT plugins and I was thinking of how you know
00:02:23.120 | agent authentication should work or how LLMs should be connected to tools, resources, data in some way.
00:02:29.360 | And so model context protocol which Anthropic created a few months ago has been a godsend and
00:02:36.880 | I think it incorporates a lot of the things that are really necessary to to get you know agents into
00:02:43.840 | production and we'll talk a little bit about that. Like I stated before I think 2025 is the year that
00:02:48.880 | agents hit production on mass. Until now there have been a lot of high impact use cases that our customers
00:02:58.800 | see that have been stalled in proof of concept stage. Things like you know people want to work
00:03:03.680 | do workflow automation, they want to deal with unstructured data and process it in interesting ways,
00:03:10.720 | they want to do information retrieval and you're starting to see agents appear in each of these
00:03:16.320 | categories already and I think that pattern will accelerate in the coming months. So what does this
00:03:23.920 | tech stack look like for agents in 2025? There are three big kind of updates or changes that are happening
00:03:31.520 | which I think allow you to build effective agents much more easily than ever before. So the first thing is
00:03:39.360 | better models. We have reasoning models and LLMs that are pretty reliable for a lot of use cases and with
00:03:48.400 | test time compute a lot of the complexity things like you know chain of thought reasoning or react or other
00:03:56.320 | kind of patterns that had been implemented at the framework layer are actually now shifting left into the inference layer
00:04:02.960 | and all that allows is for less complexity and less burden for app developers because they can get a lot
00:04:11.280 | more done by just invoking a model API than ever before. The second thing is model context protocol
00:04:19.280 | or MCP. For folks who are not familiar, MCP is basically a standardized interface for connecting LLMs to
00:04:25.920 | tools to data to resources to the world around them and so the really the revolutionary thing about it is that
00:04:34.960 | it is a single way it provides a single interface to connect and give context to LLMs whereas in the past
00:04:43.360 | there used to be you know a multitude of data connectors that were platform specific that you would have to
00:04:50.800 | integrate with and MCP has taken off you know like Google, OpenAI, Microsoft, many other companies potentially
00:05:01.040 | competitors have all kind of coalesced around MCP and so it is going to become the de facto standard for how
00:05:10.400 | LLMs connect to the world around them. And the last part that's really changed in the last few months
00:05:18.000 | is there are simpler architectures for how agent applications should look. Agents today unlike the past
00:05:26.000 | are now simply you know orchestrators of better models and MCP and connecting LLMs to these tools and resources
00:05:36.400 | using these standard protocols in some well-defined patterns. There's no longer a need for monolithic
00:05:43.120 | AI frameworks that did a lot of heavy lifting at the framework layer in the past. Now you can have simple agent
00:05:50.320 | patterns you implement them with standard protocols with good LLMs and you can get a long way. And just to
00:05:57.920 | show you Anthropic at the end of last year beginning of this year released this very influential blog post called
00:06:05.680 | Building Effective Agents and in it they highlighted a couple of agent patterns that work well in production
00:06:12.080 | from their experience with you know deploying agents into enterprises. And so the simplest example of this
00:06:19.760 | pattern is this thing called an augmented LLM which is basically an LLM that has access to tools and resources
00:06:26.720 | or data. And you basically you know it's the base building block you run this LLM in a loop it gets an
00:06:33.360 | input it may call tools it may retrieve data in order to do its job and it runs you know several iterations
00:06:41.760 | iterations and returns a response at the end. And then you can build more interesting patterns on top of
00:06:48.400 | that. So then you can have an augmented LLM which is the optimizer that generates a response and you can
00:06:54.720 | connect it to another augmented LLM which is the evaluator that evaluates the quality of the generated
00:07:00.960 | response and gives feedback to the to the generator LLM to see what it could do better. And this process
00:07:08.400 | happens over like a set of iterations until the evaluator LLM is happy with the quality of the
00:07:15.200 | response and then it you know returns the final response to the user. You could have you know distributed
00:07:21.920 | systems practices like fanning out to multiple sub agents and then fanning back in to aggregate the results.
00:07:29.920 | And perhaps the most sophisticated one which we're starting to see in tools like cloud code and
00:07:36.560 | other you know agentic systems is this idea of an orchestrator where you have one LLM that does
00:07:44.160 | that generates a plan and assigns tasks to sub agents dynamically and then synthesizes the results
00:07:52.720 | before responding back to the user. And this process can also run in a loop but really the idea is that
00:07:59.520 | that there's a planner that is reasoning and deciding what to do next kind of dynamically.
00:08:06.160 | So what I did towards the end of last year as part of my Christmas break was I wanted to build an agent
00:08:16.480 | library that implemented all of the patterns that this Building Effective Agents blog post had
00:08:24.160 | and basically was very opinionated about the world being MCP native in the very near future and so
00:08:30.000 | that's what I built it's called MCP Agent it's on GitHub you can check it out and it is basically making
00:08:36.800 | a few very key opinionated choices. One is that MCP is going to be everywhere so every line of business
00:08:43.680 | application think like you know Notion, Google Docs, Cursor or Cloud is soon going to be an MCP compatible client.
00:08:52.160 | So that means that it could connect to MCP servers and on the flip side I think every service this is
00:08:58.160 | already starting to happen is going to have an MCP server equivalent for it and so you're going to see
00:09:04.640 | things like you know a linear MCP server a GitHub MCP server and any kind of like SaaS product that needs
00:09:10.560 | to expose itself to LLMs will have an MCP server. The second thing that I'm going to show in a little
00:09:16.800 | bit is that agents should be thought of as microservices and they can be deployed as MCP servers
00:09:24.480 | themselves and as we'll talk about in a little bit that actually gives a lot of benefits on how multi-agent
00:09:32.160 | interactions can work. And the last part is agents are async workflows and they should be modeled as such
00:09:39.280 | because they can be paused, resumed, retried you may have a human in the loop and that's really
00:09:45.040 | a workflow orchestration that's asynchronous instead of something that's you know happening in your chat
00:09:51.040 | session in proc. You know if you think of agentic behavior in the MCP world today it all happens on the
00:09:59.280 | client side so you use cloud or cursor and they in turn use MCP servers to solve your the tasks
00:10:06.000 | you give them. But what if agents themselves were exposed as MCP servers? In that case if you connect
00:10:13.200 | an agent as an MCP server to an MCP client then that client can invoke that agent it could coordinate
00:10:20.000 | across multiple agents it could orchestrate similar to the patterns I showed you the same as it does today with any other MCP server.
00:10:27.760 | Also you could do multi-agent communication also over MCP so agents can then invoke other agents. In this diagram
00:10:38.560 | you kind of see an MCP client that's connected to regular MCP servers like github, slack, linear etc.
00:10:46.080 | But it's also connected to agent servers and these agent servers in turn can connect to other MCP servers
00:10:53.440 | just over the base MCP protocol and so then you can kind of get multi-agent collaboration and coordination
00:11:00.880 | for free. The MCP client can invoke in this case MCP agent server A which in turn may invoke other MCP servers
00:11:10.080 | or it may even invoke other agents and as a result you basically have this network of agents that may get
00:11:17.280 | activated from a single command that a user sends through Claude, Cursor or some other MCP client.
00:11:23.440 | So what are the benefits of this? If you expose agents as MCP servers the first thing you get is
00:11:30.800 | composable agents. Like I mentioned you have complex multi-agent systems that can operate over the same
00:11:37.120 | base protocol that everybody's adopting. We know MCP is going to be a common standard and so we can
00:11:43.280 | safely build on top of it. The second thing is you get platform agnostic agents. You can build these
00:11:49.600 | agents once and then you can reuse them anywhere that is MCP compatible. And finally you get scalable agents.
00:11:57.120 | If you run agent workflows on dedicated infrastructure then you can kind of separate that where the agent is
00:12:04.800 | where the agent compute is happening from the client that is being used to invoke the agent.
00:12:10.160 | And that gives enormous benefits in terms of you know scalability, performance and durability as well.
00:12:17.440 | So I've talked about agents as async workflows. What I mean by that is that agents can be paused and resumed.
00:12:24.960 | They need to await on human feedback in some cases. They may fail and then they need to be retried.
00:12:31.440 | Agents could be triggered or scheduled. It's not just a chat application that is agentic. You could have
00:12:37.760 | a webhook that triggers an agent or a cron job that you know triggers an agent every every day or every
00:12:43.120 | week or something. And so the right way to model all of this is as asynchronous workflows. And so that's
00:12:50.880 | what we do in MCP agent as well. We use temporal as the durable execution backend to the compute or the
00:12:59.040 | orchestration of agent execution. So let's do a quick demo to show what all of this looks like just to
00:13:05.920 | make it more real. So the first thing you'll see here is I have this task that I want to build an agent
00:13:12.960 | for. In this case it's a fairly complex task. I'm asking an agent to load the student's short story
00:13:19.120 | from a markdown file which is this. But we assume this is a student's short story. And then I want
00:13:24.640 | to generate a report. Basically grade this short story across proofreading, factual and logical
00:13:31.920 | consistency, as well as style adherence. And by the way for the style adherence I want to use the APA
00:13:38.160 | style guide from this URL. And finally I want to write that graded report to the markdown file,
00:13:44.640 | gradedreport.md. So the agent that I've created here is actually going to do a couple of things.
00:13:51.680 | But first I connect it to a couple of MCP servers. I have the fetch MCP server which can connect to URLs
00:13:58.800 | and get fetch data from the internet. And I have the file system MCP server to interact with the file
00:14:04.880 | system. So right off the bat because of MCP I don't need to interact with the file system or interact
00:14:13.440 | with the internet and fetch URLs in a unique way. It's all over the same base protocol and it's all
00:14:21.680 | exposed as tools from these MCP servers. And then I define a couple of these agents where I have a finder
00:14:28.000 | agent that can fetch content from the internet or from disk. I have a writer agent that can write
00:14:33.200 | stuff to disk. I have a proofreader, a fact checker, a style enforcer, and then I have an orchestrator.
00:14:39.840 | Recall from you know those agent patterns I showed you. This one will basically generate a plan given the
00:14:46.240 | task and it will use it will orchestrate these agents that I've defined in a way that it sees fit.
00:14:53.760 | So this workflow is about like a hundred lines of code and that it's still doing something fairly
00:15:01.280 | sophisticated. So if we run this we're going to use temporal to run this and so I'll kick this off
00:15:10.960 | and you'll see that the worker job has triggered and it's going to start executing. Workflow UI you see
00:15:17.840 | that there's a workflow that's been triggered and the first thing you'll see that the agent does is it
00:15:23.360 | actually generates a plan. So over here you see that it's broken down the task I gave it the fairly
00:15:30.720 | complex multi-step task into a series of steps that it's going to do. First it's going to load the
00:15:36.720 | student short story and it's going to use the finder agent for that. In turn the finder agent is going
00:15:42.240 | to use the file system MCP server. Then it's going to analyze the short story using the proofreader,
00:15:49.360 | the fact checker, the style enforcer and finally it's going to generate the graded report dot markdown file
00:15:56.720 | and write with the writer agent. And so then you see that the agent is executing. There's a whole
00:16:03.600 | workflow graph. This can fail at any step and can be retried. It can be terminated. It can await for human
00:16:12.160 | feedback. And here you see that it already completed. So we should have a graded report dot markdown file
00:16:19.200 | that's generated for us. And if we see what's in it, you can kind of see that it did what I asked it to
00:16:27.040 | factual consistency, APA style guide. It was able to do all this correctly. Lastly, you can actually do the same
00:16:37.120 | thing now by exposing this agent as an MCP server. You can connect it to an MCP client like cloud desktop.
00:16:44.480 | Here I have the agent exposed as an MCP server and you see that it exposes itself as workflows.
00:16:50.640 | And so I gave it the same short story here and I asked it to grade it use with this basic agent.
00:16:57.760 | And so what it does is it runs the workflow. It gives the input of the story and then it pulls for
00:17:06.560 | the status of that workflow job because note that the agent is executing in a different execution
00:17:13.040 | environment. I could close my cloud desktop and come back and it can check the status of that workflow
00:17:19.360 | and get me the results at a later date. And so the asynchronous nature of this work of this agent
00:17:25.920 | helps me kind of, you know, kick off agent tasks from anywhere. And then it like once the agent
00:17:33.200 | completes, it presents me the report over here. And so I can still use this agent in a chat bot environment.
00:17:41.840 | I can run this agent anywhere that is MCP compatible. Thank you all for listening to this. There's a lot more
00:17:48.560 | that you can do with agents because of the revolution that MCP is causing. I'd love to chat more in general
00:17:56.800 | about the future of agents. So you can come find me over email, Twitter or GitHub. Thank you.