Ubertool MCPs

00:00:00.960 | Hello, this is a recording that I'm doing after the fact on a presentation I gave in London at

00:00:07.920 | Cloud Code Anonymous. It's not the same presentation. That voiceover is a bit different,

00:00:14.640 | mostly just because of the setting. But the idea is the same, the slides are the same.

00:00:19.920 | The motivation for this talk was a blog post that I gave, or rather the road they came out,

00:00:29.280 | give or take at the same time, where I was investigating or proposing alternative ways

00:00:34.640 | to write MCP servers, because MCP servers compose kind of badly today, and they expose so many tools

00:00:44.000 | that when you load a sufficient amount of them, the ability of an agent to actually select the tools

00:00:49.440 | goes down. And so what I'm proposing here is what I call an uber tool MCP, which is effectively an MCP

00:00:58.240 | server with a single tool that is multi-function, multi-purpose, and reads code.

00:01:04.160 | I will skip over this, but I wrote some articles about MCP before, and you can find them on our blog.

00:01:13.280 | So I am generally kind of conflicted on MCPs, and I have largely replaced them with command line tools,

00:01:20.880 | because in the context of Claude, Claude is very capable of using bash, and many of the command line

00:01:26.960 | tools that can replace MCP are in the training set and quite helpful. So for instance, the github gh command

00:01:35.360 | is a very good replacement for the github mcp, and overall my experience performs better.

00:01:40.800 | And the reason they perform better is in parts because they're composable through bash, which is a

00:01:46.720 | programming language. And if we think about what these coding agents do, they really just write code. So

00:01:52.880 | the idea here was, can we provide a way for MCPs or for an agent to use an MCP to execute code on the fly

00:02:04.720 | in a stateful session. And again, just look at how many tools MCPs currently pull in. So it doesn't

00:02:13.600 | quite work to extend this to an unlimited amount of tools, even with modern approaches like loadouts,

00:02:21.520 | where you basically use a rack search to select a subset of tools for a task.

00:02:25.120 | The biggest reason why I think it is interesting to look at alternatives to

00:02:33.120 | at least the current approach of MCP is because MCP calls currently cannot be chained.

00:02:37.760 | And so you're required in many ways to rely on inference for all of the work.

00:02:44.800 | And in some of those cases, it's actually very appealing to chain multiple tool calls together

00:02:50.960 | through some meta language. And this is basically an approach to do this within one MCP. So there's,

00:02:59.200 | you cannot have this meta language at the moment, at least to

00:03:02.000 | chain different types of MCPs together. But I want to show two MCPs that I wrote,

00:03:07.840 | which have a single tool and then use a programming language as an input and a stateful session.

00:03:12.880 | And the first one is the P expect MCP.

00:03:15.840 | And just for your understanding, P expect is a Python port, a Python library. It's kind of well known,

00:03:24.800 | a very old one, a very old one that implements the functionality of the expect Unix command that comes from the 1970s, I believe.

00:03:31.440 | And it is emulating a terminal, then provides an API to make expectations against the output to await certain outputs,

00:03:42.480 | and to send inputs into it. And so in this case, I'm demonstrating how to use the P expect MCP

00:03:48.720 | to remote control an LDB process, which is then debugging a crash application.

00:03:56.560 | And you can see from the tool usage that is really just one P expect tool, and the input is Python code.

00:04:04.560 | And this works because the prompt of this MCP tells it that there is a stateful Python session available that it can use to remote control this LDB process.

00:04:15.360 | And what is kind of interesting about this approach is that while you can see that it uses a lot of these tool calls, so it's not overly efficient.

00:04:23.600 | You could get it to prompt a lot less. It's just very thorough here.

00:04:28.000 | But what is interesting about it is that once it has the root cause, you have all the code that it used to remote control this process in the context window,

00:04:39.280 | which means that you can, for instance, tell it afterwards to dump all of these outputs into another Python tool that you can then run in a single iteration to perform most of what this debug session was.

00:04:54.720 | And this is what I'm doing here.

00:04:56.320 | Tell it to dump the commands into a Python script.

00:05:00.400 | And this will rather quickly go through the already written code that it has in the context and dump it into a reusable Python script,

00:05:09.840 | which we can then run and have it explain the output.

00:05:15.120 | And this, you might ask like, okay, how often going to debug something one on once?

00:05:19.760 | Well, for instance, you could do it in a sub-agent and then have the setup session be loaded by the main agent and populate less of the context window.

00:05:32.080 | Or you might want to restart that debug session a couple of times.

00:05:35.600 | Of course, in this case, it would be better if you don't have some interactivity at the end again, and you can actually try to do that.

00:05:43.120 | But I mostly just want to demonstrate the idea behind it, which is because we already have this code in the context, we can then instruct the LLM to dump it out.

00:05:53.280 | And a second kind of interesting version of this is the playwright SMTP, which is

00:06:00.320 | the same idea.

00:06:02.720 | It exposes a Python.

00:06:04.080 | So in this case, it was a JavaScript session.

00:06:06.560 | There's a stateful JavaScript executor that has the playwright MCP loaded into the context.

00:06:13.680 | And that allows it to remote control a browser from input that the agent writes.

00:06:23.520 | And what is particularly interesting about this MCP is that it automatically also gets all the console log messages out.

00:06:32.320 | In this case, it wrote a bunch of code to write the links that it finds and explains them afterwards.

00:06:40.720 | And in this case, I'm also demonstrating looping.

00:06:45.040 | So I'm telling it to go to the GitHub profile of myself and create a list of all the pro.

00:06:51.680 | I'm sorry.

00:06:52.720 | It creates a list of all the repositories.

00:06:57.600 | And this means that you have to paginate 11 through my GitHub profile.

00:07:03.840 | And in this case, it looks at the first page, it figures out the structure, and then it starts writing

00:07:09.280 | the loop directly into one single tool call because it's all JavaScript to click the button a couple of

00:07:15.520 | times and extract all the links into a list, which it then at the end processes.

00:07:20.800 | And I think this is kind of interesting case because you can see how quickly it actually loads through

00:07:24.880 | all of those tabs.

00:07:25.760 | It's basically just latency bound at this point.

00:07:30.000 | And it's just, it was a single tool use, right?

00:07:31.840 | And now it has 313 repositories and we'll start collecting them into a Yama file.

00:07:37.600 | And in fact, the slowest part about the repository collection at the end is actually dumping it into

00:07:43.760 | the Yama file.

00:07:45.520 | And you can also see here that it does another Playwright call at the end.

00:07:49.440 | So just check, do a spot check on the first and the last 10 repositories.

00:07:55.280 | And then it figures out that it actually has some white space here.

00:07:58.000 | And now it uses inference to clean it up.

00:07:59.760 | But again, I think it's an interesting way of doing this.

00:08:04.960 | And with Playwright in particular, and because it can now dump out the Playwright script,

00:08:10.640 | you can then also use this to create integration tests for browser interactions,

00:08:15.200 | because you already have all these inference calls in the backlog of the context.

00:08:22.880 | And so you could dump out these scripts to do it in less terms.

00:08:30.240 | Is it any good?

00:08:32.480 | Well, I don't know for sure, but I think it's an experiment worth doing.

00:08:37.280 | And one of the ways in which we could start looking at this is to say that

00:08:41.280 | what coding agents should actually expose is a meta language that can be used to

00:08:50.080 | to trigger MCP calls.

00:08:53.200 | So they could kind of do something similar where you create a proxy that loads all the MCP in

00:08:57.440 | and then expose a single tool,

00:08:59.200 | which is basically just a Python function that calls all the loaded MCP tools

00:09:05.600 | and maybe some sort of guidance of which tools are available.

00:09:08.400 | But it is just because it is not in the training set.

00:09:12.320 | It is kind of tricky to do, I think today, but I encourage you to explore with it.

00:09:16.640 | If you want to play with it, the two MCPs are on GitHub.

00:09:20.000 | And there's a companion blog post that you can read, which explains the concept in more detail.

Ubertool MCPs

Chapters