back to index

MCP is all you need — Samuel Colvin, Pydantic


Chapters

0:0 Introduction: Speaker Samuel Colvin introduces himself as the creator of Pydantic.
0:42 Pydantic Ecosystem: Introduction to Pydantic the company, the Pydantic AI agent framework, and the Logfire observability platform.
1:18 Talk Thesis: Explaining the title "MCP is all you need" and the main argument that MCP simplifies agent communication.
2:5 MCP's Focus: Clarifying that the talk focuses on MCP for autonomous agents and custom code, not its original desktop automation use case.
2:48 Tool Calling Primitive: Highlighting that "tool calling" is the most relevant MCP primitive for this context.
3:10 MCP vs. OpenAPI: Listing the advantages MCP has over a simple OpenAPI specification for tool calls.
3:21 Feature 1: Dynamic Tools: Tools can appear and disappear based on server state.
3:26 Feature 2: Streaming Logs: The ability to return log data to the user while a tool is still executing.
3:33 Feature 3: Sampling: A mechanism for a tool (server) to request an LLM call back through the agent (client).
4:1 MCP Architecture Diagram: Visualizing the basic agent-to-tool communication flow.
4:43 Complex Architecture: Discussing scenarios where tools are themselves agents that need LLM access.
5:24 Explaining Sampling: Detailing how sampling solves the problem of every agent needing its own LLM by allowing tools to "piggyback" on the client's LLM access.
6:42 Pydantic AI's Role in Sampling: How the Pydantic AI library supports sampling on both the client and server side.
7:10 Demo Start: Beginning the demonstration of a research agent that uses an MCP tool to query BigQuery.
8:23 Code Walkthrough: Validation: Showing how Pydantic is used for output validation and automatic retries (model_retry).
9:0 Code Walkthrough: Context Logging: Demonstrating the use of mcp_context.log to send progress updates back to the client.
10:51 MCP Server Setup: Showing the code for setting up an MCP server using fast_mcp.
11:54 Design Pattern: Inference Inside the Tool: Explaining the benefit of having the tool perform its own LLM inference to reduce the context burden on the main agent.
12:27 Main Application Code: Reviewing the client-side code that defines the agent and registers the MCP tool.
13:16 Observability with Logfire: Switching to the Logfire UI to trace the execution of the agent's query.
14:9 Observing Sampling in Action: Pointing out the specific span in the trace that shows the tool making an LLM call back through the client via sampling.
14:48 Inspecting the SQL Query: Showing how the observability tool can be used to see the exact SQL query that was generated by the internal agent.
15:15 Conclusion: Final summary of the talk's points.

Whisper Transcript | Transcript Only Page

00:00:00.000 | I'm talking about MCP is all you need. A bit about who I am before we get
00:00:19.440 | started. I'm best known as the creator of Pydantic, data validation library for
00:00:25.000 | Python that is fairly ubiquitous, downloaded about 360 million times a
00:00:30.280 | month. So someone pointed out to me that's like 140 times a second. Pydantic is
00:00:35.920 | used in general Python development everywhere, but also in Gen.ai. So it's
00:00:40.120 | used in all of the SDKs and agent frameworks in Python basically. Pydantic
00:00:46.360 | became a company beginning of 23 and we have built two things beyond Pydantic
00:00:52.260 | since then. Pydantic AI, an agent framework for Python built on the same
00:00:58.100 | principles as Pydantic, and Pydantic Logfire observability platform, which is
00:01:04.740 | the commercial part of what we do. I'm also a somewhat inactive co-maintainer of the
00:01:12.100 | MCP Python SDK. So MCP is all you need. It's obviously a play on Jason Lu's talks, Pydantic is all you need. He gave at AI engineer, I think first of all, nearly two years ago, and then the second one, Pydantic is still all you need, maybe this time last year. And it has the same basic idea that people are overcomplicating something that we can use a single tool for. And I guess also,
00:01:39.940 | similarly, the title is completely unrealistic. Of course, Pydantic is not all you need. And neither is MCP for everything. But it has that we have the I think where where we agree is that there are an awful lot of things that MCP can do and that people are overcomplicating the situation sometimes trying to come up with new ways of doing agent to agent communication.
00:02:01.940 | Um, I'm talking here specifically about autonomous agents or code that you're writing, I'm not talking about the, um, uh, Claude desktop or cursor, uh, Zed, windsurf, etc. Use case of coding agents. Those were what MCP was originally primarily designed for. Um, I don't know whether or not David Pereira would say that, that what we're doing using MCP from Python is that, you know,
00:02:29.940 | he definitely wouldn't say it was the primary, uh, use case for, um, for MCP. So two of the, of the primitives of MCP prompts and resources probably don't come into this use case that much.
00:02:51.940 | They're very useful or should be very useful in the kind of cursor type use case that don't really apply in what we're talking about here.
00:02:58.940 | Um, but tool calling the third primitive is extremely useful for what we're trying to do here.
00:03:05.940 | Um, tool calling is a lot more complicated than you might at first think.
00:03:10.940 | A lot of people say to me about MCP are, but couldn't it just be, uh, open API? Why do we need this, uh, custom protocol for doing it?
00:03:19.940 | Um, and there's a number of reasons. The idea of dynamic tools, the tools that come and go during an agent execution, depending on the state of the server logging.
00:03:27.940 | So being able to return data to the user while the tool is still executing sampling, which I'm going to talk about a lot today,
00:03:36.940 | perhaps the most confusingly named part of MCP, if not tech in general right now, uh, and stuff like tracing observability.
00:03:44.940 | Um, and I would also add to that actually, uh, uh, uh, MCP's way of being allowed to operate as effectively a sub process over standard in and standard out is extremely useful for lots of use cases.
00:03:55.940 | And open API wouldn't, wouldn't solve those problems.
00:03:58.940 | So this is the kind of prototypical image that you will see from lots of people of what, uh, MCP is all about.
00:04:07.940 | The idea is we have some agent, we have any number of different tools that we can connect to that agent.
00:04:13.940 | And then the point is that like the agent doesn't need to be designed with those particular tools in mind.
00:04:17.940 | And those tools can be designed without knowing anything about the agent.
00:04:20.940 | And we can just compose the two together in the same way that, uh, I can go and use a browser and the web application that the website I'm going to doesn't need to know anything about the browser.
00:04:29.940 | I mean, I know we live in a kind of monoculture of browsers now, but like at least the ideal originally was we could have many different browsers all connecting over the same protocol.
00:04:36.940 | MCP is following the same idea, but it can get more complicated than this.
00:04:42.940 | We can have situations like this where, uh, we have tools within our system, which are themselves agents and are doing agentic things, need access to an LLM.
00:04:52.940 | And they of course can then in turn connect to other tools over MCP or directly connecting to tools.
00:04:58.940 | This, this works nicely. This is elegant, but there's a problem. Every single agent in our system needs access to an LLM.
00:05:06.940 | And so we need to go and configure that. We need to work out resources for that. And if we are, um, using remote MCP servers, if that remote MCP server needs to, um, use an LLM, well now it's worried about what the cost is going to be of doing that.
00:05:22.940 | What, what if the, uh, remote agent that's operating as a tool could effectively piggyback off the, uh, the model that the original agent has access to.
00:05:34.940 | That's what sampling gives us. So as I say, I think sampling is a somewhat, uh, not making that any bigger, unfortunately.
00:05:43.940 | Um, is that clear on screen? I may, maybe I'll make it bigger like that. Um, sampling is this idea of all of a way where within MCP, the protocol, the, um, server can effectively make a request back through the client to the LLM.
00:06:00.940 | So in this case, client makes a request, starts in some sort of agentic query, makes a call to the LLM.
00:06:06.940 | LLM comes back and says, I want to call that particular tool, which is an MCP server. Uh, client takes care of making that call to the MCP server.
00:06:14.940 | The MCP server now says, Hey, I actually need to be able to use an LLM to answer whatever this question is.
00:06:20.940 | So that then gets sent back to the client. The client proxies that request to the LLM, receives the response from the LLM, sends that, uh, onto the MCP server.
00:06:30.940 | And the MCP server then returns and we can continue on our way. Um, sampling is very powerful, not that widely supported at the moment.
00:06:41.940 | Um, I'm going to demo it today with Pydantic AI where we have support for sampling. Well, I'll be honest, it's a PR right now, but it will be soon. It will be merged.
00:06:50.940 | Um, we have support for sampling both as a, uh, as the client. So knowing how to proxy the, those LLM calls and as a server, basically being able to register, use the MCP client as, as the LLM.
00:07:03.940 | So this example is obviously like all examples trivialized or simplified to be, to fit on screen.
00:07:12.940 | The idea is that we, we're building a like research agent, which is going to go and research open source, uh, packages or libraries for us.
00:07:19.940 | And we have implemented one of the many tools that you would in fact need for this.
00:07:24.940 | And that tool is, um, making, uh, I will switch now to code and show you, uh, the one tool that we have, uh, I'm in completely the wrong file.
00:07:38.940 | Here we are. Um, so this tool is querying Big Query, the Big Query public dataset for, uh, PyPI to get, uh, numbers about the number of downloads of a particular package.
00:07:52.940 | So this is, this is pretty standard Pydantic AI, uh, Pydantic AI code.
00:07:57.940 | We can pick a log file, which I'll show you in a moment.
00:07:59.940 | We have the dependencies that the, uh, that the agent has access to while it's running.
00:08:05.940 | We said we can do some retries.
00:08:07.940 | So if the agent returns, if the LLM returns the wrong data, we can send a retry.
00:08:11.940 | A big system prompt where we give it basically the schema of the table, uh, tell it what to do, give it a few examples, yadda yadda.
00:08:19.940 | But then we get to this is the probably the powerful bit.
00:08:22.940 | So as an output validator, we are going to go and, first of all, we're going to strip out, uh, markdown block quotes from the SQL.
00:08:30.940 | Um, if they're there, then we will, uh, check that the table name is right, that it's querying against and tell it that it shouldn't, if it shouldn't.
00:08:38.940 | And then we're going to go and run the query.
00:08:41.940 | And critically, if the query fails, we're going to, uh, raise model retry within Pydantic AI to go and retry, uh, making the, um, uh, making the request to the, um, LLM again.
00:08:57.940 | Saying, asking the LLM to, uh, attempt to, to retry this.
00:09:01.940 | And what we're, the other thing we're doing throughout this, you'll see here, is we have this context dot depth dot mcp context dot log.
00:09:07.940 | So you'll see here when we define the depth type, we said that that was going to be an instance of this MCP, uh, context, which is what we get when you call the MCP server.
00:09:17.940 | So what we're doing here is we're having a, we're providing a type safe way within, in this case, um, the agent validator, but it could be in a tool call if you wanted it to be, to access that context.
00:09:30.940 | And so we can see here that we know, uh, um, in the type hint, uh, uh, uh, the, the, the type is, uh, MCP context.
00:09:39.940 | And so we have this log function and we know it's signature and we can go and make this log call.
00:09:43.940 | The point is this is going to return to the client and ultimately to the user watching before the, the thing is completed.
00:09:51.940 | So you can get kind of progress updates as we go.
00:09:53.940 | MCP also has a context, a concept of progress, which I'm not using here, but you can imagine that also being valuable.
00:09:59.940 | If you knew how far through the query you were, you could show an update in progress.
00:10:03.940 | So the idea that I think the original principle of, uh, logging like this is that you have the, the cursor style agent running.
00:10:10.940 | And we want to be able to give updates to the user.
00:10:13.940 | Don't worry.
00:10:14.940 | I'm still going before it's finished and exactly what's happening.
00:10:17.940 | But you could also imagine this being useful if you were using MCP.
00:10:20.940 | If this was research agent was, uh, running as a web application, you wanted to show the user what was going on.
00:10:26.940 | This deep research might take, you know, minutes to run.
00:10:29.940 | We can give these logs while the tool call is still executing.
00:10:32.940 | And then we're just going to take the, the output, turn it into a list of dicts and then format it as XML.
00:10:39.940 | So you get a nice, uh, models are very good at basically reviewing XML data.
00:10:45.940 | So we basically return whatever the query results are as that kind of XMLish data, which the LLM will then be good at, uh, interpreting.
00:10:52.940 | Now we get to the MCP bit.
00:10:55.940 | So in this code, we are setting up an MCP server using fast MCP.
00:11:00.940 | There are two versions of fast MCP right now.
00:11:02.940 | Confusingly.
00:11:03.940 | This is the one from inside the MCP SDK.
00:11:07.940 | Um, we, the doc string for our function.
00:11:11.940 | So we're registering one tool here, PI PI downloads.
00:11:14.940 | And our doc string from that function will end up becoming the description on the tool that is ultimately fed to the LLM that chooses to go and call it.
00:11:22.940 | Um, and we're going to pass in the user's question.
00:11:26.940 | I think one of the, one of the important things to say here is of course you could set this up to generate the SQL within your, uh, central agent.
00:11:37.940 | You could include all of the, um, uh, description of the SQL, the instructions within your, within the description of the tool.
00:11:47.940 | Uh, models don't seem to like that much data inside a tool description, but more to the point, we're just going to blow up the context window of our main agent.
00:11:54.940 | If we're going to ship all of this context on how to make these queries into our main agent, that's just all overhead in all of our calls to that agent, regardless of whether we're going to call this particular tool.
00:12:04.940 | So doing this kind of thing where we're doing the inference inside a tool is a powerful way of effectively limiting, uh, the context window of the, of the main running agent.
00:12:13.940 | And then we're just going to return this output, which will be a string, the value returned from, from here.
00:12:19.940 | And we'll just run the, run the MCP server and by default the MCP server will run over standard IO.
00:12:25.940 | Um, and then we come to our main application.
00:12:28.940 | So here we have a definition of our agent and you see we've defined one MCP server.
00:12:34.940 | That's just going to run the, the script I just showed you, the PI PI MCP server.
00:12:40.940 | Um, and so then this agent will act as the client and has that registered as the tool to be able to call.
00:12:46.940 | Uh, we're also going to set the, give it the current date.
00:12:49.940 | Uh, so it doesn't, uh, assume it's 20, 2023 as they often do.
00:12:54.940 | Um, and now we can go and ultimately run our main agent, ask it, for example, how many downloads Pydantic has had this year.
00:13:03.940 | And I'm going to be brave and run it and see what happens.
00:13:06.940 | Uh, and it has succeeded.
00:13:07.940 | And it has, uh, gone and told us, uh, that we had whatever 1.6 billion downloads this year.
00:13:13.940 | But probably more interesting is to come and look at what that looks like in Logfire.
00:13:16.940 | So if you look at, is it going to come through to Logfire or are we having a failure here as well?
00:13:21.940 | This, I will admit, this is the run from just before, uh, I came on stage.
00:13:25.940 | But it would look exactly the same.
00:13:26.940 | So I'm not going to talk too much about observability and how we do, uh, how MCP observability or tracing works within MCP.
00:13:36.940 | Because I know there's a talk coming up directly after me talking about that.
00:13:38.940 | So think of this as a kind of, uh, spoiler for what's going to come up.
00:13:42.940 | But you can see, we, we run our outer agent.
00:13:45.940 | It decides to, it calls, uh, uh, QPT 4.0, uh, which decides, sure enough, I'm going to go and call this tool.
00:13:54.940 | Uh, it doesn't need to think about regeneration with SQL.
00:13:56.940 | It can just have a natural language description of the query that we're trying to make.
00:14:00.940 | We then, uh, this is the MCP client.
00:14:03.940 | As you can see here, MCP client then calls into the MCP server.
00:14:07.940 | Um, makes the, which then again runs a different, uh, Pydantic AI, uh, agent, which then makes a call to an LLM, which happens through proxying it through the client.
00:14:19.940 | So that's where you can see the service going client server, uh, client server.
00:14:25.940 | Ultimately, if you look at the top level, uh, exchange with the model, you'll see here.
00:14:30.940 | Yeah, the, the, the, the ultimate output was it, which had the, the return response from running the query was, was this kind of XML data.
00:14:39.940 | And then the LLM was able to turn that into a human description of what was going on.
00:14:44.940 | So, I think the other interesting thing probably is we can go and look in.
00:14:47.940 | We should be able to see the actual SQL that was called.
00:14:50.940 | So this is the agent call inside, uh, MCP server.
00:14:54.940 | And you can see here the SQL it wrote.
00:14:56.940 | And you can confirm that it indeed looks correct.
00:14:59.940 | Um, I am going to, uh, go on from there and say, um, thank you very much.
00:15:07.940 | Um, we are at the booth, the, the pedantic booth.
00:15:10.940 | So if anyone has any questions on this, wants to see this fail in numerous other exciting ways.
00:15:14.940 | Very happy to, to talk to you.
00:15:15.940 | Yeah.
00:15:16.940 | Come and say hi.